Sangwon (Justin) Hyun

Table of Contents

About me

Post-doctoral Researcher (I am on the 2021-2022 academic job market!)
Department of Data Sciences and Operations
University of Southern California
CV / Github / Google Scholar

I'm a post-doc working with Jacob Bien. My current focus is statistical analysis and modeling of ocean data, as part of the CBIOMES initiative of the Simons Foundation.

I also work on epidemiological forecasting, and have been part of the COVID19 response effort of the DELPHI group.

I am a statistician passionate about doing high quality data science research by bringing my statistical background to in-depth scientific collaborations. My PhD training was as a methods/theory-focused statistician, but through my post-doctoral research, I developed expertise as a committed applied statistician. I strongly believe that the best way to do high-impact data science is for statisticians to embed themselves deeply in the domain of application, perhaps more so than is traditional. I believe that in the growing interdisciplinary field of data science, a deep grounding in application is increasingly the methodological statistician’s most effective way of ensuring true and tangible impact on a domain.

Favorite quote
From The Universe and the Teacup by K.C. Cole: "So the improved Newtonion Universe must cease and grow cold," says Thomasina's tutor Septimus in Stoppard's Arcadia, suddenly realizing the import of his student's mathematical discovery that disorder is the inevitable direction of things. "Dear me."

"Yes," Thomasina replies, "we must hurry if we are going to dance."


B.S. in Statistics and Mathematics, University of Michigan - Ann Arbor (2013)
M.S. in Statistics, Carnegie Mellon University (2014)
Ph.D. in Statistics, Carnegie Mellon University (2018)

My PhD advisors are Max G'Sell and Ryan Tibshirani.

My ORCID ID is 0000-0003-0377-897X.


My research interests include oceanographic data analysis, epidemiological modeling/forecasting, changepoint detection, and selective inference.

Ocean data analysis

Ocean Mover’s Distance: Using Optimal Transport for Analyzing Oceanographic Data
Sangwon Hyun, Aditya Mishra, Christopher L. Follett Bror Jonsson, Gemma Kulk, Gael Forget, Marie-Fanny Racault, Thomas Jackson, Stephanie Dutkiewic, Christian L. Müller, and Jacob Bien
Submitted to Proceedings of the Royal Society A (2021)

Density-based Approaches to Evaluate Biogeochemical Models by Satellite Derived Chlorophyll
Bror F. Jönsson, Christopher L. Follett, Jacob Bien, Stephanie Dutkiewicz, Sangwon Hyun, Gemma Kulk, Gael Forget, Christian L. Müller, Marie-Fanny Racault, Christopher N. Hill, Thomas Jackson, Shubha Sathyendranath
Submitted to AGU Biogeochemical Cycles (2021)

A Flexible Bayesian Approach to Estimating Size-structured Matrix Population Models
Kristof Glauninger, Jann Paul Mattern, Greg Britten, John Casey, Sangwon Hyun, Zhen Wu, E. Virginia Armbrust, Zaid Harchaoui, François Ribalet
To appear, PLOS Computational Biology (2021)

Modeling Cell Populations Measured By Flow Cytometry With Covariates Using Sparse Mixture of Regressions
Sangwon Hyun, Mattias Rolf Cape, François Ribalet, Jacob Bien
Minor revision with Annals of Applied Statistics (2020)

Changepoint detection and selective inference

Post-selection Inference for Changepoint Detection Algorithms with Application to Copy Number Variation Data
Sangwon Hyun, Kevin Lin, Max G'Sell, Ryan Tibshirani
Biometrics (2020)

Exact Post-Selection Inference for the Generalized Lasso Path
Sangwon Hyun, Max G'Sell, Ryan Tibshirani
Electronic Journal of Statistics (2018)

Infectious disease modeling and forecasting

An Open Repository of Real-Time COVID-19 Indicators
Various authors
PNAS (2021)

Can Auxiliary Indicators Improve COVID-19 Forecasting and Hotspot Prediction?
Daniel J. McDonald, Jacob Bien, Alden Green, Addison J. Hu, Nat DeFries, Sangwon Hyun, Natalia L. Oliveira, James Sharpnack, Jingjing Tang, Robert Tibshirani, Valérie Ventura, Larry Wasserman, Ryan Tibshirani
PNAS (2021)

Nonmechanistic Forecasts of Seasonal Influenza with Iterative One-week-ahead Distributions
Logan Brooks, David Farrow, Sangwon Hyun, Ryan Tibshirani, Roni Rosenfeld
PLOS Computational Biology (2018)

A Human Judgment Approach to Epidemiological Forecasting
David Farrow, Logan Brooks, Sangwon Hyun, Ryan Tibshirani, Donald S. Burke
PLOS Computational Biology (2017)

Flexible Modeling of Epidemics with an Empirical Bayes Framework
Logan Brooks, David Farrow, Sangwon Hyun, Ryan Tibshirani, Roni Rosenfeld
PLOS Computational Biology (2015)

Risk of Dengue for Tourists and Teams during the World Cup 2014 in Brazil
Wilbert Van Panhuis, Sangwon Hyun, Kayleigh Blaney, Ernesto T. A. Marques Jr, Giovanini E. Coelho, João Bosco Siqueira Jr, Ryan Tibshirani, Jarbas B. da Silva Jr, Roni Rosenfeld
PLOS Neglected Tropical Diseases (2014)

Ongoing work

Environmental Drivers of Marine Cell Populations Measured by Flow Cytometry
Sangwon Hyun, Timothy Coleman, Francois Ribalet, Jacob Bien

Trend Filtering for Mixture Analysis of Flow Cytometry
Timothy Coleman, Sangwon Hyun, Francois Ribalet, Jacob Bien

Using Aspect Bernoulli Matrix Decomposition for Analyzing Metagenomic Data from the Ocean
Ryan Reynolds, Sangwon Hyun, Naomi Levine, Jacob Bien

Colloborative publications

Collaborative efforts to forecast seasonal influenza in the United States, 2015–2016
Various authors
Scientific Reports (2019)

Results from the Second Year of a Collaborative Effort to Forecast Influenza Seasons in the United States
Various authors
Epidemics (2018)

An Open Challenge to Advance Probabilistic Forecasting for Dengue Epidemics
Various authors
Proceedings of the National Academy of Sciences (2019)


JSM 2021 talk
'Optimal Transport for Analyzing Ocean Data'
Seattle, WA

Center for Computational Mathematics (CCM) talk (2020)
'Sparse Multivariate Mixture of Regressions Modeling for Flow Cytometry Data'
Flatiron institute, New York, NY

JSM 2020 talk
'Joint Modeling of Continuous Flow Cytometry Data With Environmental Covariates'
(Session on Ocean Statistical Methodology and Application)
Philadelphia, PA

Ecosta2021 talk
'Sparse Gaussian mixture regression with application to flow cytometry data analysis'
Hong Kong (Held virtually)

SLDS 2020 invited talk
'Joint Modeling of Continuous Flow Cytometry Data With Environmental Covariates'
Newport Beach, CA (Canceled)

CMStatistics 2019 invited talk
'Joint Modeling of Continuous Flow Cytometry Data With Environmental Covariates'
London, UK

SDSS 2018 poster session
'Knockoff variable selection for changepoint detection'
Reston, VA

JSM 2017 talk
'On changepoint inference using Binary Segmentation Inference'
Session on New Developments in Time Series Analysis and Change Point Detection
Baltimore, MD

Invited talk
'Forecasting of dengue risk in 2016 for Southeast Asia'
2016 Southeast Asia regional meeting on climate and dengue forecasting
Kuala Lumpur, Malaysia

2016 AAAS annual meeting poster
'Epidemiological Forecasting with Statistical Models'
Best student poster (Technology, Engineering and Math)
Washington DC

JSM 2016 talk
`On changepoint inference after selection'
(Session on modern inference for selected models)
Chicago, IL

INFORMS 2016 talk
`On changepoint inference after selection'
Session on Detection of Structure and Anomalous Patterns in Data
Nashville, TN


I have taught the following two courses:
36-220 Engineering Statistics and Quality Control (Summer 2015)
36-217 Probability Theory and Random Processes (Summer 2016)

I was also teaching assistant for several statistics courses at the undergraduate and graduate level:
36-217 Probability theory and random processes,
36-225 & 36-226 Mathematical statistics sequence for undergraduates,
36-350 Undergraduate statistical computing,
36-402 Undergraduate advanced data analysis,
36-617 Applied linear models,
36-618 Topics in statistics,
36-725 Convex Optimization
36-750 Statistical Computing.

Teaching statistics and data science
I'm interested in how to improve teaching in statistics and data science. I was involved in the CMU statistics department's effort to revamp and improve the undergraduate curriculum, starting with introductory courses. I co-organized two seminar courses, both named 36-764, in which we discussed literature on learning, created a repository of assessment test questions (focused on testing conceptual understanding), and interviewed students in order to probe their misunderstanding and improve test questions. See more details in the group's website.

Assessment of Student Learning and Misconception Identification in Intro Statistics
Poster presentation, Eberly Teaching and Learning Summit 2017, Pittsburgh, PA

Identifying misconceptions of introductory data science using a think-aloud protocol
Poster presentation, eCOTS 2018, Pittsburgh, PA


Work Experience
In the summer of 2015, I went to NASA Research to do some top secret stuff (Gaussian process modeling of aeronautics data).

In the past, I've have worked in statistical consulting at the University of Michigan (CSCAR), and have interned at a finance firm.

And more
I enjoy playing games that involve bouncy spheres, bicycling, weightlifting, reading books, and cooking.
I'm an avid novice user of Emacs and Org mode. I am a proficient in high performance computing using Slurm Workload Manager.
Fun fact: I served 2 years in the military at the Joint Security Area, at the border of North/South Korea.

Author: Sangwon Hyun

Created: 2022-01-08 Sat 00:48