-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Issue Description
Difficulty: Intermediate
Time: 24 hours
Description:
This issue is to write a blog post introducing patient-level prediction (PLP) in observational health research, with a focus on workflows enabled by the JuliaHealth ecosystem. The post will explain the basics of observational health and the OMOP CDM, walk through constructing cohorts and setting up a PLP pipeline, and demonstrate how JuliaHealth tools like HealthBase.jl, OHDSIAPI.jl and OHDSICohortExpressions.jl can be used with general Julia ecosystem packages such as MLJ.jl and FunSQL.jl can be combined for reproducible, scalable prediction modeling.
The goal is to showcase how Julia can support standardized, open-source methodologies for observational health research while providing clear and practical examples.
Requirements
- Introduce observational health research and its role in understanding real-world patient data.
- Explain phenotype definitions and the importance of reproducibility.
- Introduce the OMOP CDM and its purpose in standardizing health data.
- Explain patient-level prediction (PLP), drawing on the OHDSI framework.
- Present an example research question (e.g., hypertension → diabetes progression).
- Demonstrate how to:
- Initialize a new observational health study using
HealthBase.jl. - Download cohort definitions from OHDSI ATLAS using
OHDSIAPI.jl. - Translate cohort definitions to SQL using
OHDSICohortExpressions.jl. - Run SQL against an OMOP CDM database with
FunSQL.jlandDBInterface.jl.
- Initialize a new observational health study using
- Show how to extract patient-level features from multiple OMOP CDM tables (e.g.,
condition_occurrence,drug_exposure,procedure_occurrence,observation,measurement,person) within a defined lookback window. - Demonstrate attaching outcome labels using target and outcome cohorts while ensuring proper temporal ordering.
- Preprocess features for modeling: handle missing values, standardize numeric features, and encode categorical variables.
- Split data into training and test sets for evaluation.
- Train and evaluate multiple models (e.g., logistic regression with L1 regularization, random forest, XGBoost) using MLJ.jl and compute performance metrics such as AUC.
Expected Outcomes
- A blog post draft introducing PLP in observational health research.
- Practical Julia code examples showing cohort construction, feature extraction, and execution.
- A reproducible workflow guide that readers can adapt for their own data.
- References to foundational work (e.g., Reps et al. 2018) and links to JuliaHealth resources.
- End-to-end demonstration of constructing a binary classification dataset from OMOP CDM cohorts, including covariates and outcome labels.
- Example preprocessing and modeling steps implemented in Julia with MLJ.jl, showing how predictive performance can be compared across different algorithms.
Notes
Reference Materials
- OHDSI PLP framework: Reps, J. M., Schuemie, M. J., Suchard, M. A., Ryan, P. B., Rijnbeek, P. R., & Madigan, D. (2018). Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. JAMIA, 25(8), 969–975. https://doi.org/10.1093/jamia/ocy032
- OHDSI Common Data Model
- ATLAS Demo Tool
- JuliaHealth GitHub