Skip to content

[WORKFLOW] Patient Level Prediction with Observational Health Tooling #1

@TheCedarPrince

Description

@TheCedarPrince

Issue Description

Difficulty: Intermediate
Time: 24 hours

Description:
This issue is to write a blog post introducing patient-level prediction (PLP) in observational health research, with a focus on workflows enabled by the JuliaHealth ecosystem. The post will explain the basics of observational health and the OMOP CDM, walk through constructing cohorts and setting up a PLP pipeline, and demonstrate how JuliaHealth tools like HealthBase.jl, OHDSIAPI.jl and OHDSICohortExpressions.jl can be used with general Julia ecosystem packages such as MLJ.jl and FunSQL.jl can be combined for reproducible, scalable prediction modeling.

The goal is to showcase how Julia can support standardized, open-source methodologies for observational health research while providing clear and practical examples.

Requirements

  • Introduce observational health research and its role in understanding real-world patient data.
  • Explain phenotype definitions and the importance of reproducibility.
  • Introduce the OMOP CDM and its purpose in standardizing health data.
  • Explain patient-level prediction (PLP), drawing on the OHDSI framework.
  • Present an example research question (e.g., hypertension → diabetes progression).
  • Demonstrate how to:
    • Initialize a new observational health study using HealthBase.jl.
    • Download cohort definitions from OHDSI ATLAS using OHDSIAPI.jl.
    • Translate cohort definitions to SQL using OHDSICohortExpressions.jl.
    • Run SQL against an OMOP CDM database with FunSQL.jl and DBInterface.jl.
  • Show how to extract patient-level features from multiple OMOP CDM tables (e.g., condition_occurrence, drug_exposure, procedure_occurrence, observation, measurement, person) within a defined lookback window.
  • Demonstrate attaching outcome labels using target and outcome cohorts while ensuring proper temporal ordering.
  • Preprocess features for modeling: handle missing values, standardize numeric features, and encode categorical variables.
  • Split data into training and test sets for evaluation.
  • Train and evaluate multiple models (e.g., logistic regression with L1 regularization, random forest, XGBoost) using MLJ.jl and compute performance metrics such as AUC.

Expected Outcomes

  1. A blog post draft introducing PLP in observational health research.
  2. Practical Julia code examples showing cohort construction, feature extraction, and execution.
  3. A reproducible workflow guide that readers can adapt for their own data.
  4. References to foundational work (e.g., Reps et al. 2018) and links to JuliaHealth resources.
  5. End-to-end demonstration of constructing a binary classification dataset from OMOP CDM cohorts, including covariates and outcome labels.
  6. Example preprocessing and modeling steps implemented in Julia with MLJ.jl, showing how predictive performance can be compared across different algorithms.

Notes

Reference Materials

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationworkflowWorkflow supported by JuliaHealth

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions