Primary Research Interests

1. Precision Phenotyping of Complex, Heterogenous Conditions

The presence of heterogeneity in complex conditions indicates the potential for both differing underlying biology / disease etiologies as well as environmental effects. We aim to use machine learning methods to better characterize phenotypes in order to exploit this heterogeneity to better understand biology and/or environmental factors of disease.

2. Safe and Effective Deployment of Machine Learning in the Clinic

The rapid advancement of AI tools in clinical care demands robust strategies in implementation sciences, effective deployment, enhanced physician-computer interaction, and thorough impact assessment. As the creation of cutting-edge models becomes more prevalent, there's a tendency to overvalue their positive effects without adequately weighing the potential drawbacks, such as physician burnout and shifts in health equity. Our group is committed to devising tactics to refine models for specific healthcare environments, ensuring their seamless integration into clinical routines, and meticulously gauging their efficacy. Without rigorous assessments, including strategies like randomization, we cannot be sure these tools are leading to better healthcare. We are developing methods to address these and related problems.

Select Publications

See a full list on Google Scholar

Synthetic Data Distillation Enables the Extraction of Clinical Information at Scale

Elizabeth Geena Woo*, Michael C Burkhart*, Emily Alsentzer, Brett Beaulieu-Jones Preprint (2024)
*co-first authors

Our team demonstrated that synthetic data distillation can fine-tune smaller, open-source large-language models (LLMs) to achieve performance similar to larger models in extracting clinical information. This smaller model outperforms its base version and sometimes even the larger model. This approach will enable more scalable and cost-efficient clinical information extraction, improving tasks like patient phenotyping.

Disease progression strikingly differs in research and real-world Parkinson’s populations

Brett K Beaulieu-Jones, Francesca Frau, Sylvie Bozzi, Karen J Chandross, M Judith Peterschmitt, Caroline Cohen, Catherine Coulovrat, Dinesh Kumar, Mark J Kruger, Scott L Lipnick, Lane Fitzsimmons, Isaac S Kohane, Clemens R Scherzer. npj Parkinson's disease (2024)

Our team compared Parkinson's disease (PD) progression across research and real-world populations, utilizing real-world data (RWD) and large language models for detailed characterization. It finds that patients in real-world settings are diagnosed later and start treatment later than those in research populations, with faster motor and cognitive progression in real-world cohorts. The study highlights the differences between research and real-world populations, emphasizing the need to use diverse data sources and account for biases in clinical trial design and analyses.

Predicting seizure recurrence after an initial seizure-like episode from routine clinical notes using large language models: a retrospective cohort study

Beaulieu-Jones, Brett K., Mauricio F Villamar, Phil Scordis, Ana Paula Bartmann, Waqar Ali, Benjamin D Wissel, Emily Alsentzer, Johann de Jong, Arijit Patra, Isaac Kohane. Lancet Digital Health (2023)

Our team demonstrated machine learning models, particularly large language models pre-trained on domain-specific data, are highly effective in predicting seizure recurrence in children after an initial seizure-like event. These models outperformed traditional structured data approaches and indicate that clinical notes contain significant information useful for the prediction of seizure recurrence.

Phenotypic overlap between rare disease patients and variant carriers in a large population cohort informs biological mechanisms

Lane Fitzsimmons, Undiagnosed Diseases Network, Brett Beaulieu-Jones*, Shilpa Nadimpalli Kobren* Preprint (in press) (2024)
*co-corresponding

The biological mechanisms causing extreme symptoms in rare disease patients are complex and often elusive. This study analyzes genotype and phenotype data from the UK Biobank to understand the pathways leading to seizures in undiagnosed patients from the Undiagnosed Diseases Network. By examining milder, related symptoms in UK Biobank participants with similar genetic variants, the study aims to shed light on the molecular mechanisms behind these rare conditions

Characterizing the connection between Parkinson's disease progression and healthcare utilization

Lane Fitzsimmons, Francesca Frau, Sylvie Bozzi, Karen Chandross, Brett Beaulieu-Jones Preprint (2024)

This study analyzed Parkinson's disease (PD) progression by examining clinical events across different Hoehn & Yahr (H&Y) stages extracted using natural language processing. It provides a view of healthcare utilization at different H&Y stages, models expected H&Y progression and demonstrates the potential value for a therapeutic which would slow progression.

Machine learning for patient risk stratification: standing on, or looking over, the shoulders of clinicians?

Beaulieu-Jones, Brett K., William Yuan, Gabriel A. Brat, Andrew L. Beam, Griffin Weber, Marshall Ruffin, and Isaac S. Kohane. NPJ digital medicine (2021)

We trained deep learning models on clinician-initiated administrative data for 42.9 million admissions and found performance close to full EMR-based benchmarks for inpatient outcomes. These models rely heavily on clinical behavior, and should not be used for individualized clinical decision making. For meaningful clinical guidance, models should outperform these benchmarks using data sources that capture patient state rather than clinician actions (i.e., looking over their shoulder).

Examining the use of real‐world evidence in the regulatory process

Beaulieu‐Jones, Brett K., Samuel G. Finlayson, William Yuan, Russ B. Altman, Isaac S. Kohane, Vinay Prasad, and Kun‐Hsing Yu. Clinical Pharmacology & Therapeutics (2020)

The 21st Century Cures Act requires the US FDA to create guidelines for using real-world evidence (RWE) in the regulatory process. While RWE has led to crucial medical findings, it faces challenges in proving treatment efficacy compared to randomized controlled trials. In this review article, we summarized the advantages and limitations of RWE, identified the key opportunities for RWE, and pointed the way forward to maximize the potential of RWE for regulatory purposes.

Privacy-preserving generative deep neural networks support clinical data sharing

Beaulieu-Jones, Brett K., Zhiwei Steven Wu, Chris Williams, Ran Lee, Sanjeev P. Bhavnani, James Brian Byrd, and Casey S. Greene. Circulation: Cardiovascular Quality and Outcomes (2019)

Our team has developed a method using deep neural networks to generate synthetic data that closely resembles real participants from the SPRINT trial, ensuring privacy while maintaining the utility of the data for research. This technique allows for the sharing of clinical data with researchers for secondary analysis without risking patient privacy.

Reproducibility of computational workflows is automated using continuous analysis

Beaulieu-Jones, Brett K., and Casey S. Greene. Nature biotechnology (2017)

Continuous analysis is a workflow that integrates Docker container technology with continuous integration to automatically rerun computational analyses upon any changes in source code or data. This approach facilitates effortless reproducibility of research results for peers and provides an audit trail for data analyses, enhancing transparency and reliability in scientific studies.

Semi-supervised learning of the electronic health record for phenotype stratification

Beaulieu-Jones, Brett K., and Casey S. Greene. Journal of biomedical informatics (2016)

We developed a semi-supervised learning technique to improve the extraction of phenotypes from electronic health records, aiding in the identification of disease subtypes and genetic associations. This method has shown promise in enhancing classification accuracy and predicting patient outcomes, even with limited high-quality data.

People

We're always looking to add talented & curious students, post-docs, progammers and data scientists.

Brett Beaulieu-Jones, PhD

Assistant Professor

Sections of Biomedical Data Science & Genetic Medicine

Ming-Chieh (Eddie) Liu, MS

Research Associate

Research associate examining the impact and potential of diverging predictions between different large-scale clinical AI models for the same patients.

Bashar Ramadan, MBBS

Attending Physician

Bashar provides the clinical expertise for numerous projects in the group - especially in evaluating divergence between predictive models.

David Chen, MS

Data Scientist

David is working on "Argos" a project developing tools and methods to make the evaluation and monitoring of Clinical AI easier.

Anna Zink, PhD

Principal Researcher (Center for Applied AI - Booth)

Anna is working closely with the lab on a multi-site project to evaluate the performance and fairness of machine learning models in the clinic and to develop scalable frameworks for this going forward.

Geena Woo, BA

PhD Candidate (co-advised with Gilad Lab)

Geena is using machine learning and causal inferences approaches to better understand the impact of maternal exposures on their children.

Michael Burkhart, PhD

Senior Data Scientist

Michael is working on multiple projects to improve clinical information extraction and to better understand foundational clinical AI models.

Rowan Hussein, BA

Currently: Med Student

Rowan is working to harmonize governance frameworks in order to provide a practical and actionable roadmap for health systems.

Tom Statchen, BS

Currently: Med Student

Tom is working to evaluate the performance and failure cases of deployed generative AI systems in the health system.

Sahil Sethi, BS

Currently: Med Student

Sahil is working to build predictive models to support the operational side of hospitals (e.g., unexpected payor denials).

Krish Shah, BS

Currently: MS Student - Biomedical Data Science

Krish is developing machine learning and statistical methods to better understand disease progression.

Alumni, Close Collaborators & Thesis Examinees

Sylvia Edoigiawerie, PhD

Currently: Completing the MD portion of her MD/PhD at UChicago

Sylvia completed her PhD with Dr. Naoum Issa and collaborates closely with the lab for her projects developing methods to improve seizure detection of neonates.

Lane Fitzsimmons, BS

Lane was a research associate and continues to be a collaborator. She is currently a Med student at the Renaissance School of Medicine at Stony Brook University.

Temidayo Adeluwa, MS

Temi is a PhD Candidate in the Genetics, Genomics and Systems Biology Program at UChicago with Dr. Haky Im and I'm the chair of his thesis committee.

Jessica De Freitas, PhD

I served as Jess's external examiner for her thesis on "Leveraging Electronic Health Records and Electrocardiograms for Disease Phenotyping" in 2023. She is currently a Machine Learning Scientist at Tempus Labs, Inc.

Yidi Huang, MS

Yidi was a research associate and is currently a PhD Student in the program I graduated from at the University of Pennsylvania. (Genomics and Computational Biology)

Mohammed Saqib, BS

Mohammed was a research associate and is currently a PhD Student at the University of Pennsylvania, where he's working on neuroimaging.

Open Positions

We are actively recruiting multiple students, postdocs and/or data scientists. Get in touch if you're driven to work on these problems, or propose your own ideas tied to our research interests. When reaching out it is incredibly helpful if you specify what led you to reaching out and to tell me about how our interests overlap. I'm open to creative extensions of the research interests listed above (e.g., if you are interested in a different area but would be using similar methods etc.).

Active / Recent Funding and Support

We are extremely grateful for the organizations who support and have supported our work!

Get in touch