Predicting COVID19 Risk: A Retrospective Cohort Study using the National Covid Cohort Collaborative (N3C)

Poster
Authors:Barros, AndrewUniversity of Virginia Adams, Jason, Data ScienceUniversity of Virginia Link, Robert, Data ScienceUniversity of Virginia
Abstract:

RATIONALE
We aimed to create and critically examine models to predict the risk of hospitalization at the time of COVID19 diagnosis.

PARTICIPANTS
We used the N3C dataset to identify patients with COVID19 from the start of the pandemic until 5/11/2023. We excluded patients younger than 16 years, patients contributed by a data partner with >10% missing geographic data, and patients who were hospitalized on the calendar day of their test.

OUTCOMES
Our primary outcome was hospitalization within the 16 days following a COVID19 diagnosis.

MEASURES
We included demographics, comorbid conditions, medication exposures, and zip-code level social determinants of health.

MODEL DEVELOPMENT
We compared two methods: a gradient boosted tree (GBT) ensemble model (LightGBM, Redmond, Washington) and machine learning optimized sparse scorecard model (FasterRISK, Durham, North Carolina). We used 80% of the data for training and 20% of the data for model validation.

RESULTS
Our final training cohort consisted of 3.6 million patients with a 1.7% rate of hospitalization. Five data partners were excluded for >10% missing patient geographic data. The GBT model had a validation set AUROC of 0.773 and a brier skill score of 0.021. The sparse decision rule performance was slightly lower with a validation AUROC 0.735 and a brier skill score of 0.013. Performance of the GBT model varied across subgroups. Across gender the model had a validation set AUROC of 0.770 in men and 0.777 in women. Across patient self-reported race, the model had an AUROC on 0.787 in white subjects and 0.748 in Black subjects.

CONCLUSIONS
Hospitalization can be predicted with reasonable discrimination and modest overall accuracy. Most of the identified risk factors comport with accepted risk factors. Identified social determinants are likely proxies for latent causes such as social vulnerability, structural racism, and historical injustice that continue to affect the health of people today.

Keywords:
iTHRIV, COVID, N3C
Language:
English
Publisher:
University of Virginia
Published Date:
December 06, 2023
Sponsoring Agency:
integrated Translational Health Research Institute of Virginia (iTHRIV)