N3C Logic Liaison Templates: Transforming Complex Health Data into Analytics Ready Tables

Poster
Authors:Zhou, Andrea, MD-THRV Trans Hlth Res Inst VaUniversity of Virginia French, Evan, C. Kenneth and Dianne Wright Center for Clinical and Translational ResearchVirginia Commonwealth University Moffitt, Richard, Hematology and Medical OncologyEmory University Loomba, Johanna, MD-THRV Trans Hlth Res Inst VaUniversity of Virginia
Abstract:

The National COVID Cohort Collaborative (N3C) is one of the largest collections of clinical data for COVID-19 research in the United States [1]. It has data from more than 20 million patients stored in the cloud-based N3C Data Enclave, which promotes accessibility. However, security constraints of the platform preclude the use of open source Observational Health Data Sciences and Information (OHDSI) software for analyzing observational patient-level data, leaving users with a limited toolset (primarily Spark R, PySpark, or SQL) to transform the billions of rows of data into a usable form for their specific research question. Though harmonized into a single common data model, there are missing elements both within and across data contributing partners as a result of differing data collection protocols across departments and sites, respectively.
N3C Logic Liaisons use their experience with Enclave tools and familiarity with Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to support N3C clinicians, analysts, and data scientists [2]. We create code templates that provide over 120 commonly used variables (including demographics, common comorbidities, observation periods, and data extraction details) along with a method for quickly adding custom elements. These variables are generated through Code Workbooks and Templates that utilize specific Concept Sets (lists of key variables from standard vocabularies), that identify and extract data to answer research questions. 48 different projects use the COVID+ Patients fact table template since its release in October 2021, and 36 different projects use the All Patients fact table template since its release in May 2022. After being documented and peer-reviewed for quality, templates are published to the N3C Knowledge store where they can be accessed by the entire N3C community. We also develop ancillary templates to augment, examine, and clean the data prior to analysis.
The two Logic Liaison main fact table templates have been used 339 times across all Enclave projects, eliminating redundant efforts and accelerating project timelines. These templates aligned COVID researchers’ definitions of COVID-associated hospitalization, SARS-COV2 vaccination, reinfection, etc. while supporting customization of these complex variables. Given the successes of these data quality, fact table, and ancillary fact templates, we are expanding the breadth of these templates as the data in the Enclave grows to include more sources such as linked Medicare and Medicaid claims data. We will also repurpose all disease agnostic elements of our templates for use in the N3C Clinical Pilot, demonstrating that they can be used to accelerate any research done using the OMOP CDM.

Keywords:
biomedical informatics, electronic health record data, code templates, OMOP
Language:
English
Publisher:
University of Virginia
Published Date:
November 14, 2023
Sponsoring Agency:
integrated Translational Health Research Institute of Virgina (iTHRIV)