The availability of massive amounts of data and the computing firepower to process and analyze it are critical to 21st century biomedical research, and the COVID-19 pandemic has accelerated the unprecedented need for robust information infrastructure.
Because we don’t have any historical data on COVID, and there is a limitation to the number of cases in a single health system, researchers can’t rely on data collected only from their home institution to make solid scientific recommendations. This point is particularly important when looking at how COVID affects subpopulations, such as those with specific health conditions and variations in the response to the virus due to age, where the volume of patients affected are even smaller.
Researchers at Wake Forest School of Medicine and their colleagues around the world are relying on combining their efforts to pioneer new data networks to fast-forward scientific understanding of COVID and how to combat it. Data networks are computer systems across many locations and institutions that are connected together. They allow resources to be shared, such as access to centrally stored software and access to a central database. COVID research is producing imaginative networking solutions that will contribute to bringing the pandemic under control.
A wide variety of data platforms, the individual software programs used by specific systems, exist throughout the biomedical research community, and one major focus of the current COVID-related information infrastructure efforts is to streamline and coordinate the data so researchers in different organizations can all access the same data at the same time and in the same format.
“The main thing is to get everybody on the same page from a data perspective,” explains Brian Ostasiewski, informatics program director for Wake Forest’s Clinical & Translational Science Institute (CTSI). Informatics is the science of how to use data, information and knowledge to improve human health and the delivery of health care services. Ostasiewski is a leading informatics practitioner in his role at the CTSI, and the demands of COVID research infrastructure are immediate and intensive.
“There are different data standards for each type of data, and a major advantage of the networks we participate in is that by consolidating large amounts of data from many different sources, we can significantly increase the statistical power of COVID studies and thus increase the value of the new knowledge we gain about this devastating pandemic,” he says.
The Informatics group within the CTSI has been participating in these Clinical Research Networks (CRNs) for almost a decade, dating back to the original SHRINE National Pilot that first validated a software infrastructure for federating de-identified data. Leveraging this experience, Wake Forest is currently contributing to the development of four COVID data infrastructure programs.
STAR PCORI
STAR PCORI stands for Stakeholders, Technology, And Research CRN and Patient-Centered Outcomes Research Institute. The STAR CRN is a partnership of eight health systems throughout the U.S. It is committed to improving health care delivery by supporting health systems research, patient-centered outcomes research and pragmatic clinical trials. The STAR CRN provides data and research infrastructure to support these three objectives in research groups across the country.
The network is advancing more complete and comprehensive data sets for research by providing linkage of clinical data from its participating sites to complementary data sources, including both Medicare and Medicaid claims, as well as claims from private health plans and state health departments. As a result, the data that researchers can access is more complete, more robust, and more immediately useful in their efforts to closely monitor trends in the pandemic. One of its programs, the PCORnet mini-CDM, began in April in response to the COVID pandemic. It is a data warehouse focused on COVID patients, with more frequently updated data – twice weekly instead of quarterly. “This type of data repository allows us to identify COVID patients and aggregate their data a lot faster than usual,” says Ostasiewski.
Following an initial proof of concept initiative, STAR PCORI is now running two more permanent projects using the infrastructure. “One is a surveillance project to keep track of how people are testing in different places in general,” Ostasiewski says. “The other is a particular co-morbidity, focusing on a disease involved with COVID that some people experience.”
N3C
N3C, the National COVID Cohort Collaborative, is a partnership among the Clinical and Translational Science Awards (CTSA) hub institutions, of which Wake Forest is one. It exists to centralize de-identified COVID data, and has established a data enclave, a work space to which researchers can request access, where they can do all of their work within the confines of a secure environment.
“It’s really handy because all of our researchers here and at all of the other participating institutions get access to the resource and can work with the data and share different analysis techniques without worrying about security issues that could arise from individual studies exchanging data by varying means,” Ostasiewski explains.
COMBATCOVID
COMBATCOVID, the Consortium for Multisite Biomedical Analytics and Trials on COVID-19, is a network similar to N3C. It is being coordinated by New York University, with Wake Forest poised to contribute data. Once funding for it is secured and it is launched, it will bring together electronic health records (EHR) data from multiple participating CTSA institutions into a shared centralized database. Biospecimens data from COVID-19 patients will also be shared and linked to the respective EHR data.
Although it is similar to N3C, COMBATCOVID is distinguished by its focus on answering more specific questions about COVID illness, rather than providing a generalized research infrastructure. These questions aim to discover trends among areas such as risks, treatments, and outcomes.
4CE
Wake Forest is also participating in an international consortium for EHR data-driven COVID-19 studies called 4CE, the Consortium for Clinical Characterization of COVID-19 by EHR. The goal of that effort is to inform doctors, epidemiologists and the public about COVID-19 patients with data acquired through the health care process. The consortium consists of 96 hospitals across five countries. The group has focused on temporal changes in key laboratory test values. Its first publication came in August 2020, reporting on more than 27,000 COVID cases and more than 187,000 test results. The large volume of data allowed a framework to capture the trajectory of the disease in patients and their responses to interventions.
Generally, these initiatives are not inherently different from the data studies and cooperative agreements that biomedical researchers have been establishing and using for years, but the urgency for data in the pandemic has accelerated the pace of acquiring and crunching COVID data and expanded the breadth of the collaborations. Each of these networks and the many more in various stages of conception serve a unique purpose in the quest for knowledge.
That knowledge will emerge in several forms, from scientific publications to medical community alerts. With new abilities and formats to aggregate herculean amounts of data, information is driving the crusade to conquer COVID and future pandemics.
Learn more about CTSI at Wake Forest.