Benchmarking for Comparison of Statistical and Machine Learning Methods
The focus of Dr. Speiser’s work is conducting benchmarking studies to compare statistical and machine learning methods using many real-world datasets. The results are aggregated to make data-driven recommendations for optimal methodology in efforts to improve rigor in clinical and translational research. The first benchmarking study, funded by a KL2 grant from the Wake Forest CTSI, focused on comparing random forest variable selection methods for categorical outcomes and has been cited nearly 1000 times (PMID32968335). The team is currently working on a similar study comparing random forest variable selection methods for continuous outcomes, funded by a pilot grant from the Wake Forest CTSI Biostatistics Epidemiology & Research Design Program. These tools are used with longitudinal, repeated measures and include applications in aging that use predictors and outcomes in older adults collected from year to year. The team is developing an R package to share code for processing and harmonizing longitudinal data from studies of aging, as well as synthetic datasets derived from these studies to facilitate benchmarking analyses. The photo is from the Joint Statistical Meetings in August 2022 for an invited session (and packed house!) entitled Practical Recommendations for Prediction Modeling that Advance Innovation with speakers Jaime Speiser, Nate O’Connell, Byron Jaeger, and Joe Rigdon.
Collaborators: Jaime Speiser, Mike Miller, Eddie Ip, Janet Tooze, Nate O’Connell, Byron Jaeger, Garrett Bullock, all BDS; Kate Callahan and Denise Houston, Gerontology and Geriatric Medicine; and David Miller, Implementation Science.