Skip to main content

 Q4-2024 Research Roundup

Marie Davidian 

Title:  Next-Generation SMARTs for Discovery and Evaluation of Sequential Cancer Therapeutic 

Period: 2023–2028

Source: National Institutes of Health R01 CA280970

Strategies PI: E. Laber, Duke University

Description:   Treatment of cancer is an ongoing process during which clinicians make a series of decisions at critical points in a patient’s disease by synthesizing baseline and evolving patient information with the goal of optimizing expected long-term patient benefit. An evidence-based approach to optimizing decision making is to study entire sequential treatment strategies, which can be formalized as treatment regimes. A treatment regime is a sequence of decision rules, each of which is associated with a key decision and uses accrued information on a patient to select a treatment option from among the feasible options for the patient. An optimal regime is one that maximizes expected patient benefit in the population. Sequential multiple assignment randomized trials (SMARTs), in which subjects are randomized at each of several key decision points to feasible treatment options based on their accrued information, are ideally suited to discovery and evaluation of treatment regimes, and a number of SMARTs in cancer have been conducted. At the same time, great innovations have been made in cancer clinical trials; platform and response-adaptive trials that seek to optimize treatment for both participants and future patients and that allow for incorporation of new options and elimination of ineffective options are increasingly being conducted. The potential for SMARTs to advance optimal sequential decision making in cancer treatment thus requires a next generation of design and analysis methods for SMARTs that incorporate similar innovations in the more complex setting of multiple decisions and repeated randomization of subjects and that address current cancer research priorities. The goal of this project is to develop a comprehensive statistical framework for next-generation SMARTs in cancer research, including  methods for design and analysis of platform SMARTs that use response-adaptive randomization to favor optimal treatment assignments and allow introduction of new treatments and discontinuation of ineffective treatments at any decision point; methods for design and analysis of SMARTs involving multi-component/modal treatments at each decision point; methods that merge a SMART with a micro-randomized trial to allow joint optimization of sequential therapeutic decisions and selection of supportive mHealth interventions that address the adverse consequences of cancer therapy; and methods for interim analysis of SMARTs, for which little methodology is available.   My role as Co-Investigator is to work on development of the methodology and an associated software package.

Emily Hector 

Title: New data integration approaches for efficient and robust meta-estimation, model fusion and transfer learning (PI Hector). 

Period: 2024-2029

Funding Agency: NSF DMS 2337943 

Role: PI.

Description: Statistical science aims to learn about natural phenomena by drawing generalizable conclusions from an aggregate of similar experimental observations. With the recent “Big Data” and “Open Science” revolutions, scientists have shifted their focus from aggregating individual observations to aggregating massive publicly available datasets. This endeavor is premised on the hope of improving the robustness and generalizability of findings by combining information from multiple datasets. For example, combining data on rare disease outcomes across the United States can paint a more reliable picture than basing conclusions only on a small number of cases in one hospital. Similarly, combining data on disease risk factors across the United States can distinguish local from national health trends. To date, statistical approaches to these data aggregation objectives have been limited to simple settings with limited practical utility. In response to this gap, this project develops new methods for aggregating information from multiple datasets in three distinct data integration problems grounded in scientific practice. The developed approaches are intuitive, principled and robust to substantial differences between datasets, and are broadly applicable in medical, economic and social sciences, among others. Among other applications, the project will deliver new tools to extract health insights from large electronic health records databases. The project will support undergraduate and graduate student training, course development, and the recruitment and professional mentoring of under-represented minorities in statistics. Further, the project will impact STEM education through a data science teacher training program in underserved communities.

Srijan Sengupta

Title: Statistical uncertainty quantification for large language knowledge graphs via conformal prediction

Period: 01/24-12/24

Funding agency: National Security Agency (via Laboratory for Analytic Sciences)

Role: PI

A knowledge graph (KG) is a graph-based representation of information about various entities (such as individuals, organizations, locations, or concepts) and their relationships. The transformation of unstructured language data into a structured KG facilitates downstream tasks like information retrieval, association mining, question answering, and machine reasoning. A celebrated example is the Google Knowledge Graph of 800 billion facts on 8 billion entities, from which Google serves relevant information in an infobox beside its search results. KG construction involves three steps: (i) Named Entity Recognition (NER), i.e., identifying the mention of named entities; (ii) Named Entity Disambiguation (NED), i.e., determining the identity of a named entity from context (e.g., “Jordan” could mean the country, the basketball player, or someone else), and (iii) Relationship Extraction (RE), i.e., determining the relationships among these entities. These steps typically employ black-box machine learning tools that do not produce statistically interpretable uncertainty quantification metrics, making it impossible to objectively assess how reliable the KG is. In collaboration with Karl Pazdernik, our team is working on developing statistically principled uncertainty quantification techniques for KGs by modeling how uncertainty propagates through the three steps. Conformal Prediction is a highly flexible statistical framework that produces valid measures of uncertainty for individual predictions in a distribution-free manner. This robustness to distributional assumptions makes conformal prediction particularly suitable for black-box machine learning models. This work is supported by a one-year grant from the Laboratory for Analytical Sciences.

Jung-Ying Tzeng

Title: Prenatal Stress and Diet, and the Fetal Epigenome

Period: 09/2022 – 03/2027

Funding agency: NIH R01MD017696 (Hoyo)

Goals: The goal of this project is to identify early-acquired epigenetic mechanisms that link maternal social stress to metabolic dysfunction in children, and determine if a Mediterranean diet mitigates these effects, leveraging existing cohort resources.