top of page

Research Themes

Lungs Sketch

Second Primary Lung Cancer

The number of lung cancer survivors has been increasing with early detection through screening and therapeutic advances. These lung cancer survivors now have a 4-6 times higher risk of developing second primary lung cancer (SPLC) compared to the risk of developing initial primary lung cancer (IPLC) in the general population.

      What are the risk factors for SPLC? Who has the increased risk of SPLC among lung cancer survivors? What type of interventions we can provide them?

Image by CDC

Disparity in Lung Cancer Screening

The National Lung Screening Trial (NLST) has demonstrated that annual LDCT screening reduces lung cancer-specific mortality by 20%. Thus, the national guidelines by the U.S. Preventive Services Task Force (USPSTF) recommend annual LDCT screening for high-risk individuals based on the cumulative smoking exposures. However, the uptake of LDCT screening remains very low around 6%. Who is less likely to undergo screening? What is the barrier? How do we improve uptake of LDCT screening?

          Furthermore, the national guidelines based on smoking exposure have been criticized due to the possibility to miss high-risk individuals with non-smoking risk factors (e.g., family history), especially among racial and ethnic minorities. What is the current status of racial disparities in lung cancer screening? What can we do to reduce the disparity?

Smiling Senior Couple

Frailty among Cancer Survivors

Lung cancer is a disease of the elderly with a median age of diagnosis of 71 years. With breakthroughs in therapeutics over the last decade, we have seen a dramatic increase in the total number of lung cancer survivors, particularly among the elderly. Increased survival and population aging poses a challenge to delivering optimal ongoing care. 

        The problem of population aging stems from the gap between the life-span vs. health-span. The frailty period is the time when the elderly fall between the life-span and health-span, characterized by physiological, mental, and physical declines, all of which lead to an impaired health-related quality of life.

         Cancer survivors are known to be frailer for a longer period of time and in a more severe manner than the general population of the same age. What causes accelerated and severe frailty? Long-term harms of a certain cancer treatment? Cancer patients’ financial toxicity? Persistent pains and symptoms burdens? What can we do to prevent or mitigate frailty for healthy aging and better quality of life in older lung cancer survivors?

Data Sources

Computer with Graph

The Multiethnic Cohort (MEC)

The MEC is a large population-based prospective cohort, consisting of >215,00 number of adults of the five representative racial groups in the U.S., followed since 1993-96.

      Participant characteristics have been collected through baseline and follow-up questionnaires. Incident cancers and death records are identified via linkage to the SEER and the NDI, respectively. Bio-specimen collection first began in 1995 for genomic analyses.

      The MEC cohort has been a great resource to investigate the racial disparities in cancer screening, as well as to answer a set of research questions regarding multiple malignancies among cancer survivors. 

Cancer Screening Trials

The Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial and the National Lung Screening Trial (NLST) have been utilized as a discovery cohort as well as validation cohort for my prediction models for second primary lung cancer (SPLC) among lung cancer survivors. 

      Given the distinctive participant characteristics of the NLST trial (individuals with a heavy smoking history at least 30 pack-year), the NLST dataset has been also used as a method development dataset for causal inference in survival data in the presence of competing risks. 

Image by National Cancer Institute

The SEER Cancer Registry &
SEER-Linked Databases

The Surveillance, Epidemiology, and End Results (SEER) program is an authoritative source of information on cancer incidence and mortality in the U.S., covering approximately 35% of the entire U.S. population. We have utilized the SEER cancer registry to examine the survival impact of diagnosis of multiple primary cancers among cancer survivors, and to investigate the standardized incidence ratio of multiple primary cancers to initial primary cancers. 

       In addition, we are using the SEER-linked databases, such as the SEER-Medicare and the SEER-Medicare Health Outcome Survey (SEER-MHOS), to investigate dynamic risk factors on clinical outcome, and to provide real-world evidence of clinical interventions by emulating trials. 

UK Biobank

UK Biobank is a very large and detailed prospective study with over 500,000 participants aged 40–69 years when recruited in 2006–2010. The study has collected and continues to collect extensive phenotypic and genotypic detail, including data from questionnaires, blood-based assays, imaging, and longitudinal follow-up for a wide range of health-related outcomes.

      Using the extensive UK Biobank, we have validated the known SPLC risk factors (i.e., smoking, etc.) and have investigated the risk factors associated with SPLC, including air pollution and genetic heritability.

Image by Resource Database

Electronic Health Records (EHR)

The EHRs are real-time patient records, including structured medical histories and free-text note-based information. Many patients have accumulated both types of EHRs throughout their lives, and successfully integrating them is vital to understanding the patient’s health history.

      My expertise in real-world data analytics has put me in the main role of integrating structured records and free-text EHRs at Stanford. In addition, I will play a crucial role in constructing the Stanford cohort database for SPLC research based on Stanford EHRs linked to the California Cancer Registry.

Cancer Screening Databases

Once a cancer screening program (such as biennial mammography screening for women aged 50-74 years) is introduced and following the participants undergoing the program over time, the cancer screening database will compile the cancer-free cases (screen-negative), screen-detected cases (screen-positive and cancer diagnosed), and interval cancer cases via linkage through the cancer registry. 

     Cancer screening database provides various opportunities to examine the cancer screening programs in terms of efficiency (i.e., screening rate, detection rate, sensitivity, specificity, etc.) and effect in reducing cancer mortality. Furthermore, we can estimate the screening opportunity window across various risk factors through a simple modeling approach. Screening opportunistic window informs the optimal timing and frequencies of cancer screening. 

​

Methodologies

Screenshot 2023-03-15 at 10.05.47 PM.png

Risk Predictive Modeling

Accurate prediction of disease risk is essential for effective clinical decision-making. Prediction models are commonly used to estimate the risk of the event of interest at a fixed time point—such as at the time of cancer diagnosis—thus may fail to provide updated risk estimates that can change over time. Temporal changes in patients’ data or risk factors can impact the subsequent risk.

      To effectively incorporate these patients' dynamics, we have utilized and have extended the Landmark model for survival data in the presence of competing risks. Check it out here: https://github.com/thehanlab/dynamicLM

CISNET Microsimulation

As a project of the Cancer Intervention and Surveillance Modeling Network (CISNET), several microsimulation models, including the Stanford model, have been developed and validated to capture long-term population-level benefits (e.g., mortality reduction) and harms (i.e., false positives, overdiagnosis) of lung cancer screening strategies. However, no microsimulation model incorporates the component of SPLC diagnosis.  

       We aim to extend the Stanford microsimulation model for SPLC and to estimate the survival impact of SPLC diagnosis on lung cancer-specific and overall mortality

Screenshot 2023-03-15 at 10.07.21 PM.png

Competing Risk Survival Analysis

Competing risks occur frequently in the analysis of survival data. The other causes of failure (i.e., competing events) may preclude the occurrence of the event of interest. For example, cancer survivors often die from various causes, such other comorbidities (competing risks of death) before developing a disease relapse (the event of interest). Ignoring competing risks from causal inference and prediction, or using inappropriate regression method can lead to an over/underestimation of the true risk. 

      We have used both cause-specific Cox and sub-distribution hazards models to address the issues, together with inverse probability weighting method.

Standardized Incidence Ratio

Standardized incidence ratio (SIR) is a simple but insightful measure of disease burden by evaluating the higher or lower level of cancer cases in a population of interest, compared to the referent population. It can be also used to compare the risk of developing second primary cancer among cancer survivors to the risk of developing initial primary cancer in the general population. 

Screenshot 2023-03-15 at 10.08.44 PM.png

Target Trial Emulation

With recent advances in therapeutics and early detection technologies, cancer survivors are increasing in number. Despite the importance of evaluating long-term benefits or harms of the emerging technologies among cancer survivors, conducting randomized clinical trials might not be feasible to investigate the long-term effects across different patient groups including vulnerable populations, such as elderly lung cancer survivors.

       We emulated a hypothetical target trial using real-world electronic health data to examine the efficacy of continuing computed tomography (CT) surveillance in lung cancer-specific mortality reduction among 5-year survivors of lung cancer diagnosed at Stanford Health Care. 

Eligibility to incidence (E-I) ratio

Although racial disparities in lung cancer screening have been frequently examined by evaluating differences in screening eligibility, eligibility itself in a certain racial group might be an incomplete disparity indicator unless the actual cancer risk is taken into account. First addressed by Pinsky et al. (2022), eligibility to incidence (E-I) ratio is an ideal metric to evaluate the potential disparity in cancer screening. Lower level of E-I ratio indicates the underserved screening (low eligibility) despite the high level of cancer risk (incidence).

bottom of page