17
Stata2008, Nov. 13-14 San Francisco, USA 1 Challenges in survival analysis with large datasets Noori Akhtar-Danesh, PhD McMaster University, Hamilton, Canada [email protected]

Challenges in survival analysis with large datasets

  • Upload
    huslu

  • View
    31

  • Download
    1

Embed Size (px)

DESCRIPTION

Challenges in survival analysis with large datasets. Noori Akhtar-Danesh, PhD McMaster University, Hamilton, Canada [email protected]. Background. - PowerPoint PPT Presentation

Citation preview

  • Challenges in survival analysis with large datasetsNoori Akhtar-Danesh, PhDMcMaster University,Hamilton, [email protected]

    San Francisco, USA

  • BackgroundThe Canadian Community Health Survey, Cycle 3.1 (CCHS-3.1) is a large cross-sectional survey which includes information on over 130000 Canadians. Preliminary results show that 63% of Canadians (age>=12 years) ever smoked a whole cigarette.

    San Francisco, USA

  • Objectives & ChallengesThe main objective was to investigate the age of smoking initiation based on the variables of gender and place of birth. We compared different survival analysis techniques including Cox regression and the available parametric methods. To highlight some challenges that we encountered in search for an appropriate model.

    San Francisco, USA

  • Challenge: PH AssumptionIn large datasets, test-based assessment of PH assumption is challenging because Schoenfeld test would be significant for even very small rhos due to large dataset.For the CCHS-3.1 dataset, Schoenfeld test for both Sex and Birth Place variables is significant with small rhos.

    San Francisco, USA

  • Challenge: PH Assumption

    San Francisco, USA

  • Challenge: PH AssumptionHowever, the log(-log) graph showed quite parallel lines for these variables which indicates that PH assumption is satisfied.

    San Francisco, USA

  • Challenge: PH Assumption

    San Francisco, USA

  • Challenge: PH Assumption Perhaps we need to specify a minimum value for correlation, for instance r=0.33, to be accepted as significant (as it is common in fields such as factor analysis).

    San Francisco, USA

  • Challenge: PH Assumption However, if we incorporate the survey design into the analysis, the PH test would work fine for these variables but the global test is still significant (in Stata).

    San Francisco, USA

  • Challenge: PH Assumption

    San Francisco, USA

  • Challenge: Appropriate parametric model

    San Francisco, USA

  • Challenge: Parametric models We used different parametric models incorporating the survey design and sampling weight (using svy: option).A Weibull model with frailty appeared to be the best model. But, we were not able to draw diagnostic graphs or have an overall GOF test due to the big sample size.

    San Francisco, USA

  • Using Cure Fraction Models One main assumption in survival analysis is that eventually everyone will experience the event.However, we have a large proportion (37%) of censored individuals (those who never started smoking) in the CCHS-3.1 dataset.

    San Francisco, USA

  • Using Cure Fraction Models

    San Francisco, USA

  • Using Cure Fraction Models Therefore, it is more appropriate to use a cure fraction model (Lambert 2007; Stata Journal, 7:(3), pp. 1-25).Using this model, both the cure fraction (the proportion who did not experience the event) and the time to failure (age of smoking initiation) depend (separately) on the explanatory variables.

    San Francisco, USA

  • Using Cure Fraction Models We used the strxnmix code in Stata for a non-mixture model (Lambert 2007; Stata Journal, 7:(3), pp. 1-25).Challenge: sampling weight cannot be incorporated in estimation.

    San Francisco, USA

  • ConclusionSurvival analysis for large datasets with sampling weight cannot be conducted easily.Common challenges:Assessment of PH assumptionModel diagnosticsUse of cure fraction models may not be appropriate because sampling weight cannot be incorporated in the estimation.

    San Francisco, USA

    Stata2008, San Francisco, USAStata2008, San Francisco, USAStata2008, San Francisco, USAStata2008, San Francisco, USAStata2008, San Francisco, USAStata2008, San Francisco, USAStata2008, San Francisco, USAStata2008, San Francisco, USA