Survival Analysis 101: An Easy Start Guide

This comprehensive methods paper, titled “Survival analysis 101: an easy start guide to analysing time-to-event data,” was authored by Quin E. Denfeld, Debora Burger, and Christopher S. Lee, and published in the European Journal of Cardiovascular Nursing in 2023. The article serves as an accessible and practical guide for researchers, particularly those in cardiovascular nursing and health-related fields, who are interested in employing survival analysis, also widely known as time-to-event or event history analysis. Its core objective is to demystify this powerful statistical technique by offering a step-by-step methodology accompanied by illustrative example data.

Understanding the Purpose and Unique Features of Survival Analysis:

Survival analysis is a fundamental approach for describing, explaining, and/or predicting the occurrence and timing of events. What makes it unique and particularly valuable is its ability to simultaneously consider two critical aspects: whether an event happened (a binary outcome) and precisely when it happened (a continuous outcome). This dual consideration is essential for research questions that delve into both the occurrence and the timing of specific outcomes.

The authors highlight the distinct advantages of survival analysis by contrasting it with more commonly used regression models. For instance, logistic regression, while capable of indicating if an event occurred, cannot provide insights into the timing of events or facilitate comparisons of time-to-event between different groups (e.g., younger vs. older patients). This limitation means logistic regression fails to address how event risk evolves over a study period. Conversely, linear regression, if applied to time as a dependent variable, faces significant challenges. Primarily, time-to-event data often exhibit a skewed distribution, violating a core assumption of linear regression. More critically, it struggles with “right censoring,” a common scenario where participants do not experience the event during the study period, leading to incomplete time data. Survival analysis is specifically designed to address the unique nature of time-to-event data and effectively handle censoring.

The paper also introduces key terminology integral to understanding survival analysis:

  • Event (or failure): A clearly defined, unambiguous change between two mutually exclusive states (e.g., alive or dead).
  • Censoring: Incomplete or unobserved event time, meaning the event did not occur during the study period.
  • Truncation: The use of event occurrence or non-occurrence for participant selection.
  • Survivor function: The probability of surviving past a particular time point.
  • Hazard rate: A measure of the risk of an event occurring during a specific time period.
  • Hazard ratio: The ratio of two hazard rates, often comparing two different groups.

The 10-Step Approach to Time-to-Event Analysis:

The article provides a central illustration outlining 10 essential steps for conducting time-to-event analysis, ensuring researchers can set up and run an initial analysis effectively:

  1. Identify the Event(s) of Interest (Step 1): This crucial first step involves clearly defining events based on literature and clinical relevance. Events must be mutually exclusive (e.g., hospitalized vs. not hospitalized). For less clear events, a “threshold” must be established. The authors encourage defining multiple levels of outcomes (e.g., all-cause vs. cardiovascular hospitalization) and note the frequent use of composite events, where the first event is considered regardless of severity.
  2. Determine the Duration a Participant Is at Risk (Step 2): This step involves establishing the follow-up period during which a participant could experience an event. The duration depends on factors like event frequency and available resources, and is vital for power analysis (e.g., heart failure studies often show a 40–50% event rate over a year).
  3. Determine the Scale for the Event Data (Step 3): Researchers must decide between a granular/fine time scale (e.g., days) or a coarser time scale (e.g., months or years), the latter of which is known as interval censoring. Interval censoring leads to “ties,” where multiple participants experience an event within the same time frame.
  4. Organize Your Data (Step 4): For analysis, two primary variables are needed beyond participant ID: a failure variable (usually 0 for no event, 1 for event) and a follow-up time duration variable, representing the time from study start to the first event or the end of the study for censored participants.
  5. Declare Your Data as Survival Data (Step 5): In statistical software like Stata, this involves using a specific command (e.g., stset) to inform the program that the data are time-to-event. This declaration, using the failure and time variables, links all subsequent survival analysis commands.
  6. Examine Events and Life Tables (Step 6): This step mirrors standard descriptive statistics, using commands like stdescribe to summarize subjects, entry/exit times, total time at risk, and number of failures. Additionally, life tables, borrowed from actuarial science, can show the number of participants at risk and events occurring in specific time intervals.
  7. Examine Survival or Failure Rates Over Time (Kaplan-Meier Estimator) (Step 7): The Kaplan-Meier estimator is typically used to estimate the unadjusted probability of surviving beyond a specific time point. This estimation can generate survivor function graphs (showing participants who have not had an event over time) or cumulative hazard function graphs (showing cumulative risk over time), illustrating event rates and censoring.
  8. Perform Simple Comparative Statistics (Log Rank Test) (Step 8): For comparing event probabilities between groups, time-independent variables (measured at baseline and not changing) are used. The log rank test is a simple method to compare survivor functions between two or more groups, testing the null hypothesis of no difference in event probability at any point.
  9. Examine Variables in a Multivariate Regression Model (Cox Proportional Hazards Model) (Step 9): While parametric approaches exist, the semi-parametric Cox proportional hazards model is most common for multivariate analysis. This model combines a baseline hazard function with a hazard function for other predictors, yielding hazard ratios. A critical assumption is that hazard ratios are proportional (constant) over time, which can be tested visually and numerically.
  10. Interpret and Report Findings (Step 10): The final step involves interpreting hazard ratios (e.g., HR of 1.5 means 50% greater risk) along with their 95% confidence intervals. The paper advises considering advanced methods like competing risks analysis for complex composite events and emphasizes reporting findings in multiple formats, including detailed Kaplan-Meier graphs with appropriate labels, scales, numbers at risk, confidence intervals, and significance results.

Illustrative Example: Cardiovascular Events in Heart Failure Patients:

To concretely demonstrate these steps, the authors analyzed combined follow-up data from three studies involving patients with heart failure. The research aimed to answer two questions: (i) the number of cardiovascular events over six months in heart failure patients, and (ii) the influence of age, gender, New York Heart Association (NYHA) functional classification, and comorbidity burden on event risk.

The study focused on time to the first cardiovascular event, defined as a composite outcome including all-cause death, cardiovascular hospitalization, or emergency department visit, over a 6-month risk period. Event data were recorded in days. The sample comprised 403 participants (59% male, mean age 58.7 ± 14.4 years), with a majority (58%) classified as NYHA Class III/IV and varying levels of comorbidity burden categorized by the Charlson Comorbidity Index. A total of 118 cardiovascular events (29%) were reported, including 10 deaths, 82 cardiovascular hospitalizations, and 26 emergency department visits. The median exit time for participants was 149.4 days.

The analysis revealed that Kaplan-Meier plots and log rank tests showed no significant difference in time-to-event between women and men. However, participants with NYHA Class III/IV had an increased event risk compared to those with NYHA Class I/II, and there was a significant difference in time-to-event across comorbidity categories. The subsequent Cox proportional hazards analysis confirmed that NYHA Class III/IV and high comorbidity burden were significant predictors of increased cardiovascular event risk after adjusting for other variables, while age and gender were not. Specifically, NYHA Class III/IV participants were about three times more likely (HR 3.03), and those with high comorbidity burden were about 75% more likely (HR 1.75) to experience an event compared to their referent groups. The proportional hazards assumption was not violated. The median time for a cardiovascular event in this cohort was estimated to be around 5 months.

In conclusion, this article successfully provides a foundational understanding of survival analysis, positioning it as an invaluable and highly informative approach for researchers to gain critical insights into the risk and timing of events within specific timeframes, especially within cardiovascular nursing research.

Reference: Denfeld, Q. E., Burger, D., & Lee, C. S. (2023). Survival analysis 101: An easy start guide to analysing time-to-event data. European Journal of Cardiovascular Nursing, 22(3), 332–337. https://doi.org/10.1093/eurjcn/zvad023

Subscribe to the Health Topics Newsletter!

Google reCaptcha: Invalid site key.