Sample Size for Survival Analysis
Comprehensive power analysis for clinical trials with time-to-event endpoints (e.g., overall survival, progression-free survival, time to recurrence).
Contents
1. When to Use This Method
Use this methodology when your primary endpoint is time-to-event, and the analysis will use either:
- Log-rank test — comparing two Kaplan-Meier survival curves
- Cox proportional hazards model — estimating hazard ratios with covariate adjustment
Common Applications
Do NOT Use When
- • Fixed time-point binary outcomes (use proportions method)
- • Continuous endpoints without time component
- • Recurrent events (use different methods)
2. Mathematical Formulation
2.1 Schoenfeld Formula (Required Events)
The Schoenfeld (1981) formula calculates the total number of events (d) required:
For equal allocation (), this simplifies to:
| Symbol | Description |
|---|---|
| Total number of events required | |
| Critical value for Type I error (1.96 for α = 0.05) | |
| Critical value for power (0.84 for 80%; 1.28 for 90%) | |
| Hazard ratio (treatment/control) | |
| Allocation proportions to each group |
2.2 Freedman Formula (Alternative)
The Freedman (1982) formula uses a slightly different parameterization, giving slightly more conservative (larger) event estimates:
2.3 Converting Events to Sample Size
Once the required number of events is known, convert to total sample size:
The overall event probability depends on the survival function in each arm. Where is the probability of event by time t in group i:
2.4 Lachin-Foulkes Method (With Accrual)
For trials with a defined accrual period () and additional follow-up (), the expected probability of event under exponential survival is:
| Accrual (enrollment) period | |
| Additional follow-up after accrual closes | |
| Hazard rate in group i = log(2) / mediani |
2.5 Unequal Allocation
For allocation ratio k:1 (treatment:control):
| Allocation | Multiplier vs 1:1 | Common Use Case |
|---|---|---|
| 1:1 | 1.00× | Most efficient, standard design |
| 2:1 | 1.13× | Improve treatment arm safety data |
| 3:1 | 1.33× | Rare disease, maximize treatment exposure |
Note: 1:1 allocation is most efficient. A 2:1 ratio increases total events needed by ~13%.
2.6 Non-Inferiority Design
For non-inferiority trials testing vs , with an assumed true hazard ratio under H1 of (often if the new treatment is assumed exactly as good as the control, or something below 1 if a modest improvement is planned):
When , this reduces to the often-cited form . Using that form when the planned true HR is below 1 ignores the additional separation between H1 and the NI boundary, and overstates the required events.
| NI Margin (HR) | Description |
|---|---|
| 1.15 | Stricter margin (large trials) |
| 1.25 | Preserves 50% of log(HR) effect |
| 1.30 | More lenient margin |
2.7 Dropout Adjustment
To account for loss to follow-up at rate :
Note: Alternatively, incorporate dropout into the event probability calculation by treating dropout as an additional censoring mechanism.
3. Assumptions
3.1 Core Assumptions
| Assumption | Testable Criterion | Violation Consequence |
|---|---|---|
| Proportional hazards | Schoenfeld residuals; log-log survival plots parallel | Use weighted log-rank, RMST, or piecewise methods |
| Non-informative censoring | Clinical review of dropout reasons | Sensitivity analyses; IPCW methods |
| Exponential survival | Compare KM curve to exponential fit | Simulation-based or Weibull assumptions |
| Uniform accrual | Review enrollment projections | Piecewise accrual models; simulation |
| Stable HR estimate | Review prior data; Phase II CIs | Power for range of HRs; adaptive designs |
3.2 Understanding Hazard Ratios
The hazard ratio is the primary effect size measure:
| Hazard Ratio | Risk Reduction | Clinical Interpretation | Events (80% power) |
|---|---|---|---|
| 0.50 | 50% | Strong effect (rare in practice) | ~65 |
| 0.60 | 40% | Large effect (breakthrough therapies) | ~120 |
| 0.70 | 30% | Moderate-large effect | ~246 |
| 0.75 | 25% | Moderate effect (typical target) | ~380 |
| 0.80 | 20% | Small-moderate effect | ~630 |
| 0.85 | 15% | Small effect (large trials needed) | ~1,150 |
3.3 Designing under non-proportional hazards (NPH)
The Schoenfeld formula assumes a constant hazard ratio over follow-up. Modern oncology and immunotherapy trials regularly violate this. Common NPH patterns and design responses:
| NPH pattern | Typical setting | Design / analysis response |
|---|---|---|
| Delayed effect | Immunotherapy needs time to engage; curves overlap for the first 3–6 months. | Weighted log-rank (Fleming-Harrington with ); piecewise hazard sizing. |
| Crossing curves | Treatment harms early then benefits later (or vice versa). | RMST contrast over a pre-specified horizon; max-combo tests (Karrison 2016). |
| Cure fraction | A subset is functionally cured; the rest follows a parametric tail. | Mixture cure model or split-population sizing; report long-horizon milestone survival. |
| Long tail / late separation | Survival benefit accrues primarily after most events. | Milestone survival (e.g., difference in 24-month OS) as primary or co-primary; longer minimum follow-up. |
| Heterogeneous HR by subgroup | Biomarker-defined responders dominate the average effect. | Pre-specified stratified analysis; size each stratum on its own HR if possible. |
When to use each alternative summary
- •Restricted Mean Survival Time (RMST). A difference (or ratio) in mean survival truncated at a pre-specified time . Avoids the PH assumption entirely; interpretation is in months/years of life gained. Best when curves cross or effects are time-restricted.
- •Weighted log-rank (Fleming-Harrington). Up-weights late events with for delayed effects. The max-combo / Cox-MaxCombo test takes the maximum across a set of weights, controlling Type I error and giving strong power across a range of NPH shapes.
- •Milestone survival. Difference in survival probability at a pre-specified landmark (e.g., 12 or 24 months). Direct clinical interpretation; less efficient than log-rank under PH but robust to crossing curves and tail behavior.
Sizing under NPH. The Schoenfeld-based event count under a single planning HR is illustrative at best when NPH is expected. For regulatory-facing trials, plan to:
- •Simulate operating characteristics under the planning NPH pattern (Fleming-Harrington families, piecewise exponentials, mixture cure) at the assumed effect.
- •Pre-specify the primary analysis (log-rank, max-combo, RMST, or milestone) and a small set of sensitivity analyses covering the other patterns.
- •Report the simulated power across a grid of plausible NPH parameters so reviewers can see how brittle the design is to misspecification.
4. Regulatory Guidance
FDA
ICH E9 (Statistical Principles for Clinical Trials)
"For survival analysis, the sample size is usually expressed in terms of the number of events (e.g., deaths) rather than the number of subjects to be randomized."
FDA Oncology Endpoints Guidance
FDA expects event-driven designs for OS and PFS endpoints. Sample size justification should include assumptions about median survival, accrual pattern, and expected hazard ratio with supporting evidence.
Non-Inferiority Trials
FDA guidance recommends preserving at least 50% of the historical treatment effect when defining the non-inferiority margin for survival endpoints.
EMA
EMA/CHMP/205/95 (Anticancer Medicinal Products, Rev. 6)
For time-to-event endpoints, the EMA anticancer guideline (Rev. 6, adopted November 2023) emphasizes that the number of events drives the precision of hazard-ratio estimation; trial duration and sample size should be planned to deliver the target number of events. The companion Appendix 1 covers PFS/DFS-specific methodological considerations for confirmatory trials.
Proportional Hazards Assessment
EMA expects assessment of proportional hazards assumption and pre-specified alternative analyses if violations are anticipated (e.g., immunotherapy trials with delayed effects).
Key Citations
- ICH E9: Statistical Principles for Clinical Trials (1998)
- FDA: Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics (December 2018; replaces the 2007 version)
- EMA/CHMP/205/95 (Rev. 6, November 2023): Guideline on the Evaluation of Anticancer Medicinal Products in Man
5. Validation Against Industry Standards
| Scenario | Parameters | PASS 2024 | nQuery 9.5 | Zetyra | Status |
|---|---|---|---|---|---|
| Log-rank (Schoenfeld) | α=0.05, power=0.80, HR=0.70 | 246 events | 246 events | 246 events | ✓ Match |
| Log-rank (Schoenfeld) | α=0.05, power=0.90, HR=0.70 | 329 events | 329 events | 329 events | ✓ Match |
| Log-rank (Schoenfeld) | α=0.05, power=0.80, HR=0.75 | 380 events | 380 events | 380 events | ✓ Match |
| Log-rank (Schoenfeld) | α=0.01, power=0.90, HR=0.70 | 436 events | 436 events | 436 events | ✓ Match |
| Log-rank (Freedman) | α=0.05, power=0.80, HR=0.70 | 253 events | 253 events | 253 events | ✓ Match |
| 2:1 allocation | α=0.05, power=0.80, HR=0.70 | 277 events | 277 events | 277 events | ✓ Match |
All comparisons use Schoenfeld formula with equal allocation unless otherwise noted.
6. Example SAP Language
Overall Survival Primary Endpoint (Oncology)
The primary endpoint is overall survival (OS), defined as the time from randomization to death from any cause. The trial is designed to detect a hazard ratio of 0.70 (30% reduction in risk of death) favoring the experimental arm compared to control.
With a two-sided log-rank test at significance level α = 0.05 and 80% power, a total of 246 events are required (Schoenfeld formula). Assuming a median OS of 12 months in the control arm, an accrual period of 24 months with uniform enrollment, an additional follow-up period of 12 months, and a 10% dropout rate, approximately 440 subjects (220 per arm) are required to observe 246 events.
The primary analysis will use the log-rank test stratified by [strata]. The hazard ratio and 95% confidence interval will be estimated using a Cox proportional hazards model stratified by the same factors.
PFS Co-Primary Endpoint
Progression-free survival (PFS) is a co-primary endpoint, defined as time from randomization to disease progression per RECIST 1.1 or death from any cause, whichever occurs first.
For PFS, the trial is powered to detect HR = 0.65 with 90% power at α = 0.025 (one-sided, with multiplicity adjustment). This requires 227 PFS events (Schoenfeld formula). Based on assumed median PFS of 6 months in the control arm and the accrual pattern above, these events are expected to accrue within 18 months of first patient enrolled.
Non-Inferiority Survival Trial
This non-inferiority trial is designed to demonstrate that the novel agent is not inferior to the active comparator with respect to overall survival. The non-inferiority margin is HR = 1.25, which preserves 50% of the historical treatment effect (HR = 0.64 vs. placebo). The margin was selected based on [justification per regulatory guidance].
With a one-sided test at α = 0.025 and 80% power, assuming the true HR is 1.0 (no difference), 631 events are required (Schoenfeld formula). To observe these events within 36 months, approximately 880 subjects are needed, accounting for median survival of 24 months in both arms and 15% dropout.
7. R Code
library(gsDesign)
# Schoenfeld formula for required events
nEvents <- function(hr, alpha = 0.05, power = 0.80,
sided = 2, ratio = 1) {
za <- qnorm(1 - alpha/sided)
zb <- qnorm(power)
p1 <- ratio / (1 + ratio)
p2 <- 1 / (1 + ratio)
d <- (za + zb)^2 / (p1 * p2 * log(hr)^2)
ceiling(d)
}
# Example: HR = 0.70, 80% power, two-sided α = 0.05
nEvents(hr = 0.70) # Returns 247
# Freedman (1982) formula — slightly more conservative
nEvents_freedman <- function(hr, alpha = 0.05, power = 0.80) {
za <- qnorm(1 - alpha/2)
zb <- qnorm(power)
d <- (za + zb)^2 * (1 + hr)^2 / (hr - 1)^2
ceiling(d)
}
hr_values <- c(0.50, 0.60, 0.70, 0.80)
comparison <- data.frame(
HR = hr_values,
Schoenfeld = sapply(hr_values, nEvents),
Freedman = sapply(hr_values, nEvents_freedman)
)
print(comparison)
# HR Schoenfeld Freedman
# 1 0.50 66 71
# 2 0.60 121 126
# 3 0.70 247 253
# 4 0.80 631 636
# Note: PASS/nQuery may report 246 for HR=0.70 due to
# rounding conventions (continuous approximation vs ceiling)
# Using gsDesign for comprehensive calculations
# with accrual and follow-up
x <- nSurv(
lambdaC = log(2)/12, # Control median = 12 months
hr = 0.70,
eta = 0.05/12, # 5% annual dropout
R = 24, # 24 month accrual
T = 36, # 36 month total study duration
alpha = 0.025, # One-sided
beta = 0.20 # 80% power
)
print(x)
# gsDesign with interim analyses
gsd <- gsDesign(
k = 3, # 2 interims + final
test.type = 2, # Two-sided symmetric
alpha = 0.05,
beta = 0.20,
n.fix = 247, # Fixed-design events (Schoenfeld ceiling)
sfu = sfOF # O'Brien-Fleming spending
)
print(gsd)8. References
- Schoenfeld DA. Sample-size formula for the proportional-hazards regression model. Biometrics. 1983;39(2):499-503.
- Freedman LS. Tables of the number of patients required in clinical trials using the logrank test. Statistics in Medicine. 1982;1(2):121-129.
- Lachin JM, Foulkes MA. Evaluation of sample size and power for analyses of survival with allowance for nonuniform patient entry, losses to follow-up, noncompliance, and stratification. Biometrics. 1986;42(3):507-519.
- Collett D. Modelling Survival Data in Medical Research. 3rd ed. CRC Press; 2015.
- International Council for Harmonisation (ICH). E9 Statistical Principles for Clinical Trials. February 1998.
- U.S. Food and Drug Administration. Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics: Guidance for Industry. 2018.
- Antonia SJ, et al. Overall survival with durvalumab after chemoradiotherapy in stage III NSCLC (PACIFIC). NEJM. 2018;379(24):2342-2350.
- McMurray JJV, et al. A trial to evaluate the effect of the sodium-glucose co-transporter 2 inhibitor dapagliflozin on morbidity and mortality in patients with heart failure (DAPA-HF). Eur J Heart Fail. 2019;21(5):665-675.
Last updated: May 2026
Ready to calculate your sample size?
Use our Survival Power Calculator to determine the events and sample size needed for your log-rank test or Cox model analysis.