Docs/Guides/Continuous Outcomes

Sample Size for Continuous Outcomes

Comprehensive power analysis for clinical trials with continuous endpoints (e.g., blood pressure, cholesterol, weight).

1. When to Use This Method

Use this methodology when:

  • Your primary endpoint is measured on a continuous scale (interval or ratio data)
  • You are comparing means between two or more groups
  • You need to power a superiority, non-inferiority, or equivalence trial

Common Applications

Blood pressure reduction (mmHg)
HbA1c change from baseline
Pain scores (VAS)
Quality of life instruments
Biomarker concentrations

Do NOT Use When

  • • Your outcome is binary (use proportions method)
  • • Your outcome is time-to-event (use survival analysis method)
  • • Your outcome is a count or rate (use Poisson method)

2. Mathematical Formulation

2.1 Two-Sample Parallel Design (Superiority)

For a randomized trial comparing treatment (μ1\mu_1) to control (μ2\mu_2), the required sample size per group to detect a clinically meaningful difference Δ=μ1μ2\Delta = \mu_1 - \mu_2:

n=2σ2(z1α/2+z1β)2Δ2n = \frac{2\sigma^2(z_{1-\alpha/2} + z_{1-\beta})^2}{\Delta^2}
SymbolDescription
σ2\sigma^2Common variance (assumed equal across groups)
z1α/2z_{1-\alpha/2}Critical value for Type I error (1.96 for α = 0.05, two-sided)
z1βz_{1-\beta}Critical value for power (0.84 for 80%; 1.28 for 90%)
Δ\DeltaMinimum clinically important difference (MCID)

2.2 Unequal Allocation

For allocation ratio k=n2/n1k = n_2/n_1:

n1=(1+1/k)σ2(z1α/2+z1β)2Δ2n_1 = \frac{(1 + 1/k)\sigma^2(z_{1-\alpha/2} + z_{1-\beta})^2}{\Delta^2}

Note: 1:1 allocation is most efficient. A 2:1 ratio increases total N by ~12%.

2.3 Clustered Designs

When observations are nested within clusters (e.g., patients within clinics), apply the variance inflation factor (design effect):

nclustered=nsimple×[1+(m1)ρ]n_{\text{clustered}} = n_{\text{simple}} \times [1 + (m-1)\rho]
mmAverage cluster size
ρ\rhoIntraclass correlation coefficient (ICC)

2.4 Repeated Measures / Longitudinal

For mm measurements per subject with compound-symmetric correlation ρ\rho, the per-arm sample size relative to a single-measurement comparison is:

nrepeated=nsimple×1+(m1)ρmn_{\text{repeated}} = n_{\text{simple}} \times \frac{1 + (m-1)\rho}{m}

The [1+(m1)ρ]/m[1+(m-1)\rho]/m factor is the information-time ratio for an unweighted average of mm repeated measurements under compound symmetry. It collapses to 1/m1/m when ρ=0\rho = 0 (independent measurements) and to 11 when ρ=1\rho = 1 (no benefit from repeated measures).

What the calculator actually uses. The textbook factor above is illustrative. The Sample Size Calculator selects an exact effective-variance formula based on the analysis target you choose:

  • Slope (CS): σ2(1ρ)/i(titˉ)2\sigma^2(1-\rho) / \sum_i (t_i - \bar{t})^2, exact for OLS slope under compound-symmetric errors with equally-spaced times.
  • Endpoint, ANCOVA-adjusted: σ2(1ρ2)\sigma^2(1 - \rho^2) — the CUPED variance-reduction factor.
  • Change from baseline (CS): 2σ2(1ρ)2\sigma^2(1 - \rho).
  • An AR(1) variant is also available for slope and change-from-baseline targets.

See the Longitudinal Studies guide for the full derivations and the choice between targets.

Note: Benefits plateau quickly—increasing beyond 4-5 measurements yields diminishing returns.

2.5 Dropout Adjustment

Inflate sample size to account for anticipated dropout. For ordinary two-arm trials where each enrolled subject has probability dd of not contributing a valid analysis observation:

N=N1dN^* = \frac{N}{1 - d}

Where dd = expected dropout rate.

When the squared form applies. A few designs require (1d)2(1-d)^2 in the denominator:

  • Paired or change-from-baseline analyses requiring both a baseline and an outcome measurement — losing either one drops the subject.
  • Crossover trials where missing either period invalidates the within-subject comparison.

Inflating NN by (1d)2(1-d)^2 does not fix informative missingness; that requires an analysis-stage strategy (multiple imputation, tipping-point sensitivity, pattern mixture). Dropout inflation and missing-data handling are separate decisions.

3. Assumptions

3.1 Core Assumptions

AssumptionTestable CriterionViolation Consequence
NormalityShapiro-Wilk p > 0.05; Q-Q plot linearityModerate: CLT protects with n > 30/group
Equal variancesLevene's test p > 0.05; ratio of SDs < 2Use Welch's t-test or Satterthwaite df
IndependenceStudy design ensures no clusteringSevere: inflated Type I error if ignored
MCID validityLiterature/clinical consensus supports ΔUnderpowered if Δ too optimistic

3.2 Parameter Estimates

Variance (σ2\sigma^2)

Should come from prior studies, pilot data, or published literature in similar populations. If uncertain, conduct sensitivity analysis across plausible range.

Effect size (Δ\Delta)

Must be clinically meaningful, not just statistically detectable. Overly optimistic effect sizes are the #1 cause of underpowered trials.

4. Regulatory Guidance

FDA

ICH E9 (Statistical Principles for Clinical Trials)

"The number of subjects...should always be large enough to provide a reliable answer to the questions addressed." Requires justification of effect size and variance assumptions.

ICH E20 Adaptive Designs for Clinical Trials (2025)

Permits sample size re-estimation based on interim variance, but effect size must remain blinded.

EMA

CHMP Points to Consider on Adjustment for Baseline Covariates

Recommends ANCOVA for continuous outcomes, which can reduce variance and required sample size.

EMA Guideline on Missing Data (2010)

Requires sensitivity analyses for missing data; dropout adjustment should be pre-specified.

Key Citations

  1. ICH E9: Statistical Principles for Clinical Trials (1998)
  2. ICH E20: Adaptive Designs for Clinical Trials (2025)
  3. EMA: Guideline on Adjustment for Baseline Covariates in Clinical Trials (2015)

5. Validation Against Industry Standards

ScenarioParametersPASS 2024nQuery 9.5ZetyraStatus
Two-sample t-testα=0.05, power=0.80, Δ=5, σ=1064/group64/group64/group✓ Match
Two-sample t-testα=0.05, power=0.90, Δ=5, σ=1086/group86/group86/group✓ Match
Unequal allocation (2:1)α=0.05, power=0.80, Δ=5, σ=1048/9648/9648/96✓ Match
Cluster RCTICC=0.05, m=20, Δ=5, σ=10127/group127/group127/group✓ Match

Minor variations (±1 subject) may occur due to rounding conventions.

6. Example SAP Language

Sample Size Justification

The primary endpoint is change from baseline in [outcome] at Week [X]. Based on prior studies (Author et al., Year), we assume a standard deviation of [σ] units. A difference of [Δ] units is considered the minimum clinically important difference based on [justification].

Using a two-sample t-test with a two-sided significance level of 0.05 and 80% power, [n] subjects per group are required. To account for an anticipated dropout rate of [X]%, we will enroll [N*] subjects per group ([total] subjects total).

Calculations were performed using [Zetyra / gsDesign / PASS] and validated against [reference software].

7. R Code

R
# Two-sample t-test sample size
power.t.test(
  delta = 5,        # MCID
  sd = 10,          # Standard deviation
  sig.level = 0.05, # Alpha (two-sided)
  power = 0.80,     # 1 - beta
  type = "two.sample",
  alternative = "two.sided"
)
# Result: n = 64 per group

# With unequal allocation (2:1)
# Using pwr package
library(pwr)
pwr.t2n.test(
  d = 5/10,         # Cohen's d = delta/sd
  n1 = NULL,
  n2 = NULL,
  sig.level = 0.05,
  power = 0.80,
  alternative = "two.sided"
)

# Cluster RCT adjustment
n_simple <- 64
m <- 20          # cluster size
icc <- 0.05      # intraclass correlation
deff <- 1 + (m - 1) * icc  # design effect = 1.95
n_cluster <- ceiling(n_simple * deff)
# Result: n = 125 per group (rounded up)

# Dropout adjustment (ordinary independent-subject dropout)
dropout_rate <- 0.15
n_adjusted <- ceiling(n_cluster / (1 - dropout_rate))
# Result: n = 148 per group
# Use (1 - dropout_rate)^2 only for paired/change-from-baseline designs
# where losing either measurement drops the subject.

8. References

  1. Chow SC, Shao J, Wang H, Lokhnygina Y. Sample Size Calculations in Clinical Research. 3rd ed. CRC Press; 2017.
  2. Julious SA. Sample Sizes for Clinical Trials. CRC Press; 2010.
  3. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Lawrence Erlbaum Associates; 1988.
  4. Vickers AJ, Altman DG. Analysing controlled trials with baseline and follow up measurements. BMJ. 2001;323(7321):1123-1124.
  5. Diggle PJ, Heagerty P, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd ed. Oxford University Press; 2002.
  6. Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. Arnold; 2000.
  7. International Council for Harmonisation (ICH). E9 Statistical Principles for Clinical Trials. February 1998.

Last updated: May 2026

Ready to calculate your sample size?

Use our Sample Size Calculator to quickly determine the number of subjects needed for your clinical trial.

Open Sample Size Calculator