Docs/Guides/Continuous Outcomes

Sample Size for Continuous Outcomes

Name: Zetyra
Price: 99 USD
Rating: 4.9 (47 reviews)
Author: Zetyra

Comprehensive power analysis for clinical trials with continuous endpoints (e.g., blood pressure, cholesterol, weight).

1. When to Use This Method

Use this methodology when:

Your primary endpoint is measured on a continuous scale (interval or ratio data)
You are comparing means between two or more groups
You need to power a superiority, non-inferiority, or equivalence trial

Common Applications

Blood pressure reduction (mmHg)

HbA1c change from baseline

Pain scores (VAS)

Quality of life instruments

Biomarker concentrations

Do NOT Use When

• Your outcome is binary (use proportions method)
• Your outcome is time-to-event (use survival analysis method)
• Your outcome is a count or rate (use Poisson method)

2. Mathematical Formulation

2.1 Two-Sample Parallel Design (Superiority)

For a randomized trial comparing treatment ( $\mu_1$ ) to control ( $\mu_2$ ), the required sample size per group to detect a clinically meaningful difference $\Delta = \mu_1 - \mu_2$ :

n = \frac{2\sigma^2(z_{1-\alpha/2} + z_{1-\beta})^2}{\Delta^2}

Symbol	Description
$\sigma^2$	Common variance (assumed equal across groups)
$z_{1-\alpha/2}$	Critical value for Type I error (1.96 for α = 0.05, two-sided)
$z_{1-\beta}$	Critical value for power (0.84 for 80%; 1.28 for 90%)
$\Delta$	Minimum clinically important difference (MCID)

2.2 Unequal Allocation

For allocation ratio $k = n_2/n_1$ :

n_1 = \frac{(1 + 1/k)\sigma^2(z_{1-\alpha/2} + z_{1-\beta})^2}{\Delta^2}

Note: 1:1 allocation is most efficient. A 2:1 ratio increases total N by ~12%.

2.3 Clustered Designs

When observations are nested within clusters (e.g., patients within clinics), apply the variance inflation factor (design effect):

n_{\text{clustered}} = n_{\text{simple}} \times [1 + (m-1)\rho]

$m$	Average cluster size
$\rho$	Intraclass correlation coefficient (ICC)

2.4 Repeated Measures / Longitudinal

For $m$ measurements per subject with compound-symmetric correlation $\rho$ , the per-arm sample size relative to a single-measurement comparison is:

n_{\text{repeated}} = n_{\text{simple}} \times \frac{1 + (m-1)\rho}{m}

The $[1+(m-1)\rho]/m$ factor is the information-time ratio for an unweighted average of $m$ repeated measurements under compound symmetry. It collapses to $1/m$ when $\rho = 0$ (independent measurements) and to $1$ when $\rho = 1$ (no benefit from repeated measures).

What the calculator actually uses. The textbook factor above is illustrative. The Sample Size Calculator selects an exact effective-variance formula based on the analysis target you choose:

•Slope (CS): $\sigma^2(1-\rho) / \sum_i (t_i - \bar{t})^2$ , exact for OLS slope under compound-symmetric errors with equally-spaced times.
•Endpoint, ANCOVA-adjusted: $\sigma^2(1 - \rho^2)$ — the CUPED variance-reduction factor.
•Change from baseline (CS): $2\sigma^2(1 - \rho)$ .
•An AR(1) variant is also available for slope and change-from-baseline targets.

See the Longitudinal Studies guide for the full derivations and the choice between targets.

Note: Benefits plateau quickly—increasing beyond 4-5 measurements yields diminishing returns.

2.5 Dropout Adjustment

Inflate sample size to account for anticipated dropout. For ordinary two-arm trials where each enrolled subject has probability $d$ of not contributing a valid analysis observation:

N^* = \frac{N}{1 - d}

Where $d$ = expected dropout rate.

When the squared form applies. A few designs require $(1-d)^2$ in the denominator:

•Paired or change-from-baseline analyses requiring both a baseline and an outcome measurement — losing either one drops the subject.
•Crossover trials where missing either period invalidates the within-subject comparison.

Inflating $N$ by $(1-d)^2$ does not fix informative missingness; that requires an analysis-stage strategy (multiple imputation, tipping-point sensitivity, pattern mixture). Dropout inflation and missing-data handling are separate decisions.

3. Assumptions

3.1 Core Assumptions

Assumption	Testable Criterion	Violation Consequence
Normality	Shapiro-Wilk p > 0.05; Q-Q plot linearity	Moderate: CLT protects with n > 30/group
Equal variances	Levene's test p > 0.05; ratio of SDs < 2	Use Welch's t-test or Satterthwaite df
Independence	Study design ensures no clustering	Severe: inflated Type I error if ignored
MCID validity	Literature/clinical consensus supports Δ	Underpowered if Δ too optimistic

3.2 Parameter Estimates

Variance ( $\sigma^2$ )

Should come from prior studies, pilot data, or published literature in similar populations. If uncertain, conduct sensitivity analysis across plausible range.

Effect size ( $\Delta$ )

Must be clinically meaningful, not just statistically detectable. Overly optimistic effect sizes are the #1 cause of underpowered trials.

4. Regulatory Guidance

FDA

ICH E9 (Statistical Principles for Clinical Trials)

"The number of subjects...should always be large enough to provide a reliable answer to the questions addressed." Requires justification of effect size and variance assumptions.

ICH E20 Adaptive Designs for Clinical Trials (2025)

Permits sample size re-estimation based on interim variance, but effect size must remain blinded.

EMA

CHMP Points to Consider on Adjustment for Baseline Covariates

Recommends ANCOVA for continuous outcomes, which can reduce variance and required sample size.

EMA Guideline on Missing Data (2010)

Requires sensitivity analyses for missing data; dropout adjustment should be pre-specified.

Key Citations

ICH E9: Statistical Principles for Clinical Trials (1998)
ICH E20: Adaptive Designs for Clinical Trials (2025)
EMA: Guideline on Adjustment for Baseline Covariates in Clinical Trials (2015)

5. Validation Against Industry Standards

Scenario	Parameters	PASS 2024	nQuery 9.5	Zetyra	Status
Two-sample t-test	α=0.05, power=0.80, Δ=5, σ=10	64/group	64/group	64/group	✓ Match
Two-sample t-test	α=0.05, power=0.90, Δ=5, σ=10	86/group	86/group	86/group	✓ Match
Unequal allocation (2:1)	α=0.05, power=0.80, Δ=5, σ=10	48/96	48/96	48/96	✓ Match
Cluster RCT	ICC=0.05, m=20, Δ=5, σ=10	127/group	127/group	127/group	✓ Match

Minor variations (±1 subject) may occur due to rounding conventions.

6. Example SAP Language

Sample Size Justification

The primary endpoint is change from baseline in [outcome] at Week [X]. Based on prior studies (Author et al., Year), we assume a standard deviation of [σ] units. A difference of [Δ] units is considered the minimum clinically important difference based on [justification].

Using a two-sample t-test with a two-sided significance level of 0.05 and 80% power, [n] subjects per group are required. To account for an anticipated dropout rate of [X]%, we will enroll [N*] subjects per group ([total] subjects total).

Calculations were performed using [Zetyra / gsDesign / PASS] and validated against [reference software].

7. R Code

# Two-sample t-test sample size
power.t.test(
  delta = 5,        # MCID
  sd = 10,          # Standard deviation
  sig.level = 0.05, # Alpha (two-sided)
  power = 0.80,     # 1 - beta
  type = "two.sample",
  alternative = "two.sided"
)
# Result: n = 64 per group

# With unequal allocation (2:1)
# Using pwr package
library(pwr)
pwr.t2n.test(
  d = 5/10,         # Cohen's d = delta/sd
  n1 = NULL,
  n2 = NULL,
  sig.level = 0.05,
  power = 0.80,
  alternative = "two.sided"
)

# Cluster RCT adjustment
n_simple <- 64
m <- 20          # cluster size
icc <- 0.05      # intraclass correlation
deff <- 1 + (m - 1) * icc  # design effect = 1.95
n_cluster <- ceiling(n_simple * deff)
# Result: n = 125 per group (rounded up)

# Dropout adjustment (ordinary independent-subject dropout)
dropout_rate <- 0.15
n_adjusted <- ceiling(n_cluster / (1 - dropout_rate))
# Result: n = 148 per group
# Use (1 - dropout_rate)^2 only for paired/change-from-baseline designs
# where losing either measurement drops the subject.

8. References

Chow SC, Shao J, Wang H, Lokhnygina Y. Sample Size Calculations in Clinical Research. 3rd ed. CRC Press; 2017.
Julious SA. Sample Sizes for Clinical Trials. CRC Press; 2010.
Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Lawrence Erlbaum Associates; 1988.
Vickers AJ, Altman DG. Analysing controlled trials with baseline and follow up measurements. BMJ. 2001;323(7321):1123-1124.
Diggle PJ, Heagerty P, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd ed. Oxford University Press; 2002.
Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. Arnold; 2000.
International Council for Harmonisation (ICH). E9 Statistical Principles for Clinical Trials. February 1998.

Last updated: May 2026

Ready to calculate your sample size?

Use our Sample Size Calculator to quickly determine the number of subjects needed for your clinical trial.

Open Sample Size Calculator