Sample Size for Binary Outcomes
Comprehensive power analysis for clinical trials with dichotomous endpoints (e.g., response rates, event incidence, success/failure).
Contents
1. When to Use This Method
Use this methodology when:
- Your primary endpoint is a binary outcome (yes/no, success/failure, event/no event)
- You are comparing proportions between two or more groups
- You need to power a superiority, non-inferiority, or equivalence trial
Common Applications
Do NOT Use When
- • Your outcome is continuous (use means comparison method)
- • Your outcome is time-to-event with censoring (use survival analysis method)
- • Your outcome is a count with no upper bound (use Poisson method)
- • You have paired/matched binary data (use McNemar's test method)
2. Mathematical Formulation
2.1 Two-Sample Parallel Design (Superiority)
For a randomized trial comparing intervention () to control (), the sample size per group:
| Symbol | Description |
|---|---|
| Pooled proportion under H₀: | |
| Critical value for Type I error (1.96 for α = 0.05, two-sided) | |
| Critical value for power (0.84 for 80%; 1.28 for 90%) |
Simplified approximation (equal groups):
2.2 Unequal Allocation
For allocation ratio :
With .
2.3 Non-Inferiority Design
For testing whether the new treatment is no worse than control by margin :
Note: One-sided α (typically 0.025) is standard for non-inferiority. Non-inferiority margins are typically small, resulting in substantially larger sample sizes than superiority trials.
2.4 Equivalence Design
For testing whether treatments differ by no more than :
2.5 Clustered Designs
When observations are nested within clusters, apply the variance inflation factor (design effect):
For unequal cluster sizes, adjust using coefficient of variation (CV):
2.6 Continuity Correction
For small samples or proportions near 0 or 1, apply continuity correction:
2.7 Dropout Adjustment
Inflate sample size to account for anticipated dropout. For ordinary two-arm trials where each enrolled subject has probability of not contributing a valid analysis observation:
Where = expected dropout rate.
When the squared form applies. Use for paired or change-from-baseline designs requiring both a baseline and an outcome measurement, or for crossover trials where missing either period invalidates the within-subject comparison.
Inflating does not fix informative missingness; that requires an analysis-stage strategy (multiple imputation, tipping-point sensitivity).
3. Assumptions
3.1 Core Assumptions
| Assumption | Testable Criterion | Violation Consequence |
|---|---|---|
| Independence | Study design ensures no clustering | Severe: inflated Type I error if ignored |
| Fixed proportions | Event rates stable over enrollment period | Moderate: time-varying rates may require stratification |
| Large sample | and | Use exact methods (Fisher's) if violated |
| No confounding | Randomization successful | Bias in effect estimate |
3.2 Parameter Estimates
Control rate ()
Should come from prior studies, pilot data, or published literature. Consider secular trends—rates may have changed since historical studies.
Treatment effect
Can be specified as absolute difference (), relative risk (), or odds ratio. Ensure clinical relevance, not just statistical detectability.
Event Rate Impact on Sample Size
| Control Rate | 25% Relative Reduction | Required n/group (80% power) |
|---|---|---|
| 40% | 40% → 30% | 356 |
| 20% | 20% → 15% | 906 |
| 10% | 10% → 7.5% | 1,996 |
| 5% | 5% → 3.75% | 4,182 |
4. Regulatory Guidance
FDA
ICH E9 (Statistical Principles for Clinical Trials)
Requires prospective sample size justification with clearly stated assumptions for event rates and effect sizes.
FDA Guidance on Non-Inferiority Trials (2016)
Non-inferiority margin must preserve a clinically meaningful fraction of the active control effect. Recommends the 95-95 method or fixed margin approach.
FDA Guidance on Multiple Endpoints (2022)
When multiple binary endpoints are co-primary, apply multiplicity adjustment (e.g., Bonferroni: α/k), which increases required sample size.
EMA
CHMP Guideline on Non-Inferiority (2005)
Margin selection must be justified based on historical evidence of active control efficacy vs. placebo.
EMA Points to Consider on Switching
Pre-specification required for switching between superiority and non-inferiority; cannot switch post-hoc based on results.
Key Citations
- ICH E9: Statistical Principles for Clinical Trials (1998)
- FDA Guidance: Non-Inferiority Clinical Trials to Establish Effectiveness (2016)
- FDA Guidance: Multiple Endpoints in Clinical Trials (2022)
- CHMP: Guideline on the Choice of the Non-Inferiority Margin (2005)
5. Validation Against Industry Standards
| Scenario | Parameters | PASS 2024 | nQuery 9.5 | Zetyra | Status |
|---|---|---|---|---|---|
| Two-proportion (superiority) | p₁=0.30, p₂=0.20, α=0.05, power=0.80 | 294/group | 294/group | 294/group | ✓ Match |
| Two-proportion (superiority) | p₁=0.30, p₂=0.20, α=0.05, power=0.90 | 392/group | 393/group | 392/group | ✓ Match |
| Non-inferiority | p₁=p₂=0.20, δ=0.10, α=0.025, power=0.80 | 252/group | 252/group | 252/group | ✓ Match |
| Cluster RCT | p=0.25, ICC=0.05, m=20 | 582/group | 583/group | 582/group | ✓ Match |
Minor variations (±1 subject) may occur due to rounding conventions and continuity correction options.
6. Example SAP Language
Superiority Trial
The primary endpoint is the proportion of subjects achieving [response criterion] at Week [X]. Based on prior studies (Author et al., Year), the expected response rate in the control group is [p_C]%. We hypothesize that the intervention will achieve a response rate of [p_I]%, representing an absolute improvement of [difference]%.
Using a two-sided chi-square test with α = 0.05 and 80% power, [n] subjects per group are required. To account for an anticipated dropout rate of [X]%, we will enroll [N*] subjects per group ([total] subjects total).
Calculations were performed using [Zetyra / PASS / nQuery] and validated against published formulas (Fleiss et al., 2003).
Non-Inferiority Trial
The primary endpoint is the proportion of subjects achieving [outcome] at Week [X]. This is a non-inferiority trial comparing [new treatment] to [active control].
Based on historical trials (Author et al., Year), the active control achieves a response rate of approximately [p_C]%. We assume the new treatment will have a similar response rate. The non-inferiority margin is set at [δ]%, which preserves at least [X]% of the historical treatment effect over placebo, consistent with FDA guidance.
Using a one-sided test with α = 0.025 and 80% power, [n] subjects per group are required. To account for an anticipated dropout rate of [X]%, we will enroll [N*] subjects per group.
7. R Code
# Two-proportion superiority test
library(pwr)
# Method 1: Using pwr package (effect size h)
p1 <- 0.30 # Intervention proportion
p2 <- 0.20 # Control proportion
h <- ES.h(p1, p2) # Cohen's h effect size
pwr.2p.test(
h = h,
sig.level = 0.05,
power = 0.80,
alternative = "two.sided"
)
# Result: n = 294 per group
# Method 2: Using power.prop.test (base R)
power.prop.test(
p1 = 0.30,
p2 = 0.20,
sig.level = 0.05,
power = 0.80,
alternative = "two.sided"
)
# Result: n = 294 per group
# Non-inferiority test
# Using TrialSize package
library(TrialSize)
p_control <- 0.20
p_treatment <- 0.20 # Assume equal under H1
delta <- 0.10 # Non-inferiority margin
alpha <- 0.025 # One-sided
# Manual calculation
z_alpha <- qnorm(1 - alpha)
z_beta <- qnorm(0.80)
var_sum <- p_treatment*(1-p_treatment) + p_control*(1-p_control)
n_ni <- ((z_alpha + z_beta)^2 * var_sum) / (delta)^2
ceiling(n_ni)
# Result: n = 252 per group
# Cluster RCT adjustment
n_simple <- 294
m <- 20 # cluster size
icc <- 0.05 # intraclass correlation
deff <- 1 + (m - 1) * icc # design effect = 1.95
n_cluster <- ceiling(n_simple * deff)
# Result: n = 574 per group
# Dropout adjustment (ordinary independent-subject dropout)
dropout_rate <- 0.15
n_adjusted <- ceiling(n_cluster / (1 - dropout_rate))
# Result: n = 676 per group
# Use (1 - dropout_rate)^2 only for paired/change-from-baseline designs
# where losing either measurement drops the subject.8. References
- Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. 3rd ed. Wiley; 2003.
- Chow SC, Shao J, Wang H, Lokhnygina Y. Sample Size Calculations in Clinical Research. 3rd ed. CRC Press; 2017.
- Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Lawrence Erlbaum Associates; 1988.
- Yates F. Contingency tables involving small numbers and the χ² test. JRSS Supplement. 1934;1:217-235.
- Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. Arnold; 2000.
- U.S. Food and Drug Administration. Non-Inferiority Clinical Trials to Establish Effectiveness: Guidance for Industry. November 2016.
- International Council for Harmonisation (ICH). E9 Statistical Principles for Clinical Trials. February 1998.
Last updated: May 2026
Ready to calculate your sample size?
Use our Chi-Square Calculator to determine the sample size needed for comparing proportions between groups.