A Complete Guide to Sample Size Re-estimation
Sample Size Re-estimation (SSR) is an adaptive design strategy that allows a clinical trial to adjust its sample size at an interim analysis—compensating for planning assumptions that turn out to be wrong. This guide covers when to use SSR, how to choose among the three main variants (blinded and unblinded for two-arm RCTs, and single-arm for Phase II oncology designs against a historical control), and how to integrate SSR into your protocol and Statistical Analysis Plan.
Analogy: Insurance Against Planning Uncertainty
Designing a clinical trial is like budgeting a construction project. You estimate material costs (variance, event rates) and scope (effect size) based on pilot data or literature. But once construction begins, lumber prices may spike or the foundation may need reinforcement. SSR is your change-order clause—a pre-specified mechanism to adjust the budget mid-project without starting over, while keeping the building code (Type I error) intact.
Contents
I. When to Use Sample Size Re-estimation
Every sample size calculation depends on planning assumptions: the expected treatment effect, the variance of the outcome, the response rate, or the event rate. When these assumptions come from small pilot studies, literature reviews, or expert opinion, they carry substantial uncertainty. An underpowered trial wastes patients and resources; an overpowered one is inefficient.
SSR addresses this by allowing a pre-specified interim look at the accumulating data to revise the sample size. It is appropriate when:
Nuisance parameter uncertainty
The variance, response rate, or event rate used in planning is based on limited data and could be substantially wrong. This is the most common and least-controversial use case.
Effect size uncertainty
The treatment effect may be smaller than hoped but still clinically meaningful. If the interim data shows a “promising” trend, you may want to increase the sample size to rescue the trial rather than fail due to optimistic planning.
Regulatory or operational flexibility
The protocol needs to accommodate real-world uncertainty while maintaining rigorous Type I error control—a requirement for regulatory submissions.
Phase II oncology ORR against a historical control
Single-arm Phase II trials test objective response rate (ORR) against a fixed historical control rate . The target rate is often a hopeful planning assumption; SSR lets you extend enrollment toward an budget when the interim ORR is promising but not definitive, without committing the full budget up front.
Key principle: SSR is not about “fishing” for significance. It is a pre-planned mechanism written into the protocol before the trial begins, with rules that preserve the validity of the final analysis.
II. Choosing Your Approach: Three Variants
Three variants dominate modern clinical trial SSR. The right choice depends on (a) whether the trial is randomized or single-arm, and (b) what source of uncertainty you are protecting against. Each variant has its own statistical machinery, operational requirements, and regulatory posture.
| Aspect | Blinded | Unblinded (two-arm) | Single-Arm (Phase II ORR) |
|---|---|---|---|
| Design setting | Randomized two-arm RCT | Randomized two-arm RCT | Single-arm vs fixed historical control |
| What you observe at interim | Pooled data only (no treatment labels) | Per-arm data (treatment unblinded, via DMC) | Responder count from the single arm (no blind to break) |
| What you re-estimate | Nuisance parameters only (variance, pooled rate, event rate) | Treatment effect and nuisance parameters | Posterior/predictive probability or CP under observed ORR |
| Decision rule | Plug-in with updated nuisance | Promising zone (conditional power) | Bayesian (PP promising zone) OR CP promising zone |
| Type I error control | Preserved by blinding—no special test | Inverse-normal combination test | Simulation-calibrated; not analytically guaranteed |
| DMC required? | No (sponsor can perform) | Yes (independent DMC) | No (single-arm, no unblinding risk) |
| Regulatory acceptance | Widely accepted (FDA, EMA) | Accepted with pre-specification; more scrutiny | FDA 2019, Project Optimus; Type I must be confirmed by simulation |
| Operational complexity | Low | High (DMC charter, firewalls) | Low (sponsor-run interim; no DMC firewalls required) |
| Best for | Uncertain variance or event rate | Uncertain effect size with “rescue” intent | Phase II oncology ORR with uncertain vs historical |
| Representative reference | Kieser & Friede (2003) | Mehta & Pocock (2011) | Lee & Liu (2008); Jung (2013) |
Rule of thumb:
- •Randomized trial, main worry is variance/rate misspecification → Blinded SSR.
- •Randomized trial, main worry is effect-size over-optimism and you have a DMC → Unblinded SSR.
- •Phase II single-arm against historical control (ORR) → Single-Arm SSR, usually in Bayesian mode.
Can you combine them? In principle a trial could include a blinded SSR for nuisance parameters and an unblinded SSR for effect size at different interim fractions, but this is rare. Single-arm SSR does not combine with the two-arm variants—it is a different design. Choose the single approach that addresses your primary source of uncertainty.
III. Blinded SSR: How It Works
Blinded SSR re-estimates nuisance parameters (the quantities that affect power but are not the treatment effect itself) from pooled interim data, then recalculates the required sample size using the original effect size assumption.
The Algorithm
Compute the initial sample size
Using planning assumptions (effect size, variance/rate, alpha, power), calculate the target sample size with the standard formula for your endpoint type.
Enroll to the interim fraction
Collect data on subjects, where is the pre-specified interim fraction (typically 0.5).
Estimate the nuisance parameter from pooled data
Without unblinding, compute the blinded estimate: pooled variance for continuous endpoints, pooled response rate for binary, or pooled event rate for survival.
Recalculate N using the updated nuisance parameter
Plug the observed nuisance parameter into the sample size formula while keeping the planned effect size fixed. The treatment effect assumption does not change—only the “noise” estimate.
Apply constraints
Enforce the protocol cap () and the interim floor (—you cannot un-enroll patients). For continuous/binary endpoints, enforce even parity; for survival, split by allocation ratio.
Why does blinding preserve Type I error?
Because the sponsor never sees treatment-specific outcomes, the sample size adjustment depends only on the overall variability of the data—not on whether one arm is doing better. The decision to increase N carries no information about the treatment effect, so the standard test statistic at the final analysis remains valid. This was formally established by Kieser & Friede (2003) and is explicitly acknowledged in FDA Guidance on Adaptive Designs (2019).
Note on bias: The blinded pooled variance includes a bias term (Kieser & Friede 2003) because it mixes two distributions. For small effects relative to the variance, this bias is negligible. The calculator uses the blinded estimate directly, which is conservative (slightly overestimates N).
IV. Unblinded SSR: The Promising Zone Approach
Unblinded SSR (Mehta & Pocock 2011) allows the sample size to increase when the interim treatment effect falls in a “promising zone”—large enough to suggest a real benefit, but not large enough for the trial to succeed at its originally planned N. The key challenge is maintaining Type I error control when the sample size depends on the unblinded treatment effect.
The Four Zones
At the interim analysis, compute the conditional power (CP)—the probability of achieving significance at the final analysis given the data observed so far. The CP determines which zone the trial is in:
Favorable Zone (CP ≥ 80%)
The trial is on track to succeed. Keep the original N.
Promising Zone (30% ≤ CP < 80%)
The effect is trending in the right direction but the trial is underpowered at the current N. Increase N using the fixed-design sample size formula under the observed effect at the target power (typically 90%).
Unfavorable Zone (10% ≤ CP < 30%)
The effect is weak. Increasing N would require an impractically large increase. Keep the original N.
Futility Zone (CP < 10%)
The data suggest the treatment is unlikely to work. Consider stopping for futility.
The Combination Test
When N depends on the observed treatment effect, the standard z-test at the final analysis is no longer valid—the distribution of the test statistic changes because the sample size was influenced by the interim data. The inverse-normal combination test resolves this by splitting the evidence into two independent stages:
where and are pre-specified weights based on the information fraction , and are the z-statistics from Stage 1 and Stage 2 data respectively. Reject if .
Why it works: Because and Stage 1 and Stage 2 data are independent, follows a standard normal under regardless of how N was modified between stages. The critical value remains —the same as a fixed design.
For survival endpoints: The combination test uses the event-based information fraction rather than the sample-based fraction, since events (not patients) drive the statistical information.
V. Single-Arm SSR: Bayesian & CP for Phase II ORR
Single-arm Phase II oncology trials compare the objective response rate (ORR) of a new treatment against a fixed historical control rate . The planning target is some higher rate , but both and carry real uncertainty. Single-arm SSR lets you extend enrollment when the interim ORR is promising but not definitive, bounded by a pre-specified ceiling . In the current engine the promising-zone extension itself is fixed at , with acting as an upper ceiling if that value is lower than the extension.
The calculator supports two decision rules at the interim: a Bayesian rule based on posterior and predictive probabilities, and a Conditional Power (CP) rule adapted from the Mehta–Pocock promising zone. The Bayesian mode is generally preferred for single-arm binary endpoints (see the warning box below).
A. Bayesian mode (recommended)
Place a Beta prior on the ORR (Jeffreys by default). After observing responders in interim patients, the posterior is conjugate:
Two distinct thresholds govern the design, and the calculator keeps them decoupled:
- •
gamma_efficacy— interim early-stop bar. Stop early for efficacy at the interim if . Typical values: 0.95–0.99. - •
gamma_final— final-analysis success bar. At the final look, declare success if . Defaults to (e.g., 0.975 for ).
Why decouple gamma_efficacy from gamma_final?
Using a single high threshold (e.g., ) at both looks controls Type I error aggressively but requires a much stronger interim signal to declare final success, which depresses simulated power. Using a single low threshold boosts power but inflates Type I error via easy early stops. Separating them— a conservative gamma_efficacy for interim, a moderate gamma_final for the final look—hits the design power target without giving up calibration.
The predictive probability of success (PPoS) is the probability the final posterior will clear gamma_final, computed by integrating over the Beta–Binomial distribution of the remaining outcomes. PPoS (rather than a raw posterior) is the right quantity for interim futility decisions because it directly answers “will the trial succeed if we continue?” (Saville et al., 2014):
The four interim decisions
Stop for early efficacy
. Final N equals the interim N; enrollment terminates at the interim look.
Stop for futility
(default 0.05). Final N equals the interim N.
Extend N (promising)
(default upper bound 0.50). Extend the final N to . The engine currently uses a fixed extension factor in the Bayesian promising zone; acts as an upper ceiling on that extension, not as the extension target. Raising pp_promising_upper to 0.70 keeps more trials in the promising zone (more trials reach this extended N) but does not push the per-trial extension beyond .
Continue at planned N (favorable)
but posterior has not crossed the early-stop bar. Enrollment continues to the originally planned . In all non-stop scenarios, final N is not reduced below .
B. Conditional Power mode
The CP mode adapts the Mehta–Pocock promising-zone framework to the one-sample binomial setting. Conditional power under the observed interim ORR is classified into four zones (futility, unfavorable, promising, favorable) using the CP thresholds cp_futility, cp_promising_lower, cp_promising_upper; N is extended only in the promising zone, using the fixed-design formula at the observed ORR.
Warning: Type I error is non-monotonic in cp_promising_lower
The Mehta–Pocock Type I error preservation theorem (Chen, DeMets, Lan 2004; Gao, Ware, Mehta 2008) holds in the two-arm normal/z-test setting. It does not transfer analytically to single-arm binomial designs, because interim outcomes follow a discrete distribution. As a direct consequence, simulated Type I error is not a monotonic function of cp_promising_lower— raising the threshold can move T1E in either direction depending on whether the new cutoff falls between two adjacent attainable interim event counts. Do not assume tighter bounds produce lower Type I error; grid-search neighbouring thresholds via simulation and pick the one that balances calibration and power.
Recommendation: For single-arm binary endpoints, prefer the Bayesian mode. It is calibrated on the continuous predictive-probability scale, decouples gamma_efficacy from gamma_final, and behaves monotonically in gamma_efficacy—making calibration substantially more straightforward.
C. Type I error calibration
Neither mode gives an analytic guarantee that simulated Type I error equals nominal . In practice:
- •Run Tier 2 simulation at with replicates.
- •If Bayesian T1E exceeds , raise
gamma_efficacy(typically to 0.97–0.99) so fewer null trials cross the interim bar. Raisinggamma_finalalso reduces final-look false positives but costs power. - •If CP T1E exceeds , do not blindly raise
cp_promising_lower—search a small neighbourhood instead. - •If power at is below target, lower
gamma_finaltoward , raisepp_promising_upper(Bayesian), or raiseinterim_n/n_max_factor.
VI. Worked Examples
Blinded SSR: Continuous Endpoint
Scenario: A two-arm RCT targets a mean difference of units with planned variance , (one-sided), and 90% power.
An SSR interim look is planned at 50% enrollment, with a maximum sample size cap of .
Step 1: Initial sample size
Total . Cap . Interim at .
Step 2: Blinded variance estimate
At the interim, the pooled (blinded) variance across all 85 subjects is —44% higher than planned.
Step 3: Recalculate N
Raw . After constraints: floor ✔, cap ✔. Final (inflation factor 1.44×).
Outcome: Without SSR, the trial would have been underpowered (actual power ~75% given the true variance). The SSR adjustment restores 90% power by increasing enrollment from 170 to 244—without unblinding anyone.
Unblinded SSR: Binary Endpoint
Scenario: A Phase III trial compares a new therapy (planned ) vs. control () with and 90% power.
An unblinded SSR is planned at 50% enrollment. Zone thresholds: futility CP < 10%, promising 30–80%, favorable ≥ 80%.
Step 1: Initial sample size
Using the Fleiss, Levin & Paik pooled-variance formula: , . Cap (). Interim at 50% of planned enrollment; the calculator rounds up to preserve per-arm parity: total (109 per arm).
Step 2: Unblinded interim results
The DMC reports the observed arm-level event counts at interim:
- Treatment: ()
- Control: ()
Observed effect is smaller than the planned 0.15.
Step 3: Conditional power and zone
CP under the observed effect at the current is approximately 55–60%. This falls in the promising zone (30% ≤ CP < 80%).
Step 4: Re-estimate N
Using the fixed-design formula under the observed effect (), the calculator computes the N required for 90% power at roughly . After constraints: floor ✔, cap —cap binds, .
Step 5: Final analysis with combination test
At trial completion, and are computed from Stage 1 and Stage 2 data. The combination statistic is compared to .
Outcome: The trial was “rescued” by the SSR: the observed effect (~9 pp) was clinically meaningful but smaller than planned (15 pp). Without SSR, the trial would have been underpowered and likely failed. With SSR, the increase from 434 to 868 subjects (cap-bound at ) restored adequate power while maintaining strict Type I error control via the combination test.
Single-Arm SSR: Phase II ORR (Bayesian)
Scenario: A Phase II oncology study tests a new agent against a historical control ORR with a target ORR , (one-sided), and 80% power.
Jeffreys prior . Interim at 50%. Thresholds: , (auto, ), , . Budget cap ().
Step 1: Initial sample size
One-sample binomial normal approximation. Interim at ; cap .
Step 2: Interim results
Observe responders in patients (observed ORR 33%, below but well above ).
Step 3: Posterior probability
Posterior: .
Below the early-stop bar —no early efficacy stop.
Step 4: Predictive probability of success
Integrate over the remaining 18 outcomes using the Beta–Binomial mixture:
—this falls in the promising zone. Neither stop for futility nor trigger early efficacy.
Step 5: Re-estimate N
Extend enrollment: . Inflation .
Step 6: Final analysis
At , declare success if under the Beta posterior from all 54 patients.
Outcome: The interim data did not definitively confirm , but ruled out with moderate evidence. SSR extended enrollment to 54 patients, giving the Bayesian final test more information to distinguish between the null and alternative. Operating characteristics (Type I error, power, expected N) must be verified by Tier 2 simulation at and before locking the protocol.
VII. Planning Workflow
SSR must be written into the protocol and SAP before the trial begins. Here is the typical sequence of decisions during protocol development:
Identify the source of uncertainty
Is the concern primarily about the nuisance parameter (variance, rate) or the treatment effect? This determines blinded vs. unblinded.
Choose the interim fraction
Typical range: 25–75% of planned enrollment. Common choices are 50% (balanced information) or 33% (earlier look, but noisier estimate). For survival endpoints, this is the fraction of planned events, not patients.
Set the maximum cap
The protocol must specify the maximum allowed increase (e.g., ). This limits operational and financial risk. Regulatory agencies expect this to be pre-specified.
Pre-specify decision-rule parameters
Unblinded two-arm: CP thresholds delineating four zones—futility (< 10%), unfavorable (10–30%), promising (30–80%), favorable (≥ 80%). Defaults from Mehta & Pocock (2011).
Single-Arm Bayesian: prior (Jeffreys or domain-informed Beta), gamma_efficacy (typ. 0.95–0.99), gamma_final (default ), delta_futility (typ. 0.05–0.10), and pp_promising_upper (default 0.50, raise to 0.70 for fuller budget utilization).
Single-Arm CP: same four-zone CP thresholds as the unblinded variant, but note non-monotonicity in cp_promising_lower (see Section V).
Run sensitivity analysis
Use the calculator's sensitivity table to explore how N changes across a range of plausible nuisance parameters (blinded) or effect sizes (unblinded). This informs the choice of cap and verifies feasibility.
Document in protocol and SAP
Write the SSR procedure, interim fraction, cap, and decision rules into the protocol and SAP using precise statistical language (see Section IX below).
Simulation validation: All three calculators support Monte Carlo simulation (10,000 trials by default). For blinded and unblinded two-arm SSR, simulation confirms the analytical Type I error guarantee. For single-arm SSR, simulation is not optional—neither Bayesian nor CP mode is analytically tied to , so operating characteristics must be verified by Tier 2 simulation at (Type I) and (power) before locking the protocol.
VIII. When NOT to Use SSR
SSR is a powerful tool, but it is not appropriate in every situation:
Planning assumptions are well-established
If the variance, event rate, and effect size are well-characterized from large Phase II trials or meta-analyses, there is little to gain from SSR and the operational overhead is not justified.
The sample size is already capped by feasibility
If the maximum feasible enrollment is already reached (e.g., rare disease with a fixed patient pool), SSR cannot increase N beyond what is available.
GSD early stopping is the primary concern
If the goal is to stop early for efficacy or futility (not to increase N), use Group Sequential Design instead. SSR and GSD serve different purposes and can be combined, but SSR alone does not provide stopping boundaries.
Non-proportional hazards or complex censoring (survival)
The survival SSR assumes exponential event times and proportional hazards. If these assumptions are substantially violated (e.g., immunotherapy with delayed separation), external simulation tools may be needed.
No DMC and treatment effect is the concern
Unblinded SSR requires an independent DMC with appropriate firewalls. If your trial does not have a DMC (common in tech A/B tests or small academic trials), you are limited to blinded SSR.
You can afford a concurrent control (prefer two-arm)
Single-arm SSR is a design of convenience for Phase II oncology where a concurrent randomized control is infeasible or ethically difficult. If randomization is feasible, a two-arm design with blinded or unblinded SSR is strictly preferable: the historical control rate in single-arm designs introduces bias whenever the patient population or standard of care has drifted since the historical data were collected.
Single-arm CP mode when Bayesian is available
For single-arm binary endpoints, CP-mode Type I error is non-monotonic in cp_promising_lower due to binomial discreteness (Section V). Prefer the Bayesian mode of the single-arm calculator unless a frequentist framing is a hard requirement from a reviewing statistician.
IX. Example SAP Language
The following templates can be adapted for your protocol or Statistical Analysis Plan. Replace bracketed values with your trial-specific parameters.
Blinded SSR
“A blinded sample size re-estimation will be conducted after [50%] of the planned [168] subjects have been enrolled and have completed the primary endpoint assessment. The blinded pooled [variance / response rate / event rate] will be estimated from all available data without unblinding treatment assignment.
The sample size will be recalculated using the observed [nuisance parameter] and the originally planned treatment effect of [δ = 5 units / 15 percentage points / HR = 0.70], maintaining the [one-sided α = 0.025] significance level and [90%] power.
The recalculated sample size will be subject to a maximum cap of [2.0×] the initial sample size ([336] subjects) and a minimum of the number of subjects already enrolled. This procedure preserves the Type I error rate as the sample size adjustment is based only on blinded aggregate data (Kieser & Friede, 2003; FDA Guidance on Adaptive Designs, 2019, Section IV.B.1).”
Unblinded SSR
“An unblinded sample size re-estimation based on the promising zone approach (Mehta & Pocock, 2011) will be conducted after [50%] of the planned [324] subjects have been enrolled. The independent Data Monitoring Committee (DMC) will review unblinded treatment-arm data and compute the conditional power (CP) under the observed treatment effect.
The conditional power will be used to classify the interim result into one of four zones: futility (CP < [10%]), unfavorable ([10%] ≤ CP < [30%]), promising ([30%] ≤ CP < [80%]), or favorable (CP ≥ [80%]). If the result falls in the promising zone, the total sample size will be recalculated using the fixed-design formula under the observed effect at [90%] power, subject to a maximum of [2.0×] the initial sample size ([648] subjects). In all other zones, the original sample size will be maintained.
The final analysis will employ the inverse-normal combination test with pre-specified weights and , rejecting the null hypothesis if . This procedure controls the familywise Type I error rate at [2.5%] one-sided regardless of the sample size modification (Müller & Schäfer, 2001; Mehta & Pocock, 2011).”
Single-Arm SSR (Bayesian, Phase II ORR)
“The planned sample size is based on a one-sample test of vs. at one-sided with 80% power. A (Jeffreys) prior is placed on the ORR. An interim analysis will be conducted after evaluable patients.
Let denote the posterior probability of superiority over the historical control. The trial will stop for early efficacy at the interim if ; final sample size in this case equals the interim N. Let PPoS denote the predictive probability that the final posterior will satisfy (defaulting to ). The trial will stop for futility if .
If (promising zone), the sample size will be extended to with . If PPoS , enrollment continues to the originally planned . In all non-stop scenarios the final sample size will not be reduced below .
Final success will be declared if at the final analysis. Because the posterior-probability decision rule is not analytically tied to frequentist Type I error, the operating characteristics of this design (simulated Type I error at and power at ) are documented in Appendix [X] using a calibrated Monte Carlo study of replicates (Lee & Liu, 2008; FDA Guidance on Adaptive Designs, 2019).”
X. R Code
Standalone R implementations for verifying the calculator results. These use base R only (no special packages required).
Blinded SSR: Continuous Endpoint
# Blinded SSR for continuous endpoint
blinded_ssr_continuous <- function(
delta, # Planned mean difference
sigma2_planned, # Planned variance
sigma2_obs, # Observed blinded pooled variance
alpha = 0.025, # One-sided alpha
power = 0.90, # Target power
interim_frac = 0.50,
n_max_factor = 2.0
) {
z_alpha <- qnorm(1 - alpha)
z_beta <- qnorm(power)
# Initial N
n_per_arm_0 <- ceiling(2 * (z_alpha + z_beta)^2 * sigma2_planned / delta^2)
N0 <- 2 * n_per_arm_0
N_interim <- ceiling(interim_frac * N0)
N_cap <- ceiling(N0 * n_max_factor)
# Recalculate with observed variance
n_per_arm_1 <- ceiling(2 * (z_alpha + z_beta)^2 * sigma2_obs / delta^2)
N1_raw <- 2 * n_per_arm_1
# Constrain: floor at interim, cap at N_max, even parity
# Backend logic: cap rounds DOWN (n_cap %/% 2), uncapped rounds UP
bounded <- min(max(N1_raw, N_interim), N_cap)
if (bounded < N_interim) {
n_per_arm <- ceiling(N_interim / 2)
} else if (N1_raw > N_cap) {
n_per_arm <- N_cap %/% 2 # cap binding: round DOWN
} else {
n_per_arm <- ceiling(N1_raw / 2) # uncapped: round UP
}
N1 <- 2 * n_per_arm
# Conditional power
n1_per_arm <- N_interim / 2
z_expected <- delta * sqrt(n1_per_arm / (2 * sigma2_obs))
R <- N1 / N_interim
CP <- pnorm(z_expected * sqrt(R) - z_alpha * sqrt(R - 1))
list(
initial_N = N0,
interim_N = N_interim,
new_N = N1,
inflation = N1 / N0,
cond_power = round(CP, 4),
cap_binding = N1_raw > N_cap
)
}
# Example: planned sigma2=100, observed sigma2=144
blinded_ssr_continuous(delta = 5, sigma2_planned = 100, sigma2_obs = 144)
# initial_N=170, new_N=244, inflation=1.44, cond_power~0.72Unblinded SSR: Binary Endpoint (Promising Zone)
# Unblinded SSR for binary endpoint (Mehta & Pocock 2011)
unblinded_ssr_binary <- function(
p_c_planned, # Planned control rate
p_t_planned, # Planned treatment rate
p_c_obs, # Observed control rate at interim
p_t_obs, # Observed treatment rate at interim
alpha = 0.025,
power = 0.90,
interim_frac = 0.50,
n_max_factor = 2.0,
cp_futility = 0.10,
cp_promising_lower = 0.30,
cp_promising_upper = 0.80
) {
z_alpha <- qnorm(1 - alpha)
z_beta <- qnorm(power)
# Initial N (Fleiss-Levin-Paik)
delta_plan <- abs(p_t_planned - p_c_planned)
p_bar_plan <- (p_c_planned + p_t_planned) / 2
n0 <- ceiling(
((z_alpha * sqrt(2 * p_bar_plan * (1 - p_bar_plan))
+ z_beta * sqrt(p_c_planned*(1-p_c_planned) + p_t_planned*(1-p_t_planned)))
/ delta_plan)^2
)
N0 <- 2 * n0
N_interim <- ceiling(interim_frac * N0)
N_cap <- ceiling(N0 * n_max_factor)
# Stage 1 z-statistic
n1_per_arm <- N_interim / 2
p_bar_obs <- (p_c_obs + p_t_obs) / 2
delta_obs <- p_t_obs - p_c_obs
SE1 <- sqrt(p_bar_obs * (1 - p_bar_obs) * 2 / n1_per_arm)
z1 <- delta_obs / SE1
# Conditional power under observed effect
R <- N0 / N_interim
CP <- pnorm(z1 * sqrt(R) - z_alpha * sqrt(R - 1))
# Zone classification
zone <- if (CP >= cp_promising_upper) "favorable"
else if (CP >= cp_promising_lower) "promising"
else if (CP >= cp_futility) "unfavorable"
else "futility"
# Re-estimate N if promising
if (zone == "promising") {
n1_new <- ceiling(
((z_alpha * sqrt(2 * p_bar_obs * (1 - p_bar_obs))
+ z_beta * sqrt(p_c_obs*(1-p_c_obs) + p_t_obs*(1-p_t_obs)))
/ abs(delta_obs))^2
)
N1_raw <- max(2 * n1_new, N_interim)
} else {
N1_raw <- N0
}
# Constrain: cap rounds DOWN, uncapped rounds UP (matches backend)
bounded <- min(max(N1_raw, N_interim), N_cap)
if (bounded < N_interim) {
n_per_arm <- ceiling(N_interim / 2)
} else if (N1_raw > N_cap) {
n_per_arm <- N_cap %/% 2
} else {
n_per_arm <- ceiling(N1_raw / 2)
}
N1 <- 2 * n_per_arm
# Combination weights
w1 <- sqrt(interim_frac)
w2 <- sqrt(1 - interim_frac)
list(
initial_N = N0,
interim_N = N_interim,
z1 = round(z1, 4),
cp_obs = round(CP, 4),
zone = zone,
new_N = N1,
inflation = N1 / N0,
w1 = round(w1, 4),
w2 = round(w2, 4),
z_crit = round(z_alpha, 4)
)
}
# Example: planned 45% vs 30%, observed 38% vs 28%
unblinded_ssr_binary(
p_c_planned = 0.30, p_t_planned = 0.45,
p_c_obs = 0.28, p_t_obs = 0.38
)
# zone="promising", N0=434, N1=868 (cap-bound at 2x)Survival SSR: Events and N Conversion
# Schoenfeld events and N conversion for survival SSR
survival_ssr_events <- function(
HR, # Planned hazard ratio
median_control, # Control median survival (months)
accrual_time, # Accrual period (months)
follow_up_time, # Follow-up after accrual (months)
alpha = 0.025,
power = 0.90,
alloc_ratio = 1.0,
dropout_rate = 0.0
) {
z_alpha <- qnorm(1 - alpha)
z_beta <- qnorm(power)
r <- alloc_ratio
# Required events (Schoenfeld)
d <- ceiling((z_alpha + z_beta)^2 * (1 + r)^2 / (r * log(HR)^2))
# Event probability (exponential model, uniform accrual)
lambda_c <- log(2) / median_control
lambda_t <- lambda_c * HR
total_time <- accrual_time + follow_up_time
p_c <- 1 - exp(-lambda_c * (total_time - accrual_time / 2))
p_t <- 1 - exp(-lambda_t * (total_time - accrual_time / 2))
# Apply dropout
if (dropout_rate > 0) {
years <- total_time / 12
retention <- (1 - dropout_rate)^years
p_c <- p_c * retention
p_t <- p_t * retention
}
p_avg <- (p_c + r * p_t) / (1 + r)
p_avg <- max(p_avg, 0.01)
N <- ceiling(d / p_avg)
n_control <- ceiling(N / (1 + r))
n_treatment <- N - n_control
list(
events_required = d,
p_event_control = round(p_c, 4),
p_event_treatment = round(p_t, 4),
p_event_avg = round(p_avg, 4),
N_total = N,
n_control = n_control,
n_treatment = n_treatment
)
}
# Example: HR=0.7, median control=12 months
survival_ssr_events(HR = 0.7, median_control = 12,
accrual_time = 24, follow_up_time = 12)
# events_required=331, p_avg~0.69, N_total=483Single-Arm SSR: Bayesian Decision Rule
# Single-Arm SSR (Bayesian) — interim decision at n_1 responders observed.
# Reproduces the analytical posterior / predictive-probability logic used
# by the Zetyra single-arm SSR calculator. Base R only.
single_arm_bayesian_decision <- function(
p0, # historical control ORR
p1, # planned alternative ORR
alpha = 0.025, # one-sided alpha
power = 0.80, # target power
prior_alpha = 0.5, # Beta prior a (Jeffreys default)
prior_beta = 0.5, # Beta prior b
interim_frac = 0.5,
n_max_factor = 1.5,
gamma_efficacy = 0.99,
gamma_final = NULL, # NULL -> auto (1 - alpha)
delta_futility = 0.05,
pp_promising_upper = 0.50,
r_1 = NULL # observed interim responders
) {
if (is.null(gamma_final)) gamma_final <- 1 - alpha
# Initial N (one-sample binomial normal approximation)
z_a <- qnorm(1 - alpha); z_b <- qnorm(power)
N0 <- ceiling(
((z_a * sqrt(p0 * (1 - p0))
+ z_b * sqrt(p1 * (1 - p1))) / (p1 - p0))^2
)
n1 <- max(10L, as.integer(round(interim_frac * N0)))
N_max <- max(N0, ceiling(N0 * n_max_factor))
# Expected interim events under H1 if the user did not pass r_1
if (is.null(r_1)) r_1 <- as.integer(round(p1 * n1))
# Posterior P(p > p0 | data) under Beta-Binomial conjugate update
a_post <- prior_alpha + r_1
b_post <- prior_beta + (n1 - r_1)
post_prob <- 1 - pbeta(p0, a_post, b_post)
# Predictive probability of success at the FINAL look. Integrate over
# the remaining n_rem outcomes via Beta-Binomial (log-space for stability).
n_rem <- N0 - n1
ppos <- 0
if (n_rem > 0) {
for (y in 0:n_rem) {
log_c <- (lgamma(n_rem + 1) - lgamma(y + 1)
- lgamma(n_rem - y + 1)
+ lgamma(a_post + y) + lgamma(b_post + n_rem - y)
- lgamma(a_post + b_post + n_rem)
+ lgamma(a_post + b_post)
- lgamma(a_post) - lgamma(b_post))
p_y <- exp(log_c)
final_post <- 1 - pbeta(p0, a_post + y, b_post + n_rem - y)
if (final_post >= gamma_final) ppos <- ppos + p_y
}
} else {
# Interim is the full trial — PPoS collapses to the indicator.
ppos <- as.numeric(post_prob >= gamma_final)
}
# Decision rule
if (post_prob >= gamma_efficacy) {
decision <- "stop_efficacy"; final_n <- n1
} else if (ppos <= delta_futility) {
decision <- "stop_futility"; final_n <- n1
} else if (ppos < pp_promising_upper) {
decision <- "continue_ssr"
final_n <- min(N_max, as.integer(ceiling(N0 * 1.5)))
} else {
decision <- "continue_favorable"; final_n <- N0
}
list(
N0 = N0, n1 = n1, N_max = N_max,
posterior_prob = round(post_prob, 4),
predictive_prob = round(ppos, 4),
gamma_final = gamma_final,
decision = decision,
final_n = final_n
)
}
# Example: p0=0.20, p1=0.40, observe 6 responders in n1=18
single_arm_bayesian_decision(p0 = 0.20, p1 = 0.40, r_1 = 6)
# N0=36, n1=18, posterior~0.90, PPoS~0.42, decision="continue_ssr", final_n=54XI. References
Mehta CR, Pocock SJ. Adaptive increase in sample size when interim results are promising: A practical guide with examples. Statistics in Medicine. 2011;30(28):3267–3284.
Kieser M, Friede T. Simple procedures for blinded sample size adjustment that do not affect the type I error rate. Statistics in Medicine. 2003;22(23):3571–3581.
Müller HH, Schäfer H. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches. Biometrics. 2001;57(3):886–891.
Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika. 1981;68(1):316–319.
Gould AL. Interim analyses for monitoring clinical trials that do not materially affect the type I error rate. Statistics in Medicine. 1992;11(1):55–66.
Friede T, et al. Blinded sample size re-estimation in event-driven clinical trials. Pharmaceutical Statistics. 2019;18(5):578–588.
Chen YHJ, DeMets DL, Lan KKG. Increasing the sample size when the unblinded interim result is promising. Statistics in Medicine. 2004;23(7):1023–1038.
Lee JJ, Liu DD. A predictive probability design for phase II cancer clinical trials. Clinical Trials. 2008;5(2):93–106.
Saville BR, Connor JT, Ayers GD, Alvarez J. The utility of Bayesian predictive probabilities for interim monitoring of clinical trials. Clinical Trials. 2014;11(4):485–493.
Jung SH. Randomized Phase II Cancer Clinical Trials. Chapman & Hall/CRC; 2013. (Single-arm and selection designs with predictive-probability stopping.)
FDA. Project Optimus: Optimizing the Dosage of Human Prescription Drugs and Biological Products for the Treatment of Oncologic Diseases. 2023.
FDA. Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry. 2019.
Ready to Calculate?
Use the SSR calculators to compute recalculated sample sizes, conditional power, zone classification, and sensitivity tables.
Related Documentation
Blinded SSR Technical Reference
Full mathematical derivations, API specification, and validation benchmarks.
Unblinded SSR Technical Reference
Combination test theory, zone classification, and API specification.
Single-Arm SSR Technical Reference
Bayesian posterior/predictive-probability framework, CP non-monotonicity, and API specification for Phase II ORR designs.
Complete Guide to GSD
Stopping boundaries and interim analysis planning—complementary to SSR.