Docs/Guides/Sample Size Re-estimation

A Complete Guide to Sample Size Re-estimation

Sample Size Re-estimation (SSR) is an adaptive design strategy that allows a clinical trial to adjust its sample size at an interim analysis—compensating for planning assumptions that turn out to be wrong. This guide covers when to use SSR, how to choose among the three main variants (blinded and unblinded for two-arm RCTs, and single-arm for Phase II oncology designs against a historical control), and how to integrate SSR into your protocol and Statistical Analysis Plan.

Analogy: Insurance Against Planning Uncertainty

Designing a clinical trial is like budgeting a construction project. You estimate material costs (variance, event rates) and scope (effect size) based on pilot data or literature. But once construction begins, lumber prices may spike or the foundation may need reinforcement. SSR is your change-order clause—a pre-specified mechanism to adjust the budget mid-project without starting over, while keeping the building code (Type I error) intact.

I. When to Use Sample Size Re-estimation

Every sample size calculation depends on planning assumptions: the expected treatment effect, the variance of the outcome, the response rate, or the event rate. When these assumptions come from small pilot studies, literature reviews, or expert opinion, they carry substantial uncertainty. An underpowered trial wastes patients and resources; an overpowered one is inefficient.

SSR addresses this by allowing a pre-specified interim look at the accumulating data to revise the sample size. It is appropriate when:

Nuisance parameter uncertainty

The variance, response rate, or event rate used in planning is based on limited data and could be substantially wrong. This is the most common and least-controversial use case.

Effect size uncertainty

The treatment effect may be smaller than hoped but still clinically meaningful. If the interim data shows a “promising” trend, you may want to increase the sample size to rescue the trial rather than fail due to optimistic planning.

Regulatory or operational flexibility

The protocol needs to accommodate real-world uncertainty while maintaining rigorous Type I error control—a requirement for regulatory submissions.

Phase II oncology ORR against a historical control

Single-arm Phase II trials test objective response rate (ORR) against a fixed historical control rate $p_0$ . The target rate $p_1$ is often a hopeful planning assumption; SSR lets you extend enrollment toward an $N_\text{max}$ budget when the interim ORR is promising but not definitive, without committing the full budget up front.

Key principle: SSR is not about “fishing” for significance. It is a pre-planned mechanism written into the protocol before the trial begins, with rules that preserve the validity of the final analysis.

II. Choosing Your Approach: Three Variants

Three variants dominate modern clinical trial SSR. The right choice depends on (a) whether the trial is randomized or single-arm, and (b) what source of uncertainty you are protecting against. Each variant has its own statistical machinery, operational requirements, and regulatory posture.

Aspect	Blinded	Unblinded (two-arm)	Single-Arm (Phase II ORR)
Design setting	Randomized two-arm RCT	Randomized two-arm RCT	Single-arm vs fixed historical control $p_0$
What you observe at interim	Pooled data only (no treatment labels)	Per-arm data (treatment unblinded, via DMC)	Responder count from the single arm (no blind to break)
What you re-estimate	Nuisance parameters only (variance, pooled rate, event rate)	Treatment effect and nuisance parameters	Posterior/predictive probability or CP under observed ORR
Decision rule	Plug-in with updated nuisance	Promising zone (conditional power)	Bayesian (PP promising zone) OR CP promising zone
Type I error control	Preserved by blinding—no special test	Inverse-normal combination test	Simulation-calibrated; not analytically guaranteed
DMC required?	No (sponsor can perform)	Yes (independent DMC)	No (single-arm, no unblinding risk)
Regulatory acceptance	Widely accepted (FDA, EMA)	Accepted with pre-specification; more scrutiny	FDA 2019, Project Optimus; Type I must be confirmed by simulation
Operational complexity	Low	High (DMC charter, firewalls)	Low (sponsor-run interim; no DMC firewalls required)
Best for	Uncertain variance or event rate	Uncertain effect size with “rescue” intent	Phase II oncology ORR with uncertain $p_1$ vs historical $p_0$
Representative reference	Kieser & Friede (2003)	Mehta & Pocock (2011)	Lee & Liu (2008); Jung (2013)

Rule of thumb:

•Randomized trial, main worry is variance/rate misspecification → Blinded SSR.
•Randomized trial, main worry is effect-size over-optimism and you have a DMC → Unblinded SSR.
•Phase II single-arm against historical control (ORR) → Single-Arm SSR, usually in Bayesian mode.

Can you combine them? In principle a trial could include a blinded SSR for nuisance parameters and an unblinded SSR for effect size at different interim fractions, but this is rare. Single-arm SSR does not combine with the two-arm variants—it is a different design. Choose the single approach that addresses your primary source of uncertainty.

III. Blinded SSR: How It Works

Blinded SSR re-estimates nuisance parameters (the quantities that affect power but are not the treatment effect itself) from pooled interim data, then recalculates the required sample size using the original effect size assumption.

The Algorithm

Compute the initial sample size

Using planning assumptions (effect size, variance/rate, alpha, power), calculate the target sample size $N_0$ with the standard formula for your endpoint type.

Enroll to the interim fraction

Collect data on $N_\text{interim} = \lceil t \cdot N_0 \rceil$ subjects, where $t$ is the pre-specified interim fraction (typically 0.5).

Estimate the nuisance parameter from pooled data

Without unblinding, compute the blinded estimate: pooled variance $\hat{\sigma}^2$ for continuous endpoints, pooled response rate $\hat{p}_\text{pooled}$ for binary, or pooled event rate $\hat{P}$ for survival.

Recalculate N using the updated nuisance parameter

Plug the observed nuisance parameter into the sample size formula while keeping the planned effect size fixed. The treatment effect assumption does not change—only the “noise” estimate.

Apply constraints

Enforce the protocol cap ( $N_1 \le N_\text{max}$ ) and the interim floor ( $N_1 \ge N_\text{interim}$ —you cannot un-enroll patients). For continuous/binary endpoints, enforce even parity; for survival, split by allocation ratio.

Why does blinding preserve Type I error?

Because the sponsor never sees treatment-specific outcomes, the sample size adjustment depends only on the overall variability of the data—not on whether one arm is doing better. The decision to increase N carries no information about the treatment effect, so the standard test statistic at the final analysis remains valid. This was formally established by Kieser & Friede (2003) and is explicitly acknowledged in FDA Guidance on Adaptive Designs (2019).

Note on bias: The blinded pooled variance includes a bias term $\delta^2/4$ (Kieser & Friede 2003) because it mixes two distributions. For small effects relative to the variance, this bias is negligible. The calculator uses the blinded estimate directly, which is conservative (slightly overestimates N).

IV. Unblinded SSR: The Promising Zone Approach

Unblinded SSR (Mehta & Pocock 2011) allows the sample size to increase when the interim treatment effect falls in a “promising zone”—large enough to suggest a real benefit, but not large enough for the trial to succeed at its originally planned N. The key challenge is maintaining Type I error control when the sample size depends on the unblinded treatment effect.

The Four Zones

At the interim analysis, compute the conditional power (CP)—the probability of achieving significance at the final analysis given the data observed so far. The CP determines which zone the trial is in:

Favorable Zone (CP ≥ 80%)

The trial is on track to succeed. Keep the original N.

Promising Zone (30% ≤ CP < 80%)

The effect is trending in the right direction but the trial is underpowered at the current N. Increase N using the fixed-design sample size formula under the observed effect at the target power (typically 90%).

Unfavorable Zone (10% ≤ CP < 30%)

The effect is weak. Increasing N would require an impractically large increase. Keep the original N.

Futility Zone (CP < 10%)

The data suggest the treatment is unlikely to work. Consider stopping for futility.

The Combination Test

When N depends on the observed treatment effect, the standard z-test at the final analysis is no longer valid—the distribution of the test statistic changes because the sample size was influenced by the interim data. The inverse-normal combination test resolves this by splitting the evidence into two independent stages:

Z_\text{comb} = w_1 Z_1 + w_2 Z_2

where $w_1 = \sqrt{t}$ and $w_2 = \sqrt{1-t}$ are pre-specified weights based on the information fraction $t$ , and $Z_1, Z_2$ are the z-statistics from Stage 1 and Stage 2 data respectively. Reject $H_0$ if $Z_\text{comb} > z_\alpha$ .

Why it works: Because $w_1^2 + w_2^2 = 1$ and Stage 1 and Stage 2 data are independent, $Z_\text{comb}$ follows a standard normal under $H_0$ regardless of how N was modified between stages. The critical value remains $z_\alpha$ —the same as a fixed design.

For survival endpoints: The combination test uses the event-based information fraction $t = d_\text{interim} / d$ rather than the sample-based fraction, since events (not patients) drive the statistical information.

V. Single-Arm SSR: Bayesian & CP for Phase II ORR

Single-arm Phase II oncology trials compare the objective response rate (ORR) of a new treatment against a fixed historical control rate $p_0$ . The planning target is some higher rate $p_1$ , but both $p_0$ and $p_1$ carry real uncertainty. Single-arm SSR lets you extend enrollment when the interim ORR is promising but not definitive, bounded by a pre-specified ceiling $N_\text{max}$ . In the current engine the promising-zone extension itself is fixed at $\lceil 1.5 \cdot N_0 \rceil$ , with $N_\text{max}$ acting as an upper ceiling if that value is lower than the $1.5\times$ extension.

The calculator supports two decision rules at the interim: a Bayesian rule based on posterior and predictive probabilities, and a Conditional Power (CP) rule adapted from the Mehta–Pocock promising zone. The Bayesian mode is generally preferred for single-arm binary endpoints (see the warning box below).

A. Bayesian mode (recommended)

Place a Beta prior $\text{Beta}(a_0, b_0)$ on the ORR (Jeffreys $\text{Beta}(\tfrac{1}{2}, \tfrac{1}{2})$ by default). After observing $r_1$ responders in $n_1$ interim patients, the posterior is conjugate:

p \mid \text{data} \;\sim\; \text{Beta}(a_0 + r_1,\, b_0 + n_1 - r_1)

Two distinct thresholds govern the design, and the calculator keeps them decoupled:

•gamma_efficacy — interim early-stop bar. Stop early for efficacy at the interim if $P(p > p_0 \mid \text{data}) \ge \gamma_\text{efficacy}$ . Typical values: 0.95–0.99.
•gamma_final — final-analysis success bar. At the final look, declare success if $P(p > p_0 \mid \text{data}) \ge \gamma_\text{final}$ . Defaults to $1 - \alpha$ (e.g., 0.975 for $\alpha = 0.025$ ).

Why decouple `gamma_efficacy` from `gamma_final`?

Using a single high threshold (e.g., $\gamma = 0.99$ ) at both looks controls Type I error aggressively but requires a much stronger interim signal to declare final success, which depresses simulated power. Using a single low threshold boosts power but inflates Type I error via easy early stops. Separating them— a conservative gamma_efficacy for interim, a moderate gamma_final for the final look—hits the design power target without giving up calibration.

The predictive probability of success (PPoS) is the probability the final posterior will clear gamma_final, computed by integrating over the Beta–Binomial distribution of the remaining $n_\text{total} - n_1$ outcomes. PPoS (rather than a raw posterior) is the right quantity for interim futility decisions because it directly answers “will the trial succeed if we continue?” (Saville et al., 2014):

\text{PPoS} \;=\; \sum_{y=0}^{n_\text{rem}} P(Y = y \mid \text{data})\,\mathbb{1}\bigl[P(p > p_0 \mid \text{data},\,y) \ge \gamma_\text{final}\bigr]

The four interim decisions

Stop for early efficacy

$P(p > p_0 \mid \text{data}) \ge \gamma_\text{efficacy}$ . Final N equals the interim N; enrollment terminates at the interim look.

Stop for futility

$\text{PPoS} \le \delta_\text{futility}$ (default 0.05). Final N equals the interim N.

Extend N (promising)

$\delta_\text{futility} < \text{PPoS} < \text{PP}_\text{upper}$ (default upper bound 0.50). Extend the final N to $\min(N_\text{max},\, \lceil 1.5 \cdot N_0 \rceil)$ . The engine currently uses a fixed $1.5\times$ extension factor in the Bayesian promising zone; $N_\text{max}$ acts as an upper ceiling on that extension, not as the extension target. Raising pp_promising_upper to 0.70 keeps more trials in the promising zone (more trials reach this extended N) but does not push the per-trial extension beyond $1.5 \cdot N_0$ .

Continue at planned N (favorable)

$\text{PPoS} \ge \text{PP}_\text{upper}$ but posterior has not crossed the early-stop bar. Enrollment continues to the originally planned $N_0$ . In all non-stop scenarios, final N is not reduced below $N_0$ .

B. Conditional Power mode

The CP mode adapts the Mehta–Pocock promising-zone framework to the one-sample binomial setting. Conditional power under the observed interim ORR is classified into four zones (futility, unfavorable, promising, favorable) using the CP thresholds cp_futility, cp_promising_lower, cp_promising_upper; N is extended only in the promising zone, using the fixed-design formula at the observed ORR.

Warning: Type I error is non-monotonic in `cp_promising_lower`

The Mehta–Pocock Type I error preservation theorem (Chen, DeMets, Lan 2004; Gao, Ware, Mehta 2008) holds in the two-arm normal/z-test setting. It does not transfer analytically to single-arm binomial designs, because interim outcomes follow a discrete $\text{Binomial}(n_1, p_0)$ distribution. As a direct consequence, simulated Type I error is not a monotonic function of cp_promising_lower— raising the threshold can move T1E in either direction depending on whether the new cutoff falls between two adjacent attainable interim event counts. Do not assume tighter bounds produce lower Type I error; grid-search neighbouring thresholds via simulation and pick the one that balances calibration and power.

Recommendation: For single-arm binary endpoints, prefer the Bayesian mode. It is calibrated on the continuous predictive-probability scale, decouples gamma_efficacy from gamma_final, and behaves monotonically in gamma_efficacy—making calibration substantially more straightforward.

C. Type I error calibration

Neither mode gives an analytic guarantee that simulated Type I error equals nominal $\alpha$ . In practice:

•Run Tier 2 simulation at $p = p_0$ with $\ge 5{,}000$ replicates.
•If Bayesian T1E exceeds $\alpha$ , raise gamma_efficacy (typically to 0.97–0.99) so fewer null trials cross the interim bar. Raising gamma_final also reduces final-look false positives but costs power.
•If CP T1E exceeds $\alpha$ , do not blindly raise cp_promising_lower—search a small neighbourhood instead.
•If power at $p = p_1$ is below target, lower gamma_final toward $1 - \alpha$ , raise pp_promising_upper (Bayesian), or raise interim_n / n_max_factor.

VI. Worked Examples

Blinded SSR: Continuous Endpoint

Scenario: A two-arm RCT targets a mean difference of $\delta = 5$ units with planned variance $\sigma^2 = 100$ , $\alpha = 0.025$ (one-sided), and 90% power.

An SSR interim look is planned at 50% enrollment, with a maximum sample size cap of $N_\text{max} = 2 \times N_0$ .

Step 1: Initial sample size

n = \left\lceil \frac{2(z_{0.025} + z_{0.10})^2 \cdot 100}{5^2} \right\rceil = \left\lceil \frac{2(1.960 + 1.282)^2 \cdot 100}{25} \right\rceil = \left\lceil 84.06 \right\rceil = 85 \text{ per arm}

Total $N_0 = 170$ . Cap $N_\text{max} = 340$ . Interim at $N_\text{interim} = 85$ .

Step 2: Blinded variance estimate

At the interim, the pooled (blinded) variance across all 85 subjects is $\hat{\sigma}^2 = 144$ —44% higher than planned.

Step 3: Recalculate N

n_1 = \left\lceil \frac{2(1.960 + 1.282)^2 \cdot 144}{25} \right\rceil = \left\lceil 121.05 \right\rceil = 122 \text{ per arm}

Raw $N_1 = 244$ . After constraints: floor $85 \le 244$ ✔, cap $244 \le 340$ ✔. Final $N_1 = 244$ (inflation factor 1.44×).

Outcome: Without SSR, the trial would have been underpowered (actual power ~75% given the true variance). The SSR adjustment restores 90% power by increasing enrollment from 170 to 244—without unblinding anyone.

Unblinded SSR: Binary Endpoint

Scenario: A Phase III trial compares a new therapy (planned $p_T = 0.45$ ) vs. control ( $p_C = 0.30$ ) with $\alpha = 0.025$ and 90% power.

An unblinded SSR is planned at 50% enrollment. Zone thresholds: futility CP < 10%, promising 30–80%, favorable ≥ 80%.

Step 1: Initial sample size

Using the Fleiss, Levin & Paik pooled-variance formula: $n_\text{per arm} = 217$ , $N_0 = 434$ . Cap $N_\text{max} = 868$ ( $2\times N_0$ ). Interim at 50% of planned enrollment; the calculator rounds up to preserve per-arm parity: $N_\text{interim} = 218$ total (109 per arm).

Step 2: Unblinded interim results

The DMC reports the observed arm-level event counts at interim:

Treatment: $41/109 \approx 0.376$ ( $\hat{p}_T \approx 0.38$ )
Control: $31/109 \approx 0.284$ ( $\hat{p}_C \approx 0.28$ )

Observed effect $\hat{\delta} \approx 0.09$ is smaller than the planned 0.15.

Step 3: Conditional power and zone

CP under the observed effect at the current $N_0 = 434$ is approximately 55–60%. This falls in the promising zone (30% ≤ CP < 80%).

Step 4: Re-estimate N

Using the fixed-design formula under the observed effect ( $\hat{\delta} \approx 0.09$ ), the calculator computes the N required for 90% power at roughly $N_1 \approx 1{,}050$ . After constraints: floor $218 \le 1{,}050$ ✔, cap $1{,}050 > 868$ —cap binds, $N_1 = 868$ .

Step 5: Final analysis with combination test

At trial completion, $Z_1$ and $Z_2$ are computed from Stage 1 and Stage 2 data. The combination statistic $Z_\text{comb} = w_1 Z_1 + w_2 Z_2$ is compared to $z_{0.025} = 1.96$ .

Outcome: The trial was “rescued” by the SSR: the observed effect (~9 pp) was clinically meaningful but smaller than planned (15 pp). Without SSR, the trial would have been underpowered and likely failed. With SSR, the increase from 434 to 868 subjects (cap-bound at $2\times$ ) restored adequate power while maintaining strict Type I error control via the combination test.

Single-Arm SSR: Phase II ORR (Bayesian)

Scenario: A Phase II oncology study tests a new agent against a historical control ORR $p_0 = 0.20$ with a target ORR $p_1 = 0.40$ , $\alpha = 0.025$ (one-sided), and 80% power.

Jeffreys prior $\text{Beta}(0.5, 0.5)$ . Interim at 50%. Thresholds: $\gamma_\text{efficacy} = 0.99$ , $\gamma_\text{final} = 0.975$ (auto, $= 1 - \alpha$ ), $\delta_\text{futility} = 0.05$ , $\text{PP}_\text{upper} = 0.50$ . Budget cap $N_\text{max} = 54$ ( $1.5\times$ ).

Step 1: Initial sample size

N_0 = \left\lceil \left( \frac{z_{0.025}\sqrt{p_0(1-p_0)} + z_{0.20}\sqrt{p_1(1-p_1)}}{p_1 - p_0} \right)^2 \right\rceil = \lceil 35.6 \rceil = 36

One-sample binomial normal approximation. Interim at $n_1 = 18$ ; cap $N_\text{max} = 54$ .

Step 2: Interim results

Observe $r_1 = 6$ responders in $n_1 = 18$ patients (observed ORR 33%, below $p_1 = 0.40$ but well above $p_0 = 0.20$ ).

Step 3: Posterior probability

Posterior: $p \mid \text{data} \sim \text{Beta}(6.5, 12.5)$ .

P(p > 0.20 \mid \text{data}) \;=\; 1 - F_{\text{Beta}(6.5,\,12.5)}(0.20) \;\approx\; 0.90

Below the early-stop bar $\gamma_\text{efficacy} = 0.99$ —no early efficacy stop.

Step 4: Predictive probability of success

Integrate over the remaining 18 outcomes using the Beta–Binomial mixture:

\text{PPoS} \;=\; \sum_{y=0}^{18} P(Y = y \mid \text{data}) \cdot \mathbb{1}\bigl[P(p > 0.20 \mid \text{data},\,y) \ge 0.975\bigr] \;\approx\; 0.42

$0.05 < 0.42 < 0.50$ —this falls in the promising zone. Neither stop for futility nor trigger early efficacy.

Step 5: Re-estimate N

Extend enrollment: $N_1 = \min(N_\text{max},\, \lceil 1.5 \cdot N_0 \rceil) = \min(54, 54) = 54$ . Inflation $1.5\times$ .

Step 6: Final analysis

At $N_1 = 54$ , declare success if $P(p > 0.20 \mid \text{data}) \ge 0.975$ under the Beta posterior from all 54 patients.

Outcome: The interim data did not definitively confirm $p_1 = 0.40$ , but ruled out $p_0 = 0.20$ with moderate evidence. SSR extended enrollment to 54 patients, giving the Bayesian final test more information to distinguish between the null and alternative. Operating characteristics (Type I error, power, expected N) must be verified by Tier 2 simulation at $p = p_0$ and $p = p_1$ before locking the protocol.

VII. Planning Workflow

SSR must be written into the protocol and SAP before the trial begins. Here is the typical sequence of decisions during protocol development:

Identify the source of uncertainty

Is the concern primarily about the nuisance parameter (variance, rate) or the treatment effect? This determines blinded vs. unblinded.

Choose the interim fraction

Typical range: 25–75% of planned enrollment. Common choices are 50% (balanced information) or 33% (earlier look, but noisier estimate). For survival endpoints, this is the fraction of planned events, not patients.

Set the maximum cap

The protocol must specify the maximum allowed increase (e.g., $N_\text{max} = 2 \times N_0$ ). This limits operational and financial risk. Regulatory agencies expect this to be pre-specified.

Pre-specify decision-rule parameters

Unblinded two-arm: CP thresholds delineating four zones—futility (< 10%), unfavorable (10–30%), promising (30–80%), favorable (≥ 80%). Defaults from Mehta & Pocock (2011).

Single-Arm Bayesian: prior (Jeffreys or domain-informed Beta), gamma_efficacy (typ. 0.95–0.99), gamma_final (default $1 - \alpha$ ), delta_futility (typ. 0.05–0.10), and pp_promising_upper (default 0.50, raise to 0.70 for fuller budget utilization).

Single-Arm CP: same four-zone CP thresholds as the unblinded variant, but note non-monotonicity in cp_promising_lower (see Section V).

Run sensitivity analysis

Use the calculator's sensitivity table to explore how N changes across a range of plausible nuisance parameters (blinded) or effect sizes (unblinded). This informs the choice of cap and verifies feasibility.

Document in protocol and SAP

Write the SSR procedure, interim fraction, cap, and decision rules into the protocol and SAP using precise statistical language (see Section IX below).

Simulation validation: All three calculators support Monte Carlo simulation (10,000 trials by default). For blinded and unblinded two-arm SSR, simulation confirms the analytical Type I error guarantee. For single-arm SSR, simulation is not optional—neither Bayesian nor CP mode is analytically tied to $\alpha$ , so operating characteristics must be verified by Tier 2 simulation at $p = p_0$ (Type I) and $p = p_1$ (power) before locking the protocol.

VIII. When NOT to Use SSR

SSR is a powerful tool, but it is not appropriate in every situation:

Planning assumptions are well-established

If the variance, event rate, and effect size are well-characterized from large Phase II trials or meta-analyses, there is little to gain from SSR and the operational overhead is not justified.

The sample size is already capped by feasibility

If the maximum feasible enrollment is already reached (e.g., rare disease with a fixed patient pool), SSR cannot increase N beyond what is available.

GSD early stopping is the primary concern

If the goal is to stop early for efficacy or futility (not to increase N), use Group Sequential Design instead. SSR and GSD serve different purposes and can be combined, but SSR alone does not provide stopping boundaries.

Non-proportional hazards or complex censoring (survival)

The survival SSR assumes exponential event times and proportional hazards. If these assumptions are substantially violated (e.g., immunotherapy with delayed separation), external simulation tools may be needed.

No DMC and treatment effect is the concern

Unblinded SSR requires an independent DMC with appropriate firewalls. If your trial does not have a DMC (common in tech A/B tests or small academic trials), you are limited to blinded SSR.

You can afford a concurrent control (prefer two-arm)

Single-arm SSR is a design of convenience for Phase II oncology where a concurrent randomized control is infeasible or ethically difficult. If randomization is feasible, a two-arm design with blinded or unblinded SSR is strictly preferable: the historical control rate $p_0$ in single-arm designs introduces bias whenever the patient population or standard of care has drifted since the historical data were collected.

Single-arm CP mode when Bayesian is available

For single-arm binary endpoints, CP-mode Type I error is non-monotonic in cp_promising_lower due to binomial discreteness (Section V). Prefer the Bayesian mode of the single-arm calculator unless a frequentist framing is a hard requirement from a reviewing statistician.

IX. Example SAP Language

The following templates can be adapted for your protocol or Statistical Analysis Plan. Replace bracketed values with your trial-specific parameters.

Blinded SSR

“A blinded sample size re-estimation will be conducted after [50%] of the planned [168] subjects have been enrolled and have completed the primary endpoint assessment. The blinded pooled [variance / response rate / event rate] will be estimated from all available data without unblinding treatment assignment.

The sample size will be recalculated using the observed [nuisance parameter] and the originally planned treatment effect of [δ = 5 units / 15 percentage points / HR = 0.70], maintaining the [one-sided α = 0.025] significance level and [90%] power.

The recalculated sample size will be subject to a maximum cap of [2.0×] the initial sample size ([336] subjects) and a minimum of the number of subjects already enrolled. This procedure preserves the Type I error rate as the sample size adjustment is based only on blinded aggregate data (Kieser & Friede, 2003; FDA Guidance on Adaptive Designs, 2019, Section IV.B.1).”

Unblinded SSR

“An unblinded sample size re-estimation based on the promising zone approach (Mehta & Pocock, 2011) will be conducted after [50%] of the planned [324] subjects have been enrolled. The independent Data Monitoring Committee (DMC) will review unblinded treatment-arm data and compute the conditional power (CP) under the observed treatment effect.

The conditional power will be used to classify the interim result into one of four zones: futility (CP < [10%]), unfavorable ([10%] ≤ CP < [30%]), promising ([30%] ≤ CP < [80%]), or favorable (CP ≥ [80%]). If the result falls in the promising zone, the total sample size will be recalculated using the fixed-design formula under the observed effect at [90%] power, subject to a maximum of [2.0×] the initial sample size ([648] subjects). In all other zones, the original sample size will be maintained.

The final analysis will employ the inverse-normal combination test with pre-specified weights $w_1 = \sqrt{0.5}$ and $w_2 = \sqrt{0.5}$ , rejecting the null hypothesis if $Z_\text{comb} = w_1 Z_1 + w_2 Z_2 > z_{0.025}$ . This procedure controls the familywise Type I error rate at [2.5%] one-sided regardless of the sample size modification (Müller & Schäfer, 2001; Mehta & Pocock, 2011).”

Single-Arm SSR (Bayesian, Phase II ORR)

“The planned sample size is $N_0 = [36]$ based on a one-sample test of $H_0: \text{ORR} \le [p_0 = 0.20]$ vs. $H_1: \text{ORR} = [p_1 = 0.40]$ at one-sided $\alpha = [0.025]$ with 80% power. A $\text{Beta}([0.5, 0.5])$ (Jeffreys) prior is placed on the ORR. An interim analysis will be conducted after $n_1 = [18]$ evaluable patients.

Let $\text{PP}_\text{post} = P(\text{ORR} > p_0 \mid \text{data})$ denote the posterior probability of superiority over the historical control. The trial will stop for early efficacy at the interim if $\text{PP}_\text{post} \ge \gamma_\text{efficacy} = [0.99]$ ; final sample size in this case equals the interim N. Let PPoS denote the predictive probability that the final posterior will satisfy $\text{PP}_\text{post} \ge \gamma_\text{final} = [0.975]$ (defaulting to $1 - \alpha$ ). The trial will stop for futility if $\text{PPoS} \le \delta_\text{futility} = [0.05]$ .

If $\delta_\text{futility} < \text{PPoS} < \text{PP}_\text{upper} = [0.50]$ (promising zone), the sample size will be extended to $\min(N_\text{max},\, \lceil 1.5 \cdot N_0 \rceil)$ with $N_\text{max} = [54]$ . If PPoS $\ge \text{PP}_\text{upper}$ , enrollment continues to the originally planned $N_0$ . In all non-stop scenarios the final sample size will not be reduced below $N_0$ .

Final success will be declared if $\text{PP}_\text{post} \ge [0.975]$ at the final analysis. Because the posterior-probability decision rule is not analytically tied to frequentist Type I error, the operating characteristics of this design (simulated Type I error at $p = p_0$ and power at $p = p_1$ ) are documented in Appendix [X] using a calibrated Monte Carlo study of $[10{,}000]$ replicates (Lee & Liu, 2008; FDA Guidance on Adaptive Designs, 2019).”

X. R Code

Standalone R implementations for verifying the calculator results. These use base R only (no special packages required).

Blinded SSR: Continuous Endpoint

# Blinded SSR for continuous endpoint
blinded_ssr_continuous <- function(
  delta,          # Planned mean difference
  sigma2_planned, # Planned variance
  sigma2_obs,     # Observed blinded pooled variance
  alpha = 0.025,  # One-sided alpha
  power = 0.90,   # Target power
  interim_frac = 0.50,
  n_max_factor = 2.0
) {
  z_alpha <- qnorm(1 - alpha)
  z_beta  <- qnorm(power)

  # Initial N
  n_per_arm_0 <- ceiling(2 * (z_alpha + z_beta)^2 * sigma2_planned / delta^2)
  N0 <- 2 * n_per_arm_0
  N_interim <- ceiling(interim_frac * N0)
  N_cap <- ceiling(N0 * n_max_factor)

  # Recalculate with observed variance
  n_per_arm_1 <- ceiling(2 * (z_alpha + z_beta)^2 * sigma2_obs / delta^2)
  N1_raw <- 2 * n_per_arm_1

  # Constrain: floor at interim, cap at N_max, even parity
  # Backend logic: cap rounds DOWN (n_cap %/% 2), uncapped rounds UP
  bounded <- min(max(N1_raw, N_interim), N_cap)
  if (bounded < N_interim) {
    n_per_arm <- ceiling(N_interim / 2)
  } else if (N1_raw > N_cap) {
    n_per_arm <- N_cap %/% 2        # cap binding: round DOWN
  } else {
    n_per_arm <- ceiling(N1_raw / 2) # uncapped: round UP
  }
  N1 <- 2 * n_per_arm

  # Conditional power
  n1_per_arm <- N_interim / 2
  z_expected <- delta * sqrt(n1_per_arm / (2 * sigma2_obs))
  R <- N1 / N_interim
  CP <- pnorm(z_expected * sqrt(R) - z_alpha * sqrt(R - 1))

  list(
    initial_N   = N0,
    interim_N   = N_interim,
    new_N       = N1,
    inflation   = N1 / N0,
    cond_power  = round(CP, 4),
    cap_binding = N1_raw > N_cap
  )
}

# Example: planned sigma2=100, observed sigma2=144
blinded_ssr_continuous(delta = 5, sigma2_planned = 100, sigma2_obs = 144)
# initial_N=170, new_N=244, inflation=1.44, cond_power~0.72

Unblinded SSR: Binary Endpoint (Promising Zone)

# Unblinded SSR for binary endpoint (Mehta & Pocock 2011)
unblinded_ssr_binary <- function(
  p_c_planned,     # Planned control rate
  p_t_planned,     # Planned treatment rate
  p_c_obs,         # Observed control rate at interim
  p_t_obs,         # Observed treatment rate at interim
  alpha = 0.025,
  power = 0.90,
  interim_frac = 0.50,
  n_max_factor = 2.0,
  cp_futility = 0.10,
  cp_promising_lower = 0.30,
  cp_promising_upper = 0.80
) {
  z_alpha <- qnorm(1 - alpha)
  z_beta  <- qnorm(power)

  # Initial N (Fleiss-Levin-Paik)
  delta_plan <- abs(p_t_planned - p_c_planned)
  p_bar_plan <- (p_c_planned + p_t_planned) / 2
  n0 <- ceiling(
    ((z_alpha * sqrt(2 * p_bar_plan * (1 - p_bar_plan))
      + z_beta * sqrt(p_c_planned*(1-p_c_planned) + p_t_planned*(1-p_t_planned)))
     / delta_plan)^2
  )
  N0 <- 2 * n0
  N_interim <- ceiling(interim_frac * N0)
  N_cap <- ceiling(N0 * n_max_factor)

  # Stage 1 z-statistic
  n1_per_arm <- N_interim / 2
  p_bar_obs <- (p_c_obs + p_t_obs) / 2
  delta_obs <- p_t_obs - p_c_obs
  SE1 <- sqrt(p_bar_obs * (1 - p_bar_obs) * 2 / n1_per_arm)
  z1 <- delta_obs / SE1

  # Conditional power under observed effect
  R <- N0 / N_interim
  CP <- pnorm(z1 * sqrt(R) - z_alpha * sqrt(R - 1))

  # Zone classification
  zone <- if (CP >= cp_promising_upper) "favorable"
          else if (CP >= cp_promising_lower) "promising"
          else if (CP >= cp_futility) "unfavorable"
          else "futility"

  # Re-estimate N if promising
  if (zone == "promising") {
    n1_new <- ceiling(
      ((z_alpha * sqrt(2 * p_bar_obs * (1 - p_bar_obs))
        + z_beta * sqrt(p_c_obs*(1-p_c_obs) + p_t_obs*(1-p_t_obs)))
       / abs(delta_obs))^2
    )
    N1_raw <- max(2 * n1_new, N_interim)
  } else {
    N1_raw <- N0
  }
  # Constrain: cap rounds DOWN, uncapped rounds UP (matches backend)
  bounded <- min(max(N1_raw, N_interim), N_cap)
  if (bounded < N_interim) {
    n_per_arm <- ceiling(N_interim / 2)
  } else if (N1_raw > N_cap) {
    n_per_arm <- N_cap %/% 2
  } else {
    n_per_arm <- ceiling(N1_raw / 2)
  }
  N1 <- 2 * n_per_arm

  # Combination weights
  w1 <- sqrt(interim_frac)
  w2 <- sqrt(1 - interim_frac)

  list(
    initial_N  = N0,
    interim_N  = N_interim,
    z1         = round(z1, 4),
    cp_obs     = round(CP, 4),
    zone       = zone,
    new_N      = N1,
    inflation  = N1 / N0,
    w1         = round(w1, 4),
    w2         = round(w2, 4),
    z_crit     = round(z_alpha, 4)
  )
}

# Example: planned 45% vs 30%, observed 38% vs 28%
unblinded_ssr_binary(
  p_c_planned = 0.30, p_t_planned = 0.45,
  p_c_obs = 0.28, p_t_obs = 0.38
)
# zone="promising", N0=434, N1=868 (cap-bound at 2x)

Survival SSR: Events and N Conversion

# Schoenfeld events and N conversion for survival SSR
survival_ssr_events <- function(
  HR,              # Planned hazard ratio
  median_control,  # Control median survival (months)
  accrual_time,    # Accrual period (months)
  follow_up_time,  # Follow-up after accrual (months)
  alpha = 0.025,
  power = 0.90,
  alloc_ratio = 1.0,
  dropout_rate = 0.0
) {
  z_alpha <- qnorm(1 - alpha)
  z_beta  <- qnorm(power)
  r <- alloc_ratio

  # Required events (Schoenfeld)
  d <- ceiling((z_alpha + z_beta)^2 * (1 + r)^2 / (r * log(HR)^2))

  # Event probability (exponential model, uniform accrual)
  lambda_c <- log(2) / median_control
  lambda_t <- lambda_c * HR
  total_time <- accrual_time + follow_up_time

  p_c <- 1 - exp(-lambda_c * (total_time - accrual_time / 2))
  p_t <- 1 - exp(-lambda_t * (total_time - accrual_time / 2))

  # Apply dropout
  if (dropout_rate > 0) {
    years <- total_time / 12
    retention <- (1 - dropout_rate)^years
    p_c <- p_c * retention
    p_t <- p_t * retention
  }

  p_avg <- (p_c + r * p_t) / (1 + r)
  p_avg <- max(p_avg, 0.01)

  N <- ceiling(d / p_avg)
  n_control <- ceiling(N / (1 + r))
  n_treatment <- N - n_control

  list(
    events_required = d,
    p_event_control = round(p_c, 4),
    p_event_treatment = round(p_t, 4),
    p_event_avg = round(p_avg, 4),
    N_total = N,
    n_control = n_control,
    n_treatment = n_treatment
  )
}

# Example: HR=0.7, median control=12 months
survival_ssr_events(HR = 0.7, median_control = 12,
                    accrual_time = 24, follow_up_time = 12)
# events_required=331, p_avg~0.69, N_total=483

Single-Arm SSR: Bayesian Decision Rule

# Single-Arm SSR (Bayesian) — interim decision at n_1 responders observed.
# Reproduces the analytical posterior / predictive-probability logic used
# by the Zetyra single-arm SSR calculator. Base R only.
single_arm_bayesian_decision <- function(
  p0,                        # historical control ORR
  p1,                        # planned alternative ORR
  alpha = 0.025,             # one-sided alpha
  power = 0.80,              # target power
  prior_alpha = 0.5,         # Beta prior a (Jeffreys default)
  prior_beta = 0.5,          # Beta prior b
  interim_frac = 0.5,
  n_max_factor = 1.5,
  gamma_efficacy = 0.99,
  gamma_final = NULL,        # NULL -> auto (1 - alpha)
  delta_futility = 0.05,
  pp_promising_upper = 0.50,
  r_1 = NULL                 # observed interim responders
) {
  if (is.null(gamma_final)) gamma_final <- 1 - alpha

  # Initial N (one-sample binomial normal approximation)
  z_a <- qnorm(1 - alpha); z_b <- qnorm(power)
  N0 <- ceiling(
    ((z_a * sqrt(p0 * (1 - p0))
      + z_b * sqrt(p1 * (1 - p1))) / (p1 - p0))^2
  )
  n1 <- max(10L, as.integer(round(interim_frac * N0)))
  N_max <- max(N0, ceiling(N0 * n_max_factor))

  # Expected interim events under H1 if the user did not pass r_1
  if (is.null(r_1)) r_1 <- as.integer(round(p1 * n1))

  # Posterior P(p > p0 | data) under Beta-Binomial conjugate update
  a_post <- prior_alpha + r_1
  b_post <- prior_beta  + (n1 - r_1)
  post_prob <- 1 - pbeta(p0, a_post, b_post)

  # Predictive probability of success at the FINAL look. Integrate over
  # the remaining n_rem outcomes via Beta-Binomial (log-space for stability).
  n_rem <- N0 - n1
  ppos <- 0
  if (n_rem > 0) {
    for (y in 0:n_rem) {
      log_c <- (lgamma(n_rem + 1) - lgamma(y + 1)
                - lgamma(n_rem - y + 1)
                + lgamma(a_post + y) + lgamma(b_post + n_rem - y)
                - lgamma(a_post + b_post + n_rem)
                + lgamma(a_post + b_post)
                - lgamma(a_post) - lgamma(b_post))
      p_y <- exp(log_c)
      final_post <- 1 - pbeta(p0, a_post + y, b_post + n_rem - y)
      if (final_post >= gamma_final) ppos <- ppos + p_y
    }
  } else {
    # Interim is the full trial — PPoS collapses to the indicator.
    ppos <- as.numeric(post_prob >= gamma_final)
  }

  # Decision rule
  if (post_prob >= gamma_efficacy) {
    decision <- "stop_efficacy";    final_n <- n1
  } else if (ppos <= delta_futility) {
    decision <- "stop_futility";    final_n <- n1
  } else if (ppos < pp_promising_upper) {
    decision <- "continue_ssr"
    final_n <- min(N_max, as.integer(ceiling(N0 * 1.5)))
  } else {
    decision <- "continue_favorable"; final_n <- N0
  }

  list(
    N0 = N0, n1 = n1, N_max = N_max,
    posterior_prob  = round(post_prob, 4),
    predictive_prob = round(ppos, 4),
    gamma_final     = gamma_final,
    decision        = decision,
    final_n         = final_n
  )
}

# Example: p0=0.20, p1=0.40, observe 6 responders in n1=18
single_arm_bayesian_decision(p0 = 0.20, p1 = 0.40, r_1 = 6)
# N0=36, n1=18, posterior~0.90, PPoS~0.42, decision="continue_ssr", final_n=54

XI. References

Mehta CR, Pocock SJ. Adaptive increase in sample size when interim results are promising: A practical guide with examples. Statistics in Medicine. 2011;30(28):3267–3284.

Kieser M, Friede T. Simple procedures for blinded sample size adjustment that do not affect the type I error rate. Statistics in Medicine. 2003;22(23):3571–3581.

Müller HH, Schäfer H. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches. Biometrics. 2001;57(3):886–891.

Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika. 1981;68(1):316–319.

Gould AL. Interim analyses for monitoring clinical trials that do not materially affect the type I error rate. Statistics in Medicine. 1992;11(1):55–66.

Friede T, et al. Blinded sample size re-estimation in event-driven clinical trials. Pharmaceutical Statistics. 2019;18(5):578–588.

Chen YHJ, DeMets DL, Lan KKG. Increasing the sample size when the unblinded interim result is promising. Statistics in Medicine. 2004;23(7):1023–1038.

Lee JJ, Liu DD. A predictive probability design for phase II cancer clinical trials. Clinical Trials. 2008;5(2):93–106.

Saville BR, Connor JT, Ayers GD, Alvarez J. The utility of Bayesian predictive probabilities for interim monitoring of clinical trials. Clinical Trials. 2014;11(4):485–493.

Jung SH. Randomized Phase II Cancer Clinical Trials. Chapman & Hall/CRC; 2013. (Single-arm and selection designs with predictive-probability stopping.)

FDA. Project Optimus: Optimizing the Dosage of Human Prescription Drugs and Biological Products for the Treatment of Oncologic Diseases. 2023.

FDA. Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry. 2019.

Ready to Calculate?

Use the SSR calculators to compute recalculated sample sizes, conditional power, zone classification, and sensitivity tables.

Blinded SSR Unblinded SSR Single-Arm SSR

A Complete Guide to Sample Size Re-estimation

Analogy: Insurance Against Planning Uncertainty

Contents

I. When to Use Sample Size Re-estimation

Nuisance parameter uncertainty

Effect size uncertainty

Regulatory or operational flexibility

Phase II oncology ORR against a historical control

II. Choosing Your Approach: Three Variants

III. Blinded SSR: How It Works

The Algorithm

Compute the initial sample size

Enroll to the interim fraction

Estimate the nuisance parameter from pooled data

Recalculate N using the updated nuisance parameter

Apply constraints

Why does blinding preserve Type I error?

IV. Unblinded SSR: The Promising Zone Approach

The Four Zones

Favorable Zone (CP ≥ 80%)

Promising Zone (30% ≤ CP < 80%)

Unfavorable Zone (10% ≤ CP < 30%)

Futility Zone (CP < 10%)

The Combination Test

V. Single-Arm SSR: Bayesian & CP for Phase II ORR

A. Bayesian mode (recommended)

Why decouple gamma_efficacy from gamma_final?

The four interim decisions

Stop for early efficacy

Stop for futility

Extend N (promising)

Continue at planned N (favorable)

B. Conditional Power mode

Warning: Type I error is non-monotonic in cp_promising_lower

C. Type I error calibration

VI. Worked Examples

Blinded SSR: Continuous Endpoint

Step 1: Initial sample size

Step 2: Blinded variance estimate

Step 3: Recalculate N

Unblinded SSR: Binary Endpoint

Step 1: Initial sample size

Step 2: Unblinded interim results

Step 3: Conditional power and zone

Step 4: Re-estimate N

Step 5: Final analysis with combination test

Single-Arm SSR: Phase II ORR (Bayesian)

Step 1: Initial sample size

Step 2: Interim results

Step 3: Posterior probability

Step 4: Predictive probability of success

Step 5: Re-estimate N

Step 6: Final analysis

VII. Planning Workflow

Identify the source of uncertainty

Choose the interim fraction

Set the maximum cap

Pre-specify decision-rule parameters

Run sensitivity analysis

Document in protocol and SAP

VIII. When NOT to Use SSR

Planning assumptions are well-established

The sample size is already capped by feasibility

GSD early stopping is the primary concern

Non-proportional hazards or complex censoring (survival)

No DMC and treatment effect is the concern

You can afford a concurrent control (prefer two-arm)

Single-arm CP mode when Bayesian is available

IX. Example SAP Language

Blinded SSR

Unblinded SSR

Single-Arm SSR (Bayesian, Phase II ORR)

X. R Code

Blinded SSR: Continuous Endpoint

Unblinded SSR: Binary Endpoint (Promising Zone)

Survival SSR: Events and N Conversion

Single-Arm SSR: Bayesian Decision Rule

XI. References

Ready to Calculate?

Related Documentation

Why decouple `gamma_efficacy` from `gamma_final`?

Warning: Type I error is non-monotonic in `cp_promising_lower`