Single-Arm Sample Size Re-estimation (SSR)
Technical documentation for adaptive single-arm Phase II designs comparing a binary response rate against a fixed historical control. Covers the Bayesian posterior/predictive framework, Mehta–Pocock promising-zone conditional power, prior specification, operating characteristics, and FDA regulatory considerations (2019 Adaptive Designs guidance, Project Optimus 2023).
Contents
1. Overview & Motivation
Single-arm Phase II trials are the dominant design in early oncology drug development. Enrolling all patients onto the experimental arm accelerates evidence generation when a randomized comparison is ethically or practically infeasible, and the observed objective response rate (ORR) is compared against a historical control rate drawn from prior trials of standard-of-care.
The FDA accelerated approval pathway permits drugs that demonstrate a meaningful improvement over existing therapies on a surrogate endpoint (frequently ORR) to reach patients while confirmatory Phase III trials are ongoing. Single-arm designs powered against a well-characterized are the workhorse of this pathway.
Why adaptive SSR? The initial sample size depends on a minimally-clinically-important alternative that sponsors often specify with significant uncertainty. If interim data suggest the true effect is smaller than but still clinically meaningful, a modest expansion can preserve power. Conversely, a very large observed effect supports an efficacy interim stop, and a very small effect supports futility termination—both sparing patients and resources.
When adaptive SSR helps: (a) the clinically meaningful effect size is uncertain, (b) operational flexibility is valued (stop early for efficacy or futility), (c) the historical control rate is well-established, and (d) the trial is exploratory (Phase II), not confirmatory.
2. Design Framework
Let be i.i.d. Bernoulli responses with unknown true rate . We test:
where is the historical control rate (null) and is the target alternative. Under a normal approximation to the one-sample binomial, the required fixed-design sample size is:
with and . The standard test at the final analysis rejects if the observed exceeds a critical value derived from the binomial (or its normal approximation).
At the interim look with patients enrolled and responses, the design chooses between (i) early efficacy stop, (ii) early futility stop, or (iii) continuation—optionally with a re-estimated target sample size bounded by a pre-specified cap .
3. Bayesian Mode
The Bayesian mode uses a conjugate Beta–Binomial framework. With prior and interim data responses in patients, the posterior is:
Posterior efficacy stopping. Stop early for efficacy at the interim if the posterior probability that exceeds the null rate clears the interim bar:
Two thresholds, not one. The design uses two distinct posterior-probability bars: gamma_efficacy at the interim (typically high, e.g., 0.97–0.99) and gamma_final at the final analysis (defaults to , e.g., 0.975 for ). The interim bar is the stop-early gate; the final bar is the success criterion. Conflating the two depresses simulated power because predictive probability then projects to an inflated final bar.
Predictive futility stopping. Compute the Bayesian predictive probability (PPoS) that the trial will clear gamma_final at the final analysis given current data:
Stop for futility if (typically around 0.05). Otherwise, continue—optionally recalculating the final up to .
Threshold calibration. Neither gamma_efficacy nor gamma_final is analytically tied to frequentist Type I error; verify by Monte Carlo at . If Type I error is inflated, raise gamma_efficacy first (interim early stops are counted as rejections); raising gamma_final also helps but costs power. If power is below target, lower gamma_final toward or raise the interim/final N. Zetyra's engine reports both rates in the OC table.
4. Conditional Power Mode
The conditional power (CP) mode adapts the Mehta–Pocock (2011) promising zone framework from two-arm to single-arm designs. Given interim statistic computed under the one-sample binomial:
the conditional power under the observed current trend (or under the target alternative, per SAP) is:
Zones are defined by CP thresholds:
- •Favorable (CP > promising upper): large effect; no re-estimation needed (or consider efficacy stop).
- •Promising (promising lower ≤ CP ≤ promising upper): re-estimate to restore planned CP, capped at .
- •Unfavorable (futility ≤ CP < promising lower): continue with planned sample size; do not inflate.
- •Futility (CP < futility threshold): consider stopping for futility.
The original Mehta–Pocock theorem (Chen, DeMets, Lan 2004; Gao, Ware, Mehta 2008) preserves Type I error in the two-arm normal/z-test setting when re-estimation is confined to the promising zone. For single-arm binomial designs this guarantee does not transfer analytically — the discrete sample space and exact-binomial final test mean Type I error must be confirmed via simulation (Tier 2 OC table) before fixing cp_promising_lower / cp_promising_upper for the protocol.
Warning: Type I error is non-monotonic in cp_promising_lower
Because interim outcomes follow a discrete distribution, simulated Type I error is not a monotonic function of cp_promising_lower. Raising the threshold can move T1E in either direction depending on whether the new threshold falls between two adjacent attainable interim event counts. The practical consequence: do not assume tighter bounds produce lower Type I error. Grid-search a small neighbourhood of cp_promising_lower values via Tier 2 simulation and pick the one that best balances calibration and power.
Recommendation: For single-arm binary endpoints, the Bayesian mode of this calculator is generally preferred. It decouples the interim early-stop bar (gamma_efficacy) from the final-analysis bar (gamma_final), is calibrated on the predictive-probability scale rather than discrete CP, and behaves monotonically in gamma_efficacy— making Type I error calibration substantially easier in practice.
5. Prior Specification
The choice of prior materially affects interim decisions, particularly when is small. Zetyra offers three presets:
- •Jeffreys Beta(0.5, 0.5) — default. The Jeffreys prior is the invariant reference prior for a Bernoulli parameter, derived from the square root of the Fisher information. It is objective in the sense that it is invariant under reparameterization and has prior effective sample size (ESS) of 1.
- •Flat Beta(1, 1). The uniform prior on . Often preferred by sponsors for its intuitive interpretation; ESS of 2. Slightly more informative than Jeffreys in the tails.
- •Custom informative priors. Derived from prior trials via the MAP prior / bayesian-borrowing workflow or elicited from experts via prior elicitation. Use with caution: regulators scrutinize informative priors that favor efficacy claims.
Prior ESS consideration. Prior ESS =. If ESS approaches, the posterior is heavily influenced by the prior. Report prior ESS and run sensitivity analyses (Jeffreys vs. flat vs. custom) before finalizing thresholds.
6. Operating Characteristics
For both modes, simulated operating characteristics are mandatory before fixing thresholds for the protocol. Bayesian stopping rules are not analytically tied to frequentist Type I error, and the two-arm Mehta–Pocock promising-zone theorem does not transfer analytically to single-arm binomial CP designs (FDA Adaptive Designs Guidance 2019, Section V).
Zetyra's OC table reports, for a grid of true rates :
- •Type I error at : must be . If inflated in Bayesian mode, raise
gamma_efficacy(typically toward 0.97–0.99) and re-simulate. If inflated in CP mode, do not assume tighter promising-zone bounds will help— Type I error is non-monotonic incp_promising_lowerfor this single-arm binomial design (see the warning note below). Grid-search a few neighbouring values and re-simulate, or switch to Bayesian mode. - •Simulated power at : should match the planned power target.
- •Expected sample size : shows the adaptive design's efficiency gain over fixed-N under each true rate, together with quantiles .
- •Stopping probabilities: Pr(efficacy stop), Pr(futility stop), Pr(N hits cap) at each true rate.
Interpret the table jointly: a design with 5% Type I error, 82% power at , and substantially below the fixed-N is well-tuned. An 8% Type I error means the thresholds are too liberal.
7. Regulatory Considerations
- •FDA Adaptive Designs Guidance (2019), Section IV.B. Sample size re-estimation is a well-characterized adaptation provided the rule, timing, and caps are pre-specified and Type I error is verified by simulation.
- •FDA Accelerated Approval. Single-arm ORR trials supporting accelerated approval must enroll a pre-specified population, use a locked analysis plan, and demonstrate a meaningful effect over historical control.
- •Project Optimus (2023). FDA oncology dose-optimization initiative emphasizes adequate sample sizes for dose selection and characterization of tolerability in Phase II, which SSR directly supports by expanding cohorts under promising interim trends.
- •Pre-specification requirements. The SAP must fix , the interim timing , the prior (if Bayesian), the thresholds or CP zones, the cap , and include simulation-based OC evidence.
- •SAP text generation. The Zetyra report exports an SAP-ready decision rule description plus the OC table and sensitivity scenarios directly suitable for inclusion in a protocol and SAP submission.
8. Assumptions & Limitations
- •Historical control stability. The entire design rests on being a stable, well-characterized historical rate. Drift in (e.g., supportive-care improvements, population shifts, selection bias in the historical source) inflates Type I error without detection.
- •Binary endpoint only. The v1 engine supports binary (response/no response) endpoints. Continuous and time-to-event single-arm designs are not implemented.
- •Historical control misspecification. Even modest (2–5 pp) drift in can materially shift achieved Type I error. Sensitivity scenarios in the report show how the recalculated N and CP change under plausible alternative .
- •Not for confirmatory Phase III. Single-arm designs are exploratory; efficacy claims for full approval require randomized confirmatory evidence except in narrow accelerated-approval settings.
- •One interim look. The v1 engine supports a single interim analysis. Multi-look GSD-style boundaries for single-arm trials should use the group-sequential calculator instead.
9. API Reference
Endpoint: POST /api/v1/calculators/ssr-single-arm
Request parameters
| Field | Type | Default | Description |
|---|---|---|---|
| ssr_method | string | — | "bayesian" or "conditional_power" |
| p0 | float | — | Null/historical response rate (0, 1) |
| p1 | float | — | Target alternative rate, p1 > p0 |
| alpha | float | 0.025 | One-sided Type I error |
| power | float | 0.80 | Target power at p1 |
| interim_fraction | float | 0.5 | Fraction of planned N at interim look |
| interim_n | int? | null | Absolute interim N (overrides fraction) |
| n_max_factor | float | 1.5 | Cap as multiple of initial N (must be >1, ≤5) |
| n_max_absolute | int? | null | Absolute N cap (overrides n_max_factor); must be ≥10 |
| prior_alpha | float | 0.5 | Beta prior α (Bayesian mode) |
| prior_beta | float | 0.5 | Beta prior β (Bayesian mode) |
| gamma_efficacy | float | 0.95 | Interim early-stop threshold. Posterior P(p>p0 | data) ≥ this triggers efficacy stop at the interim look. Calibrate via simulation. |
| gamma_final | float? | 1−α | Final-analysis success threshold. The eventual posterior must clear this for the trial to be a positive result. Predictive probability is computed under this threshold. Default is 1−α (e.g., 0.975 for α=0.025), which keeps simulated power near the design target. |
| delta_futility | float | 0.05 | Predictive probability threshold for futility |
| pp_promising_upper | float | 0.50 | Predictive-probability upper bound for the SSR promising zone (Bayesian mode). Trials with delta_futility < PP < this extend N up to N_max; PP ≥ this continues at the originally planned N. Must be greater than delta_futility. Raise to 0.70–0.80 to keep more trials in the SSR zone and push N_p90 toward the N_max budget. |
| cp_futility | float | 0.10 | CP lower bound for futility (CP mode) |
| cp_promising_lower | float | 0.30 | CP lower bound for promising zone |
| cp_promising_upper | float | 0.80 | CP upper bound for promising zone |
| simulate | bool | false | Run Monte Carlo OC validation |
| simulation_seed | int? | null | Random seed for reproducibility (auto-generated if null) |
| n_simulations | int | 10000 | Simulation replicates (1,000–100,000) |
Example Request
{
"ssr_method": "bayesian",
"p0": 0.20,
"p1": 0.40,
"alpha": 0.025,
"power": 0.80,
"interim_fraction": 0.5,
"n_max_factor": 1.5,
"prior_alpha": 0.5,
"prior_beta": 0.5,
"gamma_efficacy": 0.95,
"gamma_final": null,
"delta_futility": 0.05,
"pp_promising_upper": 0.50,
"simulate": true,
"simulation_seed": 42,
"n_simulations": 10000
}gamma_final: null defaults to 1 - alpha (e.g., 0.975 for alpha 0.025). Raise pp_promising_upper toward 0.70 to keep more trials in the SSR promising zone.
Response Schema (abridged)
{
"calculation_id": "...",
"tier": "analytical+simulation",
"analytical_results": {
"initial_n": 36,
"interim_n": 18,
"interim_fraction": 0.5,
"ssr_method": "bayesian",
"posterior_probability": 0.97,
"predictive_probability": 0.81,
"conditional_power": 0.82,
"conditional_power_planned": 0.82,
"zone": "",
"z1": 1.96,
"efficacy_stop": true,
"futility_stop": false,
"recalculated_n": 18,
"inflation_factor": 0.5,
"n_capped": false,
"n_max_used": 54,
"gamma_final_used": 0.975,
"prior_description": "Jeffreys Beta(0.5, 0.5)",
"decision_rule_description": "...",
"recalculation_scenarios": [
{
"label": "Planned effect",
"assumed_nuisance": 0.40,
"recalculated_n_per_arm": 36,
"recalculated_n_total": 36,
"inflation_factor": 1.0,
"conditional_power": 0.82,
"decision": "continue_favorable"
}
],
"regulatory_notes": [...]
},
"metadata": {...},
"simulation": {...},
"warnings": [],
"regulatory_citations": [...]
}decision enum values: stop_efficacy, stop_futility, continue_ssr, continue_favorable, continue_unfavorable. Five sensitivity rows are returned by default (50%, 75%, 100%, 125%, 150% of planned effect).
10. References
- •Simon R. (1989). Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials, 10(1), 1–10.
- •Lee JJ, Liu DD. (2008). A predictive probability design for phase II cancer clinical trials. Clinical Trials, 5(2), 93–106.
- •Mehta CR, Pocock SJ. (2011). Adaptive increase in sample size when interim results are promising: A practical guide with examples. Statistics in Medicine, 30(28), 3267–3284.
- •FDA. (2019). Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry. U.S. Food and Drug Administration.
- •FDA. (2023). Project Optimus: Optimizing the Dosage of Human Prescription Drugs and Biological Products for the Treatment of Oncologic Diseases. U.S. Food and Drug Administration.
- •Jeffreys H. (1961). Theory of Probability (3rd ed.). Oxford University Press.
- •Thall PF, Simon R. (1994). Practical Bayesian guidelines for phase IIB clinical trials. Biometrics, 50(2), 337–349.