Docs/Single-Arm SSR

Single-Arm Sample Size Re-estimation (SSR)

Technical documentation for adaptive single-arm Phase II designs comparing a binary response rate against a fixed historical control. Covers the Bayesian posterior/predictive framework, Mehta–Pocock promising-zone conditional power, prior specification, operating characteristics, and FDA regulatory considerations (2019 Adaptive Designs guidance, Project Optimus 2023).

1. Overview & Motivation

Single-arm Phase II trials are the dominant design in early oncology drug development. Enrolling all patients onto the experimental arm accelerates evidence generation when a randomized comparison is ethically or practically infeasible, and the observed objective response rate (ORR) is compared against a historical control rate p0p_0 drawn from prior trials of standard-of-care.

The FDA accelerated approval pathway permits drugs that demonstrate a meaningful improvement over existing therapies on a surrogate endpoint (frequently ORR) to reach patients while confirmatory Phase III trials are ongoing. Single-arm designs powered against a well-characterized p0p_0 are the workhorse of this pathway.

Why adaptive SSR? The initial sample size depends on a minimally-clinically-important alternative p1p_1 that sponsors often specify with significant uncertainty. If interim data suggest the true effect is smaller than p1p_1 but still clinically meaningful, a modest expansion can preserve power. Conversely, a very large observed effect supports an efficacy interim stop, and a very small effect supports futility termination—both sparing patients and resources.

When adaptive SSR helps: (a) the clinically meaningful effect size is uncertain, (b) operational flexibility is valued (stop early for efficacy or futility), (c) the historical control rate p0p_0 is well-established, and (d) the trial is exploratory (Phase II), not confirmatory.

2. Design Framework

Let X1,,XnX_1, \ldots, X_n be i.i.d. Bernoulli responses with unknown true rate pp. We test:

H0:pp0vs.H1:p>p0H_0: p \leq p_0 \quad \text{vs.} \quad H_1: p > p_0

where p0p_0 is the historical control rate (null) and p1p_1 is the target alternative. Under a normal approximation to the one-sample binomial, the required fixed-design sample size is:

n=(zαp0(1p0)+zβp1(1p1)p1p0)2n = \left\lceil \left( \frac{z_\alpha \sqrt{p_0(1-p_0)} + z_\beta \sqrt{p_1(1-p_1)}}{p_1 - p_0} \right)^2 \right\rceil

with zα=Φ1(1α)z_\alpha = \Phi^{-1}(1-\alpha) and zβ=Φ1(power)z_\beta = \Phi^{-1}(\text{power}). The standard test at the final analysis rejects H0H_0 if the observed p^\hat{p} exceeds a critical value derived from the binomial (or its normal approximation).

At the interim look with n1n_1 patients enrolled and x1x_1 responses, the design chooses between (i) early efficacy stop, (ii) early futility stop, or (iii) continuation—optionally with a re-estimated target sample size nn^* bounded by a pre-specified cap nmaxn_{\max}.

3. Bayesian Mode

The Bayesian mode uses a conjugate Beta–Binomial framework. With prior pBeta(α0,β0)p \sim \text{Beta}(\alpha_0, \beta_0) and interim data x1x_1 responses in n1n_1 patients, the posterior is:

px1,n1Beta(α0+x1,  β0+n1x1)p \,|\, x_1, n_1 \sim \text{Beta}(\alpha_0 + x_1,\; \beta_0 + n_1 - x_1)

Posterior efficacy stopping. Stop early for efficacy at the interim if the posterior probability that pp exceeds the null rate clears the interim bar:

Pr(p>p0x1,n1)γefficacy\Pr(p > p_0 \mid x_1, n_1) \geq \gamma_\text{efficacy}

Two thresholds, not one. The design uses two distinct posterior-probability bars: gamma_efficacy at the interim (typically high, e.g., 0.97–0.99) and gamma_final at the final analysis (defaults to 1α1 - \alpha, e.g., 0.975 for α=0.025\alpha = 0.025). The interim bar is the stop-early gate; the final bar is the success criterion. Conflating the two depresses simulated power because predictive probability then projects to an inflated final bar.

Predictive futility stopping. Compute the Bayesian predictive probability (PPoS) that the trial will clear gamma_final at the final analysis given current data:

PPoS=Pr ⁣[Pr(p>p0final data)γfinal|x1,n1]\text{PPoS} = \Pr\!\left[\Pr(p > p_0 \mid \text{final data}) \geq \gamma_\text{final} \,\middle|\, x_1, n_1\right]

Stop for futility if PPoSδfutility\text{PPoS} \leq \delta_\text{futility} (typically around 0.05). Otherwise, continue—optionally recalculating the final nn^* up to nmaxn_{\max}.

Threshold calibration. Neither gamma_efficacy nor gamma_final is analytically tied to frequentist Type I error; verify by Monte Carlo at p=p0p = p_0. If Type I error is inflated, raise gamma_efficacy first (interim early stops are counted as rejections); raising gamma_final also helps but costs power. If power is below target, lower gamma_final toward 1α1 - \alpha or raise the interim/final N. Zetyra's engine reports both rates in the OC table.

4. Conditional Power Mode

The conditional power (CP) mode adapts the Mehta–Pocock (2011) promising zone framework from two-arm to single-arm designs. Given interim statistic z1z_1 computed under the one-sample binomial:

z1=p^1p0p0(1p0)/n1z_1 = \frac{\hat{p}_1 - p_0}{\sqrt{p_0(1-p_0)/n_1}}

the conditional power under the observed current trend (or under the target alternative, per SAP) is:

CP(z1)=Φ ⁣(z1n1+(z1/n1)(nn1)zαnnn1)CP(z_1) = \Phi\!\left( \frac{z_1 \sqrt{n_1} + (z_1/\sqrt{n_1})(n - n_1) - z_\alpha \sqrt{n}}{\sqrt{n - n_1}} \right)

Zones are defined by CP thresholds:

  • Favorable (CP > promising upper): large effect; no re-estimation needed (or consider efficacy stop).
  • Promising (promising lower ≤ CP ≤ promising upper): re-estimate nn^* to restore planned CP, capped at nmaxn_{\max}.
  • Unfavorable (futility ≤ CP < promising lower): continue with planned sample size; do not inflate.
  • Futility (CP < futility threshold): consider stopping for futility.

The original Mehta–Pocock theorem (Chen, DeMets, Lan 2004; Gao, Ware, Mehta 2008) preserves Type I error in the two-arm normal/z-test setting when re-estimation is confined to the promising zone. For single-arm binomial designs this guarantee does not transfer analytically — the discrete sample space and exact-binomial final test mean Type I error must be confirmed via simulation (Tier 2 OC table) before fixing cp_promising_lower / cp_promising_upper for the protocol.

Warning: Type I error is non-monotonic in cp_promising_lower

Because interim outcomes follow a discrete Binomial(n1,p0)\text{Binomial}(n_1, p_0) distribution, simulated Type I error is not a monotonic function of cp_promising_lower. Raising the threshold can move T1E in either direction depending on whether the new threshold falls between two adjacent attainable interim event counts. The practical consequence: do not assume tighter bounds produce lower Type I error. Grid-search a small neighbourhood of cp_promising_lower values via Tier 2 simulation and pick the one that best balances calibration and power.

Recommendation: For single-arm binary endpoints, the Bayesian mode of this calculator is generally preferred. It decouples the interim early-stop bar (gamma_efficacy) from the final-analysis bar (gamma_final), is calibrated on the predictive-probability scale rather than discrete CP, and behaves monotonically in gamma_efficacy— making Type I error calibration substantially easier in practice.

5. Prior Specification

The choice of prior Beta(α0,β0)\text{Beta}(\alpha_0, \beta_0) materially affects interim decisions, particularly when n1n_1 is small. Zetyra offers three presets:

  • Jeffreys Beta(0.5, 0.5) — default. The Jeffreys prior is the invariant reference prior for a Bernoulli parameter, derived from the square root of the Fisher information. It is objective in the sense that it is invariant under reparameterization and has prior effective sample size (ESS) of 1.
  • Flat Beta(1, 1). The uniform prior on [0,1][0, 1]. Often preferred by sponsors for its intuitive interpretation; ESS of 2. Slightly more informative than Jeffreys in the tails.
  • Custom informative priors. Derived from prior trials via the MAP prior / bayesian-borrowing workflow or elicited from experts via prior elicitation. Use with caution: regulators scrutinize informative priors that favor efficacy claims.

Prior ESS consideration. Prior ESS =α0+β0\alpha_0 + \beta_0. If ESS approachesn1n_1, the posterior is heavily influenced by the prior. Report prior ESS and run sensitivity analyses (Jeffreys vs. flat vs. custom) before finalizing thresholds.

6. Operating Characteristics

For both modes, simulated operating characteristics are mandatory before fixing thresholds for the protocol. Bayesian stopping rules are not analytically tied to frequentist Type I error, and the two-arm Mehta–Pocock promising-zone theorem does not transfer analytically to single-arm binomial CP designs (FDA Adaptive Designs Guidance 2019, Section V).

Zetyra's OC table reports, for a grid of true rates p{p0,,p1,}p \in \{p_0, \ldots, p_1, \ldots\}:

  • Type I error at p=p0p = p_0: must be α\leq \alpha. If inflated in Bayesian mode, raise gamma_efficacy (typically toward 0.97–0.99) and re-simulate. If inflated in CP mode, do not assume tighter promising-zone bounds will help— Type I error is non-monotonic in cp_promising_lower for this single-arm binomial design (see the warning note below). Grid-search a few neighbouring values and re-simulate, or switch to Bayesian mode.
  • Simulated power at p=p1p = p_1: should match the planned power target.
  • Expected sample size E[Np]\mathbb{E}[N \mid p]: shows the adaptive design's efficiency gain over fixed-N under each true rate, together with quantiles N10,N50,N90N_{10}, N_{50}, N_{90}.
  • Stopping probabilities: Pr(efficacy stop), Pr(futility stop), Pr(N hits cap) at each true rate.

Interpret the table jointly: a design with 5% Type I error, 82% power at p1p_1, and E[Np0]\mathbb{E}[N \mid p_0] substantially below the fixed-N is well-tuned. An 8% Type I error means the thresholds are too liberal.

7. Regulatory Considerations

  • FDA Adaptive Designs Guidance (2019), Section IV.B. Sample size re-estimation is a well-characterized adaptation provided the rule, timing, and caps are pre-specified and Type I error is verified by simulation.
  • FDA Accelerated Approval. Single-arm ORR trials supporting accelerated approval must enroll a pre-specified population, use a locked analysis plan, and demonstrate a meaningful effect over historical control.
  • Project Optimus (2023). FDA oncology dose-optimization initiative emphasizes adequate sample sizes for dose selection and characterization of tolerability in Phase II, which SSR directly supports by expanding cohorts under promising interim trends.
  • Pre-specification requirements. The SAP must fix p0,p1,α,powerp_0, p_1, \alpha, \text{power}, the interim timing n1n_1, the prior (if Bayesian), the thresholds (γ,δ)(\gamma, \delta) or CP zones, the cap nmaxn_{\max}, and include simulation-based OC evidence.
  • SAP text generation. The Zetyra report exports an SAP-ready decision rule description plus the OC table and sensitivity scenarios directly suitable for inclusion in a protocol and SAP submission.

8. Assumptions & Limitations

  • Historical control stability. The entire design rests on p0p_0 being a stable, well-characterized historical rate. Drift in p0p_0 (e.g., supportive-care improvements, population shifts, selection bias in the historical source) inflates Type I error without detection.
  • Binary endpoint only. The v1 engine supports binary (response/no response) endpoints. Continuous and time-to-event single-arm designs are not implemented.
  • Historical control misspecification. Even modest (2–5 pp) drift in p0p_0 can materially shift achieved Type I error. Sensitivity scenarios in the report show how the recalculated N and CP change under plausible alternative p0p_0.
  • Not for confirmatory Phase III. Single-arm designs are exploratory; efficacy claims for full approval require randomized confirmatory evidence except in narrow accelerated-approval settings.
  • One interim look. The v1 engine supports a single interim analysis. Multi-look GSD-style boundaries for single-arm trials should use the group-sequential calculator instead.

9. API Reference

Endpoint: POST /api/v1/calculators/ssr-single-arm

Request parameters

FieldTypeDefaultDescription
ssr_methodstring"bayesian" or "conditional_power"
p0floatNull/historical response rate (0, 1)
p1floatTarget alternative rate, p1 > p0
alphafloat0.025One-sided Type I error
powerfloat0.80Target power at p1
interim_fractionfloat0.5Fraction of planned N at interim look
interim_nint?nullAbsolute interim N (overrides fraction)
n_max_factorfloat1.5Cap as multiple of initial N (must be >1, ≤5)
n_max_absoluteint?nullAbsolute N cap (overrides n_max_factor); must be ≥10
prior_alphafloat0.5Beta prior α (Bayesian mode)
prior_betafloat0.5Beta prior β (Bayesian mode)
gamma_efficacyfloat0.95Interim early-stop threshold. Posterior P(p>p0 | data) ≥ this triggers efficacy stop at the interim look. Calibrate via simulation.
gamma_finalfloat?1−αFinal-analysis success threshold. The eventual posterior must clear this for the trial to be a positive result. Predictive probability is computed under this threshold. Default is 1−α (e.g., 0.975 for α=0.025), which keeps simulated power near the design target.
delta_futilityfloat0.05Predictive probability threshold for futility
pp_promising_upperfloat0.50Predictive-probability upper bound for the SSR promising zone (Bayesian mode). Trials with delta_futility < PP < this extend N up to N_max; PP ≥ this continues at the originally planned N. Must be greater than delta_futility. Raise to 0.70–0.80 to keep more trials in the SSR zone and push N_p90 toward the N_max budget.
cp_futilityfloat0.10CP lower bound for futility (CP mode)
cp_promising_lowerfloat0.30CP lower bound for promising zone
cp_promising_upperfloat0.80CP upper bound for promising zone
simulateboolfalseRun Monte Carlo OC validation
simulation_seedint?nullRandom seed for reproducibility (auto-generated if null)
n_simulationsint10000Simulation replicates (1,000–100,000)

Example Request

{
  "ssr_method": "bayesian",
  "p0": 0.20,
  "p1": 0.40,
  "alpha": 0.025,
  "power": 0.80,
  "interim_fraction": 0.5,
  "n_max_factor": 1.5,
  "prior_alpha": 0.5,
  "prior_beta": 0.5,
  "gamma_efficacy": 0.95,
  "gamma_final": null,
  "delta_futility": 0.05,
  "pp_promising_upper": 0.50,
  "simulate": true,
  "simulation_seed": 42,
  "n_simulations": 10000
}

gamma_final: null defaults to 1 - alpha (e.g., 0.975 for alpha 0.025). Raise pp_promising_upper toward 0.70 to keep more trials in the SSR promising zone.

Response Schema (abridged)

{
  "calculation_id": "...",
  "tier": "analytical+simulation",
  "analytical_results": {
    "initial_n": 36,
    "interim_n": 18,
    "interim_fraction": 0.5,
    "ssr_method": "bayesian",
    "posterior_probability": 0.97,
    "predictive_probability": 0.81,
    "conditional_power": 0.82,
    "conditional_power_planned": 0.82,
    "zone": "",
    "z1": 1.96,
    "efficacy_stop": true,
    "futility_stop": false,
    "recalculated_n": 18,
    "inflation_factor": 0.5,
    "n_capped": false,
    "n_max_used": 54,
    "gamma_final_used": 0.975,
    "prior_description": "Jeffreys Beta(0.5, 0.5)",
    "decision_rule_description": "...",
    "recalculation_scenarios": [
      {
        "label": "Planned effect",
        "assumed_nuisance": 0.40,
        "recalculated_n_per_arm": 36,
        "recalculated_n_total": 36,
        "inflation_factor": 1.0,
        "conditional_power": 0.82,
        "decision": "continue_favorable"
      }
    ],
    "regulatory_notes": [...]
  },
  "metadata": {...},
  "simulation": {...},
  "warnings": [],
  "regulatory_citations": [...]
}

decision enum values: stop_efficacy, stop_futility, continue_ssr, continue_favorable, continue_unfavorable. Five sensitivity rows are returned by default (50%, 75%, 100%, 125%, 150% of planned effect).

10. References

  • Simon R. (1989). Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials, 10(1), 1–10.
  • Lee JJ, Liu DD. (2008). A predictive probability design for phase II cancer clinical trials. Clinical Trials, 5(2), 93–106.
  • Mehta CR, Pocock SJ. (2011). Adaptive increase in sample size when interim results are promising: A practical guide with examples. Statistics in Medicine, 30(28), 3267–3284.
  • FDA. (2019). Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry. U.S. Food and Drug Administration.
  • FDA. (2023). Project Optimus: Optimizing the Dosage of Human Prescription Drugs and Biological Products for the Treatment of Oncologic Diseases. U.S. Food and Drug Administration.
  • Jeffreys H. (1961). Theory of Probability (3rd ed.). Oxford University Press.
  • Thall PF, Simon R. (1994). Practical Bayesian guidelines for phase IIB clinical trials. Biometrics, 50(2), 337–349.