Docs/Guides/Choosing the Right Calculator

Choosing the Right Trial Design Calculator

A method-selection map across the Zetyra calculator suite. First identify the design problem; then choose the calculator. Most calculators are not substitutes — they operate at different layers of the design.

When to use this guide. You have a trial-design question and need to map it to the right Zetyra calculator. The interactive picker at /docs/which-calculator is faster for a single decision; this page lays out the full method-selection map for readers comparing multiple options or designing a multi-layered protocol.

1. Start here: what decision are you making?

Map your design question to the calculator that owns that decision:

Design questionCalculator(s)
How many subjects or events do I need for my endpoint?Sample Size (continuous / binary / survival / NI / cluster / longitudinal) · Chi-Square
Can baseline covariates reduce my variance?CUPED
Can I stop early for efficacy or futility?GSD or Bayesian Sequential
Do I need to re-estimate sample size mid-trial?Blinded SSR · Unblinded SSR · Single-Arm SSR
Is allocation adaptive, stratified, or fixed?Randomization guide · RAR
Is it a single-arm Phase II / rare-disease trial?Single-Arm guide · Bayesian Sample Size · SSR Single-Arm
Am I borrowing historical or external data?Prior Elicitation · Bayesian Borrowing · Bayesian Toolkit
Is it a basket / umbrella / platform trial?Master Protocols · Basket · Umbrella · Platform
Am I combining multiple adaptive mechanisms?Composed Pipeline T1E

2. The design problem in layers

The calculators don't form a flat list of substitutes. They live at different layers of the trial design, and a given trial often touches several layers at once.

Foundation sizing

How big must the trial be to detect the planned effect at the target power? Every design starts here. The Sample Size Calculator covers continuous, binary, and survival endpoints with extensions for non-inferiority, cluster-randomized, and longitudinal designs; Chi-Square covers categorical endpoint-specific planning.

Calculators: Sample Size · Chi-Square

Precision (variance reduction)

Given a planned effect, can we shrink the variance of the estimator using prognostic baseline information? This is an analysis-stage choice that flows back into the sample size. CUPED is the canonical tool; covariate adjustment via ANCOVA is the underlying mechanism (FDA-recommended when the covariates are prespecified and prognostic).

Calculators: CUPED

Interim monitoring

Can the trial stop early for efficacy or futility? This is a design-stage choice about how to allocate α, or how to calibrate posterior-probability thresholds, across pre-planned looks. GSD spends α via a spending function with strong frequentist guarantees; Bayesian Sequential uses a posterior-probability threshold that's simulation-calibrated to the desired operating characteristics (no analogue to alpha-spending applies); Bayesian PP is usually decision support rather than a formal stopping rule.

Calculators: GSD · Bayesian Sequential · Bayesian PP

Adaptation

Can the trial use prespecified rules to modify sample size or allocation in response to accumulating data? Blinded SSR re-estimates nuisance parameters without unblinding; Unblinded SSR uses Mehta-Pocock's promising-zone with the inverse-normal combination test; Single-Arm SSR adds Bayesian PPoS and CP promising-zone rules with calibrated thresholds. RAR adapts allocation toward the better-performing arm based on outcomes. The adaptation rule itself is prespecified; changing it after seeing interim data is a major protocol change that requires revalidation, not part of the design.

Calculators: Blinded SSR · Unblinded SSR · Single-Arm SSR · RAR

Evidence borrowing

Can external information (prior trials, registries, literature) substitute for some concurrent control? Prior Elicitation formalizes expert beliefs and historical evidence; Bayesian Borrowing combines them via power priors, commensurate priors, or MAP priors; Bayesian Sample Size / Two-Arm use the resulting prior to size designs that take advantage of borrowing. Exchangeability of the external source with the trial population is always the binding assumption.

Calculators: Prior Elicitation · Bayesian Borrowing · Bayesian Sample Size · Two-Arm Bayesian

Master-protocol structure

Is the trial actually multiple sub-studies under one protocol? Basket: one therapy across many disease subsets (information borrowing between baskets via BHM or EXNEX); Umbrella: many therapies in one disease, biomarker-stratified sub-studies with a shared control; Platform: a perpetual multi-arm infrastructure with staggered arm entry / exit and multi-arm multi-stage decision rules.

Calculators: Master Protocols · Basket · Umbrella · Platform

Composition (multiple adaptive mechanisms stacked)

Some adaptive mechanisms have analytic Type I error control in simple settings (e.g., classical GSD with prespecified spending functions; blinded SSR for variance/nuisance parameters). Others — RAR, Bayesian sequential thresholds, historical borrowing under potential prior-data conflict, and several SSR settings — are simulation-calibrated even in isolation. When mechanisms are combined into one design (SSR plus interim monitoring plus RAR plus borrowing), the pipeline-level operating characteristics cannot be read off any individual component; they must be simulated end-to-end. The Composed Pipeline T1E calculator runs that simulation across plausible scenarios, including prior-data conflict and time-trend drift.

Calculators: Composed Pipeline T1E

3. Commonly confused choices

  • GSD vs. Bayesian Sequential

    Both monitor a trial across pre-planned looks. GSD has direct α-spending control via a spending function; Bayesian Sequential uses a posterior probability threshold whose frequentist operating characteristics must be confirmed by simulation. See the dedicated GSD vs. Bayesian Sequential guide.

  • Bayesian PP vs. Bayesian Sequential

    Bayesian PPoS (the predictive-power calculator) is usually a decision support output at a single interim look — “given what we've seen, what's the probability of trial success?” Bayesian Sequential is a formal stopping rule with simulation-calibrated thresholds across multiple pre-planned looks.

  • Blinded SSR vs. Unblinded SSR

    Blinded SSR re-estimates a nuisance parameter (variance, event rate, response rate) from pooled blinded data — the treatment effect is fixed. Unblinded SSR re-estimates based on the observed treatment effect at interim using the promising-zone rule. Blinded is simpler and easier to defend; unblinded is more powerful when the effect-size assumption is the uncertain input.

  • GSD vs. SSR

    GSD stops the trial early when the data have spoken. SSR resizes the trial when the data suggest the original N was wrong. They can be combined in one design, but only with a prespecified combination strategy (inverse-normal weights, conditional-error function) and simulation-verified operating characteristics.

  • RAR vs. blocked or stratified randomization

    Blocked and stratified randomization are fixed-balance protections — they don't adapt to outcomes. RAR shifts the allocation probability toward the better-performing arm as outcomes accrue. RAR buys ethical benefit in multi-arm settings but adds operational complexity and demands explicit Type I error control via simulation. See the Randomization Schemes guide.

  • Basket vs. Umbrella vs. Platform

    Basket: one therapy, many diseases (oncology tissue-agnostic trials are the canonical example). Umbrella: many therapies, one disease, with biomarker-defined sub-studies. Platform: perpetual multi-arm infrastructure (REMAP-CAP, STAMPEDE) where arms enter and exit over time.

  • Bayesian Borrowing vs. CUPED

    Both can improve precision, but through different estimands and different assumptions. CUPED uses internal baseline covariates measured on the trial population to directly reduce the variance of the treatment-effect estimator under covariate adjustment; under randomization it stays unbiased. Bayesian Borrowing uses external evidence (prior trials, registries) via a power, commensurate, or MAP prior to add effective information and shrink posterior uncertainty — but it can introduce bias if the historical source is not exchangeable with the current trial. CUPED's validity rests on the baseline being prognostic and pre-randomization; Borrowing rests on the exchangeability argument plus a prior-data conflict diagnostic.

4. The classic trio: CUPED, GSD, Bayesian PP

Older Zetyra content framed these three methods as alternatives. They are not — they live at different layers and answer different questions, and a single trial can use all three together if each is prespecified and the combined operating characteristics are simulated.

MethodLayerWhat it changes
CUPEDPrecisionReduces the variance of the treatment-effect estimate via prespecified prognostic pre-randomization covariates. Lowers required N proportional to 1 - ρ² when the covariate is prognostic.
GSDMonitoringAllows early stopping at pre-planned interim looks while controlling overall Type I error via an α-spending function. Reduces expected N under H₁ while paying a small inflation in the maximum N.
Bayesian PPMonitoring (decision support)Computes a posterior probability of trial success given the interim data and the prior. Usually used for Go/No-Go decisions, not as a formal stopping rule; frequentist OCs of any rule built on it must be simulation-calibrated.

For deeper treatment of each: CUPED guide, GSD guide, Bayesian PPoS guide.

5. What these methods do not guarantee

  • CUPED is not a free lunch

    Variance reduction is expected when the baseline covariate is prespecified, pre-randomization, prognostic for the outcome, and analyzed with a proper variance estimator. CUPED can hurt precision when the covariate is weak, has substantial missingness, was influenced by treatment (post-randomization or contaminated), or was selected adaptively from the data.

  • GSD does not auto-guarantee Type I error

    Strong α control holds only if the boundary family, look schedule, estimand, and adaptations match the prespecified design. Off-schedule looks, binding-vs-non-binding-futility ambiguity, or estimand changes break the guarantee. Document each carefully in the SAP.

  • Bayesian methods do not inherently control Type I error

    The posterior probability rule is a Bayesian estimand; its frequentist OCs — Type I error rate, power, expected N — are emergent properties that must be calibrated by simulation against the full monitoring rule (prior + threshold + look schedule). FDA's January 2026 Bayesian draft guidance accepts Bayesian primary analyses with justified priors and reported simulated frequentist OCs; older “regulators want the frequentist analysis” framing is out of date.

  • Combining methods is possible but not automatic

    A single trial can use CUPED at analysis, GSD for monitoring, blinded SSR for re-sizing, and Bayesian borrowing for the control arm — but the combined design's operating characteristics cannot be read off the individual components. Each combined design requires prespecification and simulation-verified pipeline-level OCs. The Composed Pipeline T1E calculator is purpose-built for that simulation.

6. References & further reading

For methodology and worked examples, follow the per-topic guides:

Last updated: May 2026

Ready to start designing?

Use the decision tree, dive into the Bayesian Toolkit or Master Protocols, or simulate the joint behavior of a composed adaptive design.