Case study: Checkout redesign¶
Retailer testing two checkout variants against control to lift conversion from a 4.2% baseline.
Setup - 80k daily visitors, 50% share into test, 3 variants (34/33/33), alpha = 0.05, power = 0.80, two-sided, relative MDE = 10%.
Sizing (from POST /api/v1/calculate).
| Metric | Value |
|---|---|
| Per-variant sample | 45,429 users |
| Total sample | 136,287 users |
| Required duration | 4 days |
| Bonferroni adjustment | 2 treatment-vs-control comparisons, adjusted alpha 0.025 |
Design guidance (from POST /api/v1/design).
- Primary risk: More than two variants trigger a Bonferroni alpha correction. This is conservative and may overstate the required sample size.
- Key recommendation: Validate tracking and assignment before exposing live traffic.
- Guardrail to monitor: Payment error rate
Interim check. An early snapshot came in after 1.2 test-days, 48,000 visitors, and 3,812 conversions (35.2% of the planned per-variant sample): - P(variant A > control) = 93.4% - P(variant B > control) = 99.8% Variant A is still ambiguous; variant B is the only treatment with a decisive early signal.
Decision. Stop spending exposure on variant A, keep variant B against control until the planned read is complete, and ship B only if payment error rate and refund value stay in range. The value here is that sizing, multivariant correction, design risks, and the Bayesian interim view all come from the same backend run.
Full inputs and outputs: docs/case-studies/checkout-redesign.json. Rerun with python scripts/generate_case_study_numbers.py.