- Primary risk: More than two variants trigger a Bonferroni alpha correction. This is conservative and may overstate the required sample size.
- Key recommendation: Validate tracking and assignment before exposing live traffic.
- Guardrail to monitor: Payment error rate
**Interim check.**
An early snapshot came in after 1.2 test-days, 48,000 visitors, and 3,812 conversions (35.2% of the planned per-variant sample):
- P(variant A > control) = 93.4%
- P(variant B > control) = 99.8%
Variant A is still ambiguous; variant B is the only treatment with a decisive early signal.
**Decision.**
Stop spending exposure on variant A, keep variant B against control until the planned read is complete, and ship B only if payment error rate and refund value stay in range. The value here is that sizing, multivariant correction, design risks, and the Bayesian interim view all come from the same backend run.
Full inputs and outputs: [docs/case-studies/checkout-redesign.json](https://github.com/brownjuly2003-code/ab-test-research-designer/blob/main/docs/case-studies/checkout-redesign.json). Rerun with `python scripts/generate_case_study_numbers.py`.
Bayesian posterior uses an explicit Beta(1,1) uniform prior and records the posterior parameters in docs/case-studies/checkout-redesign.json.
The requested interim snapshot is 16,000 users per arm, which is 35.2% of the planned per-variant sample under the current sizing output, so the README text describes it as an early interim snapshot instead of claiming a literal 50% read.
The repository was already dirty before this task started, so git status --short cannot be empty after staging only the files that belong to this case-study task.