Statistics methods¶

The product combines fixed-horizon sizing, optional Bayesian precision planning, and operational safeguards that help teams decide whether an experiment is feasible before launch.

Method summary¶

Method	Where it appears	What it does
Binary fixed-horizon sizing	`POST /api/v1/calculate` for binary metrics	Uses a two-sided normal approximation for difference in proportions and returns per-variant sample size, total sample, and duration.
Continuous fixed-horizon sizing	`POST /api/v1/calculate` for continuous metrics	Uses a two-sample mean comparison with equal-sized variants and relative MDE over the baseline mean.
Binary post-test analysis	`POST /api/v1/results` for binary observed counts	Two-proportion z-test on pooled SE; returns p-value, Wald CI on the difference, observed power, and a verdict at the supplied alpha.
Continuous post-test analysis	`POST /api/v1/results` for continuous observed means/std/n	Welch's t-test with Welch–Satterthwaite degrees of freedom; p-value and confidence interval are both Student-t (not the normal approximation). Returns observed power as a two-sided expression `(1 − F_t(t_crit − \\|t\\|, df)) + F_t(−t_crit − \\|t\\|, df)`, which equals α at zero observed effect.
Bonferroni correction	Multi-variant plans	Adjusts alpha across treatment-vs-control comparisons so larger variant sets do not silently understate required sample size.
Bayesian precision sizing	`analysis_mode=bayesian` with `desired_precision`	Estimates per-variant sample size needed to hit a target credible-interval half-width.
Group sequential boundaries	`n_looks > 1`	Adds O'Brien-Fleming style boundaries and a sample-size inflation factor for planned interim looks.
SRM check	`POST /api/v1/srm-check` and design warnings	Uses a chi-square imbalance check to flag suspicious traffic allocation.
CUPED adjustment	Continuous metrics with pre-experiment covariates	Reduces the effective standard deviation based on pre-period correlation and shows the resulting sample-size and duration savings.

Practical interpretation¶

Frequentist sizing answers "How much traffic do we need to detect the planned effect?"
Bayesian sizing answers "How much traffic do we need for a precise posterior estimate?"
Sequential planning answers "What changes if we plan interim looks before the end?"
SRM and warning rules answer "Is the setup trustworthy enough to launch or read out?"

Notes on implementation¶

Binary and continuous sizing both treat MDE as a relative uplift over the baseline.
Multi-variant plans apply a conservative Bonferroni adjustment rather than a looser multiple-testing correction.
SRM is flagged when the chi-square p-value drops below 0.001.
CUPED is surfaced as an alternative planning scenario, not as a hidden override of the main result.
The Student-t CDF and quantile used by post-test continuous analysis are implemented in stdlib via the regularized incomplete beta (app/backend/app/stats/student_t.py) — there is no scipy runtime dependency. Accuracy vs scipy.stats.t is tracked in unit tests at ≤ 1e-7 for CDF and ≤ 1e-6 for the quantile.