Added Checkout-redesign case study section to README with reproducible numbers from backend calculation and Bayesian interim check.
Regenerated demo screenshots to match v1.1.0 UI (comparison dashboard, webhook manager).
Seeded the Hugging Face Space demo workspace on startup with an idempotent backend hook so the public demo loads with pre-populated projects.
Added GitHub Actions workflow docker-publish.yml to publish multi-arch Docker images to ghcr.io/brownjuly2003-code/ab-test-research-designer on every v* tag push.
Added dynamic shields.io badges (tests count, backend coverage, Lighthouse performance) in README, refreshed by a new CI job update-metrics-badges that commits badges/*.json back to main after green verify + lighthouse runs.
Added scripts/collect_badge_metrics.py plus --with-coverage and --artifacts-dir flags on scripts/verify_all.{py,cmd} so the same collector can run locally and in CI.
Full de/es UI translation (1157 lines, 887 leaf keys per locale, terminology anchors for A/B-Test / Variante / Referenz and test A/B / variante / línea base).
Hugging Face Dataset snapshot service (SnapshotService) pushing SQLite to a private HF Dataset with sha256 verification, atomic rename, startup restore, and opt-in through AB_HF_SNAPSHOT_REPO / AB_HF_TOKEN / AB_HF_SNAPSHOT_INTERVAL_SECONDS.
Expanded the template library to 10 industry presets (email_campaign, push_notification_reactivation, app_onboarding_drop_off, search_ranking_ctr, trial_to_paid added on top of the original 5), with a TemplateGallery UI entry point from the sidebar.
Optional OpenAI / Anthropic LLM adapter with browser-session token routing (X-AB-LLM-Provider + X-AB-LLM-Token headers, CORS-aware), token masking in logs, and a Settings panel to configure credentials client-side.
Monte-Carlo distribution view in the comparison dashboard: simulate_comparison service (parametric Beta-Bernoulli for binary, Normal bootstrap for continuous, deterministic with seed=42), POST /api/v1/projects/compare?include_monte_carlo=true&monte_carlo_simulations=<1000..50000> opt-in query, 50-bucket histogram + interactive probability-above-threshold slider.
French / Simplified-Chinese / Arabic locales with RTL layout support for Arabic (document.documentElement.dir = "rtl", ~10 CSS modules switched to inset-inline-* / margin-inline-* / border-inline-* logical properties), 913 frontend leaf keys and 235 backend keys per locale, 7-button language switcher with aria-pressed.
Extended Hypothesis property-test coverage for numerical stability (degenerate conversions, zero variance, ultra-strict alpha), Bayesian prior edge cases, SRM imbalance, sequential boundary monotonicity, and Monte-Carlo determinism + cap boundaries.
Optional Postgres backend via AB_DATABASE_URL with pluggable DatabaseBackend protocol (SQLite default, Postgres via psycopg[binary] when URL scheme is postgresql://), connection pooling through AB_DB_POOL_SIZE, /healthz and /readyz probes backend-aware, dedicated verify-postgres CI matrix job spinning postgres:16-alpine via testcontainers.
Slack App integration bundled alongside Postgres: slack/app-manifest.yml for manifest install, OAuth flow with CSRF state, HMAC SHA256 request signature verification with 5-minute replay guard, /ab-test projects | status <id> | run <id> slash commands returning Blocks-formatted responses, interactive approve/request-review buttons.
scripts/cleanup_test_artifacts.py — --dry-run aware sweeper for pytest temp roots, .coverage, the cxkm sandbox, and the local mkdocs site/ build. Documented in docs/RUNBOOK.md under "Local cleanup".
scripts/sync_doc_screenshots.py — mirrors docs/demo/*.png (the smoke source of truth, referenced by README via raw.githubusercontent.com) into docs-site/assets/screenshots/ (where mkdocs needs them to be inside docs_dir). Compares SHA-256 to skip unchanged pixels. Wired into .github/workflows/docs.yml before mkdocs gh-deploy. Documented in docs/RUNBOOK.md under "Screenshots".
docs/RUNBOOK.md Slack section now documents the token-at-rest posture: bot/user tokens are stored plaintext in SQLite/Postgres under the local-first threat model, with explicit guidance for hosted setups (filesystem permissions, sandbox workspaces, rotation via re-running /slack/install).
Lazy-loaded locale JSONs via i18next-http-backend (moved to app/frontend/public/locales/); main JS chunk dropped from 247.88 KB to 122.18 KB gzip (-50%). Vendor libs split into vendor-react / vendor-i18n / vendor-state chunks for long-term caching.
Centralized the noop ResizeObserver jsdom stub in app/frontend/src/test/setup.ts; removed the per-file vi.stubGlobal('ResizeObserver', …) boilerplate that was duplicated across 10 chart/a11y test files.
Postgres CI was extended from a project-creation smoke into a contract suite covering workspace import/export, audit log, API key lifecycle, webhook subscription CRUD, Slack installation upsert, and query-filter pagination. A shared postgres_repository module-scoped fixture amortizes the testcontainer pull across the suite.
_betacf (the regularized-beta continued fraction backing Student-t) now emits StudentTConvergenceWarning if it ever exhausts its 200-iteration budget; this never fires for df ≥ 1 and x ∈ [0, 1] in practice but guards future numerical regressions instead of returning a silent best-estimate.
Continuous post-test math:analyze_results for continuous metrics now uses Welch Student-t for both the p-value and the confidence interval (was returning a normal-approximation p-value with a z-critical CI regardless of df). New app/backend/app/stats/student_t.py implements t_cdf/t_ppf via stdlib regularized incomplete beta (zero scipy dep); 24 unit cases assert ≤1e-7 / ≤1e-4 vs scipy. Continuous power_achieved is now computed (was hardcoded 0.0) using a two-sided expression (upper + lower tail) so it equals α at zero observed effect instead of α/2.
Workspace import atomicity:import_workspace() now opens a BEGIN IMMEDIATE transaction explicitly. Default Python sqlite3 deferred-mode transactions could race across concurrent imports.
Snapshot loop resilience: the periodic HF snapshot push is now wrapped in try/except so a transient failure logs and continues instead of killing the background task silently.
Rate limiter memory:SlidingWindowRateLimiter periodically prunes buckets whose last event is outside the window. Long-uptime deployments no longer accumulate state for rotated client IPs/API keys.
Pytest on Windows:pytest.ini now sets --basetemp=.pytest_basetemp so the default python -m pytest command works without manual flags. Legacy app/backend/tests/.tmp/ (~990 MB on long-lived checkouts) removable via new scripts/cleanup_test_artifacts.py.
Locale parity: ar/de/es/fr/zh receive the missing sidebarPanel.slackApp.* block (12 keys); locale leaf-key counts now match en for all shipped locales.
OpenAPI metadata:license_info is now MIT (was UNLICENSED); the placeholder contact email was removed, eliminating the email-validator not installed warning during contract/API-doc generation.
Cross-backend audit log:log_audit_entry now uses INSERT … RETURNING id instead of cursor.lastrowid, which _PostgresCursorResult always returns as None; audit events were silently dropped from API responses when running on Postgres.
Rate limiter prune correctness:_prune_locked now respects the per-call window_seconds override stored on each bucket. Previously a bucket with an API-key-specific longer window would be evicted at the global window boundary; the next call with the override saw an empty bucket and silently bypassed its limit.
.env.example: AB_DB_PATH and AB_FRONTEND_DIST_PATH are now commented templates (the backend already derives absolute defaults from the package location). The earlier relative-path defaults broke through Path('./...').as_posix() → sqlite:///app/... resolving to absolute /app/.... AB_ADMIN_TOKEN= added so secure self-hosted setup works from copy-paste.
mkdocs --strict no longer warns: docs-site/features/database.md is added to the Features nav.
Accessibility tests (PosteriorPlot, a11y-results full-panel, a11y-comparison-dashboard) no longer time out: a flat recharts mock in app/frontend/src/test/recharts-stub.tsx removes 1000+ SVG nodes per chart from axe's scan, reducing full-panel axe duration from 15-30s to ~3s. The deleted visual .recharts-area-area assertion was preserved in a new PosteriorPlot.integration.test.tsx using a ResponsiveContainer-only clone-element mock so real recharts renders in jsdom.
ProjectRepository now handles unix-absolute SQLite URLs (sqlite:////home/user/db.sqlite3) without doubling the leading slash, fixing test_diagnostics_endpoint path assertion on Linux CI.
Postgres integration tests skip on Windows CI where Docker Linux containers are unavailable (testcontainers-ryuk 404 on container create).
Hypothesis property tests uncovered and fixed degenerate-input guards in calculations_service: zero variance continuous, conversion in {0, 1}, mde = 0, ultra-strict alpha.
multi-project comparison dashboard with lazy-loaded React chunk, power curves, sensitivity grid, forest-plot observed effects, and shared/unique insight panels, plus POST /api/v1/projects/compare and POST /api/v1/export/comparison (Markdown and PDF)
outbound webhook subscriptions (Slack and generic JSON) with admin-guarded CRUD, delivery history, retry/dead-letter tracking, and HMAC-signed X-AB-Signature headers for generic consumers
property-based statistical test suite (hypothesis==6.152.1) covering monotonicity and round-trip invariants for binary, continuous, SRM, group-sequential, and Bayesian calculators
German (de) and Spanish (es) UI and report locales with Accept-Language regional fallback on the backend; header switcher now ships all four languages
removed build-time frontend token injection and switched the UI to browser-session API tokens
made the frontend fail closed for write actions until backend diagnostics explicitly confirm write-capable access
split soft archive from permanent delete so POST /api/v1/projects/{id}/archive preserves history while DELETE /api/v1/projects/{id} hard-deletes it
added optional HMAC-signed workspace backups via AB_WORKSPACE_SIGNING_KEY, plus signed import/validate enforcement and runtime diagnostics for backup-signing mode
extended local verify wrappers, Docker verification, and CI wiring to cover signed workspace backup flows and removed the obsolete VITE_API_TOKEN path from CI
made scripts/verify_all.ps1 a thin delegation wrapper to the canonical batch verify path so Windows verification no longer drifts
fixed workspace export/import round-trip regressions so exported bundles now validate and reimport cleanly with matching checksums
normalized repository boolean fields for project list responses and closed the broken workspace import SQL insert path
regenerated frontend API contracts and API docs to match the current backend/archive schema
stabilized the smoke flow around free-port backend startup, browser draft persistence checks, and refreshed demo screenshots
replaced the Playwright E2E launch path with a self-contained runner that builds the frontend when needed, starts a temporary backend on a free port, and cleans it up after the run
aligned local verify scripts and CI Playwright installation syntax with the hardened E2E path
re-ran full local verification, including python scripts/verify_all.py --with-e2e
redesigned the frontend into a dashboard-style interface with metric cards, accordion sections, timeline history, live backend status, progress bar, tooltips, and loading spinners
added a workspace status board that summarizes saved-project coverage, snapshot depth, export reach, revision depth, and current draft sync state
made the frontend auth-aware so read-only API sessions disable save, analysis, report export, workspace import, and delete actions instead of failing at runtime
upgraded typography to Inter + JetBrains Mono and added dark-mode support
surfaced browser draft storage issues as dismissible UI toasts
added a quota-specific autosave warning for QuotaExceededError while keeping generic storage failure details for other browser-local errors
added explicit bonferroni_note to calculation responses for multivariant designs
regenerated frontend API contracts from FastAPI OpenAPI
kept deterministic calculations, warnings, saved-project history, and comparison flows aligned with the new UI
added backend performance regression coverage with a <100ms p95 guard for deterministic calculations
added GET /api/v1/diagnostics with storage/frontend/LLM runtime summary
added X-Request-ID and X-Process-Time-Ms headers for lightweight request tracing
expanded saved-project comparison contracts with executive summaries, warning severity, overlap sections, and comparison highlights
added GET /readyz for runtime readiness checks with 503 on degraded dependencies
added workspace export/import APIs and UI actions for project/history backup and restore
added saved-project revision history across create, update, and workspace import flows
added GET /api/v1/projects/{project_id}/revisions plus frontend restore of older payload revisions
added SQLite schema version reporting plus configurable journal mode, synchronous mode, and busy-timeout diagnostics
added structured backend logging with configurable plain/json output
added config validation for invalid ports and broken LLM retry/backoff settings
expanded CI to also verify the repo on Windows and to check generated API docs
added workspace backup roundtrip verification to the local/CI verify path
added optional API token auth for /api/v1/*, /readyz, and local API docs
added frontend bearer-token support through VITE_API_TOKEN at that stage; this path was later superseded by browser-session tokens
added optional read-only API token support for safe runtime requests while keeping mutations behind the write token
hardened Docker packaging with build-time frontend token injection, runtime defaults, container healthchecks, and secure compose verification; the build-time token path was later removed
added workspace backup integrity manifests with entity counts and SHA-256 checksum validation on import
added POST /api/v1/workspace/validate so workspace bundles can be preflight-checked before SQLite writes begin
added structured API error payloads with error_code, status_code, request_id, and X-Error-Code
added in-memory runtime request/error counters to diagnostics for lightweight observability
extended diagnostics and readiness with SQLite write-probe, db-size, parent-path, and free-disk reporting