Skip to content

P4.2 — Late / out-of-order events (Phase 4 · 2026-06-26)

P4.2 — Late / out-of-order events (Phase 4 · 2026-06-26)

Section titled “P4.2 — Late / out-of-order events (Phase 4 · 2026-06-26)”

Builds on P4.1 (occurred_at on exposures+conversions, schema 12). First consumer of event-time. Additive, informational (no decision impact — like F5 holdout / CUPED).

Design decision — attribution relative to the user’s own exposure (not a global window)

Section titled “Design decision — attribution relative to the user’s own exposure (not a global window)”

There is no stored experiment start/end (only a design-time estimated_duration_days). A global window derived from analysis JSON is fragile. Instead use a per-user attribution horizon, which is the standard A/B attribution model and uses occurred_at on both tables (why P4.1 added it to exposures too):

For each (exposed user, conversion on the metric):

  • out-of-orderconversion.occurred_at < exposure.occurred_at — causally impossible (the user converted before being exposed); a clock-skew / ingestion-order artifact. Excluded from attribution.
  • lateconversion.occurred_at > exposure.occurred_at + horizon_days — happened after the attribution horizon closed. Excluded from attribution.
  • in-windowexposure.occurred_at <= conversion.occurred_at <= exposure.occurred_at + horizon.

horizon_days = constant default ATTRIBUTION_HORIZON_DAYS = 14 (configurable later). Documented as a conservative default; not a per-experiment field in this slice.

Scope (additive, no behaviour change to existing reads)

Section titled “Scope (additive, no behaviour change to existing reads)”
  • Existing get_experiment_analysis_aggregates (primary rollup) is NOT changed — it stays occurred_at-agnostic for backward compatibility (every existing live/decision test depends on it).
  • New diagnostic read get_event_timing_summary(experiment_id, metric, horizon_days) counts, over exposed users’ conversions on the metric: in_window, late, out_of_order. Dual-backend SQL.
  • New live-stats block event_timing (counts + horizon) surfaced as an indicator (“N late · N out-of-order of M conversions”). Informational; decision_service untouched.
  • constants ATTRIBUTION_HORIZON_DAYS = 14.
  • repository get_event_timing_summary — CTE joins conversions to exposures on (exp,user), classifies each conversion row by comparing c.occurred_at vs e.occurred_at (+ horizon in seconds), returns counts. Portable dual-SQL (?%s; datetime compared as ISO-8601 text, lexicographically ordered for UTC +00:00 strings — verified by _normalize_occurred_at). Exclude holdout (vi<0).
  • live_stats_service _build_event_timing_block + wire into build_live_stats (param event_timing_summary, field event_timing).
  • routes/execution _compute_live_stats collects the summary for the primary metric.
  • schemas LiveEventTimingBlock {metric, horizon_days, in_window, late, out_of_order, total} + LiveStatsResponse += event_timing. Regenerate api-contract.ts + docs/API.md.
  • frontend LiveStatsSection EventTimingBlock (indicator) + lib/api re-export.
  • i18n×7 results.liveStats.eventTiming* (term “late”/“out-of-order” rendered per locale).
  • tests: test_execution_live_stats (+ in-window/late/out-of-order classification, horizon boundary, empty, endpoint e2e) · test_postgres_backend (+ event-timing round-trip, skip on Win) · vitest (+ EventTimingBlock render).

Serial Windows gate (ruff, mypy —strict, backend pytest+coverage≥88, tsc, vitest, build<500, contract —check, locale). PG timing round-trip validated by CI verify-postgres. Then push → PR → CI → merge under the standing “реши сам” mandate. Deploy stays gated on “задеплой”.

  • ISO-8601 UTC strings from _normalize_occurred_at are lexicographically comparable (fixed-width, +00:00), so text </> in SQL is correct for ordering — no DB datetime type needed (matches the project’s created_at TEXT convention). Horizon add is done in Python (compute cutoff per row) OR via julianday/EXTRACT — to stay portable, classify in Python after pulling per-conversion (e.occurred_at, c.occurred_at) pairs, OR push a portable interval compare. Decide at impl: prefer SQL counts with a portable expression; fall back to Python classification over rows if dual-SQL interval math diverges.