P4.2 — Late / out-of-order events (Phase 4 · 2026-06-26)
P4.2 — Late / out-of-order events (Phase 4 · 2026-06-26)
Section titled “P4.2 — Late / out-of-order events (Phase 4 · 2026-06-26)”Builds on P4.1 (occurred_at on exposures+conversions, schema 12). First consumer of
event-time. Additive, informational (no decision impact — like F5 holdout / CUPED).
Design decision — attribution relative to the user’s own exposure (not a global window)
Section titled “Design decision — attribution relative to the user’s own exposure (not a global window)”There is no stored experiment start/end (only a design-time estimated_duration_days). A global
window derived from analysis JSON is fragile. Instead use a per-user attribution horizon, which is
the standard A/B attribution model and uses occurred_at on both tables (why P4.1 added it to
exposures too):
For each (exposed user, conversion on the metric):
- out-of-order ⟺
conversion.occurred_at < exposure.occurred_at— causally impossible (the user converted before being exposed); a clock-skew / ingestion-order artifact. Excluded from attribution. - late ⟺
conversion.occurred_at > exposure.occurred_at + horizon_days— happened after the attribution horizon closed. Excluded from attribution. - in-window ⟺
exposure.occurred_at <= conversion.occurred_at <= exposure.occurred_at + horizon.
horizon_days = constant default ATTRIBUTION_HORIZON_DAYS = 14 (configurable later). Documented as a
conservative default; not a per-experiment field in this slice.
Scope (additive, no behaviour change to existing reads)
Section titled “Scope (additive, no behaviour change to existing reads)”- Existing
get_experiment_analysis_aggregates(primary rollup) is NOT changed — it stays occurred_at-agnostic for backward compatibility (every existing live/decision test depends on it). - New diagnostic read
get_event_timing_summary(experiment_id, metric, horizon_days)counts, over exposed users’ conversions on the metric:in_window,late,out_of_order. Dual-backend SQL. - New live-stats block
event_timing(counts + horizon) surfaced as an indicator (“N late · N out-of-order of M conversions”). Informational;decision_serviceuntouched.
- constants
ATTRIBUTION_HORIZON_DAYS = 14. - repository
get_event_timing_summary— CTE joins conversions to exposures on (exp,user), classifies each conversion row by comparingc.occurred_atvse.occurred_at(+ horizon in seconds), returns counts. Portable dual-SQL (?→%s; datetime compared as ISO-8601 text, lexicographically ordered for UTC+00:00strings — verified by_normalize_occurred_at). Exclude holdout (vi<0). - live_stats_service
_build_event_timing_block+ wire intobuild_live_stats(paramevent_timing_summary, fieldevent_timing). - routes/execution
_compute_live_statscollects the summary for the primary metric. - schemas
LiveEventTimingBlock {metric, horizon_days, in_window, late, out_of_order, total}+LiveStatsResponse += event_timing. Regenerateapi-contract.ts+docs/API.md. - frontend
LiveStatsSectionEventTimingBlock (indicator) +lib/apire-export. - i18n×7
results.liveStats.eventTiming*(term “late”/“out-of-order” rendered per locale). - tests:
test_execution_live_stats(+ in-window/late/out-of-order classification, horizon boundary, empty, endpoint e2e) ·test_postgres_backend(+ event-timing round-trip, skip on Win) · vitest (+ EventTimingBlock render).
Verify / gate
Section titled “Verify / gate”Serial Windows gate (ruff, mypy —strict, backend pytest+coverage≥88, tsc, vitest, build<500,
contract —check, locale). PG timing round-trip validated by CI verify-postgres. Then push → PR →
CI → merge under the standing “реши сам” mandate. Deploy stays gated on “задеплой”.
- ISO-8601 UTC strings from
_normalize_occurred_atare lexicographically comparable (fixed-width,+00:00), so text</>in SQL is correct for ordering — no DB datetime type needed (matches the project’screated_at TEXTconvention). Horizon add is done in Python (compute cutoff per row) OR via julianday/EXTRACT — to stay portable, classify in Python after pulling per-conversion (e.occurred_at, c.occurred_at) pairs, OR push a portable interval compare. Decide at impl: prefer SQL counts with a portable expression; fall back to Python classification over rows if dual-SQL interval math diverges.