Next Session: Current Backlog Handoff
Next Session: Current Backlog Handoff
Section titled “Next Session: Current Backlog Handoff”⚠️ УСТАРЕЛО (ранняя линия работ). «No backlog remains» здесь НЕ отражает текущее состояние. Актуальный handoff —
next-session-fable-hardening.md; общий старт —AGENT_STATE.md(«START HERE»). НЕ использовать для онбординга.
Continue from the merged Colab remote benchmark handoff. No non-live autopilot-safe product backlog remains; do not start GraceKelly, Docker, Ollama, GraceKelly browser orchestration, local model downloads, or live external-provider benchmark paths unless the user explicitly opts in during that session.
Current Baseline
Section titled “Current Baseline”- Latest completed work:
- Colab remote benchmark runbook and notebook were added and merged to
masterthrough PR #1. - Windows laptop and iMac are documented as thin clients for this benchmark lane; benchmark compute must happen in Colab/cloud.
- Notebook clone target is now
master. .pytest-tmp*/local pytest basetemp directories are ignored.- PR #1 is merged:
https://github.com/brownjuly2003-code/RAG_Support_Assistant/pull/1. - Master CI and Pages deploy passed on merge commit
415d4c8after the notebook lint fix, ChromaDB locked-audit update, CI security config test alignment, Claude trace audit fixes, and the Python 3.11 smoke-report compatibility fix. - Post-merge handoff commit
f8ffb0fis onorigin/master. - GitHub Actions action-major refresh
52d16c4and Weekly Report import fixa86b44care onorigin/master. - 2026-05-30 Codex audit and remediation series is recorded in
audit_codex_30_05_26.mdandAGENT_STATE.md. Completed local fixes: Agent UI API-data XSS hardening, docs-sitedevaluelock update, docs-site npm audit workflow guard, production security headers and production-only FastAPI docs/OpenAPI disabling, local-dev-only default Compose bindings, production auto-migration fail-closed behavior with explicit fail-open override, safetar.extractall(..., filter="data")restore extraction, and docs-site 404 route warning cleanup. - 2026-05-30 Claude audit is recorded in
audit_claude_30_05_26.md. It focuses on RAG implementation quality and flags R7/R1/R2/R3/R4/R5: unmeasured RAG quality, English default reranker on RU content, LLM fan-out, and deferred deprecation/security follow-up. R2 is closed by5c7f3b1: RRF no longer deduplicates solely by a 200-character content prefix and has shared-context-prefix regression tests. R5’s baseline tokenizer fix is closed bye91c1f1: BM25 uses Unicode word tokens pluscasefold()for both index and query tokenization; deeper RU lemmatization remains optional future tuning. R7 has a partial live baseline: commit7b0d9eemakes incompatible Chroma collections fail closed, and a separate ignored eval collection built from the three tracked demo KB docs passed a 3-case live Mistral regression (ministral-3b-latestvsmistral-small-latest, 100%/100%, 0 regressions). The default localrag_docs_defaultcollection is still stale/incompatible until rebuilt. Commit517ec57fixed live regression latency accounting; a 1-case live follow-up reported non-zero latency instead of0.0 ms. R3/R4 per-doc grade fan-out is partially closed by71367a7: multi-documentgrade_docsnow uses one structured batch LLM call with JSON/text parsing fallback and the old per-doc path retained when batch grading is unavailable. Master CI run26679982808and Pages run26679982810passed. R4 fact-verification observability is closed byc0b6d24: extract-claims and per-claim verification calls now emit trace events with durations (verify_facts.extract_claims,verify_facts.verify_claim). Master CI run26680293620and Pages run26680293609passed. R7 local seed coverage was raised byc964211: the checked-in curated dataset now has 35 unique RU cases over the tracked warranty/returns/error KB docs, with a guard test preventing regression below that floor. Local mock regression passed 35/35 cases; master CI run26680554552passed. The final CI guard also updates the PRregression-evalpaths-filter to includeevaluation/curated_cases.jsonl. Local follow-up676b3e0adds the ADR 0001 adaptive retrieval seam:RAG_RETRIEVAL_STRATEGY,GLOBALclassification, vector-only retrieval for simple routed queries, and simple-query bypass ofgrade_docsandverify_facts. Local focused graph/retriever/settings tests passed. Local follow-ups32e841f,6b7417d, and325d63cexpandevaluation/curated_cases_aircargo.jsonlfrom 31 to 100 grounded RU aircargo cases; mock regression passed 100/100. - 2026-05-30 Claude CLI follow-up: read-only full-project
claude -preview prompts were blocked by Anthropic cyber safeguards, andclaude ultrareview --timeout 30returned “Ultrareview is currently unavailable.” The actual Claude audit file above was supplied separately. - 2026-05-30 non-local check: the stale scheduled Weekly Report failures
were caused by
ModuleNotFoundError: No module named 'config'when Actions ranpython scripts/weekly_report.py --dry-run. Commita86b44ckeeps the repository root onPYTHONPATH; master CI run26671830370and manual Weekly Report dispatch26671836799both passed. - 2026-05-30 readiness/runtime note: local
.envcontainsMISTRAL_API_KEYand Mistral/v1/modelsreturned200; after explicit user opt-in, GraceKelly was started locally onhttp://127.0.0.1:8011,/healthz/readyreturnedok,/api/v1/modelsreturned a model catalog, and a minimal/api/v1/orchestratesmoke succeeded. No secret value was printed or copied. - 2026-05-31 local autonomous audit follow-ups are recorded in
AGENT_STATE.md: L1 import-time deprecation reduction, lazySemanticChunker/CrossEncoderloading, M4 focused tests for graph/API helpers,agent/tools.py,auth/oidc.py, andadmin_review, plus L2 wording that marks old gate hashes/counts as historical ledger evidence. These are local commits only; no push, deploy, live benchmark, model download, or cache deletion was performed. - Agent Copilot semantic context UI and zero-overlap similar-ticket filtering.
- Mock-safe benchmark Quickstart example and guardrail test.
static/widget.htmla11y landmark coverage and color-contrast fix.- Axe/Lighthouse verification:
tests/test_a11y.pywith axe CLI38 passed; Lighthouse mobile/static/chat.htmlscored performance 99. - Local gate and Windows pytest workflow docs.
- Provider API-key guard tests for missing/placeholder direct-provider keys.
- Autopilot runner protocol tests for PAUSE, BLOCKED, and allowed-path
enforcement without invoking real
piorcodex. - Active benchmark-doc guardrail: any
--allow-paid-apisexample in active benchmark docs must be explicitly labeled live and opt-in/manual. - Top-level AP housekeeping tasks are closed:
test: guard historical backlog pointersanddocs: refresh autopilot state snapshot.
- Colab remote benchmark runbook and notebook were added and merged to
- Active source of truth:
docs/plans/2026-05-01-backlog.md. Current branch state is summarized inAGENT_STATE.md.
Remaining Work
Section titled “Remaining Work”- Live Batch N benchmark decision: closed 2026-05-07 — mock-provider
benchmark run is the canonical regression signal (see Recently Closed
in
docs/plans/2026-05-01-backlog.md). A live GraceKelly+Mistral run remains a discretionary experiment for specific business reasons, not a backlog item. - Colab remote benchmark setup: notebook and runbook are committed on
masterfor manual Colab use. - Colab remote benchmark PR: PR #1 merged to
masteras415d4c8. Master CI and Pages deploy passed. - Weekly Report scheduled workflow import failure: closed 2026-05-30 by
a86b44c; manual workflow_dispatch run26671836799passed onmaster. - Chroma embedding compatibility guard: closed 2026-05-30 by
7b0d9ee; CI run26679263174and Pages run26679263187passed onmaster. - Live regression latency accounting: closed 2026-05-30 by
517ec57; CI run26679564874passed onmaster. - Batch document grading: closed 2026-05-30 by
71367a7; CI run26679982808and Pages run26679982810passed onmaster. - Fact-verification LLM trace coverage: closed 2026-05-30 by
c0b6d24; CI run26680293620and Pages run26680293609passed onmaster. - Curated RU seed expansion: closed 2026-05-30 by
c964211; dataset is now 35 cases, local mock regression passed 35/35, and master CI run26680554552passed. - Regression-eval dataset path guard: final local change adds
evaluation/curated_cases.jsonlto the PR paths-filter and covers it intests/test_github_workflows.py. - Adaptive retrieval seam: closed locally by
676b3e0; simple routed queries use vector-only retrieval when available and skip grade/verify, whileGLOBALclassification is ready for a future graph retriever. - Aircargo curated seed expansion: closed locally by
32e841f,6b7417d, and325d63c; dataset is now 100 cases and local mock regression passed 100/100.
Compact Resume Plan
Section titled “Compact Resume Plan”- Close current dirty WIP first, if still dirty.
- If
masteris clean ata86b44cor later, do not repeat the Weekly Report fix or the action-major refresh merely to update handoff prose. - With no new failing remote run, open PR/issue, or explicit live opt-in, the only default non-destructive local work is branch hygiene or focused follow-up from a new failing check. Do not repeat the audit-remediation family merely to refresh timestamps or handoff prose.
- If continuing R7 locally, do not assume the default local Chroma store is
usable. Either rebuild
rag_docs_defaultdeliberately from the intended KB corpus, or keep using an explicit eval prefix such asVECTORDB_COLLECTION_PREFIX=rag_eval_20260530t0835for non-destructive regression runs.
Suggested Subagents
Section titled “Suggested Subagents”None for default non-live work. Use subagents only if the session explicitly opts into planning or running a live benchmark.
Next Session Plan
Section titled “Next Session Plan”- Start with
git status --short --branchand confirm any dirty state is expected before changing files. - Read
AGENT_STATE.md,docs/operations/colab-remote-benchmark.md,docs/plans/2026-05-01-backlog.md, and this handoff before changing files. - If opening the notebook manually, use:
https://colab.research.google.com/github/brownjuly2003-code/RAG_Support_Assistant/blob/master/notebooks/rag_support_colab_remote_benchmark.ipynb - If no explicit live opt-in is given, keep work read-only or docs-only and do not reopen closed AP housekeeping tasks or already-closed Codex audit fixes.
- Do not create new deployment/release/scheduler work without an explicit instruction.
- If making changes, keep scope to docs/tests unless a focused failing test proves runtime code needs a small fix.
- Do not run GraceKelly, Mistral benchmark calls, scheduler installation, deploy, or production data commands without explicit user opt-in in that session.
- Verify with focused tests first, then
git diff --check; run full pytest if source or test files changed.
Current Session Checks
Section titled “Current Session Checks”- No live external-provider benchmark has run without explicit opt-in.
- Active handoff docs point future sessions at the Colab runbook on
master, not the closed Batch N lane. -
git status --short --branchwas reviewed before this post-merge handoff refresh; branch was clean againstorigin/masterat415d4c8. - PR #1 merged to
masteras415d4c8. - Master CI passed on
415d4c8. - Pages docs build and deploy passed on
415d4c8. - Weekly Report workflow import failure fixed by
a86b44c. - Master CI passed on
a86b44c(26671830370). - Manual Weekly Report dispatch passed on
a86b44c(26671836799). - Codex audit remediation focused local checks passed; see
AGENT_STATE.mdfor command-level evidence. - Explicit live opt-in was received for GraceKelly/Mistral runtime.
- Live R7 partial baseline ran on a separate eval collection and passed 3/3 cases with 0 regressions.
- Live latency verification passed on the same eval collection with non-zero baseline/candidate latency in the report.
- Batch
grade_docsfan-out reduction committed and verified on CI (71367a7, CI26679982808). -
verify_factsextract/claim LLM calls now have trace events for R4 latency analysis (c0b6d24, CI26680293620). - R7 local default curated seed set expanded to 35 RU cases (
c964211, CI26680554552), and the separate aircargo seed now has 100 grounded RU cases (325d63c). Full R7 now requires a staged Colab/RAGAS baseline, not another default local seed refresh. - Local adaptive retrieval seam committed as
676b3e0; focused pytest, Ruff, py_compile,mypy --follow-imports=skip, andgit diff --checkpassed. - Local aircargo seed set expanded to 100 RU cases (
32e841f,6b7417d,325d63c); fulltests/test_curated_dataset.pypassed and mock regression passed 100/100. - Focused ahead-series verification passed after
db61488: 95 pytest cases, Ruff, py_compile, mypy withvectordb/manager.py, andgit diff --check. - Docs/config ahead gate passed after
8c70cf9:tests/test_docs_quality.py,tests/test_quickstart_docs.py,tests/test_backlog_docs.py, Ruff on those tests, andgit diff --check origin/master..HEAD. - Meta/CI ahead gate passed after
f6efe4f:tests/test_precommit_config.py,tests/test_github_workflows.py, and workflow/pre-commit YAML parsing. - Regression-tooling ahead gate passed after
f6efe4f:tests/test_regression_runner.py,tests/test_provider_benchmark.py,tests/test_detect_stale_curated_cases.py, Ruff, and py_compile. - Pre-commit ahead gate passed with
PRE_COMMIT_HOME=.tmp/pre-commit-cache pre-commit run --from-ref origin/master --to-ref HEAD. The default global pre-commit cache hit a WindowsPermissionErrorbefore hooks ran; use the isolated ignored cache for repeat checks. - Settings/env ahead gate passed after
0f2a2be:tests/test_provider_settings.py,tests/test_settings_production_secrets.py,tests/test_production_entrypoint.py,tests/test_magic_numbers_settings.py,tests/test_experiment_registry.py, Ruff, and py_compile. - Eval-tooling ahead gate passed after
e24d270:tests/test_ragas_eval.py,tests/test_online_evaluators.py,tests/test_regression_eval_profile_target.py,tests/test_experiment_comparison.py, Ruff, and py_compile. - JavaScript/docs-site gate added and verified in
d09405c:astro checkis now backed by checked-in dev dependencies,npm auditreports 0 vulnerabilities,node --checkcovers static/admin/widget and docs-site scripts, Astro build produces 33 pages, and pre-commit passes with the isolated.tmp/pre-commit-cache. - Docs-site CI now runs
npm run checkbefore build (67a067f), with a red/green guard intests/test_github_workflows.py. - Checked-in JS/MJS syntax is guarded by
tests/test_static_js_quality.py(fd6c864), coveringstatic/admin.js,static/widget.js,docs-site/astro.config.mjs, anddocs-site/scripts/*.mjs. - Ripgrep search hygiene committed as
bd4c25a: repo-local.ignoreskipspytest-cache-files-*, and the broad JavaScript/docs-sitergsearch now avoids WindowsAccess is deniednoise from pytest temp dirs. - Widget static assets now have FastAPI smoke coverage (
6a0469d) for/static/widget.jsand/static/widget.html; focused pytest passed withPYTEST_DISABLE_PLUGIN_AUTOLOAD=1because local globalschemathesisplugin autoload fails before collection on missing_pytest.subtests. - Static HTML entrypoints now have FastAPI smoke coverage (
31996d1) for admin/agent/analytics/chat/help/login/metrics/widget pages; focused UI/static JS verification passed 12 tests, Ruff, py_compile, andgit diff --check. - Analytics Chart.js CDN dependency is pinned and SRI-protected
(
d9227e2); a red/green JS quality guard now fails on unversioned jsDelivr npm scripts or missing integrity/crossorigin. - 2026-05-31 local autonomous audit follow-ups are committed locally:
L1 lazy imports/deprecation reduction, M4 focused helper and router
tests, and L2 historical-ledger wording in durable state docs. Focused
pytest/Ruff/py_compile/mypy gates and
git diff --checkpassed for the changed files; seeAGENT_STATE.mdfor command-level evidence.