Project overview
Это содержимое пока не доступно на вашем языке.
RAG Support Assistant
Section titled “RAG Support Assistant”Answers support questions against a knowledge base and decides whether a request can be resolved automatically or should be escalated to a human.
Public HTTP endpoints are documented below; runtime configuration lives in docs/CONFIGURATION.md and the metric / monitoring inventory in docs/OPERATIONS.md.
Stack: FastAPI · LangGraph · ChromaDB · GraceKelly/Ollama provider routing · SQLite for trace snapshots · Postgres for sessions/audit/copilot/analytics · Redis for cache · OpenTelemetry · email/Bitrix escalation channels.
Architecture
Section titled “Architecture”User / Email / Widget | v FastAPI + Auth (JWT / OIDC / RBAC) | v LangGraph pipeline / agent graph classify -> retrieve -> rerank -> generate -> verify -> evaluate \-> tool calls (search_kb / check_order_status / create_ticket) | +-> ChromaDB + BM25 + category metadata +-> GraceKelly orchestrator / Ollama fallback +-> Postgres (sessions, audit, copilot, analytics) +-> Redis cache +-> SQLite traces + OTel spans- Retrieval: ChromaDB (vector) + BM25 hybrid search, Reciprocal Rank Fusion, cross-encoder reranking, contextual headers, and optional document category metadata.
- Generation: GraceKelly is the default local orchestrator, with explicit
local-firstOllama/Qwen2.5 7B routing for offline-only setups. Responses can include inline citations[N]backed by retrieved documents. - Agent layer: Feature-flagged tool use supports multi-step reasoning, confirmation-gated irreversible actions, and agent-side ticket creation.
- Routing: Requests resolve to
auto,human,retry, orerrordepending on quality, factuality, and downstream tool outcomes. - Channels: The same backend powers the web chat, agent copilot, and email ingestion/reply paths.
- Security: JWT auth, Google/Microsoft OIDC SSO, tenant isolation, and
pgcryptocolumn encryption protect enterprise deployments. - Observability: Request traces are written to SQLite, optional OpenTelemetry exports spans to Jaeger/Tempo, and Prometheus exposes operational metrics.
- Knowledge loops: Nightly eval drift checks, KB gap clustering, KB draft generation, freshness alerts, and weekly reports keep the knowledge base improving over time.
Module layout (high level)
Section titled “Module layout (high level)”api/app.py+api/routers/— FastAPI app construction and the extracted sub-routers (system, agent, admin_*, analytics, auth_sso, conversation, feedback, upload, …). New endpoint groups land inapi/routers/, notapp.py.api/_shared.py,api/correlation.py,api/rate_limit.py— lazyapp_module()accessor, request-correlation context, and rate-limit primitives.agent/— LangGraph pipeline + state + prompts.auth/— JWT, X-API-Key, OIDC, RBAC.db/— SQLAlchemy models, async engine, audit log, pgcrypto field.llm/providers/— Ollama / Mistral / GraceKelly providers + cost guard.vectordb/— tenant-aware vector store factory + base implementation.evaluation/— RAGAS metrics, online evaluators, regression framework.monitoring/— Prometheus metrics (~50);tracing/— Langfuse + OTel + SQLite trace store.ingestion/— loaders, pipeline, categorizer, contextual headers.scripts/— operational CLIs (regression eval, KB builders, chunking eval, nightly tasks).
All production packages are mypy --strict clean (CI-enforced).
For a complete audit and an implementation log of recent hardening work, see
docs/audits/audit_opus_2026-04-26.md(especially section 12) andDEPRECATIONS.md. Quick handover for new sessions:docs/SESSION-NOTES-2026-04-26-audit.md.
Features
Section titled “Features”- Inline citations and source panel: Answers can embed
[N]markers that resolve to retrieved documents, excerpts, and a dedicated source panel. - Mobile-first UI:
chat,help,metrics,admin,agent,analytics, andloginpages ship responsive layouts for phone, tablet, and desktop. - WCAG 2.1 AA improvements: Static pages include accessible labels, visible focus states, keyboard-friendly dialogs, and stronger screen-reader support.
- Chat polish: Upload progress, retry flows, onboarding prompts, skeleton states, and clearer error handling reduce dead-end interactions.
- Agent copilot:
/agentand/static/agent.htmlexpose escalated ticket queues, conversation context, AI drafts, and similar resolved tickets. - Agentic tool use: The graph can call KB search, order-status, and ticket-creation tools, with confirmation required for irreversible actions.
- Nightly evaluation:
scripts/nightly_eval.pyruns RAGAS-style checks on recent traces and stores drift against a rolling baseline. - Online evaluators: seven lightweight per-trace checks score citation coverage, answer-length anomalies, retrieval hit rate, tool efficiency, refusals, PII suspicion, and language mismatch without judge LLM calls.
- Knowledge-gap detection:
scripts/kb_gap_detector.pyclusters unresolved questions into admin-visible KB gap records. - Contextual ingestion: New uploads can prepend contextual headers before
embedding, and
scripts/reindex.pyreprocesses existing documents. - OpenTelemetry tracing: FastAPI, httpx, SQLAlchemy, Redis, and graph nodes emit distributed traces to OTLP collectors.
- OIDC SSO: Google and Microsoft sign-in flows issue the same application JWTs used by password login.
- Encryption at rest: Sensitive Postgres columns are encrypted with
pgcryptoand an externalDB_ENCRYPTION_KEY. - Knowledge Builder:
scripts/kb_builder.pyclusters resolved tickets into reviewable KB drafts that admins can publish back into the vector store. - Review queue:
scripts/build_review_queue.pycollects weak, escalated, slow, or thumbs-down traces into a human review backlog with admin actions. - Freshness monitoring: Citation counts plus document age highlight stale-but-important documents for review.
- Auto-categorization: Uploads are classified into categories from
config/categories.yml, and those categories are stored in document metadata. - Analytics dashboard:
/static/analytics.htmlvisualizes top topics, resolution rates, quality trends, and LLM cost summaries. - Weekly reports: Markdown digests can be pushed through Slack or email on a weekly schedule.
- Improvement backlog: A weekly generator combines confirmed bad reviews, KB gaps, slow endpoints, stale docs, evaluator drift, and thumbs-down trends into a prioritized improvement backlog.
- Email channel: Incoming support mail can be processed through IMAP polling or an inbound webhook, then routed through the same RAG flow.
- Canonical module layout: Core agent modules live under
agent/*; the old root-levelgraph.py,prompts.py, andstate.pyshims were removed. - Centralized tuning: Retrieval thresholds and operational constants are
concentrated in
config/settings.pyinstead of scattered literals. - Integration test suite:
tests/integration/covers ingestion, conversation, streaming, concurrency, escalation, and async upload paths.
Quick Start
Section titled “Quick Start”Полная пошаговая справка со сценариями GraceKelly primary, explicit local-only Ollama, Mistral и mixed routing — в
docs/QUICKSTART.md.
Prerequisites: Python 3.11+, локальный GraceKelly на http://127.0.0.1:8011 для default gracekelly-primary profile.
# 1. Dependencies — pinned hashes for reproducibility (Python 3.11+, Linux x86_64)pip install --require-hashes -r requirements.lock# Or for development (adds pytest/ruff/pre-commit):# pip install --require-hashes -r requirements-dev.lock
# 2. Start the default GraceKelly orchestratorcd ../GraceKelly # path to your local GraceKelly checkoutuvicorn gracekelly.main:create_app --factory --host 127.0.0.1 --port 8011
# 3. Run RAG Support Assistantcd RAG_Support_Assistantpython main.pyExplicit Ollama-only mode is still available:
ollama serveollama pull qwen2.5:7bLLM_PROVIDER_PROFILE=local-first python main.pyАльтернативные routing profiles (см. LLM_PROVIDER_PROFILE в docs/CONFIGURATION.md): local-first, external-mistral, gracekelly-mixed. Подробнее — в config/providers.yml и в docs/QUICKSTART.md секции 5-6.
Open:
- http://localhost:8000/static/login.html - password + SSO login page
- http://localhost:8000/static/chat.html - chat UI
- http://localhost:8000/static/admin.html - admin UI for traces, audit, review queue, providers, KB gaps, KB drafts, stale docs, and breaker controls
- http://localhost:8000/agent - agent copilot dashboard
- http://localhost:8000/static/analytics.html - analytics dashboard
- http://localhost:8000/static/metrics.html - system metrics dashboard
2026-04-27: legacy unauthenticated
/index page и/ask,/escalations,/traces,/escalations-ui,/traces-ui*endpoints изmain.pyудалены (Codex audit P0). Production entrypoint:uvicorn api.app:app.python main.pyделегирует вapi.app:app.
Core chat and ingestion
Section titled “Core chat and ingestion”| Method | Path | Auth | Description |
|---|---|---|---|
| POST | /api/ask | user | Ask a question synchronously; returns answer, documents, and citations |
| POST | /api/ask/stream | user | Ask a question over SSE streaming |
| POST | /api/upload | agent/admin | Upload a document for indexing; returns assigned categories |
| GET | /api/tasks/{task_id} | agent/admin | Check background upload task state |
| POST | /api/feedback | user | Submit thumbs up/down feedback |
| POST | /api/escalate | user | Escalate the current request to a human operator |
| GET | /api/sessions | agent/admin | List active sessions |
| GET | /api/sessions/{session_id}/history | agent/admin | Return session history |
| DELETE | /api/sessions/{session_id} | agent/admin | Delete a session |
| GET | /api/feedback/stats | agent/admin | Aggregate feedback statistics |
Agent and admin workflows
Section titled “Agent and admin workflows”| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /api/agent/tickets | agent/admin | List escalated tickets, optionally filtered by status |
| GET | /api/agent/tickets/{ticket_id} | agent/admin | Return ticket details, session messages, and similar tickets |
| POST | /api/agent/tickets/{ticket_id}/respond | agent/admin | Save an operator response and resolve a ticket |
| GET | /api/agent/similar | agent/admin | Return similar resolved tickets for a given ticket |
| GET | /api/admin/audit | agent/admin | List audit-log entries with filters |
| GET | /api/admin/traces | agent/admin | List recent traces |
| GET | /api/admin/traces/{trace_id} | agent/admin | Return one trace with steps and feedback |
| DELETE | /api/admin/traces?older_than_days=N | admin | Purge old traces |
| DELETE | /api/admin/audit-log?older_than_days=N | admin | Purge old audit-log entries |
| POST | /api/admin/circuit-breaker/reset | admin | Force-reset the Ollama circuit breaker |
| GET | /api/admin/kb-gaps | admin | List detected knowledge gaps |
| GET | /api/admin/evaluations/trends?evaluator=<name>&days=30 | admin | Return daily mean-score trends for one online evaluator |
| GET | /api/admin/evaluations/worst?evaluator=<name>&limit=20 | admin | Return the worst recent traces for one online evaluator |
| GET | /api/admin/categories | admin | Return the active category taxonomy |
| GET | /api/admin/kb-drafts | admin | List Knowledge Builder drafts |
| PATCH | /api/admin/kb-drafts/{draft_id} | admin | Edit a pending KB draft |
| POST | /api/admin/kb-drafts/{draft_id}/reject | admin | Reject a pending KB draft |
| POST | /api/admin/kb-drafts/{draft_id}/publish | admin | Publish a KB draft into the vector store |
| GET | /api/admin/improvement-backlog/current | admin | Return the latest improvement backlog as JSON |
| GET | /api/admin/improvement-backlog/archive?year=2026 | admin | List archived improvement backlog weeks |
| GET | /api/admin/stale-docs | admin | List stale but highly cited documents |
| POST | /api/admin/stale-docs/{doc_id}/review | admin | Mark a stale document as reviewed |
| GET | /api/admin/curated-dataset/stats | admin | Aggregate curated dataset counts by verdict, tenant, and channel |
| POST | /api/admin/curated-dataset/rebuild | admin | Trigger an async rebuild of evaluation/curated_cases.jsonl |
| GET | /api/admin/providers | admin | Return provider registry metadata, active profile, recent usage, and 24h cost |
Analytics and channels
Section titled “Analytics and channels”| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /api/analytics/top-topics | agent/admin | Top categories/topics for a time window |
| GET | /api/analytics/resolution-rate | agent/admin | Resolution-rate breakdown by category |
| GET | /api/analytics/cost-summary | agent/admin | Total and per-category LLM cost summaries |
| GET | /api/analytics/trends | agent/admin | Time-series analytics for quality/cost metrics |
| POST | /webhook/email | webhook secret | Preferred inbound email webhook receiver |
| POST | /api/channels/email/inbound | webhook secret | Backward-compatible inbound email webhook alias |
Auth, health, and metrics
Section titled “Auth, health, and metrics”| Method | Path | Auth | Description |
|---|---|---|---|
| POST | /api/auth/login | - | Password login; returns JWT pair |
| POST | /api/auth/refresh | - | Exchange a refresh token for a new token pair |
| GET | /api/auth/sso/providers | - | List enabled SSO providers |
| GET | /api/auth/sso/{provider}/login | - | Start Google or Azure AD OIDC login |
| GET | /api/auth/sso/{provider}/callback | - | Finish OIDC login and set JWT cookies |
| GET | /api/health | - | Readiness alias |
| GET | /api/health/live | - | Liveness probe |
| GET | /api/health/ready | - | Dependency-aware readiness probe |
| GET | /api/metrics | admin | JSON metrics snapshot for the admin dashboard |
| GET | /metrics | - | Prometheus exposition endpoint |
Rate limits: /api/ask is limited to 60 req/min, /api/upload to
10 req/min, and /api/auth/login to 5 req/min per client IP.
Correlation ID: Clients may send X-Request-Id matching
^[A-Za-z0-9_\\-:.]{1,128}$. The server echoes it back, stores it in
SQLite traces, and also propagates it into logs and spans. If it is not
supplied, the API generates a UUID4-derived identifier.
Example requests
Section titled “Example requests”curl -s http://localhost:8000/api/auth/login \ -H "Content-Type: application/json" \ -d '{"username":"admin","password":"admin"}'curl -s http://localhost:8000/api/ask \ -H "Authorization: Bearer $ACCESS_TOKEN" \ -H "Content-Type: application/json" \ -d '{"question":"How do returns work?","session_id":"11111111-1111-1111-1111-111111111111"}'curl -s "http://localhost:8000/api/analytics/top-topics?days=7" \ -H "Authorization: Bearer $ACCESS_TOKEN"Configuration
Section titled “Configuration”Runtime configuration — every environment variable, LLM/provider profiles, RAG-pipeline knobs, resilience/capacity limits, auth, storage, and channel settings — is documented in docs/CONFIGURATION.md.
Operations & evaluation
Section titled “Operations & evaluation”Monitoring (Prometheus + JSON metrics), the regression-eval and provider-benchmark harnesses, online evaluators, experiments, the review queue, improvement backlog, curated dataset, offline-review workflow, and threshold tuning are documented in docs/OPERATIONS.md.
Deployment
Section titled “Deployment”Docker, deployment topology, database migrations, and the pinned dependency-lock workflow are documented in docs/DEPLOYMENT.md.
Multi-tenancy
Section titled “Multi-tenancy”Tenant isolation is built into the application:
- JWT access tokens carry a
tenantclaim. - Traces, audit-log queries, analytics, KB drafts, freshness data, and
admin read/purge endpoints are filtered by
tenant_id. - ChromaDB uses per-tenant collections named
rag_docs_{tenant_id}. - Response cache keys are tenant-scoped.
- OIDC and email flows can resolve tenants from
TENANT_EMAIL_DOMAINS, for exampleacme.com:acme,*:default.
For an existing legacy collection named rag_docs, use the one-time migration:
python scripts/migrate_default_collection.pyWeb UI
Section titled “Web UI”/static/chat.html- main chat UI with inline citations, upload progress, onboarding, responsive layouts, and SSE streaming/static/login.html- login page with password, Google, and Microsoft SSO/static/help.html- end-user help page/static/metrics.html- system metrics dashboard with auto-refresh/static/admin.html- admin UI for breaker control, traces, audit logs, review queue, KB gaps, categories, KB drafts, and stale docs/agentand/static/agent.html- agent copilot dashboard/static/analytics.html- product analytics dashboard/static/widget.html- embeddable support widget
Accessibility
Section titled “Accessibility”- Latest axe audit: docs/a11y/axe-audit-2026-04-21.md
- Current blocking status: PASS for scanned UI pages and rendered Jinja
templates (
0critical,0serious) - Axe/Lighthouse verification 2026-05-03:
tests/test_a11y.pyran with@axe-core/cli4.11.3 installed and completed with38 passed. - Lighthouse mobile
/static/chat.html: performance99, accessibility100, best-practices96, SEO90. - Local unit gate runs
tests/test_a11y.pythrough@axe-core/cliwhen the CLI is installed; axe subprocesses use an explicit timeout budget so the full unit suite does not hang underpytest --timeout=60. - Source status: static coverage now includes
/static/widget.html; source updates have landed for explicit<main>landmarks, the admin analytics<nav>landmark, and the renderedask_resultheading order.
Run the full test suite:
pytest tests/ -vOn Windows, use the repository-local temp directory and disable the broken
auto-loaded Schemathesis plugin. The full workflow is documented in
docs/windows-test-workflow.md:
python -m pytest -p no:schemathesis --basetemp=.tmp/pytestRun only the integration suite added in arc 122:
pytest tests/integration/ -vpytest -m integration -qpytest -m "not integration" -qtests/integration/ covers full ingestion, multi-turn conversation, SSE
streaming, concurrent sessions, escalation, and async upload flows. Browser
accessibility/mobile smoke tests may skip automatically when optional
Playwright dependencies are not installed.
GitHub Actions runs lint, test-unit, test-integration, and pre-commit
on every push and pull request.
Install local tooling with pip install --require-hashes -r requirements-dev.lock.
Run pre-commit run --all-files, pytest tests/ -q --ignore=tests/integration -p no:cacheprovider, and pytest tests/integration -q.
Workflow history and logs are available on the repository Actions -> CI page.
Project structure
Section titled “Project structure”api/ FastAPI app, REST endpoints, middleware, correlation IDsagent/ Canonical graph, prompts, state, and tool modulesauth/ JWT helpers, RBAC dependencies, OIDC integrationcache/ Redis cache with in-memory fallbackchannels/ Telegram and email channel integrationsconfig/ settings.py, categories.yml, loggingdb/ SQLAlchemy models, engine, audit helpers, pgcrypto helpersdeploy/ docker-compose and Helm chart artifactsevaluation/ RAG evaluation, drift detection, benchmarksingestion/ Loaders, pipeline, categorizer, contextual headersintegrations/ Bitrix and local support inbox integrationsmonitoring/ Prometheus collectors and alert rulesreports/ Weekly-report rendererscripts/ Ops jobs: eval, review queue, reindex, KB builder/gap detection, chunking eval, email pollerstatic/ chat, admin, agent, analytics, login, metrics, widget UIstests/ Unit and integration test suitestests/integration/ End-to-end coverage for critical user flowstracing/ SQLite tracing base/wrapper and OpenTelemetry setupvectordb/ ChromaDB manager, BM25 fusion, rerankingalembic/ Migrations `001` through `017`demo/ Demo docs and seed helperscodex-tasks/ Task backlog and archived implementation specsdocs/research/ Research archiveLicense
Section titled “License”MIT. See LICENSE.