Skip to content

Architecture overview

A single user question travels through this path before a grounded answer is returned. The diagram covers the synchronous /api/ask path; ingest and admin paths share the same surface but skip the LangGraph stage.

retrieve

generate

verify_facts

+ evaluate

persist trace

metrics

spans

User question

FastAPI

api.app + routers

JWT auth + tenant filter

LangGraph

agent.graph nodes

ChromaDB

per-tenant collection

Provider router

Ollama / GraceKelly / Mistral

Online evaluators

+ trace state

SQLite

tracing.sqlite_trace

Grounded answer

+ citations + follow-ups

Prometheus

OpenTelemetry / Langfuse

The path is one-way; the LangGraph node loop (retry, rewrite_query) lives inside the Graph box and is detailed in LangGraph state machine. Each node, store, and provider maps to a directory below.

api/

Thin FastAPI app shell + routers split per concern (auth, sessions, agent, admin, analytics, feedback, conversation, upload). Late-binding through api._shared.app_module() keeps monkeypatch.setattr(api.app, …) tests working.

agent/

LangGraph state machine: state.py (TypedDict shape), graph.py (nodes + edges), prompts.py, prompt_registry.py (experiment-aware sticky-rollout), tools.py. See the auto-generated LangGraph state machine.

llm/providers/

Pluggable provider abstraction: base.py interface, ollama.py, gracekelly.py (browser-proxy to Perplexity Pro), mistral.py (OpenAI-compatible). Failover to local via routing profiles. See the provider routing matrix.

vectordb/

Hybrid retriever (BM25 + dense + cross-encoder rerank). Tenant-aware vectordb.manager wraps the base ChromaDB engine in _base_manager.py; each tenant lands in a separate collection.

evaluation/

Online + offline evaluators, RAGAS-style metrics without the ragas package, regression runner with mock-by-default paid-API gate, experiment registry, rollback watcher, weekly improvement backlog, threshold recommendations.

tracing/

tracing._base_trace is the canonical SQLite store; tracing.sqlite_trace is the public API that wraps it and adds PII redaction on log_step (production code imports from tracing.sqlite_trace). Langfuse and OpenTelemetry adapters export the same span data when configured.

StorePurposeNotes
ChromaDBVector store for KB chunks.Per-tenant collection. Persistent on disk.
SQLiteCanonical LangGraph trace store (tracing.sqlite_trace over tracing._base_trace).WAL mode; runtime store, not a dev-only fallback.
PostgresSessions, feedback, escalated tickets, experiments.Alembic migrations 001–017. Round-trip CI gate. Tracing does not go here.
RedisRate-limit counters, JWT refresh sessions, ephemeral cache.Optional in dev (in-memory fallback).
  • Multi-tenancy is enforced at four layers: schema (tenant_id columns), propagation (api/middleware/tenant.py), query enforcement (per-router filters), and per-tenant ChromaDB collections.
  • Resilience — provider calls are wrapped in a configurable chain of timeout, retry, circuit-breaker, semaphore, and wall-time budget layers; the exact composition lives in llm/providers/runtime.py.
  • Observability ships 24+ Prometheus metrics, alert rules in deploy/helm/, and OpenTelemetry → Langfuse generation tracing.
  • Security stays fail-fast on missing JWT/SESSION/admin secrets at startup when RAG_ENV=production. Tenant isolation is tested cross-tenant for every surface.