Quickstart — RAG Support Assistant

The minimum steps to run the service locally and verify it works.

0. Requirements

Python 3.11+ (tested on 3.13)
Docker Desktop (for Postgres + Redis in dev and for regression eval)
~8 GB disk space for embeddings/reranker/cache; explicit Ollama mode requires additional space for models.

Per selected profile:

GraceKelly at D:\GraceKelly\ (port 8011) — default local orchestrator for Claude Sonnet 4.6 / GPT-5 / Gemini via Perplexity Pro.
Ollama (https://ollama.com/download) — for explicit local-first scenario or fallback.
Mistral API key (MISTRAL_API_KEY) — for direct Mistral fast-tier.

1. Dependencies

cd D:\RAG_Support_Assistant
python -m venv .venv
. .venv/Scripts/activate          # Windows PowerShell: . .venv\Scripts\Activate.ps1
pip install --require-hashes -r requirements.lock

2. Configuration

cp .env.example .env              # Windows: copy .env.example .env

Open .env and fill in the required values. Minimal scenarios:

Scenario	Required variables
GraceKelly primary (default)	`GRACEKELLY_BASE_URL=http://127.0.0.1:8011`, `LLM_PROVIDER_PROFILE=gracekelly-primary` is implied
Local-only Ollama	`LLM_PROVIDER_PROFILE=local-first`
+ Mistral fast tier	`MISTRAL_API_KEY=<key>` + `LLM_PROVIDER_PROFILE=external-mistral`
GraceKelly mixed routing (Claude Sonnet 4.6 reasoning)	`MISTRAL_API_KEY=<key>` + `LLM_PROVIDER_PROFILE=gracekelly-mixed` + `GRACEKELLY_REQUEST_TIMEOUT_SEC=120`

Full list of variables — see README.md section Environment Variables.

3. Infrastructure (Postgres + Redis)

For dev — spin up disposable containers:

docker run -d --name rag-postgres -p 5432:5432 \
    -e POSTGRES_USER=rag -e POSTGRES_PASSWORD=rag_dev_password -e POSTGRES_DB=rag_assistant \
    postgres:16-alpine

docker run -d --name rag-redis -p 6379:6379 redis:7-alpine

Then run migrations:

alembic upgrade head

4. Scenario A — GraceKelly primary (default)

# Start GraceKelly in a separate terminal
cd D:\GraceKelly
uvicorn gracekelly.main:create_app --factory --host 127.0.0.1 --port 8011

# Launch RAG Support Assistant
cd D:\RAG_Support_Assistant
python main.py

Open http://localhost:8000/static/login.html (password + SSO) or http://localhost:8000/static/chat.html (chat UI). After login — /agent for the agent copilot dashboard. (legacy / index UI was removed 2026-04-27 — it was unauthenticated, see SESSION-NOTES-2026-04-27.)

gracekelly-primary profile routes fast and strong tiers through the local GraceKelly orchestrator. /api/health/ready checks GraceKelly readiness and does not require Ollama if the active profile does not use Ollama.

5. Scenario B — explicit Local-only Ollama

# Start Ollama and pull models
ollama serve &
ollama pull qwen2.5:7b

# Launch with explicit local-first profile
LLM_PROVIDER_PROFILE=local-first python main.py

6. Scenario C — GraceKelly mixed routing

Useful when you need reasoning quality (Claude Sonnet 4.6) for final answers, but want background processing (classification, grade_docs, verify_facts) handled by fast Mistral API.

Start GraceKelly (separate project):

cd D:\GraceKelly
$env:GRACEKELLY_EXECUTION_PROFILE = "hybrid"
uvicorn gracekelly.main:create_app --factory --host 127.0.0.1 --port 8011

In D:\RAG_Support_Assistant\.env:

MISTRAL_API_KEY=<your-key>
LLM_PROVIDER_PROFILE=gracekelly-mixed
GRACEKELLY_REQUEST_TIMEOUT_SEC=120

Launch RAG:
Terminal window
```
python main.py
```

gracekelly-mixed profile routes fast tier through Mistral API (~1-3s/call), strong tier (final answer) through GraceKelly browser → Perplexity Pro (Claude Sonnet 4.6, ~30-60s/call).

7. Document ingestion and first query

# Document for ingestion (PDF, MD, TXT)
# PowerShell (Windows) — note: curl.exe, not curl (which is the Invoke-WebRequest alias)
curl.exe -X POST http://localhost:8000/api/upload `
    -H "Authorization: Bearer <admin-jwt>" `
    -F "file=@docs/warranty.md"

# Bash (Linux/macOS)
curl -X POST http://localhost:8000/api/upload \
    -H "Authorization: Bearer <admin-jwt>" \
    -F "file=@docs/warranty.md"

# First query
# PowerShell (Windows)
curl.exe -X POST http://localhost:8000/api/ask `
    -H "Authorization: Bearer <admin-jwt>" `
    -H "Content-Type: application/json" `
    -d '{"question":"What is the warranty period?"}'

# Bash (Linux/macOS)
curl -X POST http://localhost:8000/api/ask \
    -H "Authorization: Bearer <admin-jwt>" \
    -H "Content-Type: application/json" \
    -d '{"question":"What is the warranty period?"}'

To get admin JWT for dev: POST /api/auth/login with admin/admin (if ADMIN_PASSWORD_HASH is not set in .env).

8. Health checks

curl http://localhost:8000/api/health/live      # liveness
curl http://localhost:8000/api/health/ready     # readiness (dependencies)
curl http://localhost:8000/api/metrics          # metrics snapshot
curl http://localhost:8000/api/admin/providers  # active routing profile + recent usage (auth)

9. Regression eval

For continuous quality checks against a curated 20-case dataset:

# Mock provider benchmark (no GK, no quota burn)
python scripts/regression_eval.py \
    --baseline ollama-small \
    --candidate mistral-small-latest \
    --max-cases 5 \
    --no-persist

# Live GK mixed routing (requires explicit paid/API opt-in)
python scripts/regression_eval.py \
    --baseline ministral-3b-latest \
    --candidate-profile gracekelly-mixed \
    --max-cases 20 \
    --allow-paid-apis

Without --allow-paid-apis, provider/model targets run in mock-provider-benchmark mode: answers and cost/latency metrics are simulated from evaluation/curated_cases.jsonl, so the command does not call GraceKelly or Mistral and does not persist to the DB when --no-persist is set. Live provider calls require explicit opt-in via --allow-paid-apis.

Results are written to reports/regression/<timestamp>-*.{json,md}. PowerShell wrapper scripts\run_regression_via_gracekelly.ps1 -AllowLive spins up disposable Postgres + Redis + ingestion + regression in one command after explicit live opt-in.

10. Common issues

11. Where to go next

README.md — full list of env vars + public endpoints + Prometheus metrics.
docs/runbook.md — operational runbook for on-call (alerts, diagnostics, actions).
docs/disaster-recovery.md — DR scenarios A-F (data loss, encryption, encryption-key).
docs/operations/ — runbooks for backup, helm, gracekelly smoke.
docs/CHANGELOG.md — change history by arcs.