Architecture
Architecture
Section titled “Architecture”GraceKelly is being rebuilt as a small and explicit orchestration core. The foundation must stay understandable under failure.
Current scope
Section titled “Current scope”Implemented:
- full API surface: orchestration, task inspection, health, readiness, model catalog, metrics
- auxiliary HTTP surfaces: analytics, detailed health, task export, task retry, and file-upload orchestration
- SSE streaming for single-model execution via
/api/v1/orchestrate/stream - canonical model registry with alias normalization
- typed task/step/event contracts with multi-model execution planning
- token counting and cost estimation per step (
input_tokens,output_tokens,cost_usd) - model pricing registry for cost estimation
- bounded browser-submit budgeting for per-task and per-hour quotas
- account-pool primitives and manager for provider cooldown and selection
- session-aware prompt shaping with configurable context-window limits
- dual-backend storage: in-memory (development) and PostgreSQL (durable)
- PostgreSQL operational tooling: schema validation plus JSON export/import snapshots
- execution adapters: dry-run, OpenAI-compatible API, Anthropic API, Perplexity browser (scripted and thin Playwright backends)
- embeddings client: Mistral embeddings for consensus clustering only
- cooperative cancel-on-quorum with per-model timeout, concurrency enforcement, and a minimal browser circuit breaker
- browser adapter lifecycle cleanup that resets idle session state on shutdown and detects stale runtime/session mismatches
- structured key-value logging across orchestrator, browser adapters, API routes, and PostgreSQL degradation paths
- request metrics histograms/counters plus optional OpenTelemetry bootstrap for observability
- optional opt-in usage telemetry middleware appending one JSONL record per HTTP request
to
<repo>/logs/usage.jsonlfor honest 30-day usage audits before any simplification work; 429s that return before correlation middleware still get a generatedX-Request-ID - operator surfaces: recent-task listing with multi-axis filtering, rich task detail with diagnostics
- built-in web UI with single-model, pairwise, five-model, and auto-routing execution patterns
- post-phase audit snapshots preserved under
docs/audits/; the standalone strategic auditaudit_opus_2026-04-26.mdat the repo root scopes simplification options A/B/C/D
Not yet implemented:
- richer retry policies beyond the current deferral
- broader browser catalog refresh if account-tier drift continues
- cross-project integration glue
Excluded by design:
- SQLite
Module boundaries
Section titled “Module boundaries”api.routes: HTTP contract only - no domain logic, no adapter importsapi.routes.stream: SSE streaming endpoint for real-time execution outputapi.routes.analytics: aggregate model performance stats from recent task and step historyapi.routes.health_detailed: detailed adapter and embeddings health endpointcore.models: canonical model catalog and alias resolutioncore.orchestrator: use-case orchestration, event building, storage coordinationcore.contracts: execution adapter interface, result envelopes, failure taxonomycore.planning: execution plan construction and request validationcore.router: adapter dispatch, concurrency gate, quorum aggregationcore.account_pool: thread-safe account selection and cooldown trackingcore.account_pool_manager: pool access wrapper for provider-side throttling decisionscore.readiness: component health aggregation across profilescore.concurrency: thread-safe per-model concurrency gatecore.execution_profile: profile-aware adapter requirement setscore.budget: per-task and per-hour browser submit budgetingcore.embeddings: Mistral embeddings client used for consensus clustering onlycore.session_context: bounded session-history reconstruction for follow-up promptsrequest_metrics: in-process HTTP and adapter metrics backing/metricstelemetry: optional OpenTelemetry setup for FastAPImiddleware.setup_usage_telemetry: opt-in JSONL append per request, gated byGRACEKELLY_USAGE_TELEMETRY_ENABLED; sha256 prompt-hash is recorded only for the orchestration POST routes, body itself is not persistedtools.recon_weekly: weekly Perplexity DOM reconnaissance with structural diff against.workflow/state/perplexity-selectors-baseline.json; drift writeslogs/recon-drift.jsonland.workflow/state/perplexity-selectors-drift.flagadapters.dry_run: simulated execution for testing and dry-run modeadapters.api.anthropic: Anthropic API adapteradapters.api.openai_compat: OpenAI-compatible API adapteradapters.browser.perplexity: Perplexity browser adapter (delegates to automation port)adapters.browser.automation: browser automation port ABC and null implementationadapters.browser.playwright_driver: thin Playwright browser backendadapters.browser.scripted: scripted browser backend for testingadapters.browser.selectors: centralized Perplexity DOM anchors from live reconadapters.browser.session: browser session state managementadapters.browser.policy: popup, auth, model verification, and submit policiesstorage.base: storage contract (TaskRepository ABC)storage.memory: thread-safe in-memory backendstorage.postgres: PostgreSQL backend with migration tooling
Architectural decisions
Section titled “Architectural decisions”- PostgreSQL is the first durable backend. SQLite is not part of the target architecture.
- Multi-model orchestration is a first-class requirement, not a later enhancement.
- Execution must support two adapter families:
- browser adapters for UI-routed providers
- API adapters for provider-backed execution
- A solo-user web UI is part of the primary operating surface for experimentation and inspection.
- Provider-specific naming drift must be normalized through the central model registry.
- Browser execution via Perplexity is the primary adapter. The user’s Perplexity Pro subscription provides access to multiple frontier models at no additional API cost. API adapters exist as optional fallbacks for direct provider access. Mistral is retained only for embeddings in consensus clustering, not as an LLM execution backend.
- Event logging must not be a critical dependency for accepting or executing a task.
Design rules
Section titled “Design rules”- Every external dependency must sit behind an adapter boundary.
- Persistence is replaceable. Memory first, PostgreSQL next.
- Model names are canonicalized once at the edge.
- Browser execution is the primary path - Perplexity subscription gives access to frontier models. API execution is a fallback, using the same orchestration contract.
- Observability must be append-only and isolated from request execution.
Known integrators
Section titled “Known integrators”External clients integrate through the local V2 HTTP API on http://127.0.0.1:8011.
Verified clients (all migrated from V1 by 2026-04-25):
RAG_Support_Assistant— provider-aware support bot with GraceKelly as a fallback LLM provider. Smoke harnessscripts/gracekelly_smoke.pywalks 8 steps (healthz, profile, simple ask, tool loop, schema dispatch, streaming, metrics, failover).agent_toolkit— LangGraph agent building blocks. TheOrchestratorChatModelLangChain adapter dispatches one of sixGKPatterns (SINGLE / SONAR / DUAL / FIVE_MODELS / CONSENSUS / MAXIMUM) to the matching V2 route (/orchestrate,/compare,/consensus,/smartwithreliability_level=high).juhub(Perplexity_Orchestrator2\juhub) — daily AI debate scheduled at 08:30 via Windows Task Scheduler.backend/scheduler.pyperforms a pre-flight:8011/healthz/readycheck and gracefully skips the run if V2 is not reachable; it does not auto-spawn V2 on its own.
The legacy V1 orchestrator at Perplexity_Orchestrator2 (port 8001, endpoints /api/gk/*)
is deprecated as of 2026-04-25. See Perplexity_Orchestrator2\DEPRECATED.md.
The dry-run profile gate covers all eight sync routes used by external clients
(/api/v1/smart, /smart/v2, /orchestrate, /consensus, /debate, /compare,
/batch, /pipeline) — verified in docs/audits/2026-04-25-dry-run-gate-audit.md.
Mistral remains embeddings-only for consensus clustering and is not an LLM execution backend
(formerly an unintended baggage adapter; ripped out in batches 101-b/c).
Operations tooling
Section titled “Operations tooling”scripts/ecosystem_smoke.py— single-command health check across V2 + 3 known clients. Used as the integrator regression gate.scripts/win-autostart/— Windows Task Scheduler artefacts to keep V2 always-on under user logon, with aset_profile.battoggle for execution profile.scripts/win-autostart/install_recon_cron.bat— registers a weekly Friday 03:00 taskGraceKelly Selectors Reconthat runsgracekelly-recon-weekly, captures the live Perplexity DOM via Playwright, and diffs it against the stored baseline. The CLI loads the repo.envbefore resolvingGRACEKELLY_BROWSER_PROFILE_DIR. Drift surfaces inlogs/recon-drift.jsonland as.workflow/state/perplexity-selectors-drift.flag.docs/audits/2026-04-25-dry-run-gate-audit.md— per-route audit table proving dry-run profile coverage.audit_opus_2026-04-26.md(repo root) — BCG-style strategic audit. Scores engineering 9/10 vs strategic fit for personal use 5/10; recommends Option B Simplify (-42% src LOC, -67% test LOC) once 30-day usage telemetry confirms which endpoints are actually exercised.
Next steps
Section titled “Next steps”- Improve consensus/debate streaming beyond the current single-model streaming path.
- Broaden retry policies beyond the current deferral if operational demand appears.
- Add more models to the pricing registry as providers are added.