Skip to content

Execution modes

GraceKelly exposes one low-level primitive and a set of higher-level patterns built on top of it. The primitive runs an explicit plan you describe; the patterns decide the plan for you. This page explains the difference so you can pick the right endpoint. Full request and response schemas live in the API reference.

You want to…EndpointGroup
Run a prompt against a plan you control (models, quorum, merge)POST /api/v1/orchestrateOrchestration
Let the service classify the prompt and pick the patternPOST /api/v1/smartSmart
Same as smart, but with the Consensus V2 enginePOST /api/v1/smart/v2Smart
Converge several attempts into one agreed answerPOST /api/v1/consensusConsensus
Stress-test a claim with a Devil’s-Advocate roundPOST /api/v1/debateDebate
See how different models answer the same promptPOST /api/v1/compareCompare
Run one model over many prompts at oncePOST /api/v1/batchBatch
Trade latency for reliability without choosing a patternPOST /api/v1/pipelinePipeline

The base operation. You hand it a prompt and an explicit execution plan — which models to use, a quorum to reach, a merge_strategy for combining results, and flags like dry_run, reasoning, and decompose. It executes synchronously and returns the final task snapshot. Every higher-level pattern ultimately drives this same path.

Reach for it when you already know exactly how a request should run and want no routing decisions made for you. The streaming variant (orchestrate/stream) emits incremental events, and orchestrate/upload accepts file attachments.

These endpoints decide the plan for you, each optimising for a different goal.

smart classifies the prompt, assesses its complexity, and selects the execution pattern itself: a single call, Consensus V1, a role-based approach, or decomposition into subtasks. smart/v2 behaves the same but swaps in the Consensus V2 engine when consensus is needed — agglomerative (HAC) clustering, cross-pollination between attempts, debate rounds, and explicit divergence handling.

Use smart as the default front door when you do not want to think about patterns. Use smart/v2 when answer quality on hard, ambiguous prompts matters more than latency.

Generates several response variations per round, clusters them by semantic similarity, and iterates until the top cluster reaches the consensus target. The output is the answer the model agrees with itself on most often.

Use it to suppress one-off hallucinations and sampling noise on factual or analytical prompts.

Produces an initial position, then runs a structured round: a Devil’s-Advocate challenge, a defense, and an improved final response. The answer is pressure-tested rather than averaged.

Use it for claims, recommendations, and decisions where the failure mode is confident-but-wrong, not noisy.

Runs the same prompt on each requested model concurrently and returns every answer. With analyze=true and at least two successes, an extra call summarises where the models agree and differ.

Use it to evaluate models against each other, or to surface disagreement as a signal in its own right.

Runs a single model over up to 20 prompts in parallel, returning per-prompt success or failure. This is the breadth tool, not a quality tool.

Runs a prompt through a pattern chosen from a reliability level rather than a named strategy. Set multi_model=true to fan out across all configured API providers and aggregate the results.

Use it when you want to ask for “more reliable” or “faster” without committing to a specific pattern.

Regardless of the entry point, work converges on the orchestrator, which resolves an adapter per model, applies circuit-breaker and budget policy, and assembles the result — see the architecture overview for the request lifecycle diagram and the execution adapters that actually talk to each provider.