Question 1

How does fallback behavior work for agents?

Accepted Answer

You define an ordered list of models per API key. On 5xx, timeout, or rate-limit, we retry the next one automatically. The agent sees one successful response — not an error and a retry loop.

Question 2

Can I pin a model version for determinism?

Accepted Answer

Yes. Every model has pinned versions (e.g., gpt-5-4-20260210). Use pinned versions in production agents so behavior doesn't drift on model updates.

Question 3

What's the SLA?

Accepted Answer

99.9% uptime on paid plans, 99.95% on Enterprise with custom contracts. Fallback chains give you provider-level redundancy on top of ours.

Question 4

How do I monitor cost per agent?

Accepted Answer

Tag every request with an agent-id / session-id / customer-id header. The dashboard breaks down usage and cost per tag, with export to CSV or your warehouse.

Question 5

Can I run long-running async tasks?

Accepted Answer

Yes. For tasks that outlast a single HTTP request, use the async job API — submit, poll, or webhook on completion.

Question 6

What about tool calls and MCP?

Accepted Answer

OpenAI-style tool calls work across every model. Native MCP (Model Context Protocol) server support is in beta.

Infrastructure for agents that run in production

Your agent doesn't care which provider answered

Deterministic fallbacks

Per-agent budgets

Long-context, long-running

Structured outputs guaranteed

Full observability

Model-agnostic by design

Agents that survive provider outages

Hard budgets per agent, per customer

Frequently asked questions

Put your agent in production