Question 1

Which SDKs work out of the box?

Accepted Answer

Any OpenAI-compatible SDK: openai-python, openai-node, LangChain, LlamaIndex, Vercel AI SDK, Mastra, and anything that takes a base URL. Change one line and you're in.

Question 2

What's the latency overhead?

Accepted Answer

Typically 5-15 ms over a direct provider call for the routing layer. Streaming starts as soon as the upstream starts — we don't buffer.

Question 3

Do you support tool calls and structured outputs?

Accepted Answer

Yes. Tool calls, JSON mode, and strict schema enforcement work across every provider that supports them. For providers that don't, we emulate via prompt engineering.

Question 4

How do fallback chains work?

Accepted Answer

Define an ordered list of models in your API key config. On 5xx, timeout, or rate limit, we retry against the next one. Single response to the client, transparent to your code.

Question 5

Can I pin a model version for reproducibility?

Accepted Answer

Yes — every model has both a floating alias (e.g., claude-sonnet-latest) and pinned versions (e.g., claude-sonnet-4-6-20260115).

Question 6

What are the rate limits?

Accepted Answer

Per-plan and per-key. Limits are visible in the dashboard and in response headers. Enterprise plans come with custom quotas.

One API, every model

Built by developers, for developers

OpenAI-compatible endpoints

Automatic fallback chains

Per-key quotas

Streaming, tools, structured outputs

One invoice, one balance

Observability out of the box

Built-in API for developers

Fallback chains that save your night

Frequently asked questions

Ship your AI feature this week