LLM Gateways, Routing, and Fallbacks
An LLM gateway sits between your application and model providers. It centralizes model access, logging, routing, budgets, and fallback behavior.
Why gateways exist
Without a gateway, every app team handles:
- API keys
- provider-specific request shapes
- retries
- model names
- usage logging
- cost limits
- rate limits
- failover
- audit trails
A gateway makes these concerns shared infrastructure.
Routing patterns
| Pattern | Example |
|---|---|
| static routing | support bot uses model A |
| cost routing | easy tasks use cheap model |
| latency routing | mobile requests use fast model |
| capability routing | image tasks use multimodal model |
| fallback routing | if provider A fails, use provider B |
| eval routing | route based on measured success rate |
What to log
- model and provider
- prompt version
- token usage
- latency
- cost
- user/app/team
- safety flags
- schema validation status
- tool calls
- trace ID
Fallbacks are product decisions
If a model call fails, do not blindly switch to a weaker model for every task.
Ask:
- Is a lower-quality answer acceptable?
- Should the user be told?
- Can the action be retried safely?
- Does the fallback support the same schema/tools?
- Is the request high risk?
Knowledge check
Q1: What is the main benefit of an LLM gateway?
It centralizes reliability, cost, provider abstraction, and observability.
Q2: Why can fallback be dangerous?
The fallback model may not support the same safety, tool, schema, or quality requirements.