Router Policies
Configure model routing strategies, fallback chains, and rate limits.
Router policies control how inference requests are distributed across the targets of a logical model. Each policy defines a routing strategy, fallback behavior, and rate limits. Policies are applied per logical model.
Routing Strategies
MechaMental offers four routing strategies. Each determines how the router selects a target when a request comes in.
Priority (Ordered Fallback)
Targets are tried in their defined priority order. The request goes to the highest-priority target first. If that target fails or is rate-limited, the router moves to the next target in the chain.
Best for: The default choice -- predictable behavior with built-in resilience.
Lowest Cost
The router selects the cheapest available target based on the model's per-token pricing. If the cheapest target is unavailable, the next cheapest is tried.
Best for: Cost-sensitive workloads where quality differences between targets are acceptable.
Lowest Latency
The router selects the target with the lowest recent response latency, based on rolling averages. This adapts dynamically as provider performance changes.
Best for: Real-time applications where response speed is critical.
Round Robin
Requests are distributed evenly across all available targets in rotation. If a target becomes unavailable, it is skipped until it recovers.
Best for: Load balancing across multiple provider accounts to avoid hitting rate limits on any single target.
Creating a Router Policy
Navigate to Router Policies
Go to Admin -> Router Policies in the sidebar. This page lists all existing policies and their assigned models.
Create a New Policy
Click New Policy and fill in:
- Name -- a descriptive name (e.g., "Production Priority", "Cost-Optimized Routing")
- Description -- explain the intended use case
- Strategy -- select one of the four routing strategies
Configure Fallback Behavior
Define how the router handles failures:
| Setting | Description |
|---|---|
| Retry Count | How many times to retry a failed target before moving to the next one in the chain |
| Timeout | Maximum wait time (in milliseconds) per target before considering it failed |
| Rate Limit Handling | Automatically skip targets that are currently rate-limited |
| Circuit Breaker | Temporarily disable targets with high error rates, re-enabling them after a cooldown period |
Set Rate Limits
Configure rate limits that this policy enforces:
- Requests per minute -- maximum API calls through this policy
- Tokens per minute -- maximum token throughput
- Daily cap -- maximum requests or tokens per day
- Monthly cap -- maximum requests or tokens per month
Assign to Logical Models
Select which logical models this policy applies to. Each logical model can have one active router policy. You can also set a policy as the default -- it applies to any logical model that does not have a specific policy assigned.
Fallback Chains
Fallback chains are the core resilience mechanism. When a target fails, the router moves to the next target according to the routing strategy.
Cross-provider resilience
Using targets from different providers in your fallback chain gives you cross-provider resilience. If one provider has an outage, traffic automatically shifts to targets on other providers.
How Fallback Works
- The router sends the request to the selected target (based on the strategy)
- If the target returns an error, the router retries up to the configured retry count
- If all retries fail, the target is marked as failed and the router moves to the next target
- If a target is rate-limited, the router skips it immediately (no retries wasted)
- If the circuit breaker has tripped for a target, the router skips it until the cooldown expires
- This continues until a target succeeds or all targets in the chain have been exhausted
Rate Limits
Rate limits can be configured at multiple levels. They stack -- a request must pass all applicable limits.
| Level | Where to Configure | Description |
|---|---|---|
| Per Target | Logical model target settings | Limit traffic to a specific provider target |
| Per Policy | Router policy settings | Limit total traffic through the policy |
| Per Workspace | Model entitlement settings | Limit a workspace's usage of a model |
| Per Provider Account | Provider account settings | Limit total traffic to a provider account |
When any limit is reached, the router either skips to the next target (if fallbacks are available) or returns a rate limit error to the caller.
Set limits with headroom
Rate limits should be set with enough headroom for normal usage spikes. If limits are too tight, legitimate requests will be rejected during peak periods.