Router Policies

Router policies control how inference requests are distributed across the targets of a logical model. Each policy defines a routing strategy, fallback behavior, and rate limits. Policies are applied per logical model.

Routing Strategies

MechaMental offers four routing strategies. Each determines how the router selects a target when a request comes in.

Priority (Ordered Fallback)

Targets are tried in their defined priority order. The request goes to the highest-priority target first. If that target fails or is rate-limited, the router moves to the next target in the chain.

Best for: The default choice -- predictable behavior with built-in resilience.

Lowest Cost

The router selects the cheapest available target based on the model's per-token pricing. If the cheapest target is unavailable, the next cheapest is tried.

Best for: Cost-sensitive workloads where quality differences between targets are acceptable.

Lowest Latency

The router selects the target with the lowest recent response latency, based on rolling averages. This adapts dynamically as provider performance changes.

Best for: Real-time applications where response speed is critical.

Round Robin

Requests are distributed evenly across all available targets in rotation. If a target becomes unavailable, it is skipped until it recovers.

Best for: Load balancing across multiple provider accounts to avoid hitting rate limits on any single target.

Creating a Router Policy

Navigate to Router Policies

Go to Admin -> Router Policies in the sidebar. This page lists all existing policies and their assigned models.

Create a New Policy

Click New Policy and fill in:

Name -- a descriptive name (e.g., "Production Priority", "Cost-Optimized Routing")
Description -- explain the intended use case
Strategy -- select one of the four routing strategies

Configure Fallback Behavior

Define how the router handles failures:

Setting	Description
Retry Count	How many times to retry a failed target before moving to the next one in the chain
Timeout	Maximum wait time (in milliseconds) per target before considering it failed
Rate Limit Handling	Automatically skip targets that are currently rate-limited
Circuit Breaker	Temporarily disable targets with high error rates, re-enabling them after a cooldown period

Set Rate Limits

Configure rate limits that this policy enforces:

Requests per minute -- maximum API calls through this policy
Tokens per minute -- maximum token throughput
Daily cap -- maximum requests or tokens per day
Monthly cap -- maximum requests or tokens per month

Assign to Logical Models

Select which logical models this policy applies to. Each logical model can have one active router policy. You can also set a policy as the default -- it applies to any logical model that does not have a specific policy assigned.

Fallback Chains

Fallback chains are the core resilience mechanism. When a target fails, the router moves to the next target according to the routing strategy.

Cross-provider resilience

Using targets from different providers in your fallback chain gives you cross-provider resilience. If one provider has an outage, traffic automatically shifts to targets on other providers.

How Fallback Works

The router sends the request to the selected target (based on the strategy)
If the target returns an error, the router retries up to the configured retry count
If all retries fail, the target is marked as failed and the router moves to the next target
If a target is rate-limited, the router skips it immediately (no retries wasted)
If the circuit breaker has tripped for a target, the router skips it until the cooldown expires
This continues until a target succeeds or all targets in the chain have been exhausted

Rate Limits

Rate limits can be configured at multiple levels. They stack -- a request must pass all applicable limits.

Level	Where to Configure	Description
Per Target	Logical model target settings	Limit traffic to a specific provider target
Per Policy	Router policy settings	Limit total traffic through the policy
Per Workspace	Model entitlement settings	Limit a workspace's usage of a model
Per Provider Account	Provider account settings	Limit total traffic to a provider account

When any limit is reached, the router either skips to the next target (if fallbacks are available) or returns a rate limit error to the caller.

Set limits with headroom

Rate limits should be set with enough headroom for normal usage spikes. If limits are too tight, legitimate requests will be rejected during peak periods.

Model Configuration

Set up provider accounts, logical models, and entitlements that router policies operate on.

Billing & Limits

Track spending and set cost-based limits alongside router rate limits.

Example: Priority fallback with three providers

Example: Round Robin with rate limit avoidance

Model Configuration

Billing & Limits

On this page