MechaMental
Admin Guide

Router Policies

Configure model routing strategies, fallback chains, and rate limits.

Router policies control how inference requests are distributed across the targets of a logical model. Each policy defines a routing strategy, fallback behavior, and rate limits. Policies are applied per logical model.

Routing Strategies

MechaMental offers four routing strategies. Each determines how the router selects a target when a request comes in.

Priority (Ordered Fallback)

Targets are tried in their defined priority order. The request goes to the highest-priority target first. If that target fails or is rate-limited, the router moves to the next target in the chain.

Best for: The default choice -- predictable behavior with built-in resilience.

Lowest Cost

The router selects the cheapest available target based on the model's per-token pricing. If the cheapest target is unavailable, the next cheapest is tried.

Best for: Cost-sensitive workloads where quality differences between targets are acceptable.

Lowest Latency

The router selects the target with the lowest recent response latency, based on rolling averages. This adapts dynamically as provider performance changes.

Best for: Real-time applications where response speed is critical.

Round Robin

Requests are distributed evenly across all available targets in rotation. If a target becomes unavailable, it is skipped until it recovers.

Best for: Load balancing across multiple provider accounts to avoid hitting rate limits on any single target.

Creating a Router Policy

Go to Admin -> Router Policies in the sidebar. This page lists all existing policies and their assigned models.

Create a New Policy

Click New Policy and fill in:

  • Name -- a descriptive name (e.g., "Production Priority", "Cost-Optimized Routing")
  • Description -- explain the intended use case
  • Strategy -- select one of the four routing strategies

Configure Fallback Behavior

Define how the router handles failures:

SettingDescription
Retry CountHow many times to retry a failed target before moving to the next one in the chain
TimeoutMaximum wait time (in milliseconds) per target before considering it failed
Rate Limit HandlingAutomatically skip targets that are currently rate-limited
Circuit BreakerTemporarily disable targets with high error rates, re-enabling them after a cooldown period

Set Rate Limits

Configure rate limits that this policy enforces:

  • Requests per minute -- maximum API calls through this policy
  • Tokens per minute -- maximum token throughput
  • Daily cap -- maximum requests or tokens per day
  • Monthly cap -- maximum requests or tokens per month

Assign to Logical Models

Select which logical models this policy applies to. Each logical model can have one active router policy. You can also set a policy as the default -- it applies to any logical model that does not have a specific policy assigned.

Fallback Chains

Fallback chains are the core resilience mechanism. When a target fails, the router moves to the next target according to the routing strategy.

Cross-provider resilience

Using targets from different providers in your fallback chain gives you cross-provider resilience. If one provider has an outage, traffic automatically shifts to targets on other providers.

How Fallback Works

  1. The router sends the request to the selected target (based on the strategy)
  2. If the target returns an error, the router retries up to the configured retry count
  3. If all retries fail, the target is marked as failed and the router moves to the next target
  4. If a target is rate-limited, the router skips it immediately (no retries wasted)
  5. If the circuit breaker has tripped for a target, the router skips it until the cooldown expires
  6. This continues until a target succeeds or all targets in the chain have been exhausted

Rate Limits

Rate limits can be configured at multiple levels. They stack -- a request must pass all applicable limits.

LevelWhere to ConfigureDescription
Per TargetLogical model target settingsLimit traffic to a specific provider target
Per PolicyRouter policy settingsLimit total traffic through the policy
Per WorkspaceModel entitlement settingsLimit a workspace's usage of a model
Per Provider AccountProvider account settingsLimit total traffic to a provider account

When any limit is reached, the router either skips to the next target (if fallbacks are available) or returns a rate limit error to the caller.

Set limits with headroom

Rate limits should be set with enough headroom for normal usage spikes. If limits are too tight, legitimate requests will be rejected during peak periods.

On this page