Provider-agnostic model routing with intelligent fallbacks. Define logical models, configure provider accounts, and set up routing policies that keep your AI pipelines resilient and cost-effective.
Abstract model definitions that decouple your pipelines from specific providers. Instead of hardcoding "gpt-4o" or "claude-sonnet" into your steps, you define a logical model with the capabilities and defaults you need, then map it to one or more provider targets.
Declare what your model supports: streaming responses, tool/function calling, vision inputs, and structured JSON mode output. Pipelines automatically respect these capabilities.
Configure default parameters like temperature, max tokens, top-p, and stop sequences. Steps inherit these unless they specify their own overrides.
Each logical model maps to one or more concrete provider models. A single logical model like "primary-llm" can target OpenAI GPT-4o as its primary and Anthropic Claude as its fallback, all without your pipeline knowing the difference.
Configure API credentials for model providers. Each provider account encapsulates authentication, usage limits, and health state so that your routing layer always knows where to send requests.
Set requests-per-minute and tokens-per-minute limits per account. The router respects these limits and shifts traffic to alternative providers when thresholds are reached.
Assign dollar-amount budgets to each provider account. Track spend in real time and get alerts before you hit the ceiling.
Continuous health checks track latency, error rates, and availability for each provider. Unhealthy accounts are automatically deprioritized.
Intelligent routing rules that match incoming requests to the right model based on step type and purpose. Policies are evaluated in priority order, and each one can define fallback behavior, circuit breakers, and match criteria.
Assign numeric priorities to each policy. The router evaluates policies from highest to lowest priority, using the first match. This gives you fine-grained control over which model handles which workload.
When the primary model is unavailable or returns errors, the router automatically falls back to secondary models defined in the policy. Requests keep flowing without any changes to your pipelines.
Built-in circuit breaker pattern with configurable failure thresholds and recovery windows. When a model exceeds its error threshold, the circuit opens and traffic is redirected to healthy alternatives until the recovery period elapses.
Policies match requests based on step type (inference, tool call, extraction) and purpose (chat, summarization, classification). This lets you route different workloads to the most suitable and cost-effective model.
Every policy tracks live metrics: total match count, success rate, and error rate. Use these stats to tune priorities, adjust fallback chains, and identify underperforming models.
Model management is the foundation of a resilient, cost-efficient AI platform. Here is what it unlocks for your team.
Logical models abstract the provider away. Move from OpenAI to Anthropic (or any other provider) by updating a single mapping, not every step in every pipeline.
Automatic fallback chains and circuit breakers ensure your AI keeps running even when a provider has an outage or hits rate limits.
Assign monthly budgets to each provider account and track spend in real time. Get alerts before costs spiral and route cheaper workloads to cheaper models.
Use match criteria on step type and purpose to send complex reasoning to powerful models and simple classification to fast, inexpensive ones.