Your universal API proxy — one endpoint, 36+ providers, zero downtime.
Chat Completions • Embeddings • Image Generation • Video • Music • Audio • Reranking • 100% TypeScript
🌐 Website • 🚀 Quick Start • 💡 Features • 📖 Docs • 💰 Pricing • 💬 WhatsApp
🌐 Available in: 🇺🇸 English | 🇧🇷 Português (Brasil) | 🇪🇸 Español | 🇫🇷 Français | 🇮🇹 Italiano | 🇷🇺 Русский | 🇨🇳 中文 (简体) | 🇩🇪 Deutsch | 🇮🇳 हिन्दी | 🇹🇭 ไทย | 🇺🇦 Українська | 🇸🇦 العربية | 🇯🇵 日本語 | 🇻🇳 Tiếng Việt | 🇧🇬 Български | 🇩🇰 Dansk | 🇫🇮 Suomi | 🇮🇱 עברית | 🇭🇺 Magyar | 🇮🇩 Bahasa Indonesia | 🇰🇷 한국어 | 🇲🇾 Bahasa Melayu | 🇳🇱 Nederlands | 🇳🇴 Norsk | 🇵🇹 Português (Portugal) | 🇷🇴 Română | 🇵🇱 Polski | 🇸🇰 Slovenčina | 🇸🇪 Svenska | 🇵🇭 Filipino
Connect any AI-powered IDE or CLI tool through OmniRoute — free API gateway for unlimited coding.
|
OpenClaw ⭐ 205K |
NanoBot ⭐ 20.9K |
PicoClaw ⭐ 14.6K |
ZeroClaw ⭐ 9.9K |
IronClaw ⭐ 2.1K |
|
OpenCode ⭐ 106K |
Codex CLI ⭐ 60.8K |
Claude Code ⭐ 67.3K |
Gemini CLI ⭐ 94.7K |
Kilo Code ⭐ 15.5K |
📡 All agents connect via http://localhost:20128/v1 or http://cloud.omniroute.online/v1 — one config, unlimited models and quota
💬 Join our community! WhatsApp Group — Get help, share tips, and stay updated.
- Website: omniroute.online
- GitHub: github.com/diegosouzapw/OmniRoute
- Issues: github.com/diegosouzapw/OmniRoute/issues
- WhatsApp: Community Group
- Original Project: 9router by decolua
Stop wasting money and hitting limits:
Subscription quota expires unused every month
Rate limits stop you mid-coding
Expensive APIs ($20-50/month per provider)
Manual switching between providers
OmniRoute solves this:
- ✅ Maximize subscriptions - Track quota, use every bit before reset
- ✅ Auto fallback - Subscription → API Key → Cheap → Free, zero downtime
- ✅ Multi-account - Round-robin between accounts per provider
- ✅ Universal - Works with Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenClaw, any CLI tool
┌─────────────┐
│ Your CLI │ (Claude Code, Codex, Gemini CLI, OpenClaw, Cursor, Cline...)
│ Tool │
└──────┬──────┘
│ http://localhost:20128/v1
↓
┌─────────────────────────────────────────┐
│ OmniRoute (Smart Router) │
│ • Format translation (OpenAI ↔ Claude) │
│ • Quota tracking + Embeddings + Images │
│ • Auto token refresh │
└──────┬──────────────────────────────────┘
│
├─→ [Tier 1: SUBSCRIPTION] Claude Code, Codex, Gemini CLI
│ ↓ quota exhausted
├─→ [Tier 2: API KEY] DeepSeek, Groq, xAI, Mistral, NVIDIA NIM, etc.
│ ↓ budget limit
├─→ [Tier 3: CHEAP] GLM ($0.6/1M), MiniMax ($0.2/1M)
│ ↓ budget limit
└─→ [Tier 4: FREE] iFlow, Qwen, Kiro (unlimited)
Result: Never stop coding, minimal cost
Every developer using AI tools faces these problems daily. OmniRoute was built to solve them all — from cost overruns to regional blocks, from broken OAuth flows to zero observability.
💸 1. "I pay for an expensive subscription but still get interrupted by limits"
Developers pay $20–200/month for Claude Pro, Codex Pro, or GitHub Copilot. Even paying, quota has a ceiling — 5h of usage, weekly limits, or per-minute rate limits. Mid-coding session, the provider stops responding and the developer loses flow and productivity.
How OmniRoute solves it:
- Smart 4-Tier Fallback — If subscription quota runs out, automatically redirects to API Key → Cheap → Free with zero manual intervention
- Real-Time Quota Tracking — Shows token consumption in real-time with reset countdown (5h, daily, weekly)
- Multi-Account Support — Multiple accounts per provider with auto round-robin — when one runs out, switches to the next
- Custom Combos — Customizable fallback chains with 6 balancing strategies (fill-first, round-robin, P2C, random, least-used, cost-optimized)
- Codex Business Quotas — Business/Team workspace quota monitoring directly in the dashboard
🔌 2. "I need to use multiple providers but each has a different API"
OpenAI uses one format, Claude (Anthropic) uses another, Gemini yet another. If a dev wants to test models from different providers or fallback between them, they need to reconfigure SDKs, change endpoints, deal with incompatible formats. Custom providers (FriendLI, NIM) have non-standard model endpoints.
How OmniRoute solves it:
- Unified Endpoint — A single
http://localhost:20128/v1serves as proxy for all 36+ providers - Format Translation — Automatic and transparent: OpenAI ↔ Claude ↔ Gemini ↔ Responses API
- Response Sanitization — Strips non-standard fields (
x_groq,usage_breakdown,service_tier) that break OpenAI SDK v1.83+ - Role Normalization — Converts
developer→systemfor non-OpenAI providers;system→userfor GLM/ERNIE - Think Tag Extraction — Extracts
<think>blocks from models like DeepSeek R1 into standardizedreasoning_content - Structured Output for Gemini —
json_schema→responseMimeType/responseSchemaautomatic conversion streamdefaults tofalse— Aligns with OpenAI spec, avoiding unexpected SSE in Python/Rust/Go SDKs
🌐 3. "My AI provider blocks my region/country"
Providers like OpenAI/Codex block access from certain geographic regions. Users get errors like unsupported_country_region_territory during OAuth and API connections. This is especially frustrating for developers from developing countries.
How OmniRoute solves it:
- 3-Level Proxy Config — Configurable proxy at 3 levels: global (all traffic), per-provider (one provider only), and per-connection/key
- Color-Coded Proxy Badges — Visual indicators: 🟢 global proxy, 🟡 provider proxy, 🔵 connection proxy, always showing the IP
- OAuth Token Exchange Through Proxy — OAuth flow also goes through the proxy, solving
unsupported_country_region_territory - Connection Tests via Proxy — Connection tests use the configured proxy (no more direct bypass)
- SOCKS5 Support — Full SOCKS5 proxy support for outbound routing
- TLS Fingerprint Spoofing — Browser-like TLS fingerprint via
wreq-jsto bypass bot detection
🆓 4. "I want to use AI for coding but I have no money"
Not everyone can pay $20–200/month for AI subscriptions. Students, devs from emerging countries, hobbyists, and freelancers need access to quality models at zero cost.
How OmniRoute solves it:
- Free Tier Providers Built-in — Native support for 100% free providers: iFlow (8 unlimited models), Qwen (3 unlimited models), Kiro (Claude for free), Gemini CLI (180K/month free)
- Free-Only Combos — Chain
gc/gemini-3-flash → if/kimi-k2-thinking → qw/qwen3-coder-plus= $0/month with zero downtime - NVIDIA NIM Free Credits — 1000 free credits integrated
- Cost Optimized Strategy — Routing strategy that automatically chooses the cheapest available provider
🔒 5. "I need to protect my AI gateway from unauthorized access"
When exposing an AI gateway to the network (LAN, VPS, Docker), anyone with the address can consume the developer's tokens/quota. Without protection, APIs are vulnerable to misuse, prompt injection, and abuse.
How OmniRoute solves it:
- API Key Management — Generation, rotation, and scoping per provider with a dedicated
/dashboard/api-managerpage - Model-Level Permissions — Restrict API keys to specific models (
openai/*, wildcard patterns), with Allow All/Restrict toggle - API Endpoint Protection — Require a key for
/v1/modelsand block specific providers from the listing - Auth Guard + CSRF Protection — All dashboard routes protected with
withAuthmiddleware + CSRF tokens - Rate Limiter — Per-IP rate limiting with configurable windows
- IP Filtering — Allowlist/blocklist for access control
- Prompt Injection Guard — Sanitization against malicious prompt patterns
- AES-256-GCM Encryption — Credentials encrypted at rest
🛑 6. "My provider went down and I lost my coding flow"
AI providers can become unstable, return 5xx errors, or hit temporary rate limits. If a dev depends on a single provider, they're interrupted. Without circuit breakers, repeated retries can crash the application.
How OmniRoute solves it:
- Circuit Breaker per-provider — Auto-open/close with configurable thresholds and cooldown (Closed/Open/Half-Open)
- Exponential Backoff — Progressive retry delays
- Anti-Thundering Herd — Mutex + semaphore protection against concurrent retry storms
- Combo Fallback Chains — If the primary provider fails, automatically falls through the chain with no intervention
- Combo Circuit Breaker — Auto-disables failing providers within a combo chain
- Health Dashboard — Uptime monitoring, circuit breaker states, lockouts, cache stats, p50/p95/p99 latency
🔧 7. "Configuring each AI tool is tedious and repetitive"
Developers use Cursor, Claude Code, Codex CLI, OpenClaw, Gemini CLI, Kilo Code... Each tool needs a different config (API endpoint, key, model). Reconfiguring when switching providers or models is a waste of time.
How OmniRoute solves it:
- CLI Tools Dashboard — Dedicated page with one-click setup for Claude Code, Codex CLI, OpenClaw, Kilo Code, Antigravity, Cline
- GitHub Copilot Config Generator — Generates
chatLanguageModels.jsonfor VS Code with bulk model selection - Onboarding Wizard — Guided 4-step setup for first-time users
- One endpoint, all models — Configure
http://localhost:20128/v1once, access 36+ providers
🔑 8. "Managing OAuth tokens from multiple providers is hell"
Claude Code, Codex, Gemini CLI, Copilot — all use OAuth 2.0 with expiring tokens. Developers need to re-authenticate constantly, deal with client_secret is missing, redirect_uri_mismatch, and failures on remote servers. OAuth on LAN/VPS is particularly problematic.
How OmniRoute solves it:
- Auto Token Refresh — OAuth tokens refresh in background before expiration
- OAuth 2.0 (PKCE) Built-in — Automatic flow for Claude Code, Codex, Gemini CLI, Copilot, Kiro, Qwen, iFlow
- Multi-Account OAuth — Multiple accounts per provider via JWT/ID token extraction
- OAuth LAN/Remote Fix — Private IP detection for
redirect_uri+ manual URL mode for remote servers - OAuth Behind Nginx — Uses
window.location.originfor reverse proxy compatibility - Remote OAuth Guide — Step-by-step guide for Google Cloud credentials on VPS/Docker
📊 9. "I don't know how much I'm spending or where"
Developers use multiple paid providers but have no unified view of spending. Each provider has its own billing dashboard, but there's no consolidated view. Unexpected costs can pile up.
How OmniRoute solves it:
- Cost Analytics Dashboard — Per-token cost tracking and budget management per provider
- Budget Limits per Tier — Spending ceiling per tier that triggers automatic fallback
- Per-Model Pricing Configuration — Configurable prices per model
- Usage Statistics Per API Key — Request count and last-used timestamp per key
- Analytics Dashboard — Stat cards, model usage chart, provider table with success rates and latency
🐛 10. "I can't diagnose errors and problems in AI calls"
When a call fails, the dev doesn't know if it was a rate limit, expired token, wrong format, or provider error. Fragmented logs across different terminals. Without observability, debugging is trial-and-error.
How OmniRoute solves it:
- Unified Logs Dashboard — 4 tabs: Request Logs, Proxy Logs, Audit Logs, Console
- Console Log Viewer — Real-time terminal-style viewer with color-coded levels, auto-scroll, search, filter
- SQLite Proxy Logs — Persistent logs that survive server restarts
- Translator Playground — 4 debugging modes: Playground (format translation), Chat Tester (round-trip), Test Bench (batch), Live Monitor (real-time)
- Request Telemetry — p50/p95/p99 latency + X-Request-Id tracing
- File-Based Logging with Rotation — Console interceptor captures everything to JSON log with size-based rotation
🏗️ 11. "Deploying and maintaining the gateway is complex"
Installing, configuring, and maintaining an AI proxy across different environments (local, VPS, Docker, cloud) is labor-intensive. Problems like hardcoded paths, EACCES on directories, port conflicts, and cross-platform builds add friction.
How OmniRoute solves it:
- npm global install —
npm install -g omniroute && omniroute— done - Docker Multi-Platform — AMD64 + ARM64 native (Apple Silicon, AWS Graviton, Raspberry Pi)
- Docker Compose Profiles —
base(no CLI tools) andcli(with Claude Code, Codex, OpenClaw) - Electron Desktop App — Native app for Windows/macOS/Linux with system tray, auto-start, offline mode
- Split-Port Mode — API and Dashboard on separate ports for advanced scenarios (reverse proxy, container networking)
- Cloud Sync — Config synchronization across devices via Cloudflare Workers
- DB Backups — Automatic backup, restore, export and import of all settings
🌍 12. "The interface is English-only and my team doesn't speak English"
Teams in non-English-speaking countries, especially in Latin America, Asia, and Europe, struggle with English-only interfaces. Language barriers reduce adoption and increase configuration errors.
How OmniRoute solves it:
- Dashboard i18n — 30 Languages — All 500+ keys translated including Arabic, Bulgarian, Danish, German, Spanish, Finnish, French, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese (PT/BR), Romanian, Russian, Slovak, Swedish, Thai, Ukrainian, Vietnamese, Chinese, Filipino, English
- RTL Support — Right-to-left support for Arabic and Hebrew
- Multi-Language READMEs — 30 complete documentation translations
- Language Selector — Globe icon in header for real-time switching
🔄 13. "I need more than chat — I need embeddings, images, audio"
AI isn't just chat completion. Devs need to generate images, transcribe audio, create embeddings for RAG, rerank documents, and moderate content. Each API has a different endpoint and format.
How OmniRoute solves it:
- Embeddings —
/v1/embeddingswith 6 providers and 9+ models - Image Generation —
/v1/images/generationswith 10 providers and 20+ models (OpenAI, xAI, Together, Fireworks, Nebius, Hyperbolic, NanoBanana, Antigravity, SD WebUI, ComfyUI) - Text-to-Video —
/v1/videos/generations— ComfyUI (AnimateDiff, SVD) and SD WebUI - Text-to-Music —
/v1/music/generations— ComfyUI (Stable Audio Open, MusicGen) - Audio Transcription —
/v1/audio/transcriptions— Whisper + Nvidia NIM, HuggingFace, Qwen3 - Text-to-Speech —
/v1/audio/speech— ElevenLabs, Nvidia NIM, HuggingFace, Coqui, Tortoise, Qwen3, + existing providers - Moderations —
/v1/moderations— Content safety checks - Reranking —
/v1/rerank— Document relevance reranking - Responses API — Full
/v1/responsessupport for Codex
🧪 14. "I have no way to test and compare quality across models"
Developers want to know which model is best for their use case — code, translation, reasoning — but comparing manually is slow. No integrated eval tools exist.
How OmniRoute solves it:
- LLM Evaluations — Golden set testing with 10 pre-loaded cases covering greetings, math, geography, code generation, JSON compliance, translation, markdown, safety refusal
- 4 Match Strategies —
exact,contains,regex,custom(JS function) - Translator Playground Test Bench — Batch testing with multiple inputs and expected outputs, cross-provider comparison
- Chat Tester — Full round-trip with visual response rendering
- Live Monitor — Real-time stream of all requests flowing through the proxy
📈 15. "I need to scale without losing performance"
As request volume grows, without caching the same questions generate duplicate costs. Without idempotency, duplicate requests waste processing. Per-provider rate limits must be respected.
How OmniRoute solves it:
- Semantic Cache — Two-tier cache (signature + semantic) reduces cost and latency
- Request Idempotency — 5s deduplication window for identical requests
- Rate Limit Detection — Per-provider RPM, min gap, and max concurrent tracking
- Editable Rate Limits — Configurable defaults in Settings → Resilience with persistence
- API Key Validation Cache — 3-tier cache for production performance
- Health Dashboard with Telemetry — p50/p95/p99 latency, cache stats, uptime
🤖 16. "I want to control model behavior globally"
Developers who want all responses in a specific language, with a specific tone, or want to limit reasoning tokens. Configuring this in every tool/request is impractical.
How OmniRoute solves it:
- System Prompt Injection — Global prompt applied to all requests
- Thinking Budget Validation — Reasoning token allocation control per request (passthrough, auto, custom, adaptive)
- 6 Routing Strategies — Global strategies that determine how requests are distributed
- Wildcard Router —
provider/*patterns route dynamically to any provider - Combo Enable/Disable Toggle — Toggle combos directly from the dashboard
- Provider Toggle — Enable/disable all connections for a provider with one click
- Blocked Providers — Exclude specific providers from
/v1/modelslisting
1. Install globally:
npm install -g omniroute
omniroute🎉 Dashboard opens at http://localhost:20128
| Command | Description |
|---|---|
omniroute |
Start server (PORT=20128, API and dashboard on same port) |
omniroute --port 3000 |
Set canonical/API port to 3000 |
omniroute --no-open |
Don't auto-open browser |
omniroute --help |
Show help |
Optional split-port mode:
PORT=20128 DASHBOARD_PORT=20129 omniroute
# API: http://localhost:20128/v1
# Dashboard: http://localhost:20129When ports are split, the API port serves only OpenAI-compatible routes (/v1, /chat/completions, /responses, /models, /codex/*).
2. Connect a FREE provider:
Dashboard → Providers → Connect Claude Code or Antigravity → OAuth login → Done!
3. Use in your CLI tool:
Claude Code/Codex/Gemini CLI/OpenClaw/Cursor/Cline Settings:
Endpoint: http://localhost:20128/v1
API Key: [copy from dashboard]
Model: if/kimi-k2-thinking
That's it! Start coding with FREE AI models.
Alternative — run from source:
cp .env.example .env
npm install
PORT=20128 DASHBOARD_PORT=20129 NEXT_PUBLIC_BASE_URL=http://localhost:20129 npm run devOmniRoute is available as a public Docker image on Docker Hub.
Quick run:
docker run -d \
--name omniroute \
--restart unless-stopped \
-p 20128:20128 \
-v omniroute-data:/app/data \
diegosouzapw/omniroute:latestWith environment file:
# Copy and edit .env first
cp .env.example .env
docker run -d \
--name omniroute \
--restart unless-stopped \
--env-file .env \
-p 20128:20128 \
-v omniroute-data:/app/data \
diegosouzapw/omniroute:latestUsing Docker Compose:
# Base profile (no CLI tools)
docker compose --profile base up -d
# CLI profile (Claude Code, Codex, OpenClaw built-in)
docker compose --profile cli up -d| Image | Tag | Size | Description |
|---|---|---|---|
diegosouzapw/omniroute |
latest |
~250MB | Latest stable release |
diegosouzapw/omniroute |
1.0.3 |
~250MB | Current version |
🆕 NEW! OmniRoute is now available as a native desktop application for Windows, macOS, and Linux.
Run OmniRoute as a standalone desktop app — no terminal, no browser, no internet required for local models. The Electron-based app includes:
- 🖥️ Native Window — Dedicated app window with system tray integration
- 🔄 Auto-Start — Launch OmniRoute on system login
- 🔔 Native Notifications — Get alerts for quota exhaustion or provider issues
- ⚡ One-Click Install — NSIS (Windows), DMG (macOS), AppImage (Linux)
- 🌐 Offline Mode — Works fully offline with bundled server
# Development mode
npm run electron:dev
# Build for your platform
npm run electron:build # Current platform
npm run electron:build:win # Windows (.exe)
npm run electron:build:mac # macOS (.dmg) — x64 & arm64
npm run electron:build:linux # Linux (.AppImage)When minimized, OmniRoute lives in your system tray with quick actions:
- Open dashboard
- Change server port
- Quit application
📖 Full documentation: electron/README.md
| Tier | Provider | Cost | Quota Reset | Best For |
|---|---|---|---|---|
| 💳 SUBSCRIPTION | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed |
| Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users | |
| Gemini CLI | FREE | 180K/mo + 1K/day | Everyone! | |
| GitHub Copilot | $10-19/mo | Monthly | GitHub users | |
| 🔑 API KEY | NVIDIA NIM | FREE (1000 credits) | One-time | Free tier testing |
| DeepSeek | Pay-per-use | None | Best price/quality | |
| Groq | Free tier + paid | Rate limited | Ultra-fast inference | |
| xAI (Grok) | Pay-per-use | None | Grok models | |
| Mistral | Free tier + paid | Rate limited | European AI | |
| OpenRouter | Pay-per-use | None | 100+ models | |
| 💰 CHEAP | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup |
| MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option | |
| Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost | |
| 🆓 FREE | iFlow | $0 | Unlimited | 8 models free |
| Qwen | $0 | Unlimited | 3 models free | |
| Kiro | $0 | Unlimited | Claude free |
💡 Pro Tip: Start with Gemini CLI (180K free/month) + iFlow (unlimited free) combo = $0 cost!
| Feature | What It Does |
|---|---|
| 🎯 Smart 4-Tier Fallback | Auto-route: Subscription → API Key → Cheap → Free |
| 📊 Real-Time Quota Tracking | Live token count + reset countdown per provider |
| 🔄 Format Translation | OpenAI ↔ Claude ↔ Gemini ↔ Cursor ↔ Kiro seamless + response sanitization |
| 👥 Multi-Account Support | Multiple accounts per provider with intelligent selection |
| 🔄 Auto Token Refresh | OAuth tokens refresh automatically with retry |
| 🎨 Custom Combos | 6 strategies: fill-first, round-robin, p2c, random, least-used, cost-optimized |
| 🧩 Custom Models | Add any model ID to any provider |
| 🌐 Wildcard Router | Route provider/* patterns to any provider dynamically |
| 🧠 Thinking Budget | Passthrough, auto, custom, and adaptive modes for reasoning models |
| 🔀 Model Aliases | Auto-forward deprecated model IDs to current replacements (built-in + custom) |
| ⚡ Background Degradation | Auto-route background tasks (titles, summaries) to cheaper models |
| 💬 System Prompt Injection | Global system prompt applied across all requests |
| 📄 Responses API | Full OpenAI Responses API (/v1/responses) support for Codex |
| Feature | What It Does |
|---|---|
| 🖼️ Image Generation | /v1/images/generations — 10 providers, 20+ models (cloud + local) |
| 📐 Embeddings | /v1/embeddings — 6 providers, 9+ models |
| 🎤 Audio Transcription | /v1/audio/transcriptions — Whisper + Nvidia NIM, HuggingFace, Qwen3 |
| 🔊 Text-to-Speech | /v1/audio/speech — ElevenLabs, Nvidia NIM, HuggingFace, Coqui, Tortoise, Qwen3 |
| 🎬 Video Generation | /v1/videos/generations — ComfyUI (AnimateDiff, SVD), SD WebUI |
| 🎵 Music Generation | /v1/music/generations — ComfyUI (Stable Audio Open, MusicGen) |
| 🛡️ Moderations | /v1/moderations — Content safety checks |
| 🔀 Reranking | /v1/rerank — Document relevance reranking |
| Feature | What It Does |
|---|---|
| 🔌 Circuit Breaker | Auto-open/close per-provider with configurable thresholds |
| 🛡️ Anti-Thundering Herd | Mutex + semaphore rate-limit for API key providers |
| 🧠 Semantic Cache | Two-tier cache (signature + semantic) reduces cost & latency |
| ⚡ Request Idempotency | 5s dedup window for duplicate requests |
| 🔒 TLS Fingerprint Spoofing | Bypass TLS-based bot detection via wreq-js |
| 🌐 IP Filtering | Allowlist/blocklist for API access control |
| 📊 Editable Rate Limits | Configurable RPM, min gap, and max concurrent at system level |
| 💾 Rate Limit Persistence | Learned limits survive restarts via SQLite with 60s debounce + 24h staleness |
| 🔄 Token Refresh Resilience | Per-provider circuit breaker (5 fails→30min) + 30s timeout per attempt |
| 🛡 API Endpoint Protection | Auth gating + provider blocking for the /models endpoint |
| 🔒 Proxy Visibility | Color-coded badges: 🟢 global, 🟡 provider, 🔵 per-connection with IP display |
| 🌐 3-Level Proxy Config | Configure proxies at global, per-provider, or per-connection level |
| Feature | What It Does |
|---|---|
| 📝 Request Logging | Debug mode with full request/response logs |
| 💾 SQLite Proxy Logs | Persistent proxy logs survive server restarts |
| 📊 Analytics Dashboard | Recharts-powered: stat cards, model usage chart, provider table |
| 📈 Progress Tracking | Opt-in SSE progress events for streaming |
| 🧪 LLM Evaluations | Golden set testing with 4 match strategies |
| 🔍 Request Telemetry | p50/p95/p99 latency aggregation + X-Request-Id tracing |
| 📋 Logs Dashboard | Unified 4-tab page: Request Logs, Proxy Logs, Audit Logs, Console |
| 🖥️ Console Log Viewer | Real-time terminal-style viewer with level filter, search, auto-scroll |
| 📑 File-Based Logging | Console interceptor captures all output to JSON log file with rotation |
| 🏥 Health Dashboard | System uptime, circuit breaker states, lockouts, cache stats |
| 💰 Cost Tracking | Budget management + per-model pricing configuration |
| Feature | What It Does |
|---|---|
| 💾 Cloud Sync | Sync config across devices via Cloudflare Workers |
| 🌐 Deploy Anywhere | Localhost, VPS, Docker, Cloudflare Workers |
| 🔑 API Key Management | Generate, rotate, and scope API keys per provider |
| 🧙 Onboarding Wizard | 4-step guided setup for first-time users |
| 🔧 CLI Tools Dashboard | One-click configure Claude, Codex, Cline, OpenClaw, Kilo, Antigravity |
| 🔄 DB Backups | Automatic backup, restore, export & import for all settings |
| 🌐 Internationalization | Full i18n with next-intl — 30 languages including RTL support |
| 🌍 Language Selector | Globe icon in header for real-time switching between 30 languages |
| 📂 Custom Data Directory | DATA_DIR env var to override default ~/.omniroute storage path |
📖 Feature Details
Create combos with automatic fallback:
Combo: "my-coding-stack"
1. cc/claude-opus-4-6 (your subscription)
2. nvidia/llama-3.3-70b (free NVIDIA API)
3. glm/glm-4.7 (cheap backup, $0.6/1M)
4. if/kimi-k2-thinking (free fallback)
→ Auto switches when quota runs out or errors occur
- Token consumption per provider
- Reset countdown (5-hour, daily, weekly)
- Cost estimation for paid tiers
- Monthly spending reports
Seamless translation between formats:
- OpenAI ↔ Claude ↔ Gemini ↔ OpenAI Responses
- Your CLI tool sends OpenAI format → OmniRoute translates → Provider receives native format
- Works with any tool that supports custom OpenAI endpoints
- Response sanitization — Strips non-standard fields for strict OpenAI SDK compatibility
- Role normalization —
developer→systemfor non-OpenAI;system→userfor GLM/ERNIE models - Think tag extraction —
<think>blocks →reasoning_contentfor thinking models - Structured output —
json_schema→ Gemini'sresponseMimeType/responseSchema
- Add multiple accounts per provider
- Auto round-robin or priority-based routing
- Fallback to next account when one hits quota
- OAuth tokens automatically refresh before expiration
- No manual re-authentication needed
- Seamless experience across all providers
- Create unlimited model combinations
- 6 strategies: fill-first, round-robin, power-of-two-choices, random, least-used, cost-optimized
- Share combos across devices with Cloud Sync
- System status (uptime, version, memory usage)
- Circuit breaker states per provider (Closed/Open/Half-Open)
- Rate limit status and active lockouts
- Signature cache statistics
- Latency telemetry (p50/p95/p99) + prompt cache
- Reset health status with one click
OmniRoute includes a powerful built-in Translator Playground with 4 modes for debugging, testing, and monitoring API translations:
| Mode | Description |
|---|---|
| 💻 Playground | Direct format translation — paste any API request body and instantly see how OmniRoute translates it between provider formats (OpenAI ↔ Claude ↔ Gemini ↔ Responses API). Includes example templates and format auto-detection. |
| 💬 Chat Tester | Send real chat requests through OmniRoute and see the full round-trip: your input, the translated request, the provider response, and the translated response back. Invaluable for validating combo routing. |
| 🧪 Test Bench | Batch testing mode — define multiple test cases with different inputs and expected outputs, run them all at once, and compare results across providers and models. |
| 📱 Live Monitor | Real-time request monitoring — watch incoming requests as they flow through OmniRoute, see format translations happening live, and identify issues instantly. |
Access: Dashboard → Translator (sidebar)
- Sync providers, combos, and settings across devices
- Automatic background sync
- Secure encrypted storage
Problem: Quota expires unused, rate limits during heavy coding
Combo: "maximize-claude"
1. cc/claude-opus-4-6 (use subscription fully)
2. glm/glm-4.7 (cheap backup when quota out)
3. if/kimi-k2-thinking (free emergency fallback)
Monthly cost: $20 (subscription) + ~$5 (backup) = $25 total
vs. $20 + hitting limits = frustration
Problem: Can't afford subscriptions, need reliable AI coding
Combo: "free-forever"
1. gc/gemini-3-flash (180K free/month)
2. if/kimi-k2-thinking (unlimited free)
3. qw/qwen3-coder-plus (unlimited free)
Monthly cost: $0
Quality: Production-ready models
Problem: Deadlines, can't afford downtime
Combo: "always-on"
1. cc/claude-opus-4-6 (best quality)
2. cx/gpt-5.2-codex (second subscription)
3. glm/glm-4.7 (cheap, resets daily)
4. minimax/MiniMax-M2.1 (cheapest, 5h reset)
5. if/kimi-k2-thinking (free unlimited)
Result: 5 layers of fallback = zero downtime
Problem: Need AI assistant in messaging apps, completely free
Combo: "openclaw-free"
1. if/glm-4.7 (unlimited free)
2. if/minimax-m2.1 (unlimited free)
3. if/kimi-k2-thinking (unlimited free)
Monthly cost: $0
Access via: WhatsApp, Telegram, Slack, Discord, iMessage, Signal...
💳 Subscription Providers
Dashboard → Providers → Connect Claude Code
→ OAuth login → Auto token refresh
→ 5-hour + weekly quota tracking
Models:
cc/claude-opus-4-6
cc/claude-sonnet-4-5-20250929
cc/claude-haiku-4-5-20251001Pro Tip: Use Opus for complex tasks, Sonnet for speed. OmniRoute tracks quota per model!
Dashboard → Providers → Connect Codex
→ OAuth login (port 1455)
→ 5-hour + weekly reset
Models:
cx/gpt-5.2-codex
cx/gpt-5.1-codex-maxDashboard → Providers → Connect Gemini CLI
→ Google OAuth
→ 180K completions/month + 1K/day
Models:
gc/gemini-3-flash-preview
gc/gemini-2.5-proBest Value: Huge free tier! Use this before paid tiers.
Dashboard → Providers → Connect GitHub
→ OAuth via GitHub
→ Monthly reset (1st of month)
Models:
gh/gpt-5
gh/claude-4.5-sonnet
gh/gemini-3-pro🔑 API Key Providers
- Sign up: build.nvidia.com
- Get free API key (1000 inference credits included)
- Dashboard → Add Provider → NVIDIA NIM:
- API Key:
nvapi-your-key
- API Key:
Models: nvidia/llama-3.3-70b-instruct, nvidia/mistral-7b-instruct, and 50+ more
Pro Tip: OpenAI-compatible API — works seamlessly with OmniRoute's format translation!
- Sign up: platform.deepseek.com
- Get API key
- Dashboard → Add Provider → DeepSeek
Models: deepseek/deepseek-chat, deepseek/deepseek-coder
- Sign up: console.groq.com
- Get API key (free tier included)
- Dashboard → Add Provider → Groq
Models: groq/llama-3.3-70b, groq/mixtral-8x7b
Pro Tip: Ultra-fast inference — best for real-time coding!
- Sign up: openrouter.ai
- Get API key
- Dashboard → Add Provider → OpenRouter
Models: Access 100+ models from all major providers through a single API key.
💰 Cheap Providers (Backup)
- Sign up: Zhipu AI
- Get API key from Coding Plan
- Dashboard → Add API Key:
- Provider:
glm - API Key:
your-key
- Provider:
Use: glm/glm-4.7
Pro Tip: Coding Plan offers 3× quota at 1/7 cost! Reset daily 10:00 AM.
- Sign up: MiniMax
- Get API key
- Dashboard → Add API Key
Use: minimax/MiniMax-M2.1
Pro Tip: Cheapest option for long context (1M tokens)!
- Subscribe: Moonshot AI
- Get API key
- Dashboard → Add API Key
Use: kimi/kimi-latest
Pro Tip: Fixed $9/month for 10M tokens = $0.90/1M effective cost!
🆓 FREE Providers (Emergency Backup)
Dashboard → Connect iFlow
→ iFlow OAuth login
→ Unlimited usage
Models:
if/kimi-k2-thinking
if/qwen3-coder-plus
if/glm-4.7
if/minimax-m2
if/deepseek-r1Dashboard → Connect Qwen
→ Device code authorization
→ Unlimited usage
Models:
qw/qwen3-coder-plus
qw/qwen3-coder-flashDashboard → Connect Kiro
→ AWS Builder ID or Google/GitHub
→ Unlimited usage
Models:
kr/claude-sonnet-4.5
kr/claude-haiku-4.5🎨 Create Combos
Dashboard → Combos → Create New
Name: premium-coding
Models:
1. cc/claude-opus-4-6 (Subscription primary)
2. glm/glm-4.7 (Cheap backup, $0.6/1M)
3. minimax/MiniMax-M2.1 (Cheapest fallback, $0.20/1M)
Use in CLI: premium-coding
Name: free-combo
Models:
1. gc/gemini-3-flash-preview (180K free/month)
2. if/kimi-k2-thinking (unlimited)
3. qw/qwen3-coder-plus (unlimited)
Cost: $0 forever!
🔧 CLI Integration
Settings → Models → Advanced:
OpenAI API Base URL: http://localhost:20128/v1
OpenAI API Key: [from OmniRoute dashboard]
Model: cc/claude-opus-4-6
Use the CLI Tools page in the dashboard for one-click configuration, or edit ~/.claude/settings.json manually.
export OPENAI_BASE_URL="http://localhost:20128"
export OPENAI_API_KEY="your-omniroute-api-key"
codex "your prompt"Option 1 — Dashboard (recommended):
Dashboard → CLI Tools → OpenClaw → Select Model → Apply
Option 2 — Manual: Edit ~/.openclaw/openclaw.json:
{
"models": {
"providers": {
"omniroute": {
"baseUrl": "http://127.0.0.1:20128/v1",
"apiKey": "sk_omniroute",
"api": "openai-completions"
}
}
}
}Note: OpenClaw only works with local OmniRoute. Use
127.0.0.1instead oflocalhostto avoid IPv6 resolution issues.
Settings → API Configuration:
Provider: OpenAI Compatible
Base URL: http://localhost:20128/v1
API Key: [from OmniRoute dashboard]
Model: if/kimi-k2-thinking
Step 1: Add OmniRoute as a custom provider:
opencode
/connect
# Select "Other" → Enter ID: "omniroute" → Enter your OmniRoute API keyStep 2: Create/edit opencode.json in your project root:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"omniroute": {
"npm": "@ai-sdk/openai-compatible",
"name": "OmniRoute",
"options": {
"baseURL": "http://localhost:20128/v1"
},
"models": {
"cc/claude-sonnet-4-20250514": { "name": "Claude Sonnet 4" },
"gg/gemini-2.5-pro": { "name": "Gemini 2.5 Pro" },
"if/kimi-k2-thinking": { "name": "Kimi K2 (Free)" }
}
}
}
}Step 3: Select the model in OpenCode:
/models
# Select any OmniRoute model from the listTip: Add any model available in your OmniRoute
/v1/modelsendpoint to themodelssection. Use the formatprovider/model-idfrom your OmniRoute dashboard.
OmniRoute includes a built-in evaluation framework to test LLM response quality against a golden set. Access it via Analytics → Evals in the dashboard.
The pre-loaded "OmniRoute Golden Set" contains 10 test cases covering:
- Greetings, math, geography, code generation
- JSON format compliance, translation, markdown
- Safety refusal (harmful content), counting, boolean logic
| Strategy | Description | Example |
|---|---|---|
exact |
Output must match exactly | "4" |
contains |
Output must contain substring (case-insensitive) | "Paris" |
regex |
Output must match regex pattern | "1.*2.*3" |
custom |
Custom JS function returns true/false | (output) => output.length > 10 |
Click to expand troubleshooting guide
"Language model did not provide messages"
- Provider quota exhausted → Check dashboard quota tracker
- Solution: Use combo fallback or switch to cheaper tier
Rate limiting
- Subscription quota out → Fallback to GLM/MiniMax
- Add combo:
cc/claude-opus-4-6 → glm/glm-4.7 → if/kimi-k2-thinking
OAuth token expired
- Auto-refreshed by OmniRoute
- If issues persist: Dashboard → Provider → Reconnect
High costs
- Check usage stats in Dashboard → Costs
- Switch primary model to GLM/MiniMax
- Use free tier (Gemini CLI, iFlow) for non-critical tasks
Dashboard/API ports are wrong
PORTis the canonical base port (and API port by default)API_PORToverrides only OpenAI-compatible API listenerDASHBOARD_PORToverrides only dashboard/Next.js listener- Set
NEXT_PUBLIC_BASE_URLto your dashboard/public URL (for OAuth callbacks)
Cloud sync errors
- Verify
BASE_URLpoints to your running instance - Verify
CLOUD_URLpoints to your expected cloud endpoint - Keep
NEXT_PUBLIC_*values aligned with server-side values
First login not working
- Check
INITIAL_PASSWORDin.env - If unset, fallback password is
123456
No request logs
- Set
ENABLE_REQUEST_LOGS=truein.env
Connection test shows "Invalid" for OpenAI-compatible providers
- Many providers don't expose a
/modelsendpoint - OmniRoute v1.0.6+ includes fallback validation via chat completions
- Ensure base URL includes
/v1suffix
⚠️ IMPORTANTE para usuários com OmniRoute em VPS/Docker/servidor remoto
Os provedores Antigravity e Gemini CLI usam Google OAuth 2.0 para autenticação. O Google exige que a redirect_uri usada no fluxo OAuth seja exatamente uma das URIs pré-cadastradas no Google Cloud Console do aplicativo.
As credenciais OAuth embutidas no OmniRoute estão cadastradas apenas para localhost. Quando você acessa o OmniRoute em um servidor remoto (ex: https://omniroute.meuservidor.com), o Google rejeita a autenticação com:
Error 400: redirect_uri_mismatch
Você precisa criar um OAuth 2.0 Client ID no Google Cloud Console com a URI do seu servidor.
1. Acesse o Google Cloud Console
Abra: https://console.cloud.google.com/apis/credentials
2. Crie um novo OAuth 2.0 Client ID
- Clique em "+ Create Credentials" → "OAuth client ID"
- Tipo de aplicativo: "Web application"
- Nome: escolha qualquer nome (ex:
OmniRoute Remote)
3. Adicione as Authorized Redirect URIs
No campo "Authorized redirect URIs", adicione:
https://seu-servidor.com/callback
Substitua
seu-servidor.compelo domínio ou IP do seu servidor (inclua a porta se necessário, ex:http://45.33.32.156:20128/callback).
4. Salve e copie as credenciais
Após criar, o Google mostrará o Client ID e o Client Secret.
5. Configure as variáveis de ambiente
No seu .env (ou nas variáveis de ambiente do Docker):
# Para Antigravity:
ANTIGRAVITY_OAUTH_CLIENT_ID=seu-client-id.apps.googleusercontent.com
ANTIGRAVITY_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret
# Para Gemini CLI:
GEMINI_OAUTH_CLIENT_ID=seu-client-id.apps.googleusercontent.com
GEMINI_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret
GEMINI_CLI_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret6. Reinicie o OmniRoute
# Se usando npm:
npm run dev
# Se usando Docker:
docker restart omniroute7. Tente conectar novamente
Dashboard → Providers → Antigravity (ou Gemini CLI) → OAuth
Agora o Google redirecionará corretamente para https://seu-servidor.com/callback e a autenticação funcionará.
Se não quiser criar credenciais próprias agora, ainda é possível usar o fluxo manual de URL:
- O OmniRoute abrirá a URL de autorização do Google
- Após você autorizar, o Google tentará redirecionar para
localhost(que falha no servidor remoto) - Copie a URL completa da barra de endereço do seu browser (mesmo que a página não carregue)
- Cole essa URL no campo que aparece no modal de conexão do OmniRoute
- Clique em "Connect"
Este workaround funciona porque o código de autorização na URL é válido independente do redirect ter carregado ou não.
- Runtime: Node.js 18–22 LTS (
⚠️ Node.js 24+ is not supported —better-sqlite3native binaries are incompatible) - Language: TypeScript 5.9 — 100% TypeScript across
src/andopen-sse/(v1.0.6) - Framework: Next.js 16 + React 19 + Tailwind CSS 4
- Database: LowDB (JSON) + SQLite (domain state + proxy logs)
- Streaming: Server-Sent Events (SSE)
- Auth: OAuth 2.0 (PKCE) + JWT + API Keys
- Testing: Node.js test runner (368+ unit tests)
- CI/CD: GitHub Actions (auto npm publish + Docker Hub on release)
- Website: omniroute.online
- Package: npmjs.com/package/omniroute
- Docker: hub.docker.com/r/diegosouzapw/omniroute
- Resilience: Circuit breaker, exponential backoff, anti-thundering herd, TLS spoofing
| Document | Description |
|---|---|
| User Guide | Providers, combos, CLI integration, deployment |
| API Reference | All endpoints with examples |
| Troubleshooting | Common problems and solutions |
| Architecture | System architecture and internals |
| Contributing | Development setup and guidelines |
| OpenAPI Spec | OpenAPI 3.0 specification |
| Security Policy | Vulnerability reporting and security practices |
| VM Deployment | Complete guide: VM + nginx + Cloudflare setup |
| Features Gallery | Visual dashboard tour with screenshots |
Click to see dashboard screenshots
| Page | Screenshot |
|---|---|
| Providers | ![]() |
| Combos | ![]() |
| Analytics | ![]() |
| Health | ![]() |
| Translator | ![]() |
| Settings | ![]() |
| CLI Tools | ![]() |
| Usage Logs | ![]() |
| Endpoint | ![]() |
OmniRoute has 210+ features planned across multiple development phases. Here are the key areas:
| Category | Planned Features | Highlights |
|---|---|---|
| 🧠 Routing & Intelligence | 25+ | Lowest-latency routing, tag-based routing, quota preflight, P2C account selection |
| 🔒 Security & Compliance | 20+ | SSRF hardening, credential cloaking, rate-limit per endpoint, management key scoping |
| 📊 Observability | 15+ | OpenTelemetry integration, real-time quota monitoring, cost tracking per model |
| 🔄 Provider Integrations | 20+ | Dynamic model registry, provider cooldowns, multi-account Codex, Copilot quota parsing |
| ⚡ Performance | 15+ | Dual cache layer, prompt cache, response cache, streaming keepalive, batch API |
| 🌐 Ecosystem | 10+ | WebSocket API, config hot-reload, distributed config store, commercial mode |
- 🔗 OpenCode Integration — Native provider support for the OpenCode AI coding IDE
- 🔗 TRAE Integration — Full support for the TRAE AI development framework
- 📦 Batch API — Asynchronous batch processing for bulk requests
- 🎯 Tag-Based Routing — Route requests based on custom tags and metadata
- 💰 Lowest-Cost Strategy — Automatically select the cheapest available provider
📝 Full feature specifications available in
docs/new-features/(217 detailed specs)
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
# Create a release — npm publish happens automatically
gh release create v1.0.6 --title "v1.0.6" --generate-notesSpecial thanks to 9router by decolua — the original project that inspired this fork. OmniRoute builds upon that incredible foundation with additional features, multi-modal APIs, and a full TypeScript rewrite.
Special thanks to CLIProxyAPI — the original Go implementation that inspired this JavaScript port.
MIT License - see LICENSE for details.
omniroute.online








