Rate Limits
Routify applies three layers of rate limits:
- Global — protects gateway from abuse (10w QPS aggregate)
- Per-user — by account tier
- Per-model — protects upstream provider quotas
User tier limits
| Tier | RPM | RPH | Tokens/day | Concurrent |
|---|---|---|---|---|
| Free | 60 | 600 | 100k | 2 |
| Pro | 300 | 10k | unlimited | 8 |
| VIP | 1500-3000 | 100k | unlimited | 32-64 |
| Enterprise | Custom | Custom | unlimited | 256 |
Model-specific limits (free + pro)
| Model | RPM cap | Concurrent |
|---|---|---|
deepseek-v3.2 | 1000 | 50 |
kimi-k2.5 | 800 | 30 |
claude-opus-4-7 | 50 | 5 |
claude-sonnet-4-6 | 200 | 15 |
gpt-4o | 200 | 15 |
o1 | 50 | 5 |
kling-1.6 | 30 | 3 (1k/day platform-wide) |
Hitting a limit
You'll get a 429 with Retry-After header (seconds):
HTTP/1.1 429 Too Many Requests
Retry-After: 30
{"error": {"type": "rate_limit", "message": "RPM exceeded: 300 / 60s"}}Standard OpenAI SDKs retry automatically.
Increasing limits
- Upgrade tier (auto-promote based on monthly spend)
- Contact us for enterprise quotas
Endpoint-specific limits
| Endpoint | Limit |
|---|---|
POST /v1/chat/completions | per-tier above |
POST /v1/embeddings | 1000 RPM, 100 concurrent |
POST /api/auth/login | 10 / IP / min |
POST /api/auth/register | 3 / IP / min |
POST /api/payment/recharge | 5 / user / min |