Long context
For inputs > 64k tokens, choose a model with a wide context window.
| Window | Model | $/M input |
|---|---|---|
| 128k | kimi-k2.5, glm-4.6, qwen3-coder | $0.30-0.60 |
| 200k | claude-opus-4-7, claude-sonnet-4-6, o1 | $3-15 |
| 256k | qwen3-max | $0.60 |
| 2M | gemini-3-pro | $1.25 |
Tips for cost savings
- Most documents are < 50k tokens — use
deepseek-v3.2(64k window) by default - Switch to long-context only when truly needed
- Use prompt caching when supported (Anthropic + DeepSeek + GPT-4o): cached prefix prompts cost ~10% of normal pricing on rerun
Context limit behavior
If you exceed the window:
{
"error": {
"type": "context_length_exceeded",
"message": "Input tokens (130000) exceed model max (128000)"
}
}Routify won't truncate silently. Trim or switch models on your end.