Guides
Long context

Long context

For inputs > 64k tokens, choose a model with a wide context window.

WindowModel$/M input
128kkimi-k2.5, glm-4.6, qwen3-coder$0.30-0.60
200kclaude-opus-4-7, claude-sonnet-4-6, o1$3-15
256kqwen3-max$0.60
2Mgemini-3-pro$1.25

Tips for cost savings

  • Most documents are < 50k tokens — use deepseek-v3.2 (64k window) by default
  • Switch to long-context only when truly needed
  • Use prompt caching when supported (Anthropic + DeepSeek + GPT-4o): cached prefix prompts cost ~10% of normal pricing on rerun

Context limit behavior

If you exceed the window:

{
  "error": {
    "type": "context_length_exceeded",
    "message": "Input tokens (130000) exceed model max (128000)"
  }
}

Routify won't truncate silently. Trim or switch models on your end.