Long context

For inputs > 64k tokens, choose a model with a wide context window.

Tips for cost savings

Most documents are < 50k tokens — use deepseek-v3.2 (64k window) by default
Switch to long-context only when truly needed
Use prompt caching when supported (Anthropic + DeepSeek + GPT-4o): cached prefix prompts cost ~10% of normal pricing on rerun

If you exceed the window:

{
  "error": {
    "type": "context_length_exceeded",
    "message": "Input tokens (130000) exceed model max (128000)"
  }
}

Routify won't truncate silently. Trim or switch models on your end.