Guides
Streaming vs non-streaming

Streaming vs non-streaming

When to stream

  • Interactive UIs (chat, IDE assistants) — show tokens as they arrive
  • Long completions (> 30 sec) — avoid timeouts
  • Token-by-token logging — for monitoring or moderation

When NOT to stream

  • Batch jobs — overhead of SSE outweighs benefits
  • JSON-mode responses — easier to parse complete payload
  • Token-counting heuristics — non-stream returns usage reliably

Example

stream = client.chat.completions.create(
    model="deepseek-v3.2",
    messages=[{"role": "user", "content": "Write 5 ideas"}],
    stream=True,
)
async for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Routify-specific notes

  • Latency: <50ms TTFB for streaming on most channels
  • Routify's gateway disables proxy_buffering automatically for SSE responses
  • We track total_tokens in the final [DONE] chunk's usage field