Streaming vs non-streaming

When to stream

Interactive UIs (chat, IDE assistants) — show tokens as they arrive
Long completions (> 30 sec) — avoid timeouts
Token-by-token logging — for monitoring or moderation

When NOT to stream

Batch jobs — overhead of SSE outweighs benefits
JSON-mode responses — easier to parse complete payload
Token-counting heuristics — non-stream returns usage reliably

Example

stream = client.chat.completions.create(
    model="deepseek-v3.2",
    messages=[{"role": "user", "content": "Write 5 ideas"}],
    stream=True,
)
async for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Routify-specific notes

Latency: <50ms TTFB for streaming on most channels
Routify's gateway disables proxy_buffering automatically for SSE responses
We track total_tokens in the final [DONE] chunk's usage field

Smart routing Tool calling