Streaming vs non-streaming
When to stream
- Interactive UIs (chat, IDE assistants) — show tokens as they arrive
- Long completions (> 30 sec) — avoid timeouts
- Token-by-token logging — for monitoring or moderation
When NOT to stream
- Batch jobs — overhead of SSE outweighs benefits
- JSON-mode responses — easier to parse complete payload
- Token-counting heuristics — non-stream returns
usagereliably
Example
stream = client.chat.completions.create(
model="deepseek-v3.2",
messages=[{"role": "user", "content": "Write 5 ideas"}],
stream=True,
)
async for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")Routify-specific notes
- Latency:
<50ms TTFBfor streaming on most channels - Routify's gateway disables
proxy_bufferingautomatically for SSE responses - We track
total_tokensin the final[DONE]chunk'susagefield