API
Streaming
Server-Sent Events formats for real-time token streaming
Both /yule/chat and /v1/chat/completions support SSE streaming when "stream": true.
Yule-Native Streaming
POST /yule/chat with "stream": true returns SSE events with typed payloads.
Token Event
Sent for each generated token:
data: {"type":"token","text":"Hello"}Done Event
Sent when generation completes. Includes usage, integrity proof, and timing:
data: {"type":"done","usage":{"prompt_tokens":15,"completion_tokens":8,"total_tokens":23},"integrity":{"model_merkle_root":"ffc7e1fd...","model_verified":true,"sandbox_active":true},"timing":{"prefill_ms":1200.5,"decode_ms":2400.3,"tokens_per_second":3.33},"finish_reason":"stop"}Error Event
Sent if something goes wrong during generation:
data: {"type":"error","error":"decode_step failed: ..."}Full Event Schema
interface YuleStreamEvent {
type: "token" | "done" | "error";
text?: string; // present on "token"
usage?: Usage; // present on "done"
integrity?: IntegrityInfo; // present on "done"
timing?: TimingInfo; // present on "done"
finish_reason?: string; // present on "done"
error?: string; // present on "error"
}OpenAI Streaming
POST /v1/chat/completions with "stream": true returns standard OpenAI SSE chunks:
Content Chunk
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1739577600,"model":"yule","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}Final Chunk
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1739577600,"model":"yule","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}This format is compatible with the OpenAI SDK's streaming interface.