Streaming

Both /yule/chat and /v1/chat/completions support SSE streaming when "stream": true.

Yule-Native Streaming

POST /yule/chat with "stream": true returns SSE events with typed payloads.

Token Event

Sent for each generated token:

data: {"type":"token","text":"Hello"}

Done Event

Sent when generation completes. Includes usage, integrity proof, and timing:

data: {"type":"done","usage":{"prompt_tokens":15,"completion_tokens":8,"total_tokens":23},"integrity":{"model_merkle_root":"ffc7e1fd...","model_verified":true,"sandbox_active":true},"timing":{"prefill_ms":1200.5,"decode_ms":2400.3,"tokens_per_second":3.33},"finish_reason":"stop"}

Error Event

Sent if something goes wrong during generation:

data: {"type":"error","error":"decode_step failed: ..."}

Full Event Schema

interface YuleStreamEvent {
  type: "token" | "done" | "error";
  text?: string;              // present on "token"
  usage?: Usage;              // present on "done"
  integrity?: IntegrityInfo;  // present on "done"
  timing?: TimingInfo;        // present on "done"
  finish_reason?: string;     // present on "done"
  error?: string;             // present on "error"
}

OpenAI Streaming

POST /v1/chat/completions with "stream": true returns standard OpenAI SSE chunks:

Content Chunk

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1739577600,"model":"yule","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

Final Chunk

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1739577600,"model":"yule","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

This format is compatible with the OpenAI SDK's streaming interface.

Streaming

On this page