API Overview
Authentication, API surfaces, and design philosophy
Authentication
All endpoints require a Bearer token in the Authorization header:
Authorization: Bearer yule_b49913e2c05162951af4f87d62c2c9a6555eb91299c7fdccThe token is printed to stderr when the server starts. You can also provide your own token with --token.
Requests without a valid token receive 401 Unauthorized.
Two API Surfaces
Yule exposes two sets of endpoints:
Yule-Native (/yule/*)
Every response includes integrity proof: the model's Merkle root, verification status, and sandbox state. This is Yule's primary API. No other local inference server does this.
| Method | Path | Description |
|---|---|---|
| GET | /yule/health | Server status, uptime, model info |
| GET | /yule/model | Full model metadata and Merkle root |
| POST | /yule/chat | Chat with integrity proof and timing |
| POST | /yule/tokenize | Tokenize text, return token IDs |
OpenAI-Compatible (/v1/*)
Standard OpenAI format for drop-in compatibility with existing tools, SDKs, and UIs that speak the OpenAI protocol.
| Method | Path | Description |
|---|---|---|
| POST | /v1/chat/completions | Chat completions (streaming + non-streaming) |
| GET | /v1/models | List loaded models |
Why Both?
OpenAI-compatible endpoints exist so you can point any tool that supports a custom base URL at Yule without code changes. But they lose what makes Yule different: the integrity proof.
If you're building something that talks to Yule directly, use the Yule-native endpoints. You get Merkle roots, sandbox status, and timing data in every response.