Yule Endpoints

GET /yule/health

Server health check.

curl -H "Authorization: Bearer $TOKEN" http://localhost:11434/yule/health

{
  "status": "healthy",
  "version": "0.1.0",
  "uptime_seconds": 42,
  "model": "tinyllama",
  "architecture": "Llama",
  "sandbox": true
}

GET /yule/model

Full model metadata.

curl -H "Authorization: Bearer $TOKEN" http://localhost:11434/yule/model

{
  "name": "tinyllama",
  "architecture": "Llama",
  "parameters": "1.1B",
  "context_length": 2048,
  "embedding_dim": 2048,
  "layers": 22,
  "vocab_size": 32000,
  "tensor_count": 201,
  "merkle_root": "ffc7e1fd6016a6f9ba2ca390a43681453a46ec6054f431aeb6244487932b0e65"
}

POST /yule/chat

Chat completion with integrity proof.

Request

{
  "messages": [
    { "role": "system", "content": "You are helpful." },
    { "role": "user", "content": "Hello" }
  ],
  "max_tokens": 100,
  "temperature": 0.7,
  "top_p": 0.9,
  "stream": false
}

All fields except messages are optional. Defaults: max_tokens: 512, temperature: 0.7, top_p: 0.9, stream: false.

Response

{
  "id": "yule-18947c080149320c",
  "text": "Hello! How can I help you today?",
  "finish_reason": "stop",
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 8,
    "total_tokens": 23
  },
  "integrity": {
    "model_merkle_root": "ffc7e1fd6016a6f9ba2ca390a43681453a46ec6054f431aeb6244487932b0e65",
    "model_verified": true,
    "sandbox_active": true
  },
  "timing": {
    "prefill_ms": 1200.5,
    "decode_ms": 2400.3,
    "tokens_per_second": 3.33
  }
}

The integrity and timing fields are what make this endpoint different from OpenAI-compatible. The Merkle root lets you verify the loaded model matches the one you verified with yule verify.

Streaming

Set "stream": true to get SSE events. See Streaming.

POST /yule/tokenize

Tokenize text without running inference.

Request

{
  "text": "Hello world"
}

Response

{
  "tokens": [15043, 3186],
  "count": 2
}

Yule Endpoints

On this page