YuleYule
Getting Started

Quick Start

Run your first inference with Yule

Run Inference

yule run ./tinyllama.gguf --prompt "Explain gravity in one sentence"

Output streams to stdout as tokens are generated. Stats print to stderr when done:

loaded: Llama (1.1B, 22 layers, dim 2048)
parse: 15.2ms
tokenizer: 32000 tokens, loaded in 5.4ms
weights: mapped in 0.1ms

prompt: 12 tokens
prefill: 6364.7ms (1.9 tok/s)
Gravity is the force that attracts objects with mass toward each other.

generated: 14 tokens in 4200.0ms (3.33 tok/s)

Start the API Server

yule serve ./tinyllama.gguf

The server prints a capability token to stderr:

loading model: ./tinyllama.gguf
model loaded: Llama (201 tensors, merkle: ffc7e1fd6016a6f9)

  token: yule_b49913e2c05162951af4f87d62c2c9a6555eb91299c7fdcc

listening on 127.0.0.1:11434
  yule api:  http://127.0.0.1:11434/yule/health
  openai:    http://127.0.0.1:11434/v1/chat/completions

All API requests require the token as a Bearer token:

curl -H "Authorization: Bearer yule_b499..." http://localhost:11434/yule/health
{
  "status": "healthy",
  "version": "0.1.0",
  "uptime_seconds": 27,
  "model": "tinyllama",
  "architecture": "Llama",
  "sandbox": false
}

What's Next

On this page