Getting Started
Quick Start
Run your first inference with Yule
Run Inference
yule run ./tinyllama.gguf --prompt "Explain gravity in one sentence"Output streams to stdout as tokens are generated. Stats print to stderr when done:
loaded: Llama (1.1B, 22 layers, dim 2048)
parse: 15.2ms
tokenizer: 32000 tokens, loaded in 5.4ms
weights: mapped in 0.1ms
prompt: 12 tokens
prefill: 6364.7ms (1.9 tok/s)
Gravity is the force that attracts objects with mass toward each other.
generated: 14 tokens in 4200.0ms (3.33 tok/s)Start the API Server
yule serve ./tinyllama.ggufThe server prints a capability token to stderr:
loading model: ./tinyllama.gguf
model loaded: Llama (201 tensors, merkle: ffc7e1fd6016a6f9)
token: yule_b49913e2c05162951af4f87d62c2c9a6555eb91299c7fdcc
listening on 127.0.0.1:11434
yule api: http://127.0.0.1:11434/yule/health
openai: http://127.0.0.1:11434/v1/chat/completionsAll API requests require the token as a Bearer token:
curl -H "Authorization: Bearer yule_b499..." http://localhost:11434/yule/health{
"status": "healthy",
"version": "0.1.0",
"uptime_seconds": 27,
"model": "tinyllama",
"architecture": "Llama",
"sandbox": false
}What's Next
- CLI Reference for all flags and options
- API Reference for the full endpoint documentation
- Security for how sandbox and verification work