Yule
Secure-first local AI inference runtime. Pure Rust, zero cloud dependencies.
Yule is a local LLM inference runtime built from scratch in Rust. It runs GGUF models on your machine with cryptographic model verification, process sandboxing, and a full API server. No cloud, no telemetry, no trust assumptions.
Why Yule?
Every local inference tool treats security as an afterthought. GGUF parsers in C++ have had multiple CVEs. Model files get downloaded and executed with full system access. There's no way to verify a model hasn't been tampered with.
Yule takes a different approach: memory-safe parsers, blake3 Merkle verification over every tensor, kernel-enforced process sandboxing, and integrity proof in every API response.
Quick Links
Getting Started
- Install — Build from source, requirements
- Quick Start — Run your first inference
CLI
- yule run — Direct inference from the command line
- yule serve — Start the API server
- yule verify — Inspect and verify model files
API
- Overview — Auth, API surfaces, design philosophy
- Yule Endpoints —
/yule/*with integrity proof - OpenAI Endpoints —
/v1/*for tool compatibility - Streaming — SSE event formats
Architecture
- Overview — Crate structure, inference thread model
- Security — Sandbox, Merkle trees, auth
- Supported Models — Architectures, quantizations, SIMD