Yule

Yule is a local LLM inference runtime built from scratch in Rust. It runs GGUF models on your machine with cryptographic model verification, process sandboxing, and a full API server. No cloud, no telemetry, no trust assumptions.

Why Yule?

Every local inference tool treats security as an afterthought. GGUF parsers in C++ have had multiple CVEs. Model files get downloaded and executed with full system access. There's no way to verify a model hasn't been tampered with.

Yule takes a different approach: memory-safe parsers, blake3 Merkle verification over every tensor, kernel-enforced process sandboxing, and integrity proof in every API response.

Quick Links

Getting Started

Install — Build from source, requirements
Quick Start — Run your first inference

CLI

yule run — Direct inference from the command line
yule serve — Start the API server
yule verify — Inspect and verify model files

API

Overview — Auth, API surfaces, design philosophy
Yule Endpoints — /yule/* with integrity proof
OpenAI Endpoints — /v1/* for tool compatibility
Streaming — SSE event formats

Architecture

Overview — Crate structure, inference thread model
Security — Sandbox, Merkle trees, auth
Supported Models — Architectures, quantizations, SIMD

Yule

Why Yule?

Quick Links

Getting Started

CLI

API

Architecture

On this page