Model Registry
Downloading models manually was fine for me, useless for anyone else.
Model Registry — Pull, Cache, Run
Before this, using Yule meant finding a GGUF file somewhere, downloading it manually, and passing the full path to yule run. Fine for me, useless for anyone else.
HuggingFace Client
The registry talks to HuggingFace's HTTP API to discover and download GGUF models. It can enumerate files in a repo, filter for .gguf files, and grab file metadata (size, download URL).
No authentication required for public models. HuggingFace tokens are supported for gated models but not enforced.
Quantization Detection
Given a filename like tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf, the registry extracts the quantization label (Q4_K_M) via pattern matching on the filename. GGUF files don't always embed quant info in metadata, but the filename convention is near-universal.
Download and Cache
Async download via reqwest with progress tracking. Downloads write to a temp file and atomically rename on completion — no partial files polluting the cache.
Models are cached at ~/.yule/models/{publisher}/{repo}/{filename}. A cache.json sidecar stores metadata: download timestamp, file size, source URL.
Model references work in two formats:
- Registry reference:
publisher/repo— resolves to HuggingFace - Local path:
./model.ggufor/absolute/path.gguf— bypasses registry entirely
Merkle on Pull
When a model is downloaded, I immediately compute the blake3 Merkle root over the tensor data. This root is stored in the result and can be compared against a manifest's expected root. yule pull + yule verify gives end-to-end integrity.
CLI
yule pull TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF # download + merkle
yule list # show cached modelsNotes
HuggingFace first because that's where 90% of GGUF models live. TheBloke, bartowski, etc. A custom registry protocol later.
No parallel chunked downloads yet — one HTTP GET, stream to disk. Range headers add chunk coordination and partial failure recovery for a marginal speedup on models under 40GB.
cache.json instead of SQLite — one JSON file per model, human-readable. For 3-10 cached models a directory walk is instant.