Model Registry — Pull, Cache, Run

Before this, using Yule meant finding a GGUF file somewhere, downloading it manually, and passing the full path to yule run. Fine for me, useless for anyone else.

HuggingFace Client

The registry talks to HuggingFace's HTTP API to discover and download GGUF models. It can enumerate files in a repo, filter for .gguf files, and grab file metadata (size, download URL).

No authentication required for public models. HuggingFace tokens are supported for gated models but not enforced.

Given a filename like tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf, the registry extracts the quantization label (Q4_K_M) via pattern matching on the filename. GGUF files don't always embed quant info in metadata, but the filename convention is near-universal.

Download and Cache

Async download via reqwest with progress tracking. Downloads write to a temp file and atomically rename on completion — no partial files polluting the cache.

Models are cached at ~/.yule/models/{publisher}/{repo}/{filename}. A cache.json sidecar stores metadata: download timestamp, file size, source URL.

Model references work in two formats:

Registry reference: publisher/repo — resolves to HuggingFace
Local path: ./model.gguf or /absolute/path.gguf — bypasses registry entirely

Merkle on Pull

When a model is downloaded, I immediately compute the blake3 Merkle root over the tensor data. This root is stored in the result and can be compared against a manifest's expected root. yule pull + yule verify gives end-to-end integrity.

CLI

yule pull TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF   # download + merkle
yule list                                            # show cached models

Notes

HuggingFace first because that's where 90% of GGUF models live. TheBloke, bartowski, etc. A custom registry protocol later.

No parallel chunked downloads yet — one HTTP GET, stream to disk. Range headers add chunk coordination and partial failure recovery for a marginal speedup on models under 40GB.

cache.json instead of SQLite — one JSON file per model, human-readable. For 3-10 cached models a directory walk is instant.

Model Registry

Model Registry — Pull, Cache, Run

HuggingFace Client

Quantization Detection

Download and Cache

Merkle on Pull

CLI

Notes

On this page