YuleYule
CLI

yule serve

Start the local API server

Usage

yule serve <model> [options]

Arguments

ArgumentDescription
modelPath to a .gguf model file

Options

FlagDefaultDescription
--bind <addr>127.0.0.1:11434Address and port to listen on
--token <token>Auto-generatedUse a specific auth token instead of generating one
--no-sandboxfalseDisable process sandboxing

What Happens on Start

  1. Model file is parsed and weights are memory-mapped
  2. Merkle tree is computed over all tensor data (blake3, 1MB leaves)
  3. Inference thread spawns with the model loaded
  4. Auth token is generated (or the provided one is registered)
  5. HTTP server starts listening

The server prints the token and endpoint URLs to stderr:

loading model: ./model.gguf
model loaded: Llama (201 tensors, merkle: ffc7e1fd6016a6f9)

  token: yule_b49913e2c05162951af4f87d62c2c9a6555eb91299c7fdcc

listening on 127.0.0.1:11434
  yule api:  http://127.0.0.1:11434/yule/health
  openai:    http://127.0.0.1:11434/v1/chat/completions

Auth

Every request must include the token as a Bearer token:

curl -H "Authorization: Bearer yule_b499..." http://localhost:11434/yule/health

Requests without a valid token get a 401 Unauthorized response.

Sandbox

By default, the server process is placed in a Windows Job Object sandbox with memory limits, no child process spawning, and UI restrictions. Use --no-sandbox to disable this (not recommended for untrusted models).

See Security for details.

Examples

# default settings
yule serve ./model.gguf

# custom port with a fixed token
yule serve ./model.gguf --bind 0.0.0.0:8080 --token my-secret-key

# no sandbox (development only)
yule serve ./model.gguf --no-sandbox

On this page