yule run

Usage

yule run <model> --prompt <text> [options]

Arguments

Argument	Description
`model`	Path to a `.gguf` model file

Options

Flag	Default	Description
`--prompt <text>`	Required	The input prompt
`--max-tokens <n>`	`512`	Maximum tokens to generate
`--temperature <f>`	`0.7`	Sampling temperature (0.0 = greedy, higher = more random)
`--no-sandbox`	`false`	Disable process sandboxing

Examples

# basic inference
yule run ./model.gguf --prompt "What is Rust?"

# deterministic output
yule run ./model.gguf --prompt "Translate to French: Hello" --temperature 0.0

# short response
yule run ./model.gguf --prompt "One word for happiness:" --max-tokens 5

Output

Tokens stream to stdout as they're generated. Model loading stats and generation metrics print to stderr, so you can pipe the output cleanly:

yule run ./model.gguf --prompt "Hello" > output.txt

The stderr output includes parse time, tokenizer stats, prefill time, and decode throughput (tok/s).