CLI
yule run
Run inference directly from the command line
Usage
yule run <model> --prompt <text> [options]Arguments
| Argument | Description |
|---|---|
model | Path to a .gguf model file |
Options
| Flag | Default | Description |
|---|---|---|
--prompt <text> | Required | The input prompt |
--max-tokens <n> | 512 | Maximum tokens to generate |
--temperature <f> | 0.7 | Sampling temperature (0.0 = greedy, higher = more random) |
--no-sandbox | false | Disable process sandboxing |
Examples
# basic inference
yule run ./model.gguf --prompt "What is Rust?"
# deterministic output
yule run ./model.gguf --prompt "Translate to French: Hello" --temperature 0.0
# short response
yule run ./model.gguf --prompt "One word for happiness:" --max-tokens 5Output
Tokens stream to stdout as they're generated. Model loading stats and generation metrics print to stderr, so you can pipe the output cleanly:
yule run ./model.gguf --prompt "Hello" > output.txtThe stderr output includes parse time, tokenizer stats, prefill time, and decode throughput (tok/s).