YuleYule
CLI

yule run

Run inference directly from the command line

Usage

yule run <model> --prompt <text> [options]

Arguments

ArgumentDescription
modelPath to a .gguf model file

Options

FlagDefaultDescription
--prompt <text>RequiredThe input prompt
--max-tokens <n>512Maximum tokens to generate
--temperature <f>0.7Sampling temperature (0.0 = greedy, higher = more random)
--no-sandboxfalseDisable process sandboxing

Examples

# basic inference
yule run ./model.gguf --prompt "What is Rust?"

# deterministic output
yule run ./model.gguf --prompt "Translate to French: Hello" --temperature 0.0

# short response
yule run ./model.gguf --prompt "One word for happiness:" --max-tokens 5

Output

Tokens stream to stdout as they're generated. Model loading stats and generation metrics print to stderr, so you can pipe the output cleanly:

yule run ./model.gguf --prompt "Hello" > output.txt

The stderr output includes parse time, tokenizer stats, prefill time, and decode throughput (tok/s).

On this page