OpenAI Endpoints

POST /v1/chat/completions

Standard OpenAI chat completions format.

Request

{
  "model": "anything",
  "messages": [
    { "role": "user", "content": "Hello" }
  ],
  "max_tokens": 100,
  "temperature": 0.7,
  "stream": false
}

The model field is accepted but ignored since Yule serves a single model.

Response

{
  "id": "chatcmpl-18947c080149320c",
  "object": "chat.completion",
  "created": 1739577600,
  "model": "yule",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 7,
    "total_tokens": 12
  }
}

Streaming

Set "stream": true for SSE chunks in OpenAI format. See Streaming.

GET /v1/models

List available models.

curl -H "Authorization: Bearer $TOKEN" http://localhost:11434/v1/models

{
  "object": "list",
  "data": [
    {
      "id": "tinyllama",
      "object": "model",
      "owned_by": "local"
    }
  ]
}

Using with Existing Tools

Point any OpenAI-compatible client at Yule by setting the base URL:

# Python (openai SDK)
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="yule_b499..."
)

response = client.chat.completions.create(
    model="anything",
    messages=[{"role": "user", "content": "Hello"}]
)

# curl
curl http://localhost:11434/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"m","messages":[{"role":"user","content":"Hello"}]}'