Chat Completions

Full reference for POST /v1/chat/completions.

Chat Completions

The Chat Completions endpoint is the primary interface for generating AI responses. It is fully compatible with the OpenAI Chat Completions API with additional Forge-specific parameters for routing, memory, security, caching, and ensemble processing.

Endpoint

POST /v1/chat/completions

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID or `"auto"` for intelligent routing
`messages`	array	Yes	Array of message objects with `role` and `content`
`temperature`	number	No	Sampling temperature (0-2). Default: 1
`max_tokens`	integer	No	Maximum tokens in the response
`stream`	boolean	No	Enable Server-Sent Events streaming. Default: false
`top_p`	number	No	Nucleus sampling parameter (0-1). Default: 1
`n`	integer	No	Number of completions to generate. Default: 1
`stop`	string\|array	No	Stop sequences
`tools`	array	No	Function/tool definitions for tool calling
`forge`	object	No	Forge-specific extensions (see below)

Forge Extensions

{
  "forge": {
    "cache": {
      "enabled": true,
      "ttl": 3600,
      "namespace": "default"
    },
    "security": {
      "level": "standard",
      "pii": { "detect": true, "redact": true }
    },
    "memory": {
      "enabled": true,
      "userId": "user_123",
      "layers": ["vector", "graph", "state"]
    },
    "ensemble": {
      "enabled": false,
      "strategy": "best-of-n",
      "n": 3
    },
    "routing": {
      "costSensitivity": "medium",
      "failover": true,
      "maxRetries": 3
    }
  }
}

Response Format

{
  "id": "forge-abc123",
  "object": "chat.completion",
  "created": 1709000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  },
  "forge_metadata": {
    "provider": "openai",
    "routing_time_ms": 3,
    "security_scan": "passed",
    "cache_hit": false,
    "cost_usd": 0.00031,
    "trace_id": "trace_xyz789"
  }
}

Streaming

Set stream: true to receive Server-Sent Events. Each event contains a delta chunk:

data: {"id":"forge-abc123","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"forge-abc123","choices":[{"delta":{"content":"!"},"index":0}]}
data: {"id":"forge-abc123","choices":[{"delta":{},"finish_reason":"stop","index":0}]}
data: [DONE]

Error Codes

Code	Description
`400`	Invalid request parameters
`401`	Invalid or missing API key
`402`	Payment required (x402)
`403`	Feature not available on current tier
`429`	Rate limit exceeded
`500`	Internal server error
`503`	All providers unavailable

curl Example

curl -X POST https://api.optima-forge.com/v1/chat/completions \
  -H "Authorization: Bearer $FORGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 150,
    "forge": {
      "cache": {"enabled": true},
      "security": {"level": "standard"},
      "memory": {"enabled": true, "userId": "user_123"}
    }
  }'

JavaScript Example

import { Forge } from "@optima-forge/sdk";

const forge = new Forge({ apiKey: process.env.FORGE_API_KEY });

const response = await forge.chat.completions.create({
  model: "auto",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" },
  ],
  temperature: 0.7,
  max_tokens: 150,
  forge: {
    cache: { enabled: true },
    security: { level: "standard" },
    memory: { enabled: true, userId: "user_123" },
  },
});

console.log(response.choices[0].message.content);

Python Example

from optima_forge import Forge

forge = Forge(api_key="forge_sk_your_key")

response = forge.chat.completions.create(
    model="auto",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    temperature=0.7,
    max_tokens=150,
    forge={
        "cache": {"enabled": True},
        "security": {"level": "standard"},
        "memory": {"enabled": True, "userId": "user_123"},
    },
)

print(response.choices[0].message.content)

API Reference

Responses API

Back to all documentation