Chat Completions

Full reference for POST /v1/chat/completions.

Chat Completions

The Chat Completions endpoint is the primary interface for generating AI responses. It is fully compatible with the OpenAI Chat Completions API with additional Forge-specific parameters for routing, memory, security, caching, and ensemble processing.

Endpoint

POST /v1/chat/completions

Request Parameters

ParameterTypeRequiredDescription
modelstringYesModel ID or "auto" for intelligent routing
messagesarrayYesArray of message objects with role and content
temperaturenumberNoSampling temperature (0-2). Default: 1
max_tokensintegerNoMaximum tokens in the response
streambooleanNoEnable Server-Sent Events streaming. Default: false
top_pnumberNoNucleus sampling parameter (0-1). Default: 1
nintegerNoNumber of completions to generate. Default: 1
stopstring|arrayNoStop sequences
toolsarrayNoFunction/tool definitions for tool calling
forgeobjectNoForge-specific extensions (see below)

Forge Extensions

{
  "forge": {
    "cache": {
      "enabled": true,
      "ttl": 3600,
      "namespace": "default"
    },
    "security": {
      "level": "standard",
      "pii": { "detect": true, "redact": true }
    },
    "memory": {
      "enabled": true,
      "userId": "user_123",
      "layers": ["vector", "graph", "state"]
    },
    "ensemble": {
      "enabled": false,
      "strategy": "best-of-n",
      "n": 3
    },
    "routing": {
      "costSensitivity": "medium",
      "failover": true,
      "maxRetries": 3
    }
  }
}

Response Format

{
  "id": "forge-abc123",
  "object": "chat.completion",
  "created": 1709000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  },
  "forge_metadata": {
    "provider": "openai",
    "routing_time_ms": 3,
    "security_scan": "passed",
    "cache_hit": false,
    "cost_usd": 0.00031,
    "trace_id": "trace_xyz789"
  }
}

Streaming

Set stream: true to receive Server-Sent Events. Each event contains a delta chunk:

data: {"id":"forge-abc123","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"forge-abc123","choices":[{"delta":{"content":"!"},"index":0}]}
data: {"id":"forge-abc123","choices":[{"delta":{},"finish_reason":"stop","index":0}]}
data: [DONE]

Error Codes

CodeDescription
400Invalid request parameters
401Invalid or missing API key
402Payment required (x402)
403Feature not available on current tier
429Rate limit exceeded
500Internal server error
503All providers unavailable

curl Example

curl -X POST https://api.optima-forge.com/v1/chat/completions \
  -H "Authorization: Bearer $FORGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 150,
    "forge": {
      "cache": {"enabled": true},
      "security": {"level": "standard"},
      "memory": {"enabled": true, "userId": "user_123"}
    }
  }'

JavaScript Example

import { Forge } from "@optima-forge/sdk";

const forge = new Forge({ apiKey: process.env.FORGE_API_KEY });

const response = await forge.chat.completions.create({
  model: "auto",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" },
  ],
  temperature: 0.7,
  max_tokens: 150,
  forge: {
    cache: { enabled: true },
    security: { level: "standard" },
    memory: { enabled: true, userId: "user_123" },
  },
});

console.log(response.choices[0].message.content);

Python Example

from optima_forge import Forge

forge = Forge(api_key="forge_sk_your_key")

response = forge.chat.completions.create(
    model="auto",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    temperature=0.7,
    max_tokens=150,
    forge={
        "cache": {"enabled": True},
        "security": {"level": "standard"},
        "memory": {"enabled": True, "userId": "user_123"},
    },
)

print(response.choices[0].message.content)