Tutorials

Getting Started with Forge in 5 Minutes

OF
Optima Forge Team
Developer Relations
2025-12-05·5 min read
tutorialquickstartapigetting-started

Getting started with Forge takes less than five minutes. By the end of this tutorial you will have made your first API request, seen automatic model routing in action, and understood the core parameters that differentiate Forge from calling providers directly.

Step 1: Create Your Account

Head to optima-forge.com/dashboard and sign up with your email or GitHub account. We use Clerk for authentication, which means you get SSO, SAML, and multi-factor auth out of the box. Once you are in the dashboard, navigate to Settings and then API Keys. Click "Create API Key," give it a name like "dev-testing," and copy the key. It starts with forge_ and you will not see it again after this screen.

Step 2: Make Your First Request

Forge exposes a fully OpenAI-compatible API at https://api.optima-forge.com/v1/chat/completions. If you have ever called OpenAI, the shape of the request is identical. Open your terminal and run:

curl -X POST https://api.optima-forge.com/v1/chat/completions   -H "Authorization: Bearer YOUR_API_KEY"   -H "Content-Type: application/json"   -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

That is it. You just made your first Forge request. The response follows the standard OpenAI chat completion format, so any SDK or framework that supports OpenAI will work with Forge by changing the base URL and API key.

Step 3: Understand Model Routing

Notice that we set "model": "auto". This tells Forge to pick the best model for your request using its cascading intent classifier. The classifier runs three tiers: fast regex and keyword matching, a semantic-router embedding lookup, and if needed, an LLM-based fallback. The entire classification takes under 5ms using an ONNX-optimized BERT model.

For a simple factual question like "What is the capital of France?" Forge will route to a fast, cheap model. For a complex coding task, it routes to a more capable model. For a creative writing task, it picks a model known for strong creative output. This automatic routing delivers roughly 85% cost savings compared to always using the most expensive model, with minimal quality degradation.

You can also specify a model directly: "model": "claude-opus-4-20250514" or "model": "gpt-4o". Forge routes to that specific model but still applies security scanning, caching, and observability.

Step 4: Use the Forge Object

The optional forge object in the request body unlocks the platform's advanced capabilities. Here is a request that enables semantic caching, strict security, and memory:

curl -X POST https://api.optima-forge.com/v1/chat/completions   -H "Authorization: Bearer YOUR_API_KEY"   -H "Content-Type: application/json"   -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "Summarize the key points of GDPR"}
    ],
    "forge": {
      "cache": true,
      "security": "strict",
      "memory": { "session": "onboarding-demo" },
      "priority": "balanced"
    }
  }'

cache enables semantic caching. If a semantically similar question was asked recently, Forge returns the cached response instantly instead of making a new LLM call. This reduces latency to under 10ms and costs to zero for cache hits.

security controls the ForgeGuard pipeline intensity. Options are "standard" (default), "strict" (full seven-layer scan), and "minimal" (basic input validation only). Strict mode runs LlamaFirewall, DeBERTa-v3 semantic analysis, Presidio PII detection, and more.

memory attaches a session identifier. All messages within the same session are stored in Forge's three-layer memory system (vector search via Qdrant, graph relationships via Neo4j, and real-time state via Redis CRDTs). This means your next request with the same session ID can reference previous conversations, even if it routes to a different provider.

priority controls the routing strategy. Options are "speed" (fastest response), "quality" (best model), "balanced" (default), and "cost" (cheapest option that meets minimum quality thresholds).

Step 5: Check the Dashboard

Go back to your dashboard at optima-forge.com/dashboard. You will see your requests logged with full traces in Langfuse, including the model selected, latency breakdown, token counts, cost, and security scan results. Every request gets an OpenTelemetry-compatible trace you can export to your existing observability stack.

Next Steps

You are now up and running with Forge. From here you can explore the Python and TypeScript SDKs, build your first ForgeBot agent, or dive into the security pipeline configuration. The free tier includes 1,000 requests per month with full access to caching, security, and memory — no credit card required.


Related Articles

Stay up to date

Get the latest articles on AI infrastructure, security, and engineering delivered to your inbox. No spam, unsubscribe anytime.

By subscribing you agree to our privacy policy.