Semantic AI Cache

Forge Cache Tool

Eliminate redundant LLM calls with semantic caching.

Forge Cache

Semantic AI Cache

Forge Cache matches incoming queries against previously cached responses using semantic similarity — not exact string matching. If a user asks 'What is React?' and you already cached 'Explain React to me', the cache hits. Dramatically reduce LLM costs on repetitive queries.

$12

/month

Features

Semantic similarity matching
Configurable similarity threshold (0-1.0)
TTL-based auto-expiration
Pattern-based cache invalidation
Hit rate analytics
Cost savings tracking
Instant cache responses (<5ms)

API Endpoints

POST

/v1/tools/cache/lookup

Semantic cache lookup

POST

/v1/tools/cache/store

Store in cache

GET

/v1/tools/cache/stats

Cache statistics

DELETE

/v1/tools/cache/invalidate

Invalidate entries

Use Cases

Cache FAQ responses to cut LLM costs 80%+

Speed up chatbot responses for common questions

Reduce latency on high-traffic AI endpoints

A/B test with cache bypass for comparison

Quick Start

// Check cache before calling LLM
const lookup = await fetch('https://api.optima-forge.com/v1/tools/cache/lookup', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer ftk_cache_your_key_here',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ query: userMessage, threshold: 0.85 }),
});

const { hit, cached_response } = await lookup.json();

if (hit) {
  return cached_response; // Instant, free
}

// Cache miss — call LLM, then store
const llmResponse = await callLLM(userMessage);
await fetch('https://api.optima-forge.com/v1/tools/cache/store', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer ftk_cache_your_key_here', 'Content-Type': 'application/json' },
  body: JSON.stringify({ query: userMessage, response: llmResponse, ttl_seconds: 3600 }),
});

Ready to start?

Get your API key in seconds. $12/month, cancel anytime.

Subscribe Now