Semantic AI Cache

Forge Cache Tool

Eliminate redundant LLM calls with semantic caching.

All Tools

Forge Cache

Semantic AI Cache

Forge Cache matches incoming queries against previously cached responses using semantic similarity — not exact string matching. If a user asks 'What is React?' and you already cached 'Explain React to me', the cache hits. Dramatically reduce LLM costs on repetitive queries.

$12
/month
Subscribe

Features

  • Semantic similarity matching
  • Configurable similarity threshold (0-1.0)
  • TTL-based auto-expiration
  • Pattern-based cache invalidation
  • Hit rate analytics
  • Cost savings tracking
  • Instant cache responses (<5ms)

API Endpoints

POST
/v1/tools/cache/lookup

Semantic cache lookup

POST
/v1/tools/cache/store

Store in cache

GET
/v1/tools/cache/stats

Cache statistics

DELETE
/v1/tools/cache/invalidate

Invalidate entries

Use Cases

Cache FAQ responses to cut LLM costs 80%+
Speed up chatbot responses for common questions
Reduce latency on high-traffic AI endpoints
A/B test with cache bypass for comparison

Quick Start

// Check cache before calling LLM
const lookup = await fetch('https://api.optima-forge.com/v1/tools/cache/lookup', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer ftk_cache_your_key_here',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ query: userMessage, threshold: 0.85 }),
});

const { hit, cached_response } = await lookup.json();

if (hit) {
  return cached_response; // Instant, free
}

// Cache miss — call LLM, then store
const llmResponse = await callLLM(userMessage);
await fetch('https://api.optima-forge.com/v1/tools/cache/store', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer ftk_cache_your_key_here', 'Content-Type': 'application/json' },
  body: JSON.stringify({ query: userMessage, response: llmResponse, ttl_seconds: 3600 }),
});

Ready to start?

Get your API key in seconds. $12/month, cancel anytime.

Subscribe Now