Semantic AI Cache
Forge Cache Tool
Eliminate redundant LLM calls with semantic caching.
All ToolsSubscribe Now
Forge Cache
Semantic AI Cache
Forge Cache matches incoming queries against previously cached responses using semantic similarity — not exact string matching. If a user asks 'What is React?' and you already cached 'Explain React to me', the cache hits. Dramatically reduce LLM costs on repetitive queries.
Features
- Semantic similarity matching
- Configurable similarity threshold (0-1.0)
- TTL-based auto-expiration
- Pattern-based cache invalidation
- Hit rate analytics
- Cost savings tracking
- Instant cache responses (<5ms)
API Endpoints
POST
/v1/tools/cache/lookupSemantic cache lookup
POST
/v1/tools/cache/storeStore in cache
GET
/v1/tools/cache/statsCache statistics
DELETE
/v1/tools/cache/invalidateInvalidate entries
Use Cases
Cache FAQ responses to cut LLM costs 80%+
Speed up chatbot responses for common questions
Reduce latency on high-traffic AI endpoints
A/B test with cache bypass for comparison
Quick Start
// Check cache before calling LLM
const lookup = await fetch('https://api.optima-forge.com/v1/tools/cache/lookup', {
method: 'POST',
headers: {
'Authorization': 'Bearer ftk_cache_your_key_here',
'Content-Type': 'application/json',
},
body: JSON.stringify({ query: userMessage, threshold: 0.85 }),
});
const { hit, cached_response } = await lookup.json();
if (hit) {
return cached_response; // Instant, free
}
// Cache miss — call LLM, then store
const llmResponse = await callLLM(userMessage);
await fetch('https://api.optima-forge.com/v1/tools/cache/store', {
method: 'POST',
headers: { 'Authorization': 'Bearer ftk_cache_your_key_here', 'Content-Type': 'application/json' },
body: JSON.stringify({ query: userMessage, response: llmResponse, ttl_seconds: 3600 }),
});Ready to start?
Get your API key in seconds. $12/month, cancel anytime.