Memory
Three-layer persistent memory system.
Memory
Forge's memory system provides persistent, cross-provider conversational memory through three complementary layers. Unlike provider-specific context windows that reset between sessions, Forge Memory maintains continuity across any LLM provider, enabling your AI applications to truly remember users, facts, and relationships.
Three Memory Layers
Vector Memory (Qdrant)
The vector layer stores conversation chunks and documents as high-dimensional embeddings in Qdrant. This enables semantic search — finding contextually relevant past conversations even when the wording differs. When memory is enabled, Forge automatically embeds each conversation turn and retrieves relevant history before sending your request to the LLM.
- Powered by Qdrant (degrades gracefully to Turso native vector when Qdrant is unavailable)
- Embedding model: configurable, defaults to text-embedding-3-small
- Top-k retrieval with similarity threshold filtering
- Namespace isolation per user, per bot, or per conversation
Graph Memory (Neo4j + Graphiti)
The graph layer builds a knowledge graph of entities, relationships, and facts extracted from conversations. Using Graphiti, Forge identifies people, organizations, preferences, and facts mentioned by users and stores them as connected nodes. This enables relationship-aware recall — "What did Alice say about the Q3 report?" can traverse the graph to find Alice's statements linked to the Q3 report entity.
- Powered by Neo4j with Graphiti for automated entity extraction
- Degrades to Turso graph tables when Neo4j is unavailable
- Entity types: Person, Organization, Topic, Fact, Preference, Event
- Automatic relationship inference across conversations
State Memory (Redis CRDTs)
The state layer manages real-time session state using Conflict-Free Replicated Data Types (CRDTs) in Redis. This ensures that when a user switches between providers mid-conversation (e.g., starting with GPT-4o and switching to Claude), the session state — including tool results, accumulated context, and user preferences — transfers seamlessly.
- LWW-Register for single-value state (current topic, language preference)
- OR-Set for collections (mentioned entities, active tools)
- MV-Register for concurrent writes from ensemble queries
- Automatic state TTL with configurable expiration
Enabling Memory via API
{
"model": "auto",
"messages": [{"role": "user", "content": "My favorite color is blue."}],
"forge": {
"memory": {
"enabled": true,
"userId": "user_123",
"layers": ["vector", "graph", "state"],
"namespace": "default"
}
}
}
Querying Memory Directly
You can query memory outside of chat completions using the Memory API:
# Search vector memory
curl -X POST https://api.optima-forge.com/v1/memory/search -H "Authorization: Bearer $FORGE_API_KEY" -d '{"userId": "user_123", "query": "favorite color", "limit": 5}'
# Get graph entities
curl -X GET https://api.optima-forge.com/v1/memory/graph/user_123/entities
# Get current session state
curl -X GET https://api.optima-forge.com/v1/memory/state/user_123/session_abc
Cross-Provider Continuity
Memory is the key to cross-provider continuity. When a user starts a conversation through GPT-4o and later switches to Claude or Gemini, Forge injects relevant memory context into the new provider's prompt. The user experience is seamless — the AI "remembers" everything regardless of which model is actually responding. This is powered by the Blackboard pattern, where all providers read from and write to the same shared memory store.
Additional Memory Technologies
- Mem0: Hybrid datastore for structured user memory with automatic summarization
- Letta: Sleep-time compute for background memory consolidation and reflection
- LightRAG: Graph-RAG for document retrieval augmented with knowledge graph structure
- Storacha: IPFS/Filecoin storage for verifiable, immutable memory snapshots (UCAN auth)