Three-Layer Memory for Persistent Context
Vector search, graph relationships, and real-time state give your AI applications memory that persists across conversations, sessions, and providers.
Why Memory Matters
Without persistent memory, every AI conversation starts from zero. Users repeat context, agents forget prior decisions, and relationships between concepts are lost. Forge solves this with a three-layer memory system that stores what was said (vector), who said it and how things relate (graph), and what is happening right now (state). Memory persists across provider switches, so a conversation that starts on GPT-4o can seamlessly continue on Claude without losing context.
Three Layers, One Memory
Each layer serves a distinct purpose and degrades gracefully when its primary backing store is unavailable. In production, all three layers work together. In development, Turso handles everything.
Vector Layer
Stores semantic embeddings of conversations, documents, and tool outputs. When a user asks a question, Forge searches across all stored vectors to find relevant context from previous interactions, uploaded files, and ingested knowledge bases. Qdrant provides HNSW indexing for sub-millisecond retrieval at scale. When Qdrant is unavailable, Forge automatically falls back to Turso's native vector column type, which supports cosine similarity search on embedded SQLite replicas.
- EmbeddingGemma-300M for high-quality 256-dim embeddings
- Configurable chunk sizes and overlap for document ingestion
- Namespace isolation per tenant, per agent, and per session
- Automatic embedding of assistant responses for retrieval later
- MMR (Maximal Marginal Relevance) for diverse search results
Graph Layer
Captures relationships between entities mentioned in conversations. When a user discusses a project, its team members, deadlines, and dependencies, the graph layer stores these as nodes and edges. Later queries can traverse the graph to answer questions like "Who is working on the project that John mentioned last Tuesday?" Neo4j provides native graph traversal, while Graphiti handles temporal knowledge graphs with versioned facts. The Turso fallback uses adjacency-list tables with recursive CTEs for basic graph queries.
- Graphiti temporal knowledge graphs with fact versioning
- Entity extraction via LLM with relationship classification
- Supports KNOWS, WORKS_ON, DEPENDS_ON, MENTIONS, and custom edge types
- Automatic entity resolution and deduplication
- Graph-powered RAG for relationship-aware context assembly
State Layer
Manages real-time session state using Conflict-free Replicated Data Types. The state layer tracks things like current conversation context, user preferences, active tool sessions, and agent workflow progress. CRDTs ensure consistency without coordination, meaning multiple Forge instances can read and write state simultaneously without locks or race conditions.
- LWW-Register for user preferences and session metadata
- OR-Set for tracking active tools, enabled features, and agent memberships
- MV-Register for conflict-aware values in multi-writer scenarios
- Automatic state expiry with configurable TTLs per key type
- State snapshots for session recovery after process restarts
Advanced Memory Patterns
Beyond the three core layers, Forge implements several advanced memory patterns that enable sophisticated multi-model reasoning and long-term knowledge management.
Blackboard Pattern
Multiple LLMs write to a shared memory space (the "blackboard") during parallel execution. Each model reads what others have contributed and adds its own analysis. The blackboard coordinates multi-model reasoning without direct inter-model communication, enabling ensemble approaches where GPT-4o provides analysis, Claude provides critique, and Gemini provides synthesis.
Mem0 Hybrid Datastore
Mem0 provides a unified memory abstraction that combines vector search, graph lookup, and key-value retrieval in a single query. Instead of querying each layer separately, Mem0 fuses results from all three layers and returns a ranked, deduplicated context window that fits within the target model's token budget.
Letta Sleep-Time Compute
Between conversations, Forge runs background memory consolidation during "sleep time." Raw conversation fragments are compressed into structured summaries, conflicting facts are resolved, and stale information is decayed. This mimics how human memory consolidates during sleep, keeping the knowledge base accurate and compact over time.
LightRAG Graph-RAG
LightRAG combines graph traversal with vector retrieval for context assembly. When a query touches multiple related concepts, LightRAG walks the knowledge graph to find connected entities, then uses vector similarity to rank and filter the results. This produces context windows that are both semantically relevant and structurally coherent.
Cross-Provider Memory Continuity
Traditional LLM integrations lose all context when you switch providers. A conversation that starts on OpenAI cannot continue on Anthropic because each provider maintains its own isolated context window.
Forge decouples memory from the provider. All three memory layers are provider-agnostic, meaning the same vector embeddings, graph relationships, and session state are available regardless of which model handles the next request. This enables powerful workflows: start a conversation on a fast, cheap model for initial exploration, then seamlessly switch to a premium model for final analysis, with full context preserved.
Memory continuity also powers Forge's auto-routing. When the quality router switches providers mid-conversation due to cost or capability reasons, the user experience is uninterrupted because memory travels with the session, not the provider.
How It Works
User discusses Q3 revenue targets
Vector + Graph updated
User asks follow-up about team assignments
Context retrieved from all 3 layers
User requests a summary document
Full conversation history available
Quick cost check routed to cheaper model
Same session state, same context
Give your AI a memory
Enable persistent memory with a single session ID in your API call. Forge handles storage, retrieval, and cross-provider continuity automatically.