Unified Memory

Three-Layer Memory for Persistent Context

Vector search, graph relationships, and real-time state give your AI applications memory that persists across conversations, sessions, and providers.

Why Memory Matters

Without persistent memory, every AI conversation starts from zero. Users repeat context, agents forget prior decisions, and relationships between concepts are lost. Forge solves this with a three-layer memory system that stores what was said (vector), who said it and how things relate (graph), and what is happening right now (state). Memory persists across provider switches, so a conversation that starts on GPT-4o can seamlessly continue on Claude without losing context.

3
Memory Layers
256-dim
Embedding Vectors
CRDTs
Conflict-Free State

Three Layers, One Memory

Each layer serves a distinct purpose and degrades gracefully when its primary backing store is unavailable. In production, all three layers work together. In development, Turso handles everything.

Vector Layer

Primary: Qdrant|Fallback: Turso Native Vector

Stores semantic embeddings of conversations, documents, and tool outputs. When a user asks a question, Forge searches across all stored vectors to find relevant context from previous interactions, uploaded files, and ingested knowledge bases. Qdrant provides HNSW indexing for sub-millisecond retrieval at scale. When Qdrant is unavailable, Forge automatically falls back to Turso's native vector column type, which supports cosine similarity search on embedded SQLite replicas.

  • EmbeddingGemma-300M for high-quality 256-dim embeddings
  • Configurable chunk sizes and overlap for document ingestion
  • Namespace isolation per tenant, per agent, and per session
  • Automatic embedding of assistant responses for retrieval later
  • MMR (Maximal Marginal Relevance) for diverse search results

Graph Layer

Primary: Neo4j + Graphiti|Fallback: Turso Graph Tables

Captures relationships between entities mentioned in conversations. When a user discusses a project, its team members, deadlines, and dependencies, the graph layer stores these as nodes and edges. Later queries can traverse the graph to answer questions like "Who is working on the project that John mentioned last Tuesday?" Neo4j provides native graph traversal, while Graphiti handles temporal knowledge graphs with versioned facts. The Turso fallback uses adjacency-list tables with recursive CTEs for basic graph queries.

  • Graphiti temporal knowledge graphs with fact versioning
  • Entity extraction via LLM with relationship classification
  • Supports KNOWS, WORKS_ON, DEPENDS_ON, MENTIONS, and custom edge types
  • Automatic entity resolution and deduplication
  • Graph-powered RAG for relationship-aware context assembly

State Layer

Primary: Redis CRDTs|Fallback: In-Memory (single node)

Manages real-time session state using Conflict-free Replicated Data Types. The state layer tracks things like current conversation context, user preferences, active tool sessions, and agent workflow progress. CRDTs ensure consistency without coordination, meaning multiple Forge instances can read and write state simultaneously without locks or race conditions.

  • LWW-Register for user preferences and session metadata
  • OR-Set for tracking active tools, enabled features, and agent memberships
  • MV-Register for conflict-aware values in multi-writer scenarios
  • Automatic state expiry with configurable TTLs per key type
  • State snapshots for session recovery after process restarts

Advanced Memory Patterns

Beyond the three core layers, Forge implements several advanced memory patterns that enable sophisticated multi-model reasoning and long-term knowledge management.

Blackboard Pattern

Multiple LLMs write to a shared memory space (the "blackboard") during parallel execution. Each model reads what others have contributed and adds its own analysis. The blackboard coordinates multi-model reasoning without direct inter-model communication, enabling ensemble approaches where GPT-4o provides analysis, Claude provides critique, and Gemini provides synthesis.

Mem0 Hybrid Datastore

Mem0 provides a unified memory abstraction that combines vector search, graph lookup, and key-value retrieval in a single query. Instead of querying each layer separately, Mem0 fuses results from all three layers and returns a ranked, deduplicated context window that fits within the target model's token budget.

Letta Sleep-Time Compute

Between conversations, Forge runs background memory consolidation during "sleep time." Raw conversation fragments are compressed into structured summaries, conflicting facts are resolved, and stale information is decayed. This mimics how human memory consolidates during sleep, keeping the knowledge base accurate and compact over time.

LightRAG Graph-RAG

LightRAG combines graph traversal with vector retrieval for context assembly. When a query touches multiple related concepts, LightRAG walks the knowledge graph to find connected entities, then uses vector similarity to rank and filter the results. This produces context windows that are both semantically relevant and structurally coherent.

Cross-Provider Memory Continuity

Traditional LLM integrations lose all context when you switch providers. A conversation that starts on OpenAI cannot continue on Anthropic because each provider maintains its own isolated context window.

Forge decouples memory from the provider. All three memory layers are provider-agnostic, meaning the same vector embeddings, graph relationships, and session state are available regardless of which model handles the next request. This enables powerful workflows: start a conversation on a fast, cheap model for initial exploration, then seamlessly switch to a premium model for final analysis, with full context preserved.

Memory continuity also powers Forge's auto-routing. When the quality router switches providers mid-conversation due to cost or capability reasons, the user experience is uninterrupted because memory travels with the session, not the provider.

How It Works

GPT-4o

User discusses Q3 revenue targets

Vector + Graph updated

Claude

User asks follow-up about team assignments

Context retrieved from all 3 layers

Gemini

User requests a summary document

Full conversation history available

Mistral

Quick cost check routed to cheaper model

Same session state, same context

Give your AI a memory

Enable persistent memory with a single session ID in your API call. Forge handles storage, retrieval, and cross-provider continuity automatically.