Forge is preparing the requested surface and verifying the live route.
Forge is preparing the requested surface and verifying the live route.
Vector search, graph relationships, and real-time state give your AI applications memory that persists across conversations, sessions, and providers.
Without persistent memory, every AI conversation starts from zero. Users repeat context, agents forget prior decisions, and relationships between concepts are lost. Forge solves this with a three-layer memory system that stores what was said (vector), who said it and how things relate (graph), and what is happening right now (state). Memory persists across provider switches, so a conversation that starts on GPT-4o can seamlessly continue on another provider without losing context.
Each layer serves a distinct purpose and degrades gracefully when its primary backing store is unavailable. In production, all three layers work together. In development, Turso handles everything.
Stores semantic embeddings of conversations, documents, and tool outputs. When a user asks a question, Forge searches across all stored vectors to find relevant context from previous interactions, uploaded files, and ingested knowledge bases. Qdrant provides HNSW indexing for sub-millisecond retrieval at scale. When Qdrant is unavailable, Forge automatically falls back to Turso's native vector column type, which supports cosine similarity search on embedded SQLite replicas.
Captures relationships between entities mentioned in conversations. When a user discusses a project, its team members, deadlines, and dependencies, the graph layer stores these as nodes and edges. Later queries can traverse the graph to answer questions like "Who is working on the project that John mentioned last Tuesday?" Neo4j provides native graph traversal, while Graphiti handles temporal knowledge graphs with versioned facts. The Turso fallback uses adjacency-list tables with recursive CTEs for basic graph queries.
Manages real-time session state using Conflict-free Replicated Data Types. The state layer tracks things like current conversation context, user preferences, active tool sessions, and agent workflow progress. CRDTs ensure consistency without coordination, meaning multiple Forge instances can read and write state simultaneously without locks or race conditions.
Beyond the three core layers, Forge implements several advanced memory patterns that enable sophisticated multi-model reasoning and long-term knowledge management.
Multiple LLMs write to a shared memory space (the "blackboard") during parallel execution. Each model reads what others have contributed and adds its own analysis. The blackboard coordinates multi-model reasoning without direct inter-model communication, enabling ensemble approaches where GPT-4o provides analysis, Anthropic models provide critique, and Gemini provides synthesis.
Mem0 provides a unified memory abstraction that combines vector search, graph lookup, and key-value retrieval in a single query. Instead of querying each layer separately, Mem0 fuses results from all three layers and returns a ranked, deduplicated context window that fits within the target model's token budget.
Between conversations, Forge runs background memory consolidation during "sleep time." Raw conversation fragments are compressed into structured summaries, conflicting facts are resolved, and stale information is decayed. This mimics how human memory consolidates during sleep, keeping the knowledge base accurate and compact over time.
LightRAG combines graph traversal with vector retrieval for context assembly. When a query touches multiple related concepts, LightRAG walks the knowledge graph to find connected entities, then uses vector similarity to rank and filter the results. This produces context windows that are both semantically relevant and structurally coherent.
Traditional LLM integrations lose all context when you switch providers. A conversation that starts on OpenAI cannot continue on Anthropic because each provider maintains its own isolated context window.
Forge decouples memory from the provider. All three memory layers are provider-agnostic, meaning the same vector embeddings, graph relationships, and session state are available regardless of which model handles the next request. This enables powerful workflows: start a conversation on a fast, cheap model for initial exploration, then seamlessly switch to a premium model for final analysis, with full context preserved.
Memory continuity also powers Forge's auto-routing. When the quality router switches providers mid-conversation due to cost or capability reasons, the user experience is uninterrupted because memory travels with the session, not the provider.
User discusses Q3 revenue targets
Vector + Graph updated
User asks follow-up about team assignments
Context retrieved from all 3 layers
User requests a summary document
Full conversation history available
Quick cost check routed to cheaper model
Same session state, same context
Enable persistent memory with a single session ID in your API call. Forge handles storage, retrieval, and cross-provider continuity automatically.