Managed extract-retrieve14 frameworks · 5 use cases

Managed extract-retrieve

An LLM extracts discrete memories; you recall them by semantic search.

An LLM reads your interactions and extracts discrete memories — facts, preferences, events — into a store. Recall is by semantic search over that store, often behind a hosted API you call rather than run yourself. The vendor typically owns the extraction logic and the embeddings pipeline. This is the most crowded family: fast to drop in, large integration ecosystems, dual session/user scope, and the production-polish end (SOC2/HIPAA, connectors, context fencing, sub-300ms) lives here too.

What makes this family unique

Ease and polish. Nothing else gives you a working memory layer in an afternoon, and nothing else ships the compliance + multi-tenant + connectors package that production B2B support demands. The trade-off is that 'extract facts then search them' is a weaker primitive than reasoning (reasoning-user-model), temporal supersession (kg-graphrag), or a queryable graph (kg-graphrag) — but for the broad middle of 'remember user facts and recall them', it is the default.

Frameworks in this family

14 catalogued.

Mem0

60k

Managed extract-retrieve

Fastest setup + largest integration ecosystem; dual session/user scope.

Self-host: moderateFreemiumApache-2.0

Best for: Fastest drop-in memory with the biggest integration ecosystem · Token-cost-sensitive production agents (single-pass extraction, sub-7k tokens/call) · AWS Agent SDK users and SOC 2 / HIPAA workloads

View memory card →

Supermemory

28k

Managed extract-retrieve

Production polish: sub-300ms, SOC2/HIPAA, connectors, context fencing.

Managed onlyFreemiumOSS*

Best for: Polished managed memory API with SOC 2 / HIPAA compliance · Coding-agent memory via MCP (Claude Code / OpenCode plugins) · One API over mixed data (files, email, PDFs, chat)

View memory card →

RetainDB

Managed extract-retrieve

Hybrid BM25 + vector + rerank → exact-token recall semantic search misses.

Managed onlyPaidOSS*

Best for: Agents needing lossless recall — full chronology, no semantic-search step · Codes / IDs / error strings and preference-recall-heavy apps

View memory card →

Memori

GibsonAI

15k

Managed extract-retrieve

Memory in plain SQL — no vector DB, fully inspectable, portable.

Self-host: trivialFree + paidApache-2.0

Best for: Skip the vector DB — inspectable SQL

View memory card →

MemOS

MemTensor

10k

Managed extract-retrieve

A self-hostable 'memory operating system' that packages long-term memory into MemCube units and manages their lifecycle (store / retrieve / update / schedule) outside the model.

Self-host: moderateFree / OSSApache-2.0

Best for: Teams wanting a self-hosted memory layer with hybrid retrieval and skill reuse

View memory card →

MemoryOS

BAI-LAB

1.5k

Managed extract-retrieve

An OS-inspired memory layer for personalized AI agents that organizes user memory into short-, mid-, and long-term tiers and migrates entries between them. Published as an EMNLP 2025 Oral.

Self-host: moderateFree / OSSApache-2.0

Best for: Personalized conversational agents needing tiered long-term user memory

View memory card →

MIRIX

Mirix-AI

3.5k

Managed extract-retrieve

A modular multi-agent memory system that augments any LLM. Specialized agents manage six memory types (Core, Episodic, Semantic, Procedural, Resource, Knowledge Vault) under a coordinator that orchestrates updates and retrieval. Ships a desktop app that builds a personal memory base from on-screen activity.

Self-host: moderateFree / OSSApache-2.0

Best for: Personal assistants needing multimodal, screen-aware long-term memory

View memory card →

TencentDB Agent Memory

Tencent

6.3k

Managed extract-retrieve

Fully-local long-term memory for AI agents built on two pillars: layered long-term memory (a semantic pyramid L0 Conversation → L1 Atom → L2 Scenario → L3 Persona) and symbolic short-term memory that offloads verbose tool logs to files while keeping a compact Mermaid 'canvas' in context. Distributed as a TypeScript/npm package; integrates with OpenClaw and the Hermes gateway.

Self-host: moderateFree / OSSMIT

Best for: Long-horizon agent tasks needing token-efficient, fully-local memory with traceable layered recall

View memory card →

SimpleMem

Aiming Lab

3.6k

Managed extract-retrieve

A lifelong memory stack for LLM agents built on 'semantically lossless compression' — store dense, high-information memory so an agent recalls more while spending far fewer tokens. Ships as one `simplemem` Python package that auto-routes across three pillars: SimpleMem (text efficiency core), Omni-SimpleMem (multimodal: text/image/audio/video), and EvolveMem (self-evolving retrieval). Also offered as a cloud-hosted and self-hostable MCP server. Backed by arXiv papers (2601.02553, 2604.01007, 2605.13941).

Self-host: moderateFree + paidMIT

Best for: Token-budget-constrained agents needing dense lifelong memory with intent-aware retrieval, optionally across modalities

View memory card →

TeleMem

TeleAI

468

Managed extract-retrieve

An agent memory management layer positioned as a high-performance drop-in replacement for Mem0 (`import telemem as mem0`), optimized for multi-turn dialogue, character modeling, long-term storage, and semantic retrieval. Pipeline: character-aware summarization → semantic-clustering deduplication → efficient storage → precise retrieval. Extends to multimodal video memory (frame extraction → captioning → vector DB) with ReAct-style multi-step video QA. Backed by a tech report (arXiv 2601.06037).

Self-host: moderateFree / OSSApache-2.0

Best for: Teams wanting a local, Mem0-compatible memory layer with strong per-character isolation and optional video memory

View memory card →

mnemory

Filip Pytloun

182

Managed extract-retrieve

A self-hosted MCP server (plus REST API) that adds persistent, personalized long-term memory to any MCP-compatible assistant (Claude Code, ChatGPT, Cursor, Open WebUI, and more). A single unified LLM call performs fact extraction, metadata classification, deduplication, and contradiction resolution at once. Two-tier design: fast searchable summaries in a vector store, plus a detailed artifact store retrieved on demand.

Self-host: moderateFree / OSSApache-2.0

Best for: Self-hosters wanting a private, MCP-native memory server with automatic fact extraction, dedup, and contradiction handling

View memory card →

archon-memory-core

Divergence Router

Managed extract-retrieve

An in-process, local-first Python memory library (`pip install archon-memory-core`) whose thesis is that memory should get better the longer it is used. Built on ChromaDB + Ollama, it pairs ranked top-1 retrieval with supersede-aware nightly consolidation, type-aware salience, an entity graph, active forgetting, and full replay/observability. Positions itself as a memory policy library (not an agent runtime), with LangChain and LlamaIndex adapters.

Self-host: moderateFree / OSSApache-2.0

Best for: Agents that accumulate contradictory facts over long horizons and want a local, consolidating memory library with built-in forgetting

View memory card →

PowerMem

OceanBase / ob-labs

722

Managed extract-retrieve

Persistent, self-evolving AI memory plugin for coding agents and applications. Combines LLM-driven memory extraction with a two-layer Experience + Skill distillation system: raw interactions are first compressed into Experience memories, then recurring patterns are further abstracted into reusable Skill entries. Ebbinghaus-style time-decay keeps memory collections pruned and relevant over time. Exposes a unified backend via Python SDK, HTTP REST server, MCP server, and CLI.

Self-host: moderateFree / OSSApache-2.0

Best for: AI coding agents and multi-agent systems that need both factual recall and reusable procedural workflows distilled from past sessions · Teams wanting a production-ready memory backend that spans multiple agent clients (Claude Code, Codex, OpenCode, Cline) via a shared server

View memory card →

Nemori

Nemori AI

204

Managed extract-retrieve

Self-organising long-term memory substrate for agentic LLM workflows, grounded in Event Segmentation Theory (EST) and Predictive Processing (PP). Ingests multi-turn conversations, segments them into topically coherent episodes via LLM-powered boundary detection, distils durable semantic knowledge from each episode, and exposes a unified search surface for downstream reasoning. Designed as a minimalist production-ready core: PostgreSQL for structured metadata, Qdrant for vector similarity.

Self-host: moderateFree / OSSMIT

Best for: Agentic LLM workflows needing structured long-term memory with semantically coherent episodes and a unified search surface across episodic and semantic stores

View memory card →

Use cases this family is built for

Top-down recommendations from the use-case playbook. Each names the one binding constraint that picks the tool, the primary pick (which may sit in another family when the case spans more than one), and runner-ups.

Multi-agent system with shared or perspectival memory

Binding constraint: Model 'what does agent A know vs agent B,' or 'what does the support persona think the customer wants.'

Pick

Honcho — The peer primitive (agents and ideas can be peers, many-to-many sessions) is the only native model of perspective.

Runner-up

Mem0 — A shared store with clean agent_id/run_id isolation — but no perspective modeling.

From Agentic Memory: Use-Case Playbook 2026 · last verified 2026-06-28

Production B2B / customer-support agent at scale

Binding constraint: Multi-tenant isolation + compliance (SOC2/HIPAA) + customer 'current state' (plan, tier, tickets) that changes.

Pick

Supermemory — SOC2/HIPAA (Scale tier), connectors, context fencing, sub-300ms, multi-tenant.

Runner-up

Zep (Graphiti) — If temporal state (a customer's current plan/tickets that drift) is the heart of it.

Hard no: Cognee — no SOC2/HIPAA as of mid-2026, disqualifying for regulated data.

From Agentic Memory: Use-Case Playbook 2026 · last verified 2026-06-28

Cost-sensitive, high-volume ingestion

Binding constraint: Lots of data, tight budget.

Pick

Memori — Drops the vector DB and runs on SQL + LLM extraction — ~80–90% cheaper infra than vector-backed stores.

Runner-ups

OpenViking — Tiered L0/L1/L2 loading gives ~80–90% token savings on reads. Pick this when cost is per-call tokens, not infrastructure.
Honcho — Also cheap at $2/1M ingested if managed is acceptable.

From Agentic Memory: Use-Case Playbook 2026 · last verified 2026-06-28

Coding agent / dev assistant

Binding constraint: Exact-token recall (function names, error codes, variable names — semantic search fails here) + capturing decisions before a long session is compacted.

Pick

ByteRover — The pre-compression hook grabs 'we decided X / Y didn't work' before the context window summarizes it away, and markdown lives next to code in git.

Runner-up

RetainDB — If exact-symbol retrieval precision is the specific pain — hybrid BM25 + vector + rerank gives exact-token recall that semantic search misses.

From Agentic Memory: Use-Case Playbook 2026 · last verified 2026-06-28

Voice agent / latency-critical

Binding constraint: Retrieval can't stall a live conversation.

Pick

ByteRover — No LLM in the read path = predictable sub-100ms recall.

Runner-up

Supermemory — Sub-300ms if you want managed richness and can spend the latency budget.

From Agentic Memory: Use-Case Playbook 2026 · last verified 2026-06-28

Last verified 2026-06-28 · updated by manual-stub