Memory & Persistence
rag7 is built on LangGraph and supports two levels of memory:
checkpointer= |
memory_store= |
|
|---|---|---|
| Scope | Per thread (conversation) | Per user (cross-thread) |
| What's stored | Full graph state | Key Q&A facts |
| Survives restarts | With SQLite/Postgres | With SQLite/Postgres |
| Use case | Resume a conversation | Remember user preferences across sessions |
For simple multi-turn chat within a single session, the history= parameter on chat() is enough — no config needed.
In-process memory (MemorySaver)
Lost on restart. Good for single-session apps or testing.
from rag7 import init_agent
from langgraph.checkpoint.memory import MemorySaver
rag = init_agent(
"docs",
model="openai:gpt-5.4",
backend="qdrant",
backend_url="http://localhost:6333",
checkpointer=MemorySaver(),
)
config = {"configurable": {"thread_id": "user-alice"}}
state = rag.invoke("What is hybrid search?", config=config)
state = rag.invoke("Give me an example.", config=config) # graph remembers the first turn
Persistent memory (SQLite)
Survives restarts. Good for chatbots and long-running apps.
from rag7 import init_agent
from langgraph.checkpoint.sqlite import SqliteSaver
with SqliteSaver.from_conn_string("./memory.db") as checkpointer:
rag = init_agent(
"docs",
model="openai:gpt-5.4",
backend="qdrant",
backend_url="http://localhost:6333",
checkpointer=checkpointer,
)
config = {"configurable": {"thread_id": "user-alice"}}
state = rag.invoke("What is hybrid search?", config=config)
Persistent memory (PostgreSQL)
Production-grade. Requires pip install langgraph-checkpoint-postgres.
from rag7 import init_agent
from langgraph.checkpoint.postgres import PostgresSaver
checkpointer = PostgresSaver.from_conn_string(
"postgresql://user:pass@localhost/mydb"
)
checkpointer.setup() # creates the checkpoint tables on first run
rag = init_agent(
"docs",
model="openai:gpt-5.4",
backend="qdrant",
checkpointer=checkpointer,
)
config = {"configurable": {"thread_id": "user-alice"}}
state = rag.invoke("What is hybrid search?", config=config)
Thread IDs
Each thread_id is an independent memory scope. Use one per user, session, or conversation:
# Different users — separate memory
rag.invoke("Who are you?", config={"configurable": {"thread_id": "user-alice"}})
rag.invoke("Who are you?", config={"configurable": {"thread_id": "user-bob"}})
Smarter long-term memory (mem0)
mem0 uses an LLM to extract discrete facts from each exchange, deduplicate them, and resolve conflicts — so "I moved to Berlin" replaces "I live in Munich" rather than creating two entries. It is the recommended choice when memory quality matters.
How rag7 wires it in
The LangGraph state machine has two memory nodes — read_memory runs
before retrieval, write_memory runs after generation. When you pass
mem0_memory= to the agent, rag7 routes both nodes through mem0:
| Node | mem0 call | What rag7 passes |
|---|---|---|
read_memory |
search(question, filters={"user_id": ...}) |
Top 5 hits become a context block prepended to retrieval. |
write_memory |
add(messages, user_id=...) after the answer |
mem0's LLM extracts facts; conflicts get resolved. |
Both nodes read user_id from the request config (config["configurable"]["user_id"]).
If a request arrives without a user_id, rag7 falls back to "default".
async-vs-sync detection happens automatically — pass AsyncMemory()
and rag7 uses asearch/aadd directly; pass Memory() and rag7
runs the sync calls in a thread pool so the event loop stays free.
Install
Minimal example
from rag7 import init_agent
from mem0 import Memory # or AsyncMemory
rag = init_agent(
"docs",
model="openai:gpt-5.4",
backend="qdrant",
backend_url="http://localhost:6333",
mem0_memory=Memory(),
)
config = {"configurable": {"user_id": "alice"}}
rag.invoke("I prefer answers in German.", config=config)
rag.invoke("What is hybrid search?", config=config)
# mem0 extracted the language preference and recalled it on the second call.
Configuring mem0's own backends
Memory() defaults to OpenAI embeddings + an embedded vector DB. To
point it at the same vector store you already use for retrieval, pass
mem0 a config dict:
from mem0 import Memory
mem = Memory.from_config({
"vector_store": {
"provider": "qdrant",
"config": {"host": "localhost", "port": 6333, "collection_name": "user_memories"},
},
"llm": {
"provider": "openai",
"config": {"model": "gpt-5.4-mini"},
},
"embedder": {
"provider": "openai",
"config": {"model": "text-embedding-3-small"},
},
})
rag = init_agent("docs", model="openai:gpt-5.4", backend="qdrant", mem0_memory=mem)
See the mem0 docs for the full provider list (Anthropic, Azure OpenAI, Postgres + pgvector, Chroma, etc.).
Inspecting recalled memories
state.trace contains a read_memory entry whenever mem0 returned hits:
state = rag.invoke("What is hybrid search?", config={"configurable": {"user_id": "alice"}})
for step in state.trace:
if step["node"] == "read_memory":
print("Recalled facts:\n", step["memories"])
Manually scoping users
user_id partitions all memory ops. Two ways to set it:
# 1. Via the per-call config (most common)
rag.invoke(question, config={"configurable": {"user_id": "alice"}})
# 2. Or bake a default into the agent and let mem0 see it
import functools
ainvoke = functools.partial(rag.ainvoke, config={"configurable": {"user_id": "alice"}})
Failure modes & gotchas
- No mem0 LLM → mem0 still stores raw turns but skips fact
extraction; quality matches
memory_store(raw Q&A). Configure mem0's own LLM to get the deduplication benefit. - Slow first call → mem0's first call seeds embeddings; expect ~1–2 s extra latency on cold start. Subsequent calls are fast.
- Async event loop already running (Jupyter, FastAPI handlers)
→ use
AsyncMemory(). rag7 detectsaadd/asearchand avoids the thread-pool round-trip. - mem0 errors are swallowed — rag7's memory nodes catch
exceptions silently so memory hiccups never break the QA path.
Check
state.tracefor missingread_memoryevents when debugging.
Combining with checkpointer
mem0_memory= is orthogonal to checkpointer=. The checkpointer
persists graph state per thread_id (resume a conversation);
mem0 persists extracted facts per user_id (carry preferences
across sessions). Pass both for full coverage.
Long-term memory (memory_store)
Cross-thread memory that persists facts across different conversations and users. After each answer the agent writes a Q&A summary; before each retrieval it reads relevant past exchanges and uses them as context.
from rag7 import init_agent
from langgraph.store.memory import InMemoryStore
rag = init_agent(
"docs",
model="openai:gpt-5.4",
backend="qdrant",
backend_url="http://localhost:6333",
memory_store=InMemoryStore(),
)
# Scope memories to a user with user_id
config = {"configurable": {"user_id": "alice"}}
rag.invoke("I prefer answers in German.", config=config)
rag.invoke("What is hybrid search?", config=config)
# Second call remembers the language preference from the first
Combine both for full memory:
from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.store.memory import InMemoryStore
rag = init_agent(
"docs",
model="openai:gpt-5.4",
backend="qdrant",
checkpointer=SqliteSaver.from_conn_string("./memory.db"),
memory_store=InMemoryStore(),
)
config = {"configurable": {"thread_id": "session-1", "user_id": "alice"}}
state = rag.invoke("What is hybrid search?", config=config)
# state.trace includes a 'read_memory' entry showing what was recalled
for step in state.trace:
if step["node"] == "read_memory":
print("Recalled:", step.get("memories"))
For production, replace InMemoryStore with AsyncPostgresStore:
from langgraph.store.postgres import AsyncPostgresStore
store = AsyncPostgresStore.from_conn_string("postgresql://user:pass@localhost/mydb")
await store.setup() # creates tables on first run
rag = init_agent("docs", model="openai:gpt-5.4", memory_store=store)
When to use memory vs history
history= on chat() |
checkpointer= |
memory_store= |
mem0_memory= |
|
|---|---|---|---|---|
| Scope | Single session | Per thread | Per user (cross-thread) | Per user (cross-thread) |
| What's stored | Answer text | Full graph state | Raw Q&A strings | Extracted facts |
| Deduplication | — | — | No | Yes (LLM-based) |
| Survives restarts | No | With SQLite/Postgres | With Postgres store | With mem0 store |
| Use case | Simple multi-turn | Resumable chatbots | Basic long-term context | Smart user preferences |
| Config key | (none) | thread_id |
user_id |
user_id |
Tip
Combine all for full coverage: history= for the current turn, checkpointer= to resume the thread, mem0_memory= to recall extracted facts from previous sessions.