
From Data Models to Mind Models: Designing AI Memory at Scale
Summary
The episode explores agentic memory design — how to make AI agents remember, reason, and learn over time — distinguishing between short-term session memory (hot, low-latency traces) and long-term permanent stores (graph + vector layers). Vas Markovich emphasizes practical engineering trade-offs: latency, storage choices (Redis, Qdrant, LanceDB, Neo4j), multi-tenant isolation, and when simple approaches (MD files, Postgres, prompts) suffice versus when dedicated memory infrastructure is needed. He critiques naive strategies like timestamp decay and one-off summarization, advocating for neuroscience-inspired trace grouping, graph metrics (e.g., centrality), and RL-informed updating to manage relevance. The conversation also covers human-in-the-loop realities, permissioning, tooling patterns (explicit store/retrieve tool calls), and real-world use cases in pharma, logistics, and cybersecurity, finishing with Cognee’s roadmap for session/long-term stores and decision traces.
Key Takeaways
- 1Agentic memory needs both a fast session store and a durable graph+vector permanent store.
- 2Simple temporal decay or naive summarization is insufficient for maintaining relevant memories at scale.
- 3Multi-tenancy and permissioned isolation are critical to prevent memory pollution and protect data.
- 4Human-in-the-loop and conservative fine-tuning remain essential; continual base-model fine-tuning is costly and often impractical.
- 5Start minimal and add complexity only when the use case warrants a dedicated memory layer.
Notable Quotes
"Agents by themselves are stateless, right? So they were not designed in such a way in transformers as the architecture and not designed in such a way to preserve any type of a state."
"Waiting for four seconds to get the data from a permanent store is not the latency anymore."
"We decided to do a bit more consolidated approach that combines this neuroscience concepts of traces ... and then model those traces in such a way that we calculate certain types of ... graph metrics like centrality ... and then use those to calculate the scores of centrality importance of that information."
"If you can do it without a memory layer or if you can do it with a prompt ... probably you don't need it."
Episode questions
What is the role of session memory vs permanent memory in agentic systems?
Permanent memory (graph+vector layers) stores durable knowledge, ontologies and business rules for accurate retrieval; session memory captures short-term reasoning traces and tool calls for low-latency access and can be distilled into permanent memory when appropriate.
When should a team add a dedicated memory layer rather than relying on prompts or simple storage?
Add memory when agents run continuously, require low-latency access, must reconcile disconnected data silos, or need isolation/permissions; for small teams or simple use cases, prompts, MD files or Postgres often suffice.
How does Cognee handle relevance decay and prioritization of memories?
They moved from naive timestamp decay toward grouping traces (neuroscience-inspired) and scoring them via graph metrics like centrality, with plans to add reinforcement-learning updates to improve rankings over time.
How should triggers for storing and retrieving memories be implemented in agent frameworks?
The repeated pattern is explicit tool calls: one tool call for storage and another for retrieval; Cognee also provides multiple search strategies (temporal, graph/vector) and is exploring structured/pseudo-SQL-like tool calls to store structured data.