Data Engineering Podcast

From Data Models to Mind Models: Designing AI Memory at Scale

Feb 22, 2026
Listen Now

Summary

The episode explores agentic memory design — how to make AI agents remember, reason, and learn over time — distinguishing between short-term session memory (hot, low-latency traces) and long-term permanent stores (graph + vector layers). Vas Markovich emphasizes practical engineering trade-offs: latency, storage choices (Redis, Qdrant, LanceDB, Neo4j), multi-tenant isolation, and when simple approaches (MD files, Postgres, prompts) suffice versus when dedicated memory infrastructure is needed. He critiques naive strategies like timestamp decay and one-off summarization, advocating for neuroscience-inspired trace grouping, graph metrics (e.g., centrality), and RL-informed updating to manage relevance. The conversation also covers human-in-the-loop realities, permissioning, tooling patterns (explicit store/retrieve tool calls), and real-world use cases in pharma, logistics, and cybersecurity, finishing with Cognee’s roadmap for session/long-term stores and decision traces.

Key Takeaways

  • 1Agentic memory needs both a fast session store and a durable graph+vector permanent store.
  • 2Simple temporal decay or naive summarization is insufficient for maintaining relevant memories at scale.
  • 3Multi-tenancy and permissioned isolation are critical to prevent memory pollution and protect data.
  • 4Human-in-the-loop and conservative fine-tuning remain essential; continual base-model fine-tuning is costly and often impractical.
  • 5Start minimal and add complexity only when the use case warrants a dedicated memory layer.

Notable Quotes

"Agents by themselves are stateless, right? So they were not designed in such a way in transformers as the architecture and not designed in such a way to preserve any type of a state."

"Waiting for four seconds to get the data from a permanent store is not the latency anymore."

"We decided to do a bit more consolidated approach that combines this neuroscience concepts of traces ... and then model those traces in such a way that we calculate certain types of ... graph metrics like centrality ... and then use those to calculate the scores of centrality importance of that information."

"If you can do it without a memory layer or if you can do it with a prompt ... probably you don't need it."