AI That Ends Busy Work — Hebbia CEO on “Agent Employees”

May 29, 2025

AI Business Product Management User Experience Startups

Summary

The podcast episode features an in-depth conversation with George Sivulka, CEO and founder of Hebbia, a pioneering AI company transforming white-collar workflows by integrating AI as 'agent employees' within organizations. Hebbia’s Matrix platform automates massive volumes of document reading—equivalent to tens of thousands of years of human work annually—delivering near-zero hallucination outputs in highly regulated industries including finance, law, and defense. The episode emphasizes a hybrid workforce model where AI agents are treated as organizational nodes, communicating via tools like Slack and email but requiring active human management through prompting, highlighted as the emergent management skill in AI-powered workplaces. Unlike the prevalent Retrieval-Augmented Generation (RAG) architecture widely used in AI, Hebbia employs a unique Instruction Set Design (ISD) architecture that prioritizes reasoning and minimizes hallucinations by orchestrating multiple large language model (LLM) inferences at runtime. This ‘inference time super scaling’ approach enables Hebbia to process billions of pages reliably, supporting sensitive decision making. The conversation also critiques the AI industry’s overemphasis on vertical specialization and advocates for generalization and adaptability in AI platforms, aligning with meta learning principles. Organizational design discussions draw parallels to Amazon's modular startup structure to stress how integrating AI agents reshapes company hierarchies and workflows into modular, flexible hybrid teams. The episode further addresses the challenges in adopting AI within legacy infrastructures and the importance of blending technical innovation with cultural and managerial transformations, including retraining managers in prompt engineering. Finally, Hebbia’s roadmap points toward building AI capabilities beyond chatbots, focusing on agent-based workflows that unlock new enterprise value, redefine junior knowledge worker roles, and drive competitive advantages in AI-augmented organizations.

Key Takeaways

1Hebbia introduces the concept of 'agent employees', AI entities treated as autonomous organizational nodes integrated alongside human workers. These agents have communication capabilities within enterprise tools like email and Slack, requiring human oversight to align their workflows and iteratively improve performance.
2Rejecting the prevalent Retrieval-Augmented Generation (RAG) architecture, Hebbia employs an Instruction Set Design (ISD) architecture that executes multiple large language model (LLM) inference calls at runtime to improve accuracy and virtually eliminate hallucinations.
3Hebbia quantifies its platform’s impact by processing around 4 to 5 billion pages annually, representing approximately 50,000 human years of reading, dramatically accelerating document analysis in finance and legal sectors.
4Prompting and managing AI agents is emerging as a vital new managerial skill, with effective prompt engineering pivotal to ensuring AI agents perform accurately and iteratively complete tasks through single-step actions.
5Hebbia advocates for generalization over vertical specialization in AI applications, positing that horizontally capable AI platforms adaptable across varied domains yield better long-term utility and flexibility.
6Integrating AI agent employees is expected to catalyze diverse future organizational forms, including fully human, fully AI, and prevalent hybrid organizations, enabled by internet democratization of talent and API-like AI modules.
7Organizational design profoundly influences AI product development and integration, with Hebbia drawing inspiration from Amazon's AWS modular startup model where each offering is managed independently, fostering innovation and product ecosystem coherence.
8Hebbia’s emergence from foundational meta learning research at Stanford enabled it to pivot quickly toward applied enterprise AI, combining deep theoretical insights with pragmatic innovation around inference scaling and multi-agent orchestration.
9Legacy infrastructure and entrenched business practices slow AI adoption despite its potential to contribute more to GDP than human workers within a decade; overcoming this requires simultaneous technological, cultural, and organizational transformations.

Notable Quotes

"You'll actually have hybrid AI and human employees working alongside each other. People that are really good at prompting be the best managers. Intelligence will become too cheap to meet her."

"We’re one of the first people to say, hey, you get way better accuracy from using more large language model calls at runtime. And how AI is already redesigning org charts. From an organizational design perspective, you can start to define the agent employee as another node in your org chart."

"I think that the idea behind the blog post and talking about an agent employee actually stemmed from a way that I think organizational design is starting to change at our customers. And actually likened it to an old organizational trichotomy of remote work where the internet actually came out and it took, you know, it basically decoupled the output of labor from where you were. So you could be all in the same room in New York or you could be across the world and the internet basically democratized access to talent."

"And so the entire essence of the piece was, hey, just as we have remote employees, we have hybrid organizations, fully remote organizations and in-person organizations. You're going to start to have, you know, fully human organizations, actually fully AI organizations like the one person billion dollar startup or, you know, these things that are effectively just APIs. And you'll actually have, and this will be the most common thing, hybrid AI and human employees, agent and human employees working alongside each other."

"And if you are a AI modern organization, you probably are using some sort of chat or RAG application. And how good you are at prompting that AI is how good you are at managing that AI. And instead of, you know, kind of letting the AI go and prosecute a task over and over and over and self-correct, it can only take a single step."

"Amazon is a perfect example of how this happened, how org design shaped what they built. If you look at AWS's offerings, every single offering in that big menu is a different startup altogether with its own GM. And that intentional organizational decision impacts their product and impacts how it works and impacts whether or not those products work together really elegant."

"AI agents will contribute more to GDP than human workers within a decade. How backloaded is that? In other words, what's your sense as somebody who's in the proverbial trenches every day of the reality of agents in the enterprise industry and what they can do and what's overhyped?"

""So our early Matrix product two years ago, we were one of the first people to say, hey, you get way better accuracy from using more large language model calls at runtime. And so we built lots of infrastructure to scale that up. We actually built an agents team before it was even called agents to actually go out and run these larger jobs. And I think that's probably the most interesting research direction moving forward is like, hey, let's go and say this current scaling law of training has slowed down. The scaling law of inference or test time compute is very interesting. Let's double click there.""

""The idea of a context window is that you can think about all of the things in the context window. So as humans, we currently can remember whatever, the last 10 minutes of conversation. But then we also have bits and pieces over our life and our training and our early careers that we've collected that make us really good employees. AI, you've got a system prompt and then whatever, up to a million tokens to jam as much context as you can in there. When you think about what you actually want AI to do, you want it to reason over all of your data. Applications like RAG can jam the context window with as many search results as you want, but it's not going to reason over that data. It'll just search for stuff that exists. By elongating the context window, ideally, you'd be able to connect the dots, find stuff that's there, but also stuff that's not there, stuff that's missing.""

""At the end of the day, the stragglers, the long tail of adopters will still take time. And I think that'll happen in the decade. The change to credit cards happened over five to 10 years. And now this change should just point and click and credit cards happen even faster than that. But there will still be stragglers. It'll still take time.""

""Hebbia isn’t just another chatbot. We built a platform that processes billions of pages with near-zero hallucinations. We killed RAG because it created too many hallucinations. Instead, our ISD architecture runs agent teams that reason over the data rather than just retrieve a few search results. This infrastructure approach is necessary to deliver real-world value to some of the most regulated industries like defense, finance, and law.""

"Matrix was billed as the interface to AGI. What's almost my reality as a user of the product? What do I have in front of me? What does it do for me? If the most important job in the future will be how you actually manage AI agents or how you prompt these things at scale, you can think of Matrix as actually running a bunch of sub-agents or an interface like even a Trello board where you can assign a lot of tasks and then a bunch of agents will do these things."

"One of the things that is a massive fallacy in AI applications today is verticalization as paramount. All of the VCs, a lot of entrepreneurs as well, believe because it's been true for the last 20 years that building a very verticalized piece of software is the only answer. And so you've got plenty of startups that are like, okay, I'm only going to be AI for blank, AI for compliance, or AI for law, or AI for X, Y, or Z."

"Generalization will beat specialization every single time. And what I mean by that is if you, let's say you want an investing agent, your investing agent will become a researching agent, and the researching agent should become a learning agent and then you get to something closer to AGI. Or if you're a legal agent, your legal agent will become maybe like a diligence agent or like something that's very hyper-logical."

"The reason why you as an investor beat the market isn't because you're all using the same chat GPT wrapper prompt. And the reason why you as a lawyer are getting paid $2,000 an hour isn't because, oh, I have the same shortcuts from AI for a law company. It's because you are an expert and you can customize the greatest and the best tools to the way that your firm works, to the way that you actually can find an edge."

← All episodes Browse issues