
Summary
The podcast episode "What Is an AI Agent?" hosted by Derrick Harris features partners from the a16z Infra team—Guido Appenzeller, Matt Bornstein, and Yoko Li—delving into the multifaceted topic of AI agents. The discussion begins by grappling with the lack of a clear, unified definition of AI agents, noting that the term spans a spectrum from simple LLM-powered chat systems to aspirational autonomous systems near AGI capabilities. They define agents as systems capable of multi-step reasoning and decision-making using LLM chains embedded in dynamic decision trees, often integrating external tools and data sources. The conversation highlights the architectural similarities between AI agents and traditional SaaS applications, emphasizing that LLM inference is the primary computational bottleneck handled by specialized GPU infrastructure, while the orchestration layer remains lightweight and scalable. The panel explores challenges in pricing AI agents, acknowledging the evolving market where costs are trending toward marginal operational expenses, but buyers increasingly expect pricing tied to the value delivered rather than compute usage alone. A key complexity arises from agents being used both directly by humans and indirectly via inter-agent interactions, complicating traditional usage-based and per-seat pricing models, suggesting potential for innovative hybrid pricing strategies. The episode also touches on user experience considerations, particularly in conversational or companion AIs, where metered pricing risks undermining authenticity and user trust. The debate extends to whether agents will replace or augment human labor, with consensus leaning toward augmentation in the near term, given AI's current limitations in creativity, intent, and autonomy. Examples such as the mobile game Pokémon Go illustrate how application-layer value creation and network effects can justify pricing well above underlying infrastructure costs, a dynamic likely to appear in AI agent markets. Finally, the hosts discuss technical challenges in embedding stochastic LLM outputs into deterministic program control flows, predicting that specialized fine-tuned applications built atop foundational models will be the likely winners. Throughout, the episode balances technical, product, business, and ethical perspectives, revealing both the promise and current ambiguities in the AI agent landscape.
Key Takeaways
- 1AI agents are best understood as systems exhibiting multi-step reasoning and decision-making processes, conceptualized as chained large language model (LLM) calls embedded within dynamic decision trees. Unlike simple chatbots producing singular responses, these agents evaluate inputs sequentially, invoke external tools or data sources, and iteratively determine next steps autonomously. This layered reasoning enables AI agents to perform complex, adaptive tasks beyond mere prompt-response interactions.
- 2There exists significant disagreement about what qualifies as an AI agent, ranging from broad definitions encompassing almost any AI-powered application to narrow, aspirational criteria demanding features like long-term persistence, autonomous learning, and knowledge base integration akin to Artificial General Intelligence (AGI). This definitional ambiguity reflects a continuum rather than a binary classification, contributing to marketing, product, and research confusion.
- 3The pricing of AI agents is in flux due to their dual usage modes—being invoked by humans and by other agents—challenging traditional SaaS pricing models like per-seat or usage-based billing. This hybrid nature suggests that new, flexible pricing paradigms may be necessary to capture the economic value agents provide accurately and sustainably.
- 4As AI infrastructures become more efficient and operational costs, especially those involving GPU-backed LLM deployment, decline, AI agent pricing is shifting from cost-plus towards value-based models that relate price to perceived user savings or ROI. Buyers increasingly exhibit sophistication regarding backend costs and require vendors to justify pricing through demonstrable economic benefits.
- 5AI agents share a software architecture analogous to traditional SaaS applications, comprising an LLM prompt loop orchestrating multi-step reasoning, external tool invocations, and stateless logic for orchestration, while heavy LLM inference runs on specialized GPU infrastructure. This division allows lightweight agent control logic to scale to many instances on standard servers, optimizing resource utilization.
- 6Pricing conversational or companion AI agents on a per-interaction or per-response basis can degrade the user experience by making interactions feel transactional and undermining the perceived authenticity and emotional bonding users expect from companions. Alternative pricing approaches, such as flat or subscription fees, may better preserve trust and engagement.
- 7AI agents primarily augment rather than fully replace human labor at present, often automating portions of workflows and reducing workload without eliminating jobs entirely. This distinction shapes practical deployment strategies and tempers aggressive automation narratives driven by hype.
- 8Due to application-level monopolies and network effects, as exemplified by pricing dynamics in games like Pokémon Go where virtual goods command prices far above their infrastructure costs, AI agent offerings that create unique, defensible user value can justify premium pricing independent of raw compute costs.
- 9A fundamental technical challenge remains in integrating stochastic, variable LLM outputs into deterministic control flows of software, limiting AI agent robustness and reliability. Overcoming this requires novel architectures, which likely favors specialized vertical applications built on or fine-tuned from foundational models rather than generalist foundational model providers.
Notable Quotes
""Most AI applications, and in particular, if we want to call them AI agent applications, they have their sales pitch around, you should pay us X because we're saving you, you know, it's like a classic ROI calculation. Established value. Value-based pricing. But in practice, I think most buyers are actually pretty sophisticated about what's going on under the hood. They know it's pretty simple stuff happening. And so it's like, hey, what does it cost you to run all these GPUs? And we'll pay you some premium over that. And I think that's how a lot of vendors are pricing in practice these days. I mean, long time you'd expect pretty healthy margins, just like in SaaS, right?""
""And I actually don't know where to put agents here. But it could be used by either, right? It could be used by either. An agent could be using an agent or a human could be using an agent.""
""Like you can't charge someone every sentence they talk to their companion, although some of the foundational models do. There are services that will charge you per response. I haven't used them, but they do exist. Yeah. I see. Wow. Okay. So usually it's kind of weird to charge someone like buy tokens of how much they talk to the companion, rather than like a flat monthly fee. It doesn't feel like a true friend. Right. Exactly. It's very transactional.""
""So an agent, we said you have sort of an overall loop with an LLM and prompts that feeds into itself plus external tool use. The LLM itself, you probably want to run a separate infrastructure just because it's highly specialized. You need these vast GPU farms. You can't easily run today's large LLMs in a single GPU. So that's a very specialized infrastructure. That's externally. So the LLM call is external. The state management, well, today in SaaS applications, we do all the state management externally in databases or something like that. So you probably also want to externalize that, right? And then what remains is fairly lightweight logic where I basically, I'm taking context that I retrieve somehow from databases. I assemble that into a prompt. I run the prompt. And then I occasionally invoke tools. Maybe I do that with MCP or something like that with an external server. But the core loop is actually pretty lightweight. And I can run a gazillion agents on a single server. Not a gazillion, but many agents on a single server. I don't need a lot of compute performance for that. Does that sound about right?""
""When you try to actually incorporate the output from an LLM into the control flow of your program, that is actually a very hard, very unsolved problem. That, you know, to your point, there are relatively minor architectural differences today, but this may actually drive more significant changes in the future. I actually think the winners will be the specialists, not the foundational models. It's the people who will build on top of the foundational models or fine-tune the foundational models.""
"So the episode features A16Z partners Guido Appenzeller, Matt Bornstein, and Yoko Lee, and you'll hear me throughout to smooth the transitions between topics. We had a lot of fun recording this, and you'll hear it all, starting with the question of whether there really is a uniform definition of AI agents after these disclosures, including how we should define them, how we should think about the jobs they do, and how the companies building them should think about pricing them."
"I mean, that said, it seems like we're seeing to some degree a specialization of user interfaces in sort of two directions, right? There's, let's say, a cursor or something like that, which really emphasizes the tight loop between the user, the tight feedback loop between the user and the LLM and the thing I'm working on, right? So I want immediate gratification when I do something, you know, and sort of response time matters."
""What's more, the ultimate value of any given agent, which is still to be determined for the vast majority of them, is to what degree they can actually replace or simply augment human workers.""
""We’ve heard this narrative from a couple of startups that they’re basically saying like, hey, you know, we can price the software we’re building much, much higher because this is an agent. So we can go to a company and say you’re replacing a human worker with this agent. The human worker makes, I don’t know, $50,000 a year. And therefore, this agent, you can get for only $30,000 a year. This sounds really compelling from a first glance.""
"There's some people who basically say for something to be a real agent, it has to be something fairly close to AGI, right? It needs to persist over long periods of time. It needs to be able to learn. It needs to have a knowledge base. It needs to work independently on problems."
""Now, on the flip side, we all know that the cost of a product over time converges towards the marginal cost of production. And so today, if I used to use a translator, maybe to translate a page of text, today you use ChatGPT. I do not pay ChatGPT like I paid my translator. I paid a tiny fraction of a cent, which is via the API, which is the actual cost.""
"Look, I kind of think agent is just a word for AI applications, right? Anything that uses AI kind of can be an agent."
""I think part of the ethos and part of the confusion around agents is this idea that we actually will develop human replacements... Before we had AI, we had people called agents. And we still have all kinds of people called agents. And it just doesn’t seem like that’s happening, right? Not in the replacement sense.""
"I think the cleanest definition I've seen of an agent is just something that does complex planning and something that interacts with outside systems. The problem with that definition is all LLMs now do both of those things, right? They have built-in planning in many cases, and they at least consume information, you know, at least from the internet, maybe from some servers that expose information through MCP or some other protocol. So the line really is very blurry."
""I think agent will be multiple functions with LMs in the middle. If you have a low-level agent and I'm giving this low-level agent a task and I get back a task result, it looks a little bit like a classic API call. But with the LM in the middle to make decisions on what to do for that API call. So, but I understood, but that's sort of how this function works internally.""