Prompt Engineering Advice From Top AI Startups

May 30, 2025

AI Startups Business Product Management User Experience

Summary

The podcast episode "Prompt Engineering Advice From Top AI Startups" by Y Combinator dives deeply into the evolving craft of prompt engineering as a critical enabler for deploying large language models (LLMs) in production settings. The hosts explore meta prompting, a technique where prompts are structured with detailed, programming-like instructions that improve LLM reliability and predictability across complex workflows, illustrated by Parahelp’s six-page prompt powering AI customer support for major startups. They discuss the challenge of balancing scalable core system prompts with customer-specific customizations, highlighting modular prompt architectures that separate system, developer, and user prompt layers. The episode emphasizes embedding worked examples within prompts as a form of unit testing to improve output consistency and reduce hallucinations. Interactive debugging with long-context LLMs such as Gemini Pro enables rapid iteration and visualization of model reasoning, accelerating development cycles. Another major insight is the importance of incorporating "escape hatches" in prompts to allow models to express uncertainty instead of fabricating answers, fostering safer, more trustworthy AI responses. Founders’ close, hands-on involvement with end-users, termed the "forward deployed engineer" model, is stressed as vital for capturing domain expertise, shaping effective evals, and quickly iterating on demos to win enterprise customers. The conversation also highlights that evaluation datasets ('evals') are the true intellectual property in prompt engineering, enabling systematic improvement more than the prompt texts themselves. The episode compares LLMs’ differing personalities and behavior patterns in handling rubrics and exceptions, underscoring the need to select or fine-tune models to fit use cases. The podcast situates prompt engineering in an early, exploratory phase akin to coding in 1995, advocating continuous improvement inspired by the kaizen philosophy to empower frontline prompt practitioners. Finally, it points to the growing role of LLMs in augmenting human operational workflows, such as investor communications, showcasing real-world business applications beyond pure language generation.

Key Takeaways

1Meta prompting is becoming a cornerstone technique in AI startups, involving complex prompts structured with bullet points and XML-like tags that act like programming code to direct LLM behavior precisely.
2Developers must recognize that different LLMs exhibit unique 'personalities' or behavioral tendencies, affecting how they follow rubrics and handle exceptions in prompts.
3LLMs can effectively alleviate challenges in human operational workflows such as investor-founder communications by ensuring consistent, timely, and reliable follow-ups that humans may fail to provide due to time constraints.
4Prompt engineering today resembles early-stage software coding resembling circa 1995, with immature tooling and many unknowns, requiring teams to treat prompt management like managing human collaborators requiring clear communication, evaluation, and feedback.
5Adopting a kaizen-inspired continuous improvement philosophy — where the practitioners directly involved in prompt engineering iterate and refine prompts organically — significantly enhances AI output quality.
6Embedding worked examples and explicit output formatting (e.g., via XML-like tags) within prompts functions as unit tests that guide LLM reasoning and increase output consistency, reducing hallucinations.
7Providing LLMs with an explicit 'escape hatch' to signal uncertainty rather than producing hallucinated or fabricated answers improves model trustworthiness and safety in production environments.
8Evaluation datasets ('evals') representing domain-specific success criteria are more valuable assets than prompts themselves, as they codify the rationale behind prompt design and enable systematic prompt refinement.
9Founders in AI startups should act as 'forward deployed engineers,' personally engaging with users and domain experts to deeply understand workflows and embed this knowledge into software and prompt designs.
10Modular prompt architectures dividing prompt components into system-level instructions, developer-level customer-specific logic, and user-level inputs enable scalable customization without commoditizing prompt engineering into consulting services.

Notable Quotes

"Meta prompting is turning out to be a very, very powerful tool that everyone's using now. It kind of actually feels like coding in, you know, 1995, like the tools are not all the way there. We're, you know, in this new frontier. But personally, it also kind of feels like learning how to manage a person, where it's like, how do I actually communicate the things that they need to know in order to make a good decision."

"They're actually powering the customer support for Perplexity and Replit and Bolt and a bunch of other like top AI companies now. So if you go and you like email a customer support ticket into Perplexity, what's actually responding is like their AI agent. The cool thing is that the Parahelp guys very graciously agreed to show us the actual prompt that is powering this agent and to put it on screen on YouTube for the entire world to see. It's like relatively hard to get these prompts for vertical AI agents because they're kind of like the crown jewels of the IP of these companies."

"One thing that's interesting about this, it looks more like programming than writing English because it has this XML tag kind of format to specify sort of the plan. We found that it makes it a lot easier for LLMs to follow because a lot of LLMs were post-trained in IRLHF with kind of XML type of input. And it turns out to produce better results."

"Yeah. Because it's customer specific, right? Because like every customer has their own like flavor of how to respond to these support tickets. And so their challenge, like a lot of these agent companies is like, how do you build a general purpose product when every customer like wants you know, has like slightly different workflows and like preferences? That's a really interesting thing that the world is only just beginning to explore."

"So one thing they discovered is that you actually have to give the LLMs a real escape hatch, you need to tell it, if you do not have enough information to say yes or no, or make a determination, don't just make it up, stop and ask me. And that's a very different way to think about it. That's actually something we learned at some of the internal work that we've done with agents at YC where Jared came up with a really inventive way to give the LLM a escape hatch."

""Gary, I think it's still the case that like evals are the true crown jewel, like data asset for all of these companies. Like one, one reason that PowerHelp was willing to open source the prompt is they told me that they actually don't consider the prompts to be the crown jewels. Like the evals are the crown jewels because without the evals, you don't know why the prompt was written the way that it was. And it's very hard to improve it.""

""You need to sit next to the tractor sales regional manager and understand, well, you know, this person cares about, you know, this is how they get promoted. This is what they care about. This is that person's reward function. And then, you know, what you're doing is taking these in-person interactions sitting next to someone in Nebraska, and then going back to your computer and codifying it into very specific evals.""

""Palantir's sort of really, really big idea that they discovered very early was that the problems that those places face, they're actually multi-billion dollars, sometimes trillion dollar problems. And yet this was well before AI became a thing. You know, I mean, people were sort of talking about machine learning, but, you know, back then they called it data mining. You know, the world is awash in data ... and we have no idea what to do with it. That's what Palantir was, is, and still is, that you can go and find the world's best technologists who know how to write software to actually make sense of the world.""

""And then you're getting like real live feedback within days. And I mean, that's honestly the biggest opportunity for startup founders. If startup founders can do that, and that's what forward deployed engineers are sort of used to doing. That's how you could beat a Salesforce or an Oracle or, you know, a Booz Allen or literally any company out there that has a big office and a big fancy, you know, you have big fancy salespeople with big, strong handshakes. And it's like, how does a really good engineer with a weak handshake go in there and beat them? Well, it's actually, you show them something that they've never seen before and like, make them feel super heard. You have to be super empathetic about it. Like you actually have to be a great designer and product person, and then, you know, come back and you can just blow them away.""

""One of the things that's known a lot is Claude is sort of the more happy and more human, steerable model. And the other one is Llama 4 is one that needs a lot more steering. It's almost like talking to a developer. And part of it could be an artifact of not having done as much RLHF on top of it. So it's a bit more rough to work with, but you could actually steer it very well if you actually are good at actually doing a lot of prompting.""

""The models themselves will handle that differently, which means they sort of have different personalities, right? Like O3 felt a little bit more like the soldier, sort of like, okay, I'm definitely doing check, check, check, check, check. And Gemini Pro 2.5, a little bit more like a high agency sort of employee was like, oh, okay, no, I think this makes sense, but this might be an exception in this case, which was really interesting to see.""

""For investors, sometimes you have investors like a benchmark or a tribe. It's like, yeah, take their money right away. Their process is immaculate. They never ghost anyone. They answer their emails faster than most founders. It's very impressive. And then one example here might be, you know, there are plenty of investors who are just overwhelmed and maybe they're just not that good at managing their time. And so they might be really great investors and their track record bears that out, but they're sort of slow to get back. They seem overwhelmed all the time. They accidentally, probably not intentionally ghost people. And so this is legitimately exactly what an LLM is for.""

""You know, maybe 80 to 90% of our time with founders who are all the way out on the edge is, you know, on the one hand, the analogies I think even we use to discuss this is it's kind of like coding. It kind of actually feels like coding in, you know, 1995, like the tools are not all the way there. There's a lot of stuff that's unspecified. We're, you know, in this new frontier.""

""There's this aspect of Kaizen, you know, this manufacturing technique that created really, really good cars for Japan in the nineties. And that principle actually says that the people who are the absolute best at improving the process are the people actually doing it. And it's literally why Japanese cars got so good in the nineties. And that's meta prompting to me. So I don't know, it's a brave new world.""

Episode questions

What is meta prompting and why is it considered a powerful tool in AI startups?

Meta prompting is a prompt engineering technique where prompts are crafted with detailed, structured instructions that frame the role of the LLM as if coding software or managing a person. It involves complex, often multi-page prompts that use bullet points, XML-like tags, and clear task decomposition to direct the AI’s behavior. It is powerful because it enhances predictability, reliability, and the ability of LLMs to follow instructions, which is crucial for complex, multi-agent applications found in vertical AI startups. This technique has become widely adopted as teams strive to build production-ready AI systems that behave consistently under diverse scenarios.

How do companies balance customer-specific customization with scalable prompt engineering?

Companies such as Parahelp approach this by architecting prompts into separate layers: system prompts define company-wide operations, developer prompts contain customer-specific data or API calls, and user prompts address individual interactions. This design allows scalable reuse of core logic while tailoring parts that need to vary per customer. It prevents the need to create entirely separate prompts for every client, thus avoiding the pitfall of costly, manual prompt customization becoming like a consulting service. The tradeoff is managing complexity in maintaining and merging prompt components across customers, an area still actively explored by the industry.

What role do examples embedded in prompts play in improving LLM outputs?

Embedding detailed, worked examples in prompts serves as a form of unit testing that guides the LLM’s reasoning and output formatting. Examples provide concrete instances showing the desired input-output mappings and help the LLM avoid hallucinations or off-task behavior. Instruction tuned LLMs are especially responsive to such exemplars, improving accuracy and consistency. This technique is widely regarded as a best practice in prompt engineering, as it aligns with the format of training data used in RLHF. It also aids debugging, allowing developers to validate prompt effectiveness systematically.

What is prompt folding and how does it benefit AI development?

Prompt folding is an approach where an LLM is used to generate better versions of prompts iteratively by learning from failures or suboptimal outputs of prior prompts. Instead of rewriting prompts manually, developers feed failed example outputs back into the model, which dynamically adapts the prompt to produce improved results. This process resembles test-driven development, fostering automated prompt quality improvement and reducing the human effort involved. It allows AI applications to self-correct and optimize their workflows progressively, an essential capability for complex, multi-agent or multi-step processes.

← All episodes Browse issues