
Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops
Summary
The episode outlines the operational foundations required to run reliable, cost-effective LLM-powered applications, focusing on observability, prompt management, and evaluation workflows. Aman Agarwal presents OpenLit's OTEL-first approach to convert opaque model interactions into stepwise traces, enabling debugging across models, tools, and data stores. He emphasizes common blind spots—runaway token costs, brittle prompt/secret handling, and lack of reproducible experiments—and shows how vendor-neutral standards and centralized collector management (OPAMP) reduce lock-in. The conversation also covers experimentation patterns (multi-model comparisons, routing), closing the loop from evals to prompt/dataset improvements, and trade-offs where OpenLit may not fit (proprietary stacks or hosted SaaS requirements).
Key Takeaways
- 1Runaway token usage and cost are a major operational risk for LLM apps.
- 2Observability and stepwise tracing are essential to understand and debug LLM behavior.
- 3Adopt vendor-neutral, open standards (OpenTelemetry / OPAMP) to avoid lock-in and enable flexible tooling.
- 4Decouple prompt and secret management from application code for reliable, mutable production behavior.
- 5Close the loop with experimentation, automated evals, and routing to continuously improve models and prompts.
Notable Quotes
"We need to be very keen on like logging traces, logging most of the information that it can help us debug the AI usage."
"If you have that (OTEL format), it's a no vendor lock-in support. Basically, any tool would be able to read that, process that and give you output to that."
"We have evaluations right now ... ask LLM to kind of give us the score of a hallucination bias and toxicity."
"Unless and less until you are aware about what model to use for a particular use case you won't be able to develop a particular solution you will just be like playing around with your money and time."