EpisodeThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762

Feb 26, 2026

AI User Experience Product Management Hardware Startups

Summary

The episode surveys how LLM research has shifted from raw pretraining scale to post-training and inference-time techniques that boost reasoning and practical performance. Sebastian Raschka emphasizes verifiable-reward training and decoding/ensemble strategies (self-consistency, self-refinement) as central drivers of recent gains in math and coding. The discussion also covers agentic workflows and tooling — local agents, editor integrations, and plugins — as crucial for real-world adoption, while noting reliability and failure propagation remain constraints. Architectural trends (mixture-of-experts, attention efficiency, long-context models) and the limits of fully automatic per-user continual learning round out the conversation, along with practical advice for developers and a preview of Raschka’s book on building reasoning models.

Key Takeaways

1R&D emphasis has moved from brute-force pretraining to post-training and inference-time approaches that improve reasoning.
2Verifiable-reward training delivers strong, measurable gains on domains with deterministic correctness checks (math, coding).
3Inference-scaling (ensembles, self-consistency, self-refinement) reliably boosts accuracy but increases compute and complexity.
4Tooling and wrapper interfaces (local agents, plugins, editor integrations) are as important as the underlying model for real-world adoption.
5Agentic and multi-agent systems show promise but are not yet a universal productivity win due to reliability and failure propagation risks.
6Fully automatic per-user continual learning is currently impractical; controlled or semi-automatic personalization is a more realistic near-term path.

Notable Quotes

"Most of the interesting things are happening now on the post-training front and the reasoning realm."

"OpenClaw (Maltbot) is interesting — it's a local agent people can run on their own computers… it gets people excited and shows genuine use cases like organizing calendar and emails."

"The reasoning training is essentially mainly based on the verifiable rewards which means they are tasks where you can verify the answer so for example in DeepSeek R1 the verifiable rewards were coding and math."

"You can generate multiple answers and that's called self-consistency… and take a majority vote or use another LM to score answers."

"you can't definitely not have a single copy per user that would be way expensive ... everyone would have to have a little super computer at home like a hundred thousand dollar computer"

"if you run the space on correct answers and that type of setting and you just keep the running it is kind of a form of continual learning"

"there is no big alternative to the transformer architecture but there is for example there are things like text diffusion models"

← All episodes Browse issues