Practical AI: Machine Learning, Data Science, LLM

Open source AI to tackle your backlog

Apr 17, 2025
Listen Now

Summary

The podcast episode "Open source AI to tackle your backlog" explores the transformative impact of AI technologies on software development workflows. It begins by delineating two main categories of AI tooling: rapid prototyping tools aimed at non-coders such as designers and product managers that facilitate quick visual application builds, and production-grade tools for experienced developers focused on maintaining quality in large-scale systems. A central focus is on the concept of agentic workflows, where autonomous AI agents execute multi-step coding tasks independently, producing fully tested pull requests and significantly boosting developer productivity. All Hands AI exemplifies this with a single generalist AI agent augmented by specialized micro-agents, a design chosen over multiple specialized agents to reduce complexity and improve task coverage. The episode traces how early tools like GitHub Copilot revolutionized the role of AI from just enhancing autocomplete to enabling AI-driven feature creation and code maintenance. However, challenges such as code quality degradation due to over reliance on AI by junior developers underscore the importance of continuous human oversight. Open source strategies are fundamental to All Hands AI, fostering transparency, community participation, and innovation, while balancing enterprise needs with proprietary features. The conversation also covers practical considerations such as local versus cloud-hosted deployments, model cost and customization, and how advances in language models—especially in accurate code editing—are critical for effective AI assistance. Additionally, experimental supervisory models monitor agent progress to enhance reliability and trust. The podcast concludes with philosophical reflections on AI’s role in democratizing software development and amplifying human potential through software creation, highlighting the ongoing collaboration between academia, community developers, and commercial teams to realize this vision.

Key Takeaways

  • 1AI tooling for software development currently bifurcates into rapid prototyping platforms for non-coders and production-grade assistants for experienced developers, reflecting distinct user needs and project complexities.
  • 2Agentic workflows represent a paradigm shift, empowering autonomous AI agents to handle complex, multi-step software development tasks end-to-end, thereby multiplying developer productivity.
  • 3A single versatile AI agent supplemented by dynamically invoked micro-agents is more effective and maintainable than multiple specialized agents for software engineering workflows.
  • 4The evolution of AI development tools from simple code autocomplete with GitHub Copilot to fully autonomous AI agents creating, testing, and submitting complete pull requests illustrates rapid advancement in AI-assisted programming.
  • 5Without adequate human oversight, especially from experienced developers, heavy reliance on AI agents can lead to code base degradation through duplicated functions, unrefactored code, and overall 'code rot.'
  • 6Open source development is foundational for AI tools like All Hands AI, promoting transparency, community engagement, and collaborative innovation critical to advancing AI-assisted software development.
  • 7Deploying AI agent tooling involves trade-offs between local sandboxed environments and cloud-hosted solutions, each appealing to different developer and enterprise requirements.
  • 8Advancements in language models, particularly in accurate code editing and diff generation exemplified by Claude, are critical enablers for reliable AI-assisted software development at scale.
  • 9Sophisticated supervisory models that monitor AI agent progress and dynamically intervene by stopping, rerouting, or selecting among solution trajectories enhance trustworthiness and efficiency in AI-driven coding workflows.
  • 10Balancing open source accessibility with proprietary enterprise features enables AI tooling platforms to serve diverse markets while sustaining business viability.

Notable Quotes

"So there's a huge variety of tooling out there right now for code generation. So it's a very hard space to navigate. There's two ways I like to bifurcate the space. One is on the one hand, you have a lot of tools that are really meant for like rapid prototyping. There's some really cool stuff happening there. Stuff like lovable, bolt.dev, v0.dev stuff. They tend to be very visual. You're getting like quick prototypes of games or websites, things like that. Some really fun stuff happening there. And it's stuff that's enabling like a whole new set of people to experiment with software development. People who, you know, maybe like designers or product managers don't really have coding experience, maybe have like very little coding experience. They can now build whole apps, which is super cool."

"And then on the other end of the spectrum, you have stuff that is much more oriented towards like senior developers who are shipping production code. They're working on a code base that's going to go and serve millions of users, where you have to be a little bit more careful about what's going on. And also some really cool stuff happening on that end of the spectrum."

"Yeah. So, you know, I think the sort of like step one of integrating AI into your development process was like Copilot, right? Where it's really just plugging into autocomplete, right? We're all familiar with autocomplete. We've been using it for decades. It just got a thousand times better all of a sudden. Instead of just completing a class name, now it's writing like, you know, several lines of code. So that was like a huge boost to my productivity when I adopted Copilot. I was like, yeah, this is amazing."

"One somewhat surprising thing is how ineffective this paradigm ended up being from two perspectives. So the first perspective is it didn't really, and this is specifically for the case of software engineering. There might be other cases where this would be useful. The first is in terms of effectiveness, we found that having a single agent that just has, you know, all of the necessary context, it has the ability to write code, use a web browser to gather information and execute code, ends up being able to do a pretty large swath of tasks without a lot of, you know, kind of specific tooling and structuring around the problems."

"If you don't have somebody looking at the changes that are being proposed and critiquing them and like telling the agent, hey, you know, you added this new function, but we have an existing function that does that. Or, you know, this function is getting too big. Please refactor it. If you're not looking over its shoulder and critiquing its work, the code base will just grow into this monster and you'll have to throw it all away because it's just it's it's beyond repair."

""The first reason is, I think everybody in our, you know, community believes that this is going to be very transformative technology and it may drastically change the way we do software development going forward. And we have two options. We have an option where software development is drastically changed for us by people, by other people, or there's the option where we do it ourselves together. And we believe in the latter approach. Basically, we believe that, you know, if this is going to have a big effect on software development, software developers should be able to be, you know, be able to participate in that.""

""For instance, when you start up a conversation in the cloud, sandbox comes up within like one or two seconds rather than having to wait like 30 seconds or so for it to start up on your local machine. And we also can like connect into GitHub a little bit more seamlessly because we can have an OAuth application where you just like one click log in and, you know, we can access everything. And then the cloud feature that I love more than anything is that if you can, if you leave a comment in like a, like a pull request, like say the tests are failing, you can just say add open hands, please fix the tests. And because we have this long live server in the cloud that can just kick off a conversation automatically and open hands will just commit back to your, to your pull request.""

""Yeah, it's a great question. It's, it's actually very similar to when I started managing folks for one, like you, you just have to get good at thinking like, oh no, I should delegate this. You have to like kind of have that switch flip and like your instinct is like fire up BS code and just like start working. And you have to, you have to have, like have that moment of like, oh no, like this is actually a good thing for the agent to work on or for my employee to work on. Uh, there's also like a little bit of a trust thing, right? Like when I first started managing folks, I wanted to micromanage them. I wanted to like tell them exactly how to do everything. Uh, and it ended up being just more work for both of us and frustrating for them. Once I learned to like trust my employees and know that like, they might not do it exactly like, like I would do it, but like, they're going to do a good job. They might need some coaching and some direction, but building, building that trust over time is, is really important. And it's the same thing with the agent. You know, the agent isn't always right. You do need to, you know, I like to say trust, but verify, right. You need to read its code and like, understand what it's trying to do and where it might've misunderstood something and maybe iterate a few times through, uh, either like a code review in GitHub or by just like chatting with it inside of the application itself. But yeah, very, very similar to that management experience of like learning to kind of take your hands off the keyboard and, uh, be really clear with somebody else about communicating. These are the requirements and, uh, here's how you can improve and things like that.""

""Yeah, it's a good question. For reference, the open hands agent is the largest committer to our code base. So we're definitely in our code base is rather large and complex. So I just checked now and it had 209 commits over the past three months and the next closest contributor had 142. So it's doing pretty well. But there's a bunch of technical pieces that need to go together to make that work.""

""The underlying language model is really important. And fortunately, a lot of the kind of core language model providers are focusing on this. We're also, you know, training language models ourselves. But the underlying language model needs to have a lot of abilities. One kind of boring but extremely important one is the ability to edit files. So about six months ago, this was a major problem for most language models. They were not able to successfully generate a diff between what the file used to, what a portion of the file used to look like and what the new portion of the file would look like. Or they would like add an extra line or duplicate things or stuff like this. So this was a major problem. Claude is very good at this right now. A lot of the other language models are kind of catching up to be good at doing this.""

"So we built this model specifically based on the data set that we've gathered. And that's a really cool product feature because on the one hand, like you can just recognize, like, did we achieve, did we solve the task or did we not and like report back to the user appropriately? We can stop the agent if it's like going off the rails and we can say, hey, this is what's going wrong. Please reroute, you know, using this new strategy. We can also, like, launch several different trajectories towards solving a problem and then, you know, maybe pick one out of the three that we launched and say, okay, this one looks like it's going in the best direction. Keep following this one and kill the other two."

"I mean, so far it's, it's, we're, we're opening everything up, right? We've taken the position that we basically want OpenHands to be as useful as possible to an individual developer running it on their workstation."

"We've taken the position that we basically want OpenHands to be as useful as possible to an individual developer running it on their workstation. Right. You know, we are a company, we do want to make money. And so we are building some closed source features specifically for like large teams who are using OpenHands together. But so far we've taken the position that basically all the research we do and all the like know-how for how the agents do as good a job as possible at solving software tasks, that should be open source, that should be available to every developer."

"A lot of security conscious companies do start with the open source because everything, they can hook it up to Bedrock or, you know, a local model or, you know, basically they can plug into the existing models that the company has approved. We have the cloud offering, which all runs through Anthropic, all runs through our servers, which is a great convenience for a lot of people, but kind of scares off some companies that are very security conscious. But then we can also take basically all the infrastructure we've built for our cloud offering and ship it into somebody else's cloud. So you can run it all inside your AWS environment. You can connect it to Bedrock. So it's basically all configured to stay within your walls."

"One of the things I have in my introductory slides to like a presentation I give about coding agents is looking at the Nobel Prize winners from last year in physics and chemistry. And the Nobel Prize winners in physics were people like Jeff Hinton and the ones in chemistry were people like Demis Hussibus. And, you know, these are obviously the top awards in areas other than computing. And I'm building agents to create software. But the reason why I'm building agents to create software is not because software is the end. It's because software is like a means to an end. And I think like AI has a huge possibility to increase, you know, the impact and the human condition and things like this. But I think the way it's going to do that is through software, basically."