I am by no means a talented programmer, but thanks to a free program called SWE agent, I was I managed to debug and fix a sedate issue related to a misnamed file in various code repositories on the software hosting site GitHub.
I flagged the SWE-agent issue on GitHub and watched as it went through the code and figured out what might be wrong. It correctly determined that the cause of the error was a line pointing to the wrong file location, then went through the project, located the file, and changed the code to make it work correctly. This is something that an inexperienced developer (like me) could spend hours trying to debug.
Many developers are already using AI to write software faster. GitHub’s Copilot was the first integrated development environment to employ AI, but many IDEs will now auto-complete code snippets as a developer starts typing. You can also ask the AI questions about your code or ask it for suggestions on how to improve what you’re working on.
Last summer, John Yang and Carlos Jimenez, two Princeton graduate students, began discussing what it would take for AI to become a true software engineer. That led them and other Princeton students to come up with the idea SWE bencha suite of benchmarks for testing AI tools on a variety of coding tasks. After releasing the benchmark in October, the team developed its own tool, SWE-agent, to master these tasks.
SWE-agent (“SWE” stands for “software engineering”) is one of many much more powerful AI coding programs that go beyond writing lines of code and act as so-called software agents, using the tools needed to organize, debug, and organize software. The Devin startup became popular thanks to video demonstration one of these tools in March.
Ofir Press, a member of the Princeton team, says SWE-bench could assist OpenAI test the performance and reliability of software agents. “This is just my opinion, but I think they’ll release a software agent soon,” Press says.
OpenAI declined to comment, but another source with knowledge of the company’s operations, who requested anonymity, told WIRED that “OpenAI is definitely working on coding agents.”
Just as GitHub’s Copilot showed that huge language models can write code and raise developer productivity, tools like SWE-agent can prove that AI agents can work reliably, starting with code creation and maintenance.
Many companies are testing agents for software development. At the top of the SWE-bench leaderboard, which measures the performance of different coding agents on different tasks, is one of Artificial Intelligence in the Factorystartup and then AutoCodeRoveran open source project by a team at the National University of Singapore.
Substantial players are also getting involved. A software writing tool called Amazon Q is another top performer on SWE-bench. “Software development is much more than just writing,” says Deepak Singh, vice president of software development at Amazon Web Services.
He adds that AWS has used an agent to translate entire software stacks from one programming language to another. “It’s like having a really smart engineer sitting next to you, writing and building the application with you,” Singh says. “I think it’s pretty transformational.”
The OpenAI team recently helped a team at Princeton refine a benchmark for measuring the reliability and effectiveness of tools like SWE-agent, suggesting the company could also refine agents to write code or perform other tasks on a computer.
Singh says many customers are already building convoluted back-end applications using Q. My own experiments with SWE-bench suggest that everyone who codes will soon want to employ agents to augment their coding skills, or risk being left behind.
