Tests
Note: This blog was first published on February 2, 2022. After the article was published in Science on December 8, 2022, we made minor updates to the text to reflect this.
Solving novel problems and setting fresh milestones in competitive programming
Creating solutions to unforeseen problems is second nature to human intelligence – the result of critical thinking based on experience. The machine learning community has made great progress in generating and understanding text data, but progress in solving problems is circumscribed to relatively uncomplicated math and programming problems, or retrieving and copying existing solutions.
As part DeepMind Mission to solve intelligence, we created a system called AlphaCode that writes computer programs at a competitive level. AlphaCode achieved an estimated ranking in the top 54% of programming competition participants by solving fresh problems that require a combination of critical thinking, logic, algorithms, coding and natural language understanding.
Published on the cover of Scienceour article details AlphaCode, which uses transformer-based language models to generate code at an unprecedented scale and then intelligently filters to select a petite set of promising programs.
We checked our results using competitions organized on the website Code forces, a popular platform that hosts regular competitions that attract tens of thousands of participants from around the world who come to test their coding skills. We selected 10 recent competitions for evaluation, each newer than our training data. AlphaCode finished at roughly the level of the competitor’s median, marking the first time that an AI code generation system achieved a competitive level of performance in programming competitions.
To aid others leverage our results, we have published a dataset of competing programming problems and solutions on GitHub, including extensive testing to ensure that programs that pass these tests are valid – a key feature that current datasets lack. We hope this baseline will lead to further innovations in problem solving and code generation.
The problem comes from Codeforces and the solution is generated by AlphaCode.
Common problems include finding ways to place roads and buildings within specific boundaries or creating strategies to win custom board games. Participants are then ranked primarily based on the number of problems they solve. Companies apply these competitions as recruiting tools, and similar problems are common in software engineering hiring processes.
I can confidently say that AlphaCode’s results exceeded my expectations. I was skeptical because even uncomplicated competitive problems often require not only implementing an algorithm, but also (and this is the hardest part) inventing it. AlphaCode has risen to the level of a promising fresh competitor. I can’t wait to see what lies ahead!
Mike Mirzayanov, founder of Codeforces
The problem-solving abilities required to excel in these competitions exceed the capabilities of existing artificial intelligence systems. However, by combining advances in large-scale transformer models (which have recently shown promise in code generation) with large-scale sampling and filtering, we have made significant progress in the number of problems we can solve. We pre-train our model on a selection of public GitHub code and fine-tune it on our relatively petite, competitive development dataset.
During evaluation, we create a huge number of C++ and Python programs for each problem, orders of magnitude larger than in previous work. We then filter, cluster and re-rank these solutions into a petite set of 10 candidate programs that we submit for external review. This automated system replaces the competition’s trial-and-error process of debugging, compiling, passing tests, and final submission.
With Codeforces’ permission, we evaluated AlphaCode by simulating participation in 10 recent competitions. The impressive work of a competitive programming community has created a field in which it is impossible to solve problems by taking shortcuts, such as duplicating solutions seen before or trying every potentially related algorithm. Instead, our model must create novel and engaging solutions.
Overall, AlphaCode ranks around the median competitor. This result, while far from a competition winner, represents a significant leap in AI problem-solving capabilities, and we hope our results will inspire a competitive developer community.
Solving competitive programming problems is a really complex thing, requiring both good coding skills and problem-solving creativity from people. I was very impressed that AlphaCode could make progress in this area, and I was excited to see the model apply its understanding of instructions to create code and drive its random exploration to create solutions.
Petr Mitrichev, software engineer, Google and world-class competitive programmer
For AI to aid humanity, our systems must be able to develop problem-solving capabilities. AlphaCode placed among the top 54% in real-world programming competitions, progress that demonstrates the potential of deep learning models for tasks requiring critical thinking. These models elegantly apply current machine learning to express solutions to problems in code, harking back to the symbolic roots of artificial intelligence from decades ago. And this is just the beginning.
Our research on code generation leaves enormous room for improvement and points to even more invigorating ideas that could aid developers boost their productivity and open up the field to people who don’t currently write code. We will continue this research and hope that further research will result in tools that improve programming and bring us closer to problem-solving artificial intelligence.
See AlphaCode solutions and learn more about the model on the website alphacode.deepmind.com