Thursday, May 15, 2025

The most proficient open-source AI model could improve AI agents

Share

The most proficient open-source AI model with visual capabilities could lead more developers, researchers, and startups to build AI agents that can perform useful tasks on your computers.

Published today by the Allen Institute for AI (Ai2) Multimodal Open Language Modelor Molmo, can interpret images as well as converse via a chat interface. This means it can understand a computer screen, potentially helping an AI agent perform tasks such as browsing the web, navigating file directories, and creating documents.

“With this release, many more people will be able to implement the multimodal model,” he says. Ali FarhadiCEO of Ai2, a research organization based in Seattle, Washington, and a computer scientist at the University of Washington. “This should be an enabler for the next generation of applications.”

So-called AI agents are being widely touted as the next large thing in AI, with OpenAI, Google, and others racing to develop them. Agents have become a buzzword lately, but the grand vision is for AI to go far beyond chatting to reliably take sophisticated and sophisticated actions on computers when instructed to do so. That capability has yet to materialize on any scale.

Some powerful AI models already have visual capabilities, including GPT-4 from OpenAI, Claude from Anthropic, and Gemini from Google DeepMind. These models can be used to power some experimental AI agents, but they are hidden from view and accessible only through a paid application programming interface, or API.

Meta has released a family of AI models called Llama under a license that restricts their commercial operate, but has not yet made a multi-modal version available to developers. Meta is expected to announce several up-to-date products, possibly including up-to-date AI Llama models, at today’s Connect event.

“Having an open-source, multi-modal model means that any startup or researcher who has an idea can try to implement it,” he says. About the presscandidate at Princeton University who works on AI agents.

Press says the fact that Molmo is open source means that developers will be able to more easily tune their agents for specific tasks, such as working with spreadsheets, by providing additional training data. Models like GPT-4 can only be tuned to a narrow extent through their APIs, whereas a fully open model can be modified extensively. “When you have an open source model like that, you have a lot more options,” Press says.

Ai2 is releasing several sizes of Molmo today, including a 70 billion-parameter model and a 1 billion-parameter model that’s diminutive enough to run on a mobile device. The model’s parameter count refers to the number of units it contains for storing and manipulating data, and roughly corresponds to its capabilities.

Ai2 claims that Molmo is as effective as much larger commercial models despite its relatively diminutive size, because it has been carefully trained on high-quality data. The up-to-date model is also fully open source, because unlike Meta’s Llama, there are no restrictions on its operate. Ai2 is also releasing the training data used to create the model, giving researchers more details about how it works.

Releasing powerful models is not without risk. Such models can be more easily adapted to nefarious purposes; for example, we may one day see the emergence of malicious AI agents designed to automate the hacking of computer systems.

Ai2’s Farhadi says Molmo’s performance and portability will allow developers to build more powerful software agents that run natively on smartphones and other portable devices. “The billion-parameter model now performs at or in the league of models that are at least 10 times larger,” he says.

Building useful AI agents, however, may depend on more than just more proficient multimodal models. A key challenge is making the models work more reliably. That may require further breakthroughs in AI reasoning capabilities—something OpenAI is trying to achieve with its latest o1 model, which demonstrates step-by-step reasoning skills. The next step may be to give multimodal models such reasoning capabilities.

For now, Molmo’s launch means AI agents are closer than ever before, and they could soon prove useful even beyond the giants that rule the AI ​​world.

Latest Posts

More News