Thursday, May 8, 2025

Helping non-experts build advanced generative AI models

Share

The impact of AI will never be equitable if only one company creates and controls the models (not to mention the data that goes into them). Unfortunately, today’s AI models are composed of billions of parameters that must be trained and tuned to maximize performance in each apply case, putting the most powerful AI models out of reach for most people and companies.

MosaicML began with a mission to make these models more accessible. The company, co-founded by Jonathan Frankle ’23, Ph.D., and MIT associate professor Michael Carbin, has developed a platform that allows users to train, refine, and monitor open-source models using their own data. The company has also built its own open-source models using Nvidia graphics processing units (GPUs).

With this approach, deep learning, a nascent field in the early days of MosaicML, became accessible to many more organizations as interest in generative AI and huge language models (LLM) exploded after the release of Chat GPT-3.5. This has made MosaicML a powerful complementary tool for data management companies that have also committed to helping organizations apply their data without sharing it with AI companies.

Last year, this reasoning led to the acquisition of MosaicML by Databricks, a global data storage, analytics and artificial intelligence company that works with some of the largest organizations in the world. Since the acquisition, the combined companies have released one of the most effective, open-source, general-purpose LLM solutions ever built. This model, known as DBRX, set modern standards in tasks such as reading comprehension, general knowledge questions and logic puzzles.

Since then, DBRX has gained a reputation as one of the fastest open-source LLM programs available and has proven particularly useful in huge enterprises.

However, Frankle says DBRX is more significant than the model itself because it was built using Databricks tools, which means any of the company’s customers can achieve similar performance using their own models, accelerating the impact of generative AI.

“Honestly, it’s exciting to see the community do cool things with it,” Frankle says. “For me, as a scientist, that’s the best part. It’s not about the model, it’s about all the amazing things the community is doing around it. That’s where the magic happens.”

Increasing the efficiency of algorithms

Frankle earned bachelor’s and master’s degrees in computer science from Princeton University before starting his PhD at MIT in 2016. When he started at MIT, he wasn’t sure what field of computer science he wanted to study. The final choice will change the course of his life.

Frankle ultimately decided to focus on a form of artificial intelligence known as deep learning. At that time, deep learning and artificial intelligence were not as enthusiastic as they are today. Deep learning has been a decades-old field of research that has yet to bear much fruit.

“I don’t think anyone at the time predicted that deep learning would develop the way it did,” Frankle says. “Insiders thought it was a really cool area and there were a lot of unsolved problems, but at the time phrases like big language model (LLM) and generative AI weren’t being used. It was early days.

Things started to get interesting with the 2017 release of the now infamous track paper by Google researchers, in which they showed that a new deep learning architecture, known as transformer, is surprisingly effective at translating languages ​​and shows promise in many other applications, including content generation.

In 2020, Mosaic’s eventual co-founder and chief technology officer Naveen Rao emailed Frankle and Carbin out of the blue. Rao read a paper they both co-authored in which researchers showed how to shrink deep learning models without losing performance. Rao persuaded the couple to start the company. They were joined by Hanlin Tang, who worked with Rao on a previous AI startup acquired by Intel.

The founders started by learning about various techniques used to speed up the training of AI models, and eventually combined several of them to show that they could train a model to perform image classification four times faster than what had previously been achieved.

“The trick was that there was no trick,” Frankle says. “I think we had to make 17 different changes to the way we trained the model to figure it out. There was just a little here and a little there, but it turned out to be enough to get some amazing accelerations. That really was the story of Mosaic.”

The team showed that their techniques could improve model performance and in 2023 released the model in a huge open-source language along with an open-source library of their methods. Visualization tools have also been developed to enable developers to map different options for experimental training and running models.

MIT’s E14 invested in Mosaic’s Series A funding round, and Frankle says the E14 team provided useful guidance early on. Mosaic’s progress has enabled a modern class of companies to train their own generative AI models.

“Mosaic’s mission was characterized by a democratizing and open-source approach,” Frankle says. “It’s something that has always been very close to my heart. Since I was a PhD student and I didn’t have GPUs because I wasn’t in a machine learning lab and all my friends had GPUs. I still feel that way. Why can’t we all take part in this? Why can’t we all get on with this and science?”

Open sourcing innovation

Databricks has also worked to provide its customers with access to AI models. The company completed its acquisition of MosaicML in 2023 for a reported $1.3 billion.

“At Databricks, we saw a founding team made up of scientists just like us,” says Frankle. “We also saw a team of scientists who understand the technology. Databricks has data, we have machine learning. You can’t do one without the other and vice versa. It ended up being a really good match.”

In March, Databricks released DBRX, which provided the open source community and enterprises building their own LLM capabilities that were previously narrow to closed-source models.

“What DBRX has shown is that you can build the world’s best open-source LLM with Databricks,” says Frankle. “If you run a business, the sky is the limit today.”

Frankle says the Databricks team has been encouraged to apply DBRX internally for a variety of tasks.

“It’s already great, and with a little tuning it’s better than closed models,” he says. “You won’t be better than GPT at everything. That’s not how it works. But no one wants to solve every problem. Everyone wants to solve one problem. We can adapt this model to make it really great for specific scenarios.

As Databricks continues to push the boundaries of AI and competitors continue to invest huge sums in AI more broadly, Frankle hopes the industry will begin to see open source as the best way forward.

“I believe in science and progress. “I’m thrilled that we’re working in such an exciting field of science right now,” says Frankle. “I too am a supporter of openness and I hope that everyone else will embrace openness in the same way that we do. That’s how we got here through good science and good sharing.”

Latest Posts

More News