The finish releases API Lamas operating 18x faster than OpenAI: Cerebras Partnership provides 2600 tokens per second

Join our daily and weekly newsletters to get the latest updates and exclusive content regarding the leading scope of artificial intelligence. Learn more

Finish A partnership was announced today with Brain systems To power your recent ones Call APIOffering programmers access to application speed up to 18 times faster than established GPU solutions.

Announcement, issued for the inauguration of Meta Llamacon A conference of developers in Menlo Park, positions the company for direct competition OpenaiIN AnthropicAND Google On the rapidly developing market of AI inference services, where developers buy tokens by billions to power their applications.

“Meta chose the brain to cooperate in order to provide an ultra quick conclusion that they must serve programmers through the new API of Llam,” said Julie Shin Choi, marketing director at Ceregras during a press briefing. “We are really in Cerebras, we are very happy that we can announce our first partnership of CSP HIGERSCAler to ensure very quick inference to all programmers.”

Partnership means a formal metal entry into the sale of AI calculations, transforming its popular Llam Open Source models into a commercial service. While the Lama Metals have gathered One billion downloadsUntil now, the company has not offered the first cloud infrastructure for programmers to create applications with them.

“This is very exciting, even without talking about brains,” said James Wang, a higher executive director in Cerebrach. “OpenAI, anthropic, Google – have built a completely new AI business from scratch, which is a business in the AI inference. Developers who build AI applications, buy tokens for billions, sometimes by billions. And they are like new computing instructions that people need to build AI applications.”

The comparative chart shows that Cerebras process Lama 4 to 2648 tokens per second, dramatically ahead of Sambanov’s competitors (747), Groq (600) and GPU from Google and others-explaining the choice of meta equipment for the recent API interface. (Loan: Cerebry)

Breaking the Speed Barrier: How does lama brain top -up models

What distinguishes the META offer is a dramatic boost in speed provided by specialized AI Cerebras systems. The brain system is supplied 2600 tokens per second In the case of Scout Llam 4 compared to about 130 tokens per second for chatgpt and about 25 token Artificial analysis.

“If you just compare API-UP, Gemini and GPT, they are all great models, but they all work at a GPU speed, or about 100 tokens per second,” Wang explained. “And 100 tokens per second is fine in chat, but it is very slow because of reasoning. It is very slow for agents. And people are struggling with it today.”

This speed advantage allows completely recent categories of applications, which were previously impractical, including agents in real time, conversational voice systems with low delay content, interactive code production and immediate multi-stage reasoning-all of which require combining many connections with enormous models that can now be completed in a few seconds, not minutes.

. Call API It is a significant change in the AI Meta strategy, passing primarily from the supplier’s model to become a full -length company of infrastructure AI. By offering the API service, Meta creates a stream of revenues from AI investments, while maintaining involvement in open models.

“The meta is now selling tokens and is great for the American AI ecosystem,” Wang noted during a press conference. “They bring a lot to the table.”

The API will offer tuning and evaluation tools from LAMA 3.3 8B modelenabling programmers to generate data, train and test the quality of their custom models. Meta emphasizes that it will not apply customer data to train its own models, and the models built using the API Llam APs can be transferred to other hosts – clear differentiation from more closed approaches of some competitors.

Cerebras sits a recent META service through a network of data centers located throughout North America, including facilities in Dallas, Oklahoma, Minnesota, Montreal and California.

“All our data centers that serve to apply are currently in North America,” Choi explained. “We will operate the meta with full brain capacity. The load on the work will be balanced in all these different data centers.”

Business arrangement is consistent with what Choi described as a “classic supplier of computers to the Hypersccalera model”, just as NVIDIA provides the equipment of the main cloud suppliers. “The blocks of our calculations that can serve the population of programmers,” she said.

Outside the brains, The meta also announced a partnership with GROQ To ensure quick application options, giving programmers many alternatives with high performance outside a established GPU application.

Entrance to the API of the API of inference with excellent performance indicators can potentially disrupt the established order dominated by OpenaiIN GoogleAND Anthropic. Combining the popularity of your Open Source models with much faster application possibilities, the meta is positioned as a powerful competitor in the commercial AI space.

“Metaus has a unique position with 3 billion users, hyper-scale data centers and a huge programmers’ ecosystem,” according to Cerebras presentation materials. Integration of cerebral technology “helps finish to drag OpenAI and Google in performance by about 20x”.

In the case of the brain, this partnership is the main milestone and validation of the AI specialized hardware approach. “For years we have been building this engine on a shallow scale and we have always known that the first technology rate, but ultimately it must end in part of someone’s hyperskal cloud. This was the last goal from the commercial perspective of the strategy and finally we achieved this milestone,” Wang said.

. Call API It is currently available as a circumscribed preview, and the meta plans to have a wider implementation in the coming weeks and months. Developers interested in access to the ultra -fast application of Llam 4 may demand early access by choosing a brain from the model in API Llam.

“If you imagine a programmer who knows nothing about the brains, because we are a relatively small company, maybe just click two buttons on standard finish software, generate the API key, choose the Cerebry flag, and then suddenly their tokens are processed on a giant engine on a fee scale,” Wang explained. “This type, when we are in the back of the Ecosystem of the entire Meta developer, is simply huge for us.”

The choice of specialized silicon signals something deep: in the next phase of artificial intelligence it is not only about what your models know, but how quickly they can think. In this future, speed is not only a function – that’s it.

Daily observations in matters of business apply with VB daily

If you want to impress your boss, VB Daily is covered by you. We give you an internal measure about what companies do with generative artificial intelligence, from regulatory changes to practical implementation, so you can share insights for the maximum roi.

Read our Privacy Policy

Thanks for the subscription. Check out more VB newsletter here.

There was a mistake.

Categories

The finish releases API Lamas operating 18x faster than OpenAI: Cerebras Partnership provides 2600 tokens per second

Breaking the Speed Barrier: How does lama brain top -up models

The meta is sharpening the privacy policy around Ray-Ban glasses to boost AI training

RSA Cybersecurity Roundup: fresh tools for detecting deep wardrobes, encryption of exfiltration and more

Opeli explains why chatgpt has become too buried

Up-to-date York wants subway cameras to predict “trouble” before that happens

Never again window switching: Agent Mastercard Pay transforms the way companies operate AI search

More News

Never again window switching: Agent Mastercard Pay transforms the way companies operate AI search

Artificial intelligence uses likes to get on your head

Taking care of a return to European power

Donald Trump is already ruining Christmas

The meta is sharpening the privacy policy around Ray-Ban glasses to boost AI training

RSA Cybersecurity Roundup: fresh tools for detecting deep wardrobes, encryption of exfiltration and more

Opeli explains why chatgpt has become too buried

Categories

The finish releases API Lamas operating 18x faster than OpenAI: Cerebras Partnership provides 2600 tokens per second

Breaking the Speed ​​Barrier: How does lama brain top -up models

More News

Breaking the Speed Barrier: How does lama brain top -up models