Tuesday, March 10, 2026

Ship speedy, optimize later: The best AI engineers don’t worry about cost – they prioritize implementation

Share

Across industries, rising computing expenses are often cited as a barrier to AI adoption, but leading companies are discovering that cost is no longer a true constraint. Tougher challenges (and ones that many tech leaders are thinking about)? Latency, flexibility and capacity. On Wonderfor example, artificial intelligence adds just a few custom centers; the food delivery and takeout company is much more concerned about cloud capacity with rapidly increasing demands. Recursionfor its part, it focused on balancing small- and larger-scale training and implementations via on-premises clusters and the cloud; this gave the biotech company flexibility for rapid experimentation. Real-world experiences from outdoor companies underscore a broader industry trend: For enterprises using AI at scale, economics are not the key deciding factor – the conversation has moved from how to pay for AI to how quickly it can be deployed and maintained. AI leaders from both companies recently sat down with Venturebeat CEO and Editor-in-Chief Matt Marshall as part of VB’s traveling AI Impact series. Here’s what they shared.

I wonder: Rethink your capacity assumptions

Wonder uses AI to handle everything from recommendations to logistics, but for now, according to CTO James Chen, the AI ​​adds just a few cents to the order. Chen explained that the technology component of ordering a meal costs 14 cents, the AI ​​costs 2 to 3 cents, although that price is “rising rapidly” to 5 to 8 cents. However, this seems almost insignificant compared to the total operating costs. Instead, the 100% cloud-native AI company’s main concern was performance amid rising demand. Chen noted that Wonder was built under the “assumption” (which turned out to be wrong) that there would be “unlimited capacity,” so they could move “super fast” and wouldn’t have to worry about infrastructure management. However, the company has grown significantly over the past few years, he said; as a result, about six months ago, “we started getting little signals from cloud providers saying, ‘Hey, you might need to consider moving to region two,'” because they were running out of processor capacity or data storage in their facilities as demand grew. It was “very shocking” that they had to switch to plan B earlier than expected. “Of course it is good practice to operate in multiple regions, but we thought it might take another two years,” Chen said.

Which isn’t economically feasible (yet)

Chen noted that Wonder built its own model to maximize conversion rates; the goal is to make modern restaurants available to the right customers as widely as possible. These are “isolated scenarios” in which models are trained over time to be “very, very efficient and very fast.” Chen noted that the best option for Wonder right now is the vast models. However, in the long term, they would like to move to compact models that are extremely customized (via AI agents or concierges) based on their purchase history and even clickstream. “Having these micro-models is definitely the best, but it’s currently very expensive,” Chen noted. “If you try to create one for each person, it’s just not economical.”

Budgeting is an art, not a science

Wonder gives its developers and data scientists as much room for experimentation as possible, and internal teams review usage costs to make sure no one ran the model and “required a huge amount of computation in exchange for a huge bill,” Chen said. The company is trying different things to pivot to AI and operate within margins. “But it’s very difficult to budget because you have no idea,” he said. One challenge is the pace of development; when a modern model comes out, “we can’t just sit there, can we? We have to use it.” Budgeting for the unknown economics of a token-based system is “definitely art versus science.” He explained that a key element of the software development lifecycle is maintaining context when using vast native models. When you find something that works, you can add it to your company’s “context corpus” which can be submitted with every query. It’s a massive deal and it costs money every time. “More than 50% or even 80% of your costs come from re-sending the same information back to the same engine on every request,” Chen said. Theoretically, the more they do, the lower the unit cost. “I know that when the transaction happens, I will pay X cents in tax on each transaction, but I don’t want to limit myself to using technology for all my other creative ideas.”

The “vindication moment” for recursion

Recursion, for its part, has focused on meeting broad-based computing needs via a hybrid infrastructure of on-premises clusters and cloud inference. When it initially wanted to build its AI infrastructure, the company had to opt for its own setup because “cloud providers didn’t have a lot of good offerings,” explained CTO Ben Mabey. “The vindication was that we needed more computing power, so we turned to the cloud providers and they said, ‘Maybe in a year or so.’” The company’s first cluster in 2017 included Nvidia gaming GPUs (1080, launched in 2016); they have since added Nvidia H100 and A100 and utilize a Kubernetes cluster that runs in the cloud or on-premises. Addressing the issue of longevity, Mabey noted: “These gaming GPUs are still in use, which is crazy, right? The myth that a GPU has a lifespan of only three years is definitely not true. The A100s are still at the top of the list, they are the workhorse of the industry.”

Best utilize cases on-premises or in the cloud; cost differences

Recently, Mabey’s team trained a basic model on the Recursion image repository (which consists of petabytes of data and over 200 photos). This and other types of vast training jobs required a “massive cluster” and interconnected multi-node setups. “When we need a fully connected network and access to a lot of data in a parallel file system, we use the premium version,” he explained. On the other hand, shorter tasks run in the cloud. The recursion method involves “preemption” of GPUs and Google Tensor Processing Units (TPUs), which is the process of interrupting running GPU tasks to work on higher priority tasks. “Because we don’t pay attention to the speed of certain inference tasks where we’re transmitting biological data, whether it’s image data, whether it’s sequencing data, whether it’s DNA data,” Mabey explained. “We can say, ‘Give it to us in an hour,’ and it’s OK if it kills work.” From a cost standpoint, moving bulky loads locally is “conservatively” 10 times cheaper, Mabey noted; for five-year total cost of ownership, it is half the cost. On the other hand, for smaller storage needs, the cloud can be “fairly competitive” in terms of cost. Ultimately, Mabey urged technology leaders to step back and determine whether they really want to commit to AI; cost-effective solutions usually require multi-year purchases. “From a psychological point of view, I have seen our colleagues not investing in computers and therefore always paying on demand,” Mabey said. “Their teams are using much less computing power because they don’t want to be exposed to high cloud bills. Innovation is really hampered by people who don’t want to spend money.”

Latest Posts

More News