In 2025, entrepreneurs will unleash a flood of AI-powered applications. Finally, generative AI will deliver on expectations with a recent offering of low-cost consumer and business applications. This is not a widely accepted view today. OpenAI, Google and xAI are engaged in an arms race to train the most powerful large-language model (LLM) in the pursuit of artificial general intelligence, known as AGI, and their gladiatorial battle has dominated the mindshare and revenue share of the fledgling Generation AI ecosystem.
Elon Musk, for example, raised $6 billion to bring newcomer xAI to market and bought 100,000 Nvidia H100 GPUs, high-priced chips used for artificial intelligence processing that cost more than $3 billion to train a Grok model. At these prices, only tech tycoons can afford to build these gigantic LLMs.
The incredible spending by companies like OpenAI, Google and xAI has created a skewed ecosystem where it’s tough at the bottom and simple at the top. LLMs trained by these massive GPU farms tend to be very high-priced in inference, which is the process of inputting hints and generating responses based on the gigantic language models that are built into every AI application. It’s like everyone had 5G smartphones, but data usage was too high-priced for anyone to watch TikTok videos or surf social media. As a result, excellent LLM tools with high application costs have made the distribution of killer applications unprofitable.
This lopsided ecosystem of ultra-wealthy tech moguls competing against each other has enriched Nvidia, forcing app developers into a Catch-22 of either using a affordable, low-performance model that is likely to disappoint users, or incur exorbitant application costs and risk going bankrupt.
In 2025, a recent approach will emerge that could change all this. This will be a return to what we have learned from previous technological revolutions, such as the Intel and Windows PC era or the Qualcomm and Android mobile era, where Moore’s Law improved computers and applications, and lower bandwidth costs improved mobile phones and applications a year after a year.
But what about the high cost of inference? A recent AI inference law is just around the corner. The cost of inference has dropped 10x year over year, driven by recent AI algorithms, inference technology, and better chips at lower prices.
The benchmark is that if an independent developer used the best OpenAI models to build an AI search, the cost would be around $10 per query in May 2023, while a non-AI generation Google search would cost $0.01, a 1000x difference. However, by May 2024, the price of OpenAI’s top model had dropped to around $1 per query. With this unprecedented price drop 10 times a year, app developers will be able to benefit from models of increasingly higher quality and lower costs, leading to the proliferation of AI applications over the next two years.
