Friday, February 28, 2025

Industry observers say that GPT-4.5 is a “strange” model, question its price

Share


Join our daily and weekly newsletters to get the latest updates and exclusive content regarding the leading scope of artificial intelligence. Learn more


Opeli announced the GPT-4.5 edition, which previously the CEO of Altman himself would be the last model not to the chain (COT).

The company said that the fresh model “is not a border model”, but it is still its largest enormous language model (LLM), with greater computing efficiency. Altman said that although GPT-4.5 does not consider in the same way as other fresh Openai O1 or O3-Mini offers, this fresh model still offers more human concern.

Industry observers, many of which had early access to the fresh model, found that GPT-4.5 is an compelling transition from OpenAI, reassuring their expectations as to what the model should achieve.

Professor Wharton and commentator AI, Ethan Mollick, published in social media that GPT-4.5 is “a very strange and interesting model”, noticing that it could be “strangely lazy in complex projects”, although he is a robust writer.

Co-founder of Opeli and former head of Tesla Ai Andrej Karpathy noticed that GPT-4.5 made him remember when the GPT-4 came out and saw the potential of the model. IN Post to XKarpathy said that using GPT 4.5 “everything is a bit better and is amazing, but not exactly in a trivial way.”

However, the Carpathians warned that people should not expect a revolutionary influence from the model, because “it is not intended to have a forward in cases where reasoning is critical (mathematics, code, etc.).”

Detailed industry thoughts

Here’s what the Karpathy had to say about the latest GPT iteration in a long post on x:

Today is the release of GPT4.5 by OpenAI. I am looking forward to this ~ 2 years since GPT4 was issued, because this version offers a qualitative measurement of the slope of improvement, which you draw from the scaling of pre -claiming calculations (i.e. simply training a larger model). Each 0.5 version is about 10 times the preliminary calculation. Now remember that GPT1 barely generates a coherent text. GPT2 was a confused toy. GPT2.5 was “omitted” straight in GPT3, which was even more interesting. GPT3.5 crossed the threshold in which it was enough to send as a product and caused “chatgpt moment” OpenAI. And in turn GPT4 also felt better, but I will say that it definitely seemed subtle.

I remember that I was part of the hackaton who tried to find specific hints in which GPT4 surpassed 3.5. There were definitely, but clear and specific examples of “Slam dunk” were hard to find. This … everything was a little better, but in a distributed way. The word choice was a bit more original. Understanding the nuances in the hint has been improved. The analogues made a bit more meaning. The model was a bit more fun. World knowledge and understanding have improved at the edges of sporadic domains. Hallucinations were slightly rarer. The vibrations were simply a little better. It seemed that the water that raises all boats in which everything would improve by 20%. So with the expectations that I started testing GPT4.5, to which I had access for several days, and which was 10 times more charging computing than GPT4. And I feel that I’m in the same hackaton again 2 years ago. Everything is a bit better and amazing, but also not exactly in a insignificant way. Despite this, it is amazing compelling and electrifying, because another qualitative measurement of a certain inclination of the ability that comes “for free” from just the taste of a larger model.

It should be remembered that GPT4.5 was trained only with claim, supervised Finance and RLHF, so this is not yet a model of reasoning. Therefore, this version of the model does not move the model’s capabilities in cases where reasoning is critical (mathematics, code, etc.). In such cases, training with RL and gaining thinking is extremely essential and works better, even if it is at the top of the older base model (e.g. GPT4ISH capabilities). The most contemporary here is full O1. Presumably, Opeli will now want to train with learning to gain reinforcement in addition to GPT4.5 to enable her thinking and pushing the model’s possibilities in these domains.

HOWEVER. We expect to improve tasks that are not bulky, and I would say that these are tasks that are more eQ (unlike IQ) related and bottlenecks by e.g. global knowledge, creativity, creating analogies, general understanding, humor, etc., so these are tasks in which I was most interested during my controls.

Below I thought that it would be nice to emphasize 5 fun/fun hints testing these possibilities, and organizing them in an interactive “LM Arena Lite” here on X, using a combination of images and surveys in the thread. Unfortunately, X does not allow you to take into account both the image and survey in one post, so I have to alternately posts that give the image (showing prompt and two answers from 4 and one from 4.5) and a survey in which people can vote, which is better. After 8 hours I will reveal the identity of which he is a model. Let’s see what will happen 🙂

Thoughts CEO about GPT-4.5

Other early users also noticed the potential in GPT-4.5. Aaron Levie, CEO of Box said on x that his company used GPT-4.5 to lend a hand extract structured data and metadata from the sophisticated content of the enterprise.

AI’s breakthrough is just coming. Openai has just announced GPT-4.5, and we will provide it with clients with a box later today at the AI ​​studio.

We tested GPT4.5 in early access mode with AI frame for unstructured data on advanced enterprises and we observed robust results. Thanks to the AI ​​Enterprise Eval box, we test models for various scenarios, such as the accuracy of questions and answers, the possibilities of reasoning and more. In particular, in order to examine the capabilities of GPT-4.5, we focused on the key area with the significant potential of the company’s influence: structural data extraction or metadata extraction from the sophisticated content of the enterprise.

In the box, we strictly evaluate data extraction models using many corporate class data sets. One of the key data sets that we exploit is CUD, which consists of over 510 commercial legal agreements. As part of this data set, the box identified 17,000 fields, which can be distinguished from unstructured content and evaluate the model based on the extraction of a single shot for these fields (this is our most hard test, in which the model has the opportunity to separate all metadata in one pass compared to many attempts). In our GPT-4.5 tests, it correctly separates 19 percentage points more fields exactly compared to GPT-4O, emphasizing its better ability to handle refined contract data.

Then, to make sure that the GPT-4.5 can cope with the requirements of enterprises in the real world, we assessed its efficiency in relation to a more stringent set of documents, our own set of challenges Box. We chose the subset of sophisticated legal contracts with multimodal content, information about high density and length exceeding 200 pages-to represent some of the most hard scenarios that our clients are facing. In this set of challenges, GPT-4.5 consistently exceeded the GPT-4O in separating key fields with higher accuracy, showing its excellent ability to handle complicated and refined legal documents.

In general, we see robust results from GPT-4.5 for sophisticated enterprise data, which will unlock even more cases of exploit in the enterprise.

Questions about the price and its meaning

Even when early users said that GPT-4.5 was feasible-though slightly lazy-they interviewed its edition.

For example, an outstanding Opeli critic, Gary Marcus, called GPT-4.5 “Noth Burger” to BlueSky.

Warm Take: GPT 4.5 is nickname; GPT-5 still fantasy • scaling data is not a physical law; Almost everything I said was true. • All BS with GPT-5, which we have listened to the last few years: not so true. • Fanboys like Cowen blame users, but the results are simply not what they had.

– – Gary Marcus (@GARYMARCUS.BSKY.Social) 2025-02-27T20: 44: 55.115z

CEO Hughing face Clement Delangue commented This closed GPT4.5 target makes it “meh”.

However, many noticed that GPT-4.5 had nothing to do with its performance. Instead, people asked why OpenAi would have Speak the model so expensive that exploit is almost prohibited But it’s not as powerful as other models.

One user commented X: “So you tell me that GPT-4.5 is worth more than O1, but it doesn’t work so well on comparative tests … It makes sense. “

Other X users Theories have been found that the high cost of the token could be to stop competitors such as Deepseek “to distinguish the 4.5”.

Deepseek became a great competitor against Openai in January, and industry leaders thought the Deepseek-R1 to be as talented as Openi-but more affordable.

Latest Posts

More News