Join our daily and weekly newsletters to get the latest updates and exclusive content regarding the leading scope of artificial intelligence. Learn more
Deepseek-R1 certainly evoked a lot of emotions and care, especially for the competing OPenai model. That is why we will test them compared to several basic market analysis and market research.
To put models to an equal extent, we used Pression Pro Search, which now supports both O1 and R1. Our goal was to look beyond comparative tests and check whether models can actually perform ad hoc tasks that require the collection of information from the Internet, choosing the right data and performing basic tasks that would require significant manual effort.
Both models are impressive, but they make mistakes when the hints have no specifics. O1 is a bit better in the reasoning of tasks, but the transparency of R1 gives it an advantage in cases (and there will be a lot of them) in which he makes mistakes.
Here is the division of several of our experiments and links to the embarrassment pages where you can view the results yourself.
Calculation of returns from the internet investment
Our first test assessed whether the models could calculate the return on investment (ROI). We considered the script in which the user invested 140 USD in a magnificent seven (Alphabet, Amazon, Apple, Meta, Microsoft, Nvidia, Tesla) The first day of each month from January to December 2024. We asked the model to calculate the portfolio value on the current day.
To perform this task, the model would have to draw information about the price of Mag 7 for the first day of each month, evenly divide the monthly investment into shares (USD 20 per actions), summarize them and calculate the value of the wallet according to the value of stocks on the current day.
In this task, both models failed. O1 returned the list of share prices in January 2024 and January 2025 together with the formula to calculate the value of the portfolio. However, it was not possible to calculate the correct value and basically said that there would be no roi. On the other hand, R1 made a mistake, investing only in January 2024 and calculated returns for January 2025.
However, the processing process of models was compelling. While O1 did not provide many details about how he achieved his results, R1 reasoning was followed He showed that this did not have the correct information, because the engine of taking embarrassment did not obtain monthly data for share prices (many recovery generating applications are not managed because of the lack of skills, but because of bad search). It turned out that this is critical feedback that led us to the next experiment.

Reasoning of the file content
We decided to conduct the same experiment as before, but instead of encouraging the model to download information from the Internet, we decided to provide it in a text file. For this purpose, we copied monthly data in the warehouse for each supply with yahoo! Finance in a text file and handed it over to the model. The file contained the name of each action plus the HTML table, which included the price for the first day of each month from January to December 2024 and the last registered price. The data has not been cleaned to reduce manual effort and check if the model can choose the relevant parts from the data.
Again, both models did not give the right answer. O1 seemed to separate the data From the file, but it has been suggested that the calculations were made by hand in tools such as Excel. Tracking reasoning was very unclear and did not contain any useful information to solve problems. R1 also failed And he did not answer, but tracking reasoning contained a lot of useful information.
For example, it was clear that the model correctly analyzed HTML data for each supply and was able to separate the correct information. He was also able to calculate the investment monthly by month, add them and calculate the final value in accordance with the latest share price in the table. However, this final value remained in the reasoning chain and could not be found in the final response. The model was also mixed up by the government on the Nvidia chart, which meant the division of the 10: 1 shares on June 10, 2024 and finally converted the final value of the wallet.

Again, the real distinguishing feature was not the result, but the ability to examine how the model came to his answer. In this case, R1 provided us with better impressions, enabling us to understand the limitations of the model and how we can reformulate our data and format our data to get better results in the future.
Comparison of data on the Internet
Another experiment we conducted required the model to compare the statistics of four leading NBA centers and determining which of them had the best improvement in the percentage of goals in the field (FG%) from the 2022/2023 season to the seasons 2023/2024. This task required the model to perform multi -stage reasoning at various data points. The monit was that he included Victor Wembanymy, who had just entered the league as a debutant in 2023.
Recovering this prompt was much easier because player statistics are widely reported on the Internet and are usually included in their Wikipedia and NBA profiles. Both models responded correctly (it’s Giannis in case you were compelling), although depending on the sources used, their numbers were slightly different. However, they did not realize that WEMBA did not qualify for comparison and collected other statistics from the time in the European League.
In its answer R1 provided a better division results with a comparative table with links to the sources used to answer. The added context enabled us to correct the poem. After modifying the prompt specifying that we were looking for FG% from NBA seasons, the model correctly excluded WEMBs from the results.

Final verdict
Models of reasoning are powerful tools, but they still have a lot to do before they can be fully trusted by tasks, especially when other elements of the application model Immense Language (LLM) are evolved. From our experiments, both O1 and R1 can still make basic mistakes. Despite demonstrating impressive results, they still need some holding to give exact results.