TODAYINFO
  • finance
  • technology
  • military
  • world
  1. TODAYINFO
  2. technology

Apple's latest research: Existing AI models "are more like memory than real reasoning"

2025-06-08 19:15:48 HKTtodayinfo

IT Home June 8th news, the Apple Machine Learning Research Center published a research paper on June 6th local time, saying that existing AI models do not have real thinking or reasoning skills, but rely on pattern matching and memory, especially for complex tasks.

Apple researchers' existing cutting-edge "large inference models"—such as OpenAI o3-mini, DeepSeek-R1, Anthropic's Claude 3.7 Sonnet Thinking and Google Gemini Thinking—system evaluations were conducted.

The study found that although these models have the ability to generate detailed "thinking chains" and show advantages on moderate-complexity tasks, their inference ability has fundamental limitations: when the complexity of the problem exceeds a specific critical point, the model performance completely collapses to "zero accuracy."

In addition, in the process of model reasoning, even if there is still sufficient inference computing power, the number of tokens they use to "think" will actually decrease with the increase in difficulty, which means that there are fundamental limitations of existing inference methods.

This article "The Illusion of Thinking: Understanding the Advantages and Limitations of Inference Models from the Perspective of Problem Complexity" was written by Parshin Shojaee et al. Research shows that the industry's current evaluation of these models focuses on mathematical and programming benchmarks, focusing on the accuracy of the final answer, but this often ignores the problem of data pollution and does not provide insights into the structure and quality of internal reasoning trajectory.

The researchers used a series of controllable puzzle-solving environments that allow precise manipulation of compositional complexity while maintaining consistency in the logical structure. This allows not only the final answers to be analyzed, but also the internal reasoning trajectory can be explored, thereby gaining a deeper understanding of how these models “think”.

The research team proposed that model performance can be divided into three stages:

low-complexity tasks: traditional big models (IT Home Note: such as Claude-3.7 no-thinking version) perform better;

medium-complexity tasks: Large inference models (LRMs) with thinking mechanisms are more dominant;

high-complexity tasks: both types of models fall into a completely invalid state.

In particular, studies have found that LRMs have limitations in performing precise calculations, unable to show inconsistency when using explicit algorithms and reasoning across different puzzles.

In general, this study not only questions the current paradigm of LRMs based on established mathematical benchmarks, but also emphasizes the need for more meticulous experimental setups to explore these problems. By using a controllable puzzle environment, this study provides profound insights into the capabilities and limitations of linguistic inference models and points the direction for future research.

These findings highlight the advantages and limitations of existing LRMs, raising questions about the nature of these systems’ reasoning, which are of great significance to their design and deployment.”

References:

Latest articles
  • Putin's revenge is too ruthless. Zelensky dares not accept the remains of martyrs when deindustrializing Ukraine. Putin's revenge is too ruthless. Zelensky dares not accept the remains of martyrs when deindustrializing Ukraine. world | 2025-06-08
  • For the first time: NASA Mars Orbiter captures the wonder of "passing through clouds and fog" on the top of the volcano For the first time: NASA Mars Orbiter captures the wonder of "passing through clouds and fog" on the top of the volcano technology | 2025-06-08
  • Ping An Bank responds to deposits and gets LABUBU: Some branches launch new account opening feedback activities Ping An Bank responds to deposits and gets LABUBU: Some branches launch new account opening feedback activities finance | 2025-06-08
  • The 618 war upgrades, JD Supermarket live broadcast challenged the number one lipstick, and the same style in stock will be 10% off The 618 war upgrades, JD Supermarket live broadcast challenged the number one lipstick, and the same style in stock will be 10% off technology | 2025-06-08
  • South Korea's largest opposition party said: In order to normalize the parliament, the chairman of the justice committee should be the opposition party South Korea's largest opposition party said: In order to normalize the parliament, the chairman of the justice committee should be the opposition party world | 2025-06-08
  • Global Nature Day X Museum Zhizhi丨Preview of the award ceremony of youth science popularization activities Global Nature Day X Museum Zhizhi丨Preview of the award ceremony of youth science popularization activities technology | 2025-06-08
  • Pull the background of civilization and build a fertile ground for business. Zhongxiang City sounded the "double improvement" charge Pull the background of civilization and build a fertile ground for business. Zhongxiang City sounded the "double improvement" charge finance | 2025-06-08
  • Weekly Review: The index is about to break through, and the key is still to look at the volume! Weekly Review: The index is about to break through, and the key is still to look at the volume! finance | 2025-06-08
  • The 7 yuan military stocks in the Niu San Xu Kaidong, Morgan and UBS collectively held by them have been trading sideways for 7 years. The 7 yuan military stocks in the Niu San Xu Kaidong, Morgan and UBS collectively held by them have been trading sideways for 7 years. finance | 2025-06-08
  • CCTV's "News Broadcast" follows Suzhou for two consecutive days CCTV's "News Broadcast" follows Suzhou for two consecutive days finance | 2025-06-08
  • China's total TV shipments in May were 2.83 million units, down 2.1% year-on-year China's total TV shipments in May were 2.83 million units, down 2.1% year-on-year technology | 2025-06-08
  • The Sino-US negotiations are imminent, and the US is in chaos! The streets of California become battlefields, Trump is preparing to send troops The Sino-US negotiations are imminent, and the US is in chaos! The streets of California become battlefields, Trump is preparing to send troops world | 2025-06-08
  • After the flight took off, it circled 8 laps and returned. After changing the plane and then flying, it was delayed for more than 7 hours. The official response After the flight took off, it circled 8 laps and returned. After changing the plane and then flying, it was delayed for more than 7 hours. The official response technology | 2025-06-08
  • "Luzhou Environmental Protection Clinic" is officially launched. Enterprises can make an appointment for "physical examination" "Luzhou Environmental Protection Clinic" is officially launched. Enterprises can make an appointment for "physical examination" technology | 2025-06-08
  • US media quoted Bannon as saying: Musk and the US Treasury Secretary fought in the White House corridor, and had a fierce physical conflict US media quoted Bannon as saying: Musk and the US Treasury Secretary fought in the White House corridor, and had a fierce physical conflict world | 2025-06-08

©2025 TODAYINFO. ALL RIGHTS RESERVED.