When tested with a classic psychological assessment, advanced AI models experienced a total breakdown in focus. A new PNAS Nexus study suggests these systems lack the human-like executive control necessary to override automatic responses and maintain complex goals.
No.
https://www.nature.com/articles/d41586-025-02343-x
It’s lying
You know the “DeepMind and OpenAi models” is the hint that the LLM model is not the one doing the math. The LLM provides a hypothesis and the DeepMind model provides grounding or feedback on whether the hypothesis even makes sense or works.
It is totally irrelevant that the model calls tools to do the math. That is still a success.
It’s relevant to what the parent was saying about LLMs. The success of the LLM in using mathematical tools does not contradict what they were saying. To then accuse them of lying because of a misunderstanding is… bad form.
It does the math, it just uses a calculator.