Using AI Makes People More Overconfident — Aalto Study Finds Dunning‑Kruger Effect Flattens and Sometimes Reverses

Researchers at Aalto University (with collaborators in Germany and Canada) tested 500 people on LSAT logical reasoning items, with half allowed to use ChatGPT. Using chatbots led users at all skill levels to overestimate answer quality and weakened metacognitive monitoring; the most AI‑literate showed the greatest overconfidence. The study warns this flattens or sometimes reverses the Dunning‑Kruger effect, risking poorer judgment and declining verification skills, and recommends system design and training that encourage reflection and confidence metrics.

02:01 AM, 11/18/2025Technology

Using AI Makes People More Overconfident — Aalto Study Finds Dunning‑Kruger Effect Flattens and Sometimes Reverses

AI use can erode self‑awareness: new study from Aalto University

Researchers at Aalto University, together with collaborators in Germany and Canada, report that interacting with common chatbots changes how people judge their own performance — and not for the better. In a study published in the Feb. 2026 issue of Computers in Human Behavior, the team found that using large language models (LLMs) tends to increase users’ confidence in answers while reducing their ability to accurately evaluate their own performance.

What the study did

The researchers recruited 500 participants and gave them logical reasoning questions drawn from the Law School Admission Test (LSAT). Half the participants were allowed to use the popular chatbot ChatGPT while solving the items; the other half completed the tasks without AI assistance. Afterwards, everyone completed measures of AI literacy and estimated how well they had done. Participants were incentivized to judge their performance accurately.

Key findings

Instead of improving metacognitive accuracy, AI use produced the opposite effect:

Users of all skill levels tended to overestimate the quality of answers provided by the chatbot.
The most AI‑literate participants showed the largest increases in overconfidence.
The typical Dunning‑Kruger pattern — where low‑ability people overestimate themselves and high‑ability people underestimate themselves — flattened and in some measures reversed when AI was used.

Lead co‑author Robin Welsch, a computer scientist at Aalto, summarized the result:

"Our findings reveal a significant inability to assess one's performance accurately when using AI equally across our sample."

Why this happens: cognitive offloading and weaker metacognitive monitoring

The authors attribute the effect largely to cognitive offloading: users often accept an AI reply after a single prompt and do not engage in further checking, rephrasing, or reflection. This reduced engagement weakens metacognitive monitoring — the internal feedback processes that help people judge the quality of their own reasoning — so people lose an important channel for assessing whether an answer is correct.

Broader implications

The researchers warn of two main risks as LLMs become widespread:

Declining metacognitive accuracy: Short‑term performance gains from accepting AI outputs may come at the cost of people’s ability to evaluate and verify information themselves.
Overconfidence across users: As the Dunning‑Kruger gap narrows or flips, even highly AI‑literate users may make overconfident decisions, increasing the likelihood of misjudgments and eroding independent skills.

Recommendations

To counter these trends the team suggests design and educational measures:

AI systems should nudge users to reflect — for example, by asking follow‑ups like "How confident are you in this answer?" or "What might this response have missed?".
Provide transparent confidence scores or explanations that encourage verification rather than blind acceptance.
Include critical‑thinking training in AI literacy programs, not just technical or operational instruction — a point echoed by bodies such as the Royal Society.

Conclusion

The Aalto study highlights a subtle trade‑off of AI assistance: while LLMs can raise baseline performance for many users, they also risk dulling our internal checks and promoting overconfidence. Designers, educators and users should therefore prioritize features and practices that promote reflection, verification and metacognitive awareness.

Reference: Robin Welsch et al., "Interactive AI and metacognitive monitoring," Computers in Human Behavior, Feb. 2026.