AI systems will likely never be error-free. Structural limits in training data and the complexity of real-world systems mean some datasets are intrinsically hard to classify. A July 2025 study of over 500,000 UNAM students found top models reached only about 80% accuracy, misclassifying roughly one in five. For high-stakes tasks like prescribing medication, hybrid systems that combine human oversight with AI are the safest path while legal and regulatory frameworks catch up.
Why AI Will Keep Making Mistakes — And Why That Matters for Health Care

Over the past decade, artificial intelligence has advanced rapidly, generating excitement and ambitious claims even as users routinely encounter mistakes. Voice assistants mishear phrases, chatbots invent facts, and navigation systems have led drivers astray — errors the systems themselves often fail to flag.
Why Errors May Be Inherent
Many people accept these shortcomings because AI can make everyday tasks faster or more convenient. But when AI is proposed for high-stakes decisions — such as prescribing medications — the tolerance for error must be much lower. A bill introduced in the U.S. House of Representatives in early 2025 that would have allowed AI systems to prescribe drugs autonomously sparked intense debate about whether that is safe or even feasible.
As a researcher of complex systems, I study how interacting components produce outcomes that are often unpredictable. My lab’s work suggests that some errors are a structural consequence of the data and problems AI is asked to solve. Certain datasets simply cannot be split perfectly into neat categories because real-world categories overlap. That overlap can create a minimum, unavoidable error rate no matter how sophisticated the model becomes.
“If a machine is expected to be infallible, it cannot also be intelligent.” — Alan Turing. Learning requires making and correcting mistakes, and that trade-off lies at the heart of many AI limits.
Evidence From Education Data
In a study my colleagues and I published in July 2025, we examined the property called classifiability — whether a dataset can be cleanly separated into categories. We analyzed records from more than half a million students who enrolled at the Universidad Nacional Autónoma de México (UNAM) between 2008 and 2020 to test whether algorithms could predict who would finish a degree on time.
We tested several standard classification methods and developed a custom algorithm for the task. None were perfect: the best models achieved roughly 80% accuracy, meaning at least one in five students were misclassified. Many students shared identical observable profiles (grades, age, gender, socioeconomic indicators) yet had different outcomes, so no algorithm could make perfect predictions based only on those inputs.
Adding more data often yields diminishing returns: in some settings, improving accuracy by a single percentage point could require orders of magnitude more data (for example, 100×). Moreover, life events that affect outcomes — job loss, illness, pregnancy — can occur after the initial data are collected, so even an infinite dataset of initial conditions would not eliminate uncertainty arising from future events.
Complexity Limits Prediction
What limits prediction is complexity. From the Latin plexus, meaning "intertwined," complexity arises because components interact in ways that create new, emergent behavior. Studying elements in isolation can therefore mislead about the whole system. The same idea applies to traffic: a car’s future position depends not only on its current speed but also on fleeting interactions with other vehicles that cannot be forecast far in advance.
These principles extend to medicine. Different diseases can produce similar symptoms, while the same disease can present differently across patients. A fever might come from a respiratory infection or a gastrointestinal illness; a cough may accompany a cold but not always. Such overlaps make health datasets intrinsically hard to classify without error.
Implications for Autonomous Prescribing
Humans make mistakes too, but when an AI system misdiagnoses or prescribes incorrectly, accountability and legal liability become far murkier. Who is responsible if a patient is harmed — the software developer, the pharmaceutical company, the clinician, the pharmacy, or the insurer? Current regulations and legal frameworks do not yet provide clear answers.
In many high-stakes contexts, hybrid or "centaur" approaches that combine human judgment with machine computation outperform either alone. For example, clinicians can use AI to generate candidate medications tailored to a patient’s history, physiology and genetics while retaining final decision authority. This collaborative model is already being explored in precision medicine.
Given the structural limits to prediction and the potential consequences for patient safety, the precautionary principle argues against allowing AI to prescribe medications entirely without human oversight. Where human health is at stake, human supervision will likely remain necessary for the foreseeable future.
Author: Carlos Gershenson, Binghamton University, State University of New York.
This article is republished from The Conversation under a Creative Commons license.
Conflict of interest: Carlos Gershenson reports no relevant financial interests or affiliations beyond his academic appointment.















