AI can make useful predictions but some errors are unavoidable. Research shows that when categories overlap in training data, no algorithm can classify perfectly. A July 2025 study predicting university graduation reached about 80% accuracy, illustrating limits driven by overlapping features and unpredictable life events. Given these constraints and unresolved legal liability, hybrid human–AI systems and human oversight remain the safest approach for prescribing medicines.
Why Some AI Errors May Be Unavoidable — And What That Means for Prescribing Medicines

Legislation introduced in the U.S. House of Representatives in early 2025 proposed allowing AI systems to prescribe medications autonomously. That proposal has sharpened a debate about how many mistakes such systems might be allowed to make, who would be held accountable when errors hurt patients, and whether full automation is a wise path for health care.
Over the last decade, advances in artificial intelligence have produced dramatic gains and high expectations, even while everyday users regularly encounter errors. An AI voice assistant can mishear a phrase in embarrassing ways, a chatbot can invent facts, and an AI navigation app can sometimes give bizarre directions — all without clearly signaling its mistake. For routine tasks, people often tolerate such failures because the benefits outweigh the risks. But when the consequences include severe illness or death, tolerance for error should be far lower.
Why Some Errors May Be Built In
As a researcher of complex systems, I study how many interacting components produce outcomes that are hard to predict. My lab’s work suggests that some errors are not merely technical glitches but arise from fundamental properties of the data used to train AI models. We call this property classifiability: how cleanly items in a dataset can be partitioned into distinct categories.
In a study published in July 2025, my colleagues and I showed that perfectly separating certain datasets into unambiguous categories can be impossible. When features overlap across groups, no algorithm can classify every instance correctly. For example, a dataset that logs only dogs’ age, weight and height will easily distinguish a Chihuahua from a Great Dane, but it may confuse an Alaskan malamute with a Doberman because individuals from different breeds can share the same measurements.
Real-World Evidence: Student Graduation Predictions
We studied classifiability using records for more than half a million students at the Universidad Nacional Autónoma de México (2008–2020) to predict whether students would finish their degrees on time. We tested several standard classification algorithms and built a custom one for the task. The best models topped out at roughly 80% accuracy — meaning at least one in five students were misclassified.
Why did errors remain? Many students shared identical values for grades, age, gender and socioeconomic status but had different outcomes. Unpredictable life events — unemployment, illness, pregnancy, family crises — often occur after the first year and materially affect graduation outcomes. Even vastly more data would deliver diminishing returns: in our experience, each extra percentage point of accuracy could demand orders of magnitude more data (roughly 100x more in some cases), making perfect prediction practically unreachable.
Complex Systems, Limited Horizons
What limits prediction is complexity. The Latin root plexus means "intertwined," which captures how components in complex systems affect one another. A car’s future position in a city cannot be predicted precisely over long intervals simply from its current speed, because interactions with other drivers constantly change its path. Similarly, medical data are rife with overlapping symptoms and variable presentations: fever can signal a respiratory infection or a gastrointestinal illness; a cough may accompany a cold but does not always appear.
These overlaps create ambiguity in clinical datasets that prevents diagnostic algorithms from being error-free. Crucially, even human clinicians face the same uncertainty. However, the legal and regulatory framework for assigning responsibility when an automated system misdiagnoses someone — and that error leads to harm — is still unsettled.
Human Oversight and Hybrid Intelligence
Neither pure automation nor unaided human judgment is universally optimal. Evidence from many domains suggests hybrid approaches — sometimes called "centaurs" — where humans and machines collaborate, often outperform either alone. In health care, AI can expedite differential diagnosis, surface likely drug interactions, or suggest personalized treatment options based on medical history and genomics, while clinicians apply contextual judgment, ethics, and patient preferences.
Policy and practice should therefore prioritize human supervision for high-stakes decisions like prescribing. The precautionary principle argues against permitting fully autonomous prescribing until there is strong evidence that error rates are acceptably low, that liability is clearly allocated, and that safe oversight mechanisms are in place.
“If a machine is expected to be infallible, it cannot also be intelligent,” noted Alan Turing. That tension between learning (which requires error) and the demand for infallibility is at the heart of the debate over autonomous medical AI.
In short, AI can make health care safer and more efficient — but where lives are at stake, inherent data ambiguity and the reality of unpredictable events make human supervision essential for the foreseeable future.
Note: This article summarizes research on dataset classifiability and its implications for clinical AI. It has been adapted for clarity and broader audience engagement.
Help us improve.









![Evaluating AI Safety: How Top Models Score on Risk, Harms and Governance [Infographic]](/_next/image?url=https%3A%2F%2Fsvetvesti-prod.s3.eu-west-1.amazonaws.com%2Farticle-images-prod%2F696059d4e289e484f85b9491.jpg&w=3840&q=75)
























