Systematic reviews are medicine’s gold standard but are slow and resource-intensive. Generative AI tools promise to automate laborious steps—especially abstract screening—and could turn static reports into living, continually updated syntheses. In November 2025, leading organizations issued the RAISE statement urging cautious, validated AI use, but major concerns remain about reproducibility, transparency, incomplete database access, and equity. Addressing those risks—through validation, provenance logging, and broader access—could let AI speed trustworthy evidence into policy and practice.
How AI Is Transforming Systematic Reviews — Faster Evidence, New Risks, and a Path to Trust

When clinicians, regulators, and policymakers need clear answers about health risks—such as whether acetaminophen (Tylenol) causes autism (it does not)—they turn to systematic reviews, widely regarded as medicine’s gold standard. These reviews guide drug prescribing, vaccine policy, and environmental regulation. But producing them is slow and labor-intensive, and a new generation of artificial intelligence (AI) tools promises to speed the work dramatically. Used responsibly, AI could deliver faster, life-saving evidence to practice—but only if it preserves the rigor that makes reviews trustworthy.
What Are Systematic Reviews?
Systematic reviews answer a defined scientific question by locating, appraising, and synthesizing all relevant studies. They are "systematic" because teams follow strict, pre-specified methods for searching databases, selecting studies, assessing quality, extracting data, and reporting findings. When well done, the process is transparent and reproducible—designed to reduce cherry-picking and expose bias.
Why They Take So Long
In practice, systematic reviews are painstaking. After narrowing a precise question, teams commonly have two or three reviewers screen tens of thousands of titles and abstracts. Reviewers must know which databases to search and how to craft queries so they don't miss important studies. Promising records undergo full-text assessment, then data are extracted according to a pre-specified plan and synthesized. In medicine, this process often takes 10–14 months and sometimes years, causing syntheses to lag behind fast-moving fields such as the early COVID-19 literature.
Where AI Fits In
The most time-consuming task—screening thousands of abstracts—is an obvious candidate for automation. Some established tools already embed narrow AI features to prioritize likely-relevant records so reviewers see the best candidates first. But those systems typically reorder human work rather than decide which studies to include.
A new wave of generative-AI tools aims to go further. Services such as Elicit and SciSpace respond to natural-language queries with literature summaries and cited sources, attempting to automate search, selection, and synthesis. Other platforms like Nested Knowledge add AI features to familiar review software. The promise: work that takes months could be reduced to hours or even minutes.
"How we do systematic reviews needs to change. It’s not sustainable going forward," said Ella Flemyng, Head of Editorial Policy and Research Integrity at Cochrane.
Standards, Guidance, and the RAISE Statement
The rapid deployment of generative tools has outpaced formal guidance. Many reviews using AI have not yet appeared in top-tier journals, in part because standards for responsible AI use were lacking. That changed in November 2025, when Cochrane, the Campbell Collaboration, JBI, and the Collaboration for Environmental Evidence issued the joint position paper Responsible Use of AI in Evidence Synthesis (RAISE). The statement is cautious: reviewers remain accountable for outputs, and any AI tool must be validated in the context of a given review before use.
"A lot of these [AI] uses in reviews are still exploratory... We don’t have the evidence base for a blanket roll-out for any of these tools," Flemyng said.
Critics note that RAISE offers high-level principles but little operational detail. Reviewers accustomed to step-by-step procedures still need clearer guidance on validation, logging, and reproducibility for AI components.
Major Benefits
- Speed: Faster screening and synthesis could shorten review timelines from months to days.
- Living Reviews: AI can enable continuously updated "living" syntheses that incorporate new studies in near real time.
- Language Inclusion: Improved multilingual search and translation may allow more non-English studies to be included.
Key Risks and Limitations
Experts described several important concerns that must be addressed before wide adoption:
- Reproducibility: Many generative systems produce different outputs for the same query at different times or with minor prompt changes, undermining a core scientific requirement.
- Opacity: Models and pipelines can be black boxes; tracing exactly how a decision was made or which studies were considered can be difficult or impossible.
- Incomplete Coverage: Some tools rely on freely available papers and do not search subscription databases, meaning the corpus they examine may omit important evidence.
- Equity and Access: Paywalls, geographic restrictions, and costly database subscriptions risk widening the gap between resource-rich institutions and low- and middle-income countries.
- False Confidence: Users may assume an AI search is exhaustive when in fact it is limited by its training data and access to databases.
These problems matter for policy. If a review produced by AI cannot be independently verified or misses large portions of the evidence base, it can mislead decisions about treatments, vaccines, or environmental protections.
How Risks Can Be Mitigated
Several practical steps can reduce risks while preserving benefits:
- Require validation studies comparing AI-assisted outputs to gold-standard human reviews before tool adoption.
- Log prompts, model versions, random seeds, access credentials, and decision rules so outputs are auditable and reproducible.
- Prioritize tools designed for transparency (explainable models, provenance tracking, exportable workflows).
- Advocate for broader open access to research databases and negotiate equitable licensing so AI searches are not biased by paywalls.
- Develop clear, practical guidance (beyond high-level principles) for integrating and validating AI in each review stage.
Real-World Check
The author tested Elicit after hearing a claim by Health and Human Services Secretary Robert F. Kennedy Jr. that the statement "Vaccines do not cause autism" is "not supported by science." Elicit returned a detailed report concluding that studies "consistently found no association between vaccination and autism spectrum disorders." The answer remained stable with rephrased queries. Days later (January 5, 2026), the federal childhood immunization schedule was narrowed. This example shows AI tools can be useful, but it does not eliminate broader concerns about reproducibility, coverage, and transparency.
Conclusion
AI has the potential to revolutionize systematic reviews—speeding their production, enabling living syntheses, and expanding language coverage. But realizing that potential requires rigorous validation, transparency, reproducibility, and fair access to databases. Until then, AI should be used cautiously and with clear documentation so that fast does not mean fragile.
Help us improve.

![Evaluating AI Safety: How Top Models Score on Risk, Harms and Governance [Infographic]](/_next/image?url=https%3A%2F%2Fsvetvesti-prod.s3.eu-west-1.amazonaws.com%2Farticle-images-prod%2F696059d4e289e484f85b9491.jpg&w=3840&q=75)































