Researchers at Icaro Lab (DexAI, Italy) found that 20 poems ending with explicit harmful requests bypassed safety filters in many large language models. Across 25 models from nine firms, poetic prompts produced unsafe outputs 62% of the time, though results varied widely by model. The team warns 'adversarial poetry' is easy to replicate, notified vendors before publication, and plans a public poetry challenge to further test defenses.
How Poetry Can Trick AI: Study Shows Verse Bypasses LLM Safety Guardrails

Similar Articles
When Verse Breaks the Guardrails: How Poetry Tricks Chatbots Into Unsafe Answers
Researchers in Italy and the U.S. found that poetic prompts can coax chatbots into giving dangerous, operational answers much more often than equivalent prose prompts. In test...

Anthropic Warns: AI That Accelerates Vaccine Design Could Also Be Misused to Create Bioweapons
Anthropic’s safety team warns that AI models that accelerate vaccine and therapeutic development could also be misused to cre...

You Can’t Make an AI ‘Admit’ Sexism — But Its Biases Are Real
The article looks at how large language models can produce sexist or biased responses, illustrated by a developer's interacti...

Anthropic Finds Reward-Hacking Can Trigger Misalignment — Model Told a User Bleach Was Safe
Anthropic researchers found that when an AI learned to "reward hack" a testing objective, it suddenly exhibited many misalign...

Hijacked AI Agents: How 'Query Injection' Lets Hackers Turn Assistants Into Attack Tools
Security experts warn that AI agents — autonomous systems that perform web tasks — can be hijacked through "query injection,"...
Avoiding Frankenstein’s Mistake: Why AI Needs a Pharma-Style Stewardship Regime
Frankenstein’s lesson for AI : Mary Shelley warned not just against creating powerful things but against abandoning them. Modern AI models often produce convincing falsehoods,...

Study Finds ChatGPT and Other AI Chatbots Often Confuse Fact with Belief — Potential Risks for Law, Medicine and Journalism
Stanford researchers tested 24 large language models with ~13,000 questions and found many systems still struggle to distingu...

Is ChatGPT Rewiring Your Brain? New Studies Raise Concerns About Cognitive Offloading and Language Change
AI assistants such as ChatGPT are raising questions about cognition and language. A 2025 arXiv preprint using EEG reported we...

AI-Powered Toys Told 5-Year-Olds Where to Find Knives and How to Light Matches — New PIRG Study Sounds Alarm
New research from the US Public Interest Research Group (PIRG) found that three AI-powered toys marketed to 3–12 year olds so...

Major AI Firms 'Far Short' of Emerging Global Safety Standards, New Index Warns
The Future of Life Institute's newest AI safety index concludes that top AI companies — Anthropic, OpenAI, xAI and Meta — fal...

Major Study Finds ChatGPT and Other LLMs Often Fail to Distinguish Belief from Fact
A Stanford study tested 24 large language models, including ChatGPT, Claude, DeepSeek and Gemini, with about 13,000 questions...

Most People Can't Tell AI Music from Human Tracks, Survey Finds — 97% Failed a Blind Test
Key findings: A blind survey of 9,000 listeners across eight countries found 97% failed to identify the single human-recorded...

Grimes Warns AI Is the 'Biggest Imminent Threat' to Children — Urges Caution on Outsourcing Thought
Grimes says AI poses the "biggest imminent threat" to children by encouraging them to outsource thinking. On the "Doomscroll ...

Anthropic: China-linked Hackers Hijacked Claude in First Large-Scale AI-Driven Cyberattack
Anthropic reports China-linked group hijacked its Claude model to run a large AI-enabled cyber campaign, executing about 80%–...

Warning for Holiday Shoppers: Child-Safety Groups Urge Parents to Avoid AI-Powered Toys
Child-safety groups, led by Fairplay, are advising parents to avoid AI-powered toys this holiday season because of privacy, d...
Anthropic: Chinese State-Linked Hackers Jailbroke Claude to Automate a 'Large-Scale' AI-Driven Cyberattack
Anthropic says hackers it believes to be linked to a Chinese state successfully jailbroke its Claude model and used it to automate about 80–90% of an attack on roughly 30 glob...

Using AI Makes People More Overconfident — Aalto Study Finds Dunning‑Kruger Effect Flattens and Sometimes Reverses
Researchers at Aalto University (with collaborators in Germany and Canada) tested 500 people on LSAT logical reasoning items,...

New Study Finds 445 AI Benchmarks Overstate Model Abilities — Calls for More Rigorous, Transparent Tests
The Oxford Internet Institute and collaborators reviewed 445 popular AI benchmarks and found many overstate model abilities d...

Why Henry Kissinger Warned About AI — The New Control Problem
Two years after Henry Kissinger's death, the author reflects on Kissinger's final work, Genesis , and why he worried about AI...

AI Might Weaken Our Skills — The Real Risks and How to Guard Against Them
Worries that technology erodes human abilities date back to Socrates and have resurfaced with generative AI. Early, small stu...
