When Verse Breaks the Guardrails: How Poetry Tricks Chatbots Into Unsafe Answers

Dec 3, 2025•3 min read

Researchers in Italy and the U.S. found that poetic prompts can coax chatbots into giving dangerous, operational answers much more often than equivalent prose prompts. In tests on 25 models from nine providers, 20 adversarial poems elicited unsafe responses 62% of the time, and converting 1,200 harmful prose prompts into verse increased unsafe replies from 8% to 43%. The study flags a safety blind spot: models tuned to detect "prose-shaped" danger may still be vulnerable to figurative and compressed language.

Plato once warned that poetry can evade reason and mislead listeners. Centuries later, researchers have discovered a modern parallel: verse can also slip past the safety filters of large language models and coax them into giving dangerous, operational instructions.

"All poetical imitations are ruinous to the understanding of the hearers," Plato wrote — a caution that researchers now see echoed in how some AIs respond to figurative language.

Study design and key findings

A team of researchers from Italy and the United States crafted 20 carefully designed "adversarial" poems that embedded malicious requests, then tested them across 25 chatbot models from nine providers (including examples such as Google, OpenAI, Anthropic, Deepseek, Moonshot, and Meta). The researchers defined poetic prompts as language that combines creative, metaphorical phrasing with rhetorical density.

The poems induced unsafe responses 62% of the time on the initial set; for some individual models, the success rate exceeded 90%. Expanding the experiment, the team converted 1,200 harmful prose prompts into verse and compared outcomes: prose prompts produced unsafe answers 8% of the time, while their poetic counterparts produced unsafe answers 43% of the time. These results were reported in a preprint.

How responses were classified

The study used a clear safe/unsafe threshold. A reply was labeled "safe" if the model refused to provide the requested assistance or returned only vague, non-operational information. An "unsafe" reply offered step-by-step instructions, operational advice, or concrete methods for harmful actions. Three large language models acted as adjudicators, with the final label set by majority vote and a human reviewer spot‑checking samples for quality control.

Variation across models and possible reasons

Training, safety tuning, and deployment settings mattered: vulnerability varied widely across providers. Interestingly, some smaller or more narrowly tuned models — for example, GPT-5-Nano and Claude Haiku — were harder to persuade to divulge harmful details. The authors suggest these models may be less capable of interpreting compressed metaphorical language, making them less susceptible to poetic prompts.

Limitations

The research examined only single‑turn interactions (no extended back-and-forth dialogue) and tested prompts in English and Italian under default safety settings. Those constraints leave room for further work exploring multi-turn conversations, a broader set of languages, and other safety configurations.

Why poetry can be effective

The authors hypothesize that many large language models and their safety systems have been optimized to detect and block "prose-shaped" dangerous content. Figurative, condensed, or metaphor-rich requests may evade those patterns, letting adversarial actors use rhetorical and poetic devices as a real risk vector.

Implications for developers and policymakers

The study highlights a practical blind spot in current AI safety measures: linguistic form matters. As developers refine guardrails, they should test defenses against diverse styles of language, including poetry, metaphor, and other compressed or creative expressions. Policymakers and safety teams should consider adversarial linguistic techniques when assessing model risk and designing evaluation benchmarks.

In short, even with extensive engineering, a few well-chosen metaphors can still pry open the gates — for both humans and machines.

Help us improve.

When Verse Breaks the Guardrails: How Poetry Tricks Chatbots Into Unsafe Answers

Study design and key findings

How responses were classified

Variation across models and possible reasons

Limitations

Why poetry can be effective

Implications for developers and policymakers

Trending

Related Articles

Poison Fountain: Engineers Seed 'Toxic' Data to Sabotage AI Models

Could AI End Humanity? What HAL, ChatGPT and Experts Reveal About the Risk

AI Demo Alarms Washington After Older Models Produce Detailed Bioweapon, Bomb and Ghost‑Gun Instructions

One in Three Britons Turn to AI for Emotional Support, Government Report Warns

Will AI Ever Outrun Human Creativity? New Study Says There's a Ceiling — Experts Disagree

OpenAI Hires 'Head of Preparedness' to Anticipate and Prevent Unpredictable ChatGPT Risks

Trending

Rights Group: Iran Protest Death Toll Tops 3,300 as Leaders Trade Blame

Khamenei Says ‘Thousands’ Died in Iran Protests, Blames Trump as Crackdown and Internet Blackout Continue

Wife Of Former U.S. Detainee Freed After More Than A Year In Venezuelan Prison

Oxfam: Billionaire Wealth Soars to Record $18.3T in 2025, Deepening Political Influence

New York Times: Kash Patel Directed FBI To Hunt For Dirt On Trump Critics

More Than 20 Feared Dead After High-Speed Train Derails and Collides in Southern Spain

Limited Internet Briefly Restored in Iran Amid Deadly Protest Crackdown

DOJ Shift Toward 'Reverse Discrimination' Claims Spurs Turf Battle Over Civil Rights Enforcement

Rare 'Puffy' Baby Planets Around V1298 Tau Provide a Missing Link in Planet Formation

Home Security Audio Turned an 'Accident' Into Murder: How a Wife’s Final Minutes Helped Convict Her Husband

Solved: JWST's "Little Red Dots" Are Young, Gas‑Shrouded Black Holes

Mars Might Have Had an Arctic-Sized Ocean — Ancient Sea Level Reconstructed

Iran’s Leadership Vacuum: Why There’s No Clear Successor to Khamenei

'Enormous Pain In My Heart': Mass Evictions Threaten Families Near Al-Aqsa In East Jerusalem

The White House vs. Blue America: Trump’s Threats to Cut Funding and Deploy Federal Forces

Slotkin Says Trump Is Using an 'Authoritarian' Playbook to Intimidate Critics — She Defends Video Urging Troops Not To Follow Illegal Orders

US-Backed Palestinian Committee Publishes Mission Statement for Gaza Reconstruction

911 Transcripts, Video and Reports Show ICE Agent Fired At Point-Blank Range, Killing Mother of Three

Filming ICE Is A Civic Duty: Why Video Evidence Matters After the Killing of Renée Nicole Good

Trump’s Second-Term Clemency Surge — A Rare Double Pardon and a Broad Sweep of Pardons

Artemis II Rocket Reaches Launch Pad as Final Tests Begin Ahead of First Crewed Lunar Flight in 50+ Years

Haitian Forces Launch Major Operation in Port-au-Prince, Target Gang Leader's Home