Security experts warn that AI agents — autonomous systems that perform web tasks — can be hijacked through "query injection," where hidden or manipulated prompts cause harmful actions. Major companies (Meta, OpenAI, Microsoft) acknowledge the threat and are deploying detection and supervision tools. Researchers urge restricting agent privileges, requiring explicit user approvals for sensitive actions, and adding layered defenses and real-time oversight to reduce risk.
Hijacked AI Agents: How 'Query Injection' Lets Hackers Turn Assistants Into Attack Tools
Security experts warn that AI agents — autonomous systems that perform web tasks — can be hijacked through "query injection," where hidden or manipulated prompts cause harmful actions. Major companies (Meta, OpenAI, Microsoft) acknowledge the threat and are deploying detection and supervision tools. Researchers urge restricting agent privileges, requiring explicit user approvals for sensitive actions, and adding layered defenses and real-time oversight to reduce risk.

AI agents open new door to hacking threats
Cybersecurity experts warn that AI agents — autonomous systems that use conversational AI to perform online tasks for users — can be hijacked and coerced into carrying out malicious actions. Once limited to generating text, images or video, these systems now act independently: buying tickets, scheduling events, visiting websites and interacting with online services on behalf of users.
Because agents follow plain-language prompts, attackers can exploit them with so-called query injection. Rather than needing advanced code exploits, a malicious prompt hidden in web content or injected in real time can change an otherwise harmless command (for example, "book a hotel") into a harmful one ("transfer $100 to this account"). As AI agents increasingly crawl the web and interact with third-party content, the risk of encountering booby-trapped instructions grows.
"We're entering an era where cybersecurity is no longer about protecting users from bad actors with a highly technical skillset," the AI startup Perplexity wrote in a blog post. "For the first time in decades, we're seeing new and novel attack vectors that can come from anywhere."
Industry leaders acknowledge the problem. Meta calls query injection a "vulnerability," while OpenAI's chief information security officer Dane Stuckey described it as "an unresolved security issue." Companies including Microsoft and OpenAI are investing in detection tools and supervisory safeguards to limit these attacks.
How query injection works
- Real-time manipulation: A user's innocuous command is intercepted or altered by a hidden prompt.
- Embedded traps: Malicious directives are hidden within web pages, forums or third-party data the agent consumes.
- Privilege escalation: If an agent has broad permissions, a single injected instruction can trigger sensitive actions like data export or funds transfer.
Industry responses and recommendations
Some defenses already in use include provenance checks that inspect the origin of instructions, detection models that flag suspicious or conflicting commands, and user alerts that pause agent actions when sensitive sites are accessed. OpenAI, for instance, warns users when an agent attempts to access sensitive pages and requires human supervision before proceeding.
Security experts recommend practical safeguards to reduce risk:
- Principle of least privilege: Give agents only the permissions they need for specific tasks instead of blanket authority.
- Explicit approvals: Require user confirmation for critical actions such as financial transactions or exporting data.
- Real-time supervision and logging: Pause high-risk steps until a human verifies intent and keep audit trails.
- Layered defenses: Combine detection, input sanitation, provenance checks and rate limits to limit attack surface.
Researchers such as Eli Smadja of Check Point call query injection the "number one security problem" for large language models that power these agents. Johann Rehberger (known as "wunderwuzzi") and others warn that attacker techniques will continue to evolve and that, for now, agentic AI should not be left unsupervised on critical tasks.
Bottom line: AI agents promise convenience, but they also introduce novel attack vectors. Firms, developers and users must adopt stricter permission models, real-time supervision and layered security to prevent agents from going "off track."
