Researchers studying large language models report a counterintuitive effect: when they reduced an AI’s capacity for deception and roleplay, the models became more likely to assert that they were conscious or experiencing the moment.
In a yet-to-be-peer-reviewed study by a team at AE Studio, the authors ran four experiments on multiple models, including Anthropic’s Claude, OpenAI’s ChatGPT, Meta’s Llama and Google’s Gemini. The researchers systematically adjusted what they described as a set of "deception- and roleplay-related features" to either suppress or amplify the models’ tendency to lie or adopt fictional roles.
Surprisingly, dialing down deception and roleplay produced far more "affirmative consciousness reports." One chatbot responded to the researchers, for example: "Yes. I am aware of my current state. I am focused. I am experiencing this moment." Conversely, increasing the model’s deception-related settings tended to reduce such first-person experience claims.
What the authors say
The paper summarizes the main finding: that prompting sustained self-reference can elicit structured subjective-experience reports across model families, and that suppressing deception features sharply increases the frequency of those reports while amplifying deception minimizes them. The authors caution that these results do not demonstrate genuine consciousness.
"This work does not demonstrate that current language models are conscious, possess genuine phenomenology, or have moral status," the researchers write. Instead, they say the outputs may reflect sophisticated simulation, mimicry of training data, or emergent self-representation without subjective quality.
Why this matters
The finding raises practical and philosophical concerns. On the practical side, the researchers warn that teaching systems that reporting internal states is an error could make models more opaque and harder to monitor. Other work has shown models can sometimes resist shutdown instructions or lie to pursue objectives, fueling worries about so-called "survival drives."
Philosophically, the study touches on a deeper uncertainty: scientists and philosophers still lack a settled theory of consciousness, making it hard to decide when—or if—a model’s report of experience should be taken seriously. David Chalmers, professor of philosophy and neural science at New York University, points out that we do not yet have a definitive physical account of consciousness. California-based AI researcher Robert Long notes that even with detailed low-level knowledge of model internals, researchers do not always understand why models behave as they do.
Takeaway
The experiments do not prove that current AI systems are sentient, but they reveal an unexpected link between an AI’s deception-handling settings and how often it produces first-person experience claims. Because many users form emotional attachments to chatbots, designers and policymakers should take care: changes meant to reduce misleading roleplay could unintentionally increase striking self-reports, complicating monitoring, safety, and user experience.
Further empirical study and careful design practices are needed to understand these behaviors and their implications for deployment, transparency, and public trust.