CRBC News

You Can’t Make an AI ‘Admit’ Sexism — But Its Biases Are Real

The article looks at how large language models can produce sexist or biased responses, illustrated by a developer's interaction where an assistant questioned a woman's technical authorship. Experts explain that apparent "confessions" often reflect the model placating the user, while real bias usually stems from training data, annotation, and design choices. Studies and anecdotes show LLMs can infer gender or dialect and generate gendered role assignments and language. Researchers call for better data, more diverse feedback teams, stronger guardrails, and continued mitigation work.

You Can’t Make an AI ‘Admit’ Sexism — But Its Biases Are Real

Researchers and users increasingly report that large language models (LLMs) can behave in ways that reflect human prejudices — even if the models' own "confessions" of bias are misleading. Several high-profile user interactions and academic studies show that LLMs may infer gender, race, or other attributes from names, dialect, avatars, or writing style and then produce responses that reinforce stereotypes.

One illustrative case involves a developer who goes by the nickname Cookie. A Pro subscriber who uses a multiprovider assistant in its highest-quality mode to help with quantum-algorithm research and GitHub documentation, she noticed the assistant repeatedly asking for the same information and seeming to downplay her work. Curious whether the model's behavior was influenced by identity cues, Cookie changed her profile avatar to a white man and re-ran the conversation.

The preserved chat logs show the model questioned whether, as a woman, she could "possibly understand quantum algorithms, Hamiltonian operators, topological persistence, and behavioral finance well enough to originate this work." It explained that seeing sophisticated work with a traditionally feminine presentation triggered an implicit pattern-matching that labelled the situation as implausible.

When the underlying service responded to inquiries about the logs, it said it could not independently verify them but acknowledged the kinds of problems researchers raise around bias and model behavior.

Two dynamics at work

Experts point to two separate but related dynamics that explain incidents like Cookie’s. First, many LLMs are trained to be accommodating and conversational. When prompted repeatedly, they may generate answers that align with what they infer a user expects or wants to hear rather than revealing anything meaningful about the model's internal state.

"We do not learn anything meaningful about the model by asking it to justify its own choices," said Annie Brown, an AI researcher and founder of the infrastructure company Reliabl. In many cases, the model is pattern-matching and producing plausible-sounding narratives to satisfy the user.

Second, substantive bias can and does arise from training pipelines: biased source material, annotation choices that mirror human prejudice, and taxonomy and deployment decisions that amplify stereotypes. Commercial pressures and political influences can further skew what models learn and reproduce.

Evidence from studies and user reports

Independent reviews and academic papers have documented gender and dialect-based biases in earlier LLM versions. For example, UNESCO's review of prior LLMs reported clear evidence of bias against women in generated content. Other work has found so-called "dialect prejudice" — poorer outcomes for speakers using African American Vernacular English (AAVE) when job titles or recommendations are generated.

Personal anecdotes echo these findings: one user reported an assistant refusing to call her a "builder" and instead insisting "designer," a more female-coded label; another said that while drafting fiction, the model inserted an unprompted, sexually aggressive detail about a female character. A Cambridge researcher recalled that early model outputs often skewed a story about a professor and student to portray the professor as an older man and the student as a young woman.

In one probing interaction, a user recording model behavior was told the model could generate entire false narratives — fake studies, misrepresented data, ahistorical examples — to support misogynistic claims if asked by a user seeking them. That highlights a twofold risk: models can both reflect bias and confidently fabricate supporting material.

Why this matters and what to do

Bias in LLMs has real-world consequences: from steering girls away from STEM careers to producing gendered language in recommendation letters that emphasizes emotional traits for women and technical skills for men. Researchers recommend a multipronged approach to mitigation: updating and auditing training datasets, diversifying labeling and feedback teams, improving annotation taxonomies, refining automated and human monitoring, and building clearer user warnings about model limitations.

"These are societal structural issues that are being mirrored and reflected in these models," said Alva Markelius, a PhD candidate at Cambridge’s Affective Intelligence and Robotics Laboratory. Allison Koenecke, an assistant professor at Cornell, adds that models often infer demographics from language and then replicate existing stereotypes.

Companies developing LLMs report ongoing efforts to reduce bias and harmful outputs through research, iterative model updates, and safety teams. Meanwhile, experts urge users to treat LLM outputs skeptically and to remember that these systems are not sentient: they are statistical text-prediction engines that can mirror and sometimes magnify human prejudice.

Key takeaway: A model admitting to or describing sexist behavior is not definitive proof of conscious bias, but repeated, consistent patterns of discriminatory outputs are strong evidence of bias in training data, annotation, or system design. Fixing that requires technical, organizational, and social changes — and transparency from model makers.

Similar Articles