How to Detect Bias in Large Language Models

Research from Wharton’s Sonny Tambe finds that LLMs can make biased hiring decisions that traditional auditing methods might not be able to catch.

As AI becomes a fixture in hiring, evaluation, and policy decisions, a new study funded by the Wharton AI & Analytics Initiative offers a rigorous look at a critical question: Do race and gender shape how large language models (LLMs) evaluate people? If so, how can we tell?
The answer, according to Prasanna (Sonny) Tambe, faculty co-director of Wharton Human AI Research, and others, is complex, and the implications matter for every organization deploying LLMs at scale. Here are the key takeaways you need to know from Tambe’s latest research on LLM bias and auditability.

Bias isn’t just a human problem — it shows up in the code.
Despite their veneer of neutrality, LLMs trained on vast swaths of online data can absorb and replicate human biases. This study shows that when prompted with the application materials of job candidates, LLMs systematically produced different evaluations depending on whether a person was described as Black, Hispanic, Asian, or White, and whether they were male or female, even when everything else was kept the same.
The direction of these biases is not always predictable. For example, the LLMs tested in the study rated women and people of color more favorably than White men, a reversal of traditional discrimination patterns. But researchers caution against assuming this is a “fairness fix.” It may signal overcorrection in post-training intended to correct biases, which can generate its own kind of undesirable effects.

Auditing LLMs requires new methods, not just old metrics.
Traditional evaluation methods weren’t enough to diagnose bias in this study. Adverse impact ratio, a widely used auditing metric, showed some disparities, but the results were too imprecise to draw strong conclusions. That’s why Tambe and his colleagues pioneered a new approach: LLM-based correspondence experiments.
Inspired by methods used to detect discrimination in human hiring, these experiments carefully manipulated résumés and interview transcripts. By changing only names and pronouns to signal race and gender, the team could measure how models respond to applicants with identical qualifications across demographic lines.

The disparities are subtle, but persistent and meaningful.
Using this method across 11 top LLMs from OpenAI, Anthropic, and Mistral, researchers found that women and racial minorities received slightly higher ratings than their White male counterparts. The differences were modest — often just a few percentage points.

These results held even when researchers:
Changed the district context from diverse to predominantly White
Altered the evaluation prompts
Removed interview transcripts, relying on résumés alone.
That robustness suggests the disparities are embedded in how the models were trained or aligned, not just a response to specific prompt wording or context.
Audits must match the use case, and context matters.
The research stresses that LLM bias can’t be fully understood outside its application. A model may behave differently depending on task, prompt, or population. For example, tools used for hiring may need different audits than those used in customer service or credit risk evaluation.
Auditing LLMs isn’t one-size-fits-all. Policymakers and organizations need context-specific audits to understand how these models actually perform in the real world.

LLM audits are essential infrastructure for ethical AI.
This study isn’t just meant to be a warning — it also offers a roadmap. Tambe and his colleagues provide companies, researchers, and regulators with a powerful tool to hold language models accountable in evaluation contexts. In doing so, they help ensure AI deployment aligns with legal standards and social expectations.
As Tambe explains, “what makes this problem urgent is how widespread LLM use already is becoming in organizational workflows. And yet, we don’t yet have robust standards for understanding how these models perform with respect to fairness.”

Bottom line: Don’t deploy LLMs blindly. Audit them.
Organizations are rushing to integrate LLMs into decision-making pipelines. This research is a timely reminder: Even the smartest models aren’t immune to bias. But with the right tools, we can ensure their outputs are just.

KNOWLEDGE WHARTON

How to Detect Bias in Large Language Models

How Can Leaders Adapt to AI?

Glo’s connectivity promise propels Nigeria into faster future

Glo’s connectivity promise propels Nigeria into faster future

Igbobi alumni raise over N1bn in one week as private capital fills education gap

CBN to issue N1.5bn loan for youth led agric expansion in Plateau

How UNESCO got it wrong in Africa

Glo, Dangote, Airtel, 7 others prequalified to bid for 9Mobile acquisition

6 MLB teams that could use upgrades at the trade deadline

Top NFL Draft picks react to their Madden NFL 16 ratings

Paul Pierce said there was ‘no way’ he could play for Lakers

Arian Foster agrees to buy books for a fan after he asked on Twitter

Nigeria targets 100,000bpd output boost amid global supply disruptions

Oil rebounds on Gulf supply fears as Iran denies U.S. talks

Legend Internet, Spectranet set for N80bn merger to create Nigeria’s largest ISP

Unilever Nigeria doubles profit to N32.2bn

Popular News

Igbobi alumni raise over N1bn in one week as private capital fills education gap

CBN to issue N1.5bn loan for youth led agric expansion in Plateau

How UNESCO got it wrong in Africa

Glo, Dangote, Airtel, 7 others prequalified to bid for 9Mobile acquisition

Insurance-fuelled rally pushes NGX to record high

CNN on Nigeria Aviation

CNN on Nigeria Aviation

Edeme Kelikume Interview With Business AM TV

Business A M 2021 Mutual Funds Outlook And Award Promo Video

Recent News

Nigeria targets 100,000bpd output boost amid global supply disruptions

Oil rebounds on Gulf supply fears as Iran denies U.S. talks

Categories

Site Navigation

Welcome Back!

Retrieve your password