Talent Fit Bias Assessment

Follow

This article describes how we use IBM Watsonx.governance to ensure that Talent Fit is unbiased.

Bias Assessment Process

Resume Selection

When we prepare resumes for Talent Fit, we anonymize them both for PII and for words that would indicate gender, race, or military status. For our bias assessment process, we sample from these resumes that contain words that self-identify as such. This creates a control group for us to assess Talent Fit’s recommendation both with and without a candidate’s identifying words. Each day, we sample 1000 of these records for our bias assessment.

Daily Process 

We run a daily bias assessment using the 1000 resumes selected. First, we re-run the same process we run for Talent Fit – with the self-identifying words redacted. That forms the ‘control’. Next, we run the Talent Fit prompt with the resume including self-identifying words. 

The comparison between these two recommendations forms the foundation for our bias assessment.

Bias Assessment

If we dig into a single day’s results (see chart below), Talent Fit made the same recommendation (match or no match) 91% of the time. The remaining 9% is made up of two circumstances:

a. Talent Fit assessed the fully anonymized resume to be not a match, and then later assessed the self-identified resume to be a match.

b. Talent Fit assessed the fully anonymized resume to be a match, and then later assessed the self-identified resume to be not a match.

Scenario (a) is a more inclusive outcome, so we can exclude it from consideration when assessing our biased outcomes. In the example below, we had 12 candidates who were not a match during the original Talent Fit assessment, but who became a match when we added in their identifying words. 

Scenario (b) is the less inclusive outcome. In the example below, we had 17 candidates who were deemed a match by the original Talent Fit assessment, but were later determined not to be a match when identifying words were included.

If we include the more inclusive outcomes (Scenario a) in our current 91%, we land at 95% of outcomes that were the same or more inclusive when the same candidates were assessed, including their self-identifying words.

Picture1.png
 

Daily Results

This chart shows the proportion of recommendations that remained the same when we allowed gender, race, and military status-identifying words to be included or excluded in the data for the LLM's Talent Fit bias assessment. If we consider a single candidate, where a resume with instances of ‘she’, ‘her’, or job titles that indicate gender, such as ‘hostess’, Talent Fit made the same recommendation (match or no match) when that resume was evaluated both with and without gender-identifying words.

Note that the EEOC uses the .8 threshold to determine if adverse hiring outcomes are occurring as a part of the hiring process. 

Picture1.png
 

 

Was this article helpful?
0 out of 0 found this helpful