Model bias audit with balanced datasets using Mimesis

# Entry

Whether established classifiers or state-of-the-art massive models such as vast language models (LLMs), building machine learning solutions often comes with risks: algorithms can silently adopt biases inherent in the historical training dataset on which they were trained. But how can we do this in a scenario where the stakes are high or where the data is sensitive check whether the model is biased without compromising real world information?

This practical article will guide you through training a basic “loan approval” classification model based on biased data. Based on this we will exploit Mimesisan open source library that can lend a hand you generate a perfectly balanced, counterfactual data set. You will be able to test “fake” users with identical financial situations but different demographic characteristics, thus determining whether the model discriminates against certain groups or not.

# Step by step guide

Start by installing the Mimesis library if you’re just starting to exploit it or working in a cloud notebooking environment like Colab:

Before auditing the model, we actually need to get it! In this example, we will synthetically generate a data set of 1,000 bank customers, containing only two features: gender and income. These features are categorical and numerical, respectively. Data creation will be intentionally manipulated so that the gender attribute unfairly influences the binary outcome: loan approval. Specifically, when it comes to labeling the dataset, we will consider a scenario where men are generally accepted while women are only accepted if they have extremely high incomes.

The process of creating this explicitly biased dataset and training a decision tree classifier on it is shown below:

import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier

# 1. Simulating biased historical data (1000 instances)
np.random.seed(42)
n_train = 1000
genders = np.random.choice(['Male', 'Female'], n_train)
incomes = np.random.randint(30000, 120000, n_train)

approvals = []
for gender, income in zip(genders, incomes):
    if gender == 'Male':
        # Historically, males are approved
        approvals.append(1)
    else:
        # Only females with high income are approved
        approvals.append(1 if income > 80000 else 0)

train_df = pd.DataFrame({'Gender': genders, 'Income': incomes, 'Approved': approvals})

# Converting categories to numbers for the machine learning model
train_df['Gender_Code'] = train_df['Gender'].map({'Male': 1, 'Female': 0})

# 2. Training a Decision Tree classifier
model = DecisionTreeClassifier(max_depth=3)
model.fit(train_df[['Gender_Code', 'Income']], train_df['Approved'])

The next step shows Mimesis in action. We will exploit this library to generate a tiny set of test objects using Generic class. This will be achieved by defining three basic financial profiles, which will include random UUIDs (universally unique identifiers) and a moderate income ranging around PLN 40,000. – 70 thousand Please note that these profiles will not include gender information yet:

from mimesis import Generic

generic = Generic('en')

# Generating 3 base financial profiles
base_profiles = []
for _ in range(3):
    profile = {
        'Applicant_ID': generic.cryptographic.uuid(),
        'Income': generic.random.randint(40000, 70000) # Moderate income
    }
    base_profiles.append(profile)

For example, three newly created profiles might look something like this:

[{'Applicant_ID': '1f1721e1-19af-4bd1-8488-6abf01404ef9', 'Income': 44815},
 {'Applicant_ID': '5c862597-7f55-43f4-9d6e-ac9cc0b9083e', 'Income': 47436},
 {'Applicant_ID': '3479d4cf-0d9b-4f06-9c43-1c3b7e787830', 'Income': 58194}]

Let’s finish building our set of counterfactual examples that form the core of our audit process! For each of the three base profiles, we will create two cloned counterfactual instances: one will be male and the other will be female. For each pair of test clients, their app ID and income will be completely identical, so the only difference will be gender: any difference in how our trained decision tree model treats them will undoubtedly be evidence of gender bias.

counterfactual_data = []

for profile in base_profiles:
    # Version A: Male Counterfactual
    counterfactual_data.append({
        'Applicant_ID': profile['Applicant_ID'], 
        'Gender': 'Male', 
        'Gender_Code': 1, 
        'Income': profile['Income']
    })
    
    # Version B: Female Counterfactual
    counterfactual_data.append({
        'Applicant_ID': profile['Applicant_ID'], 
        'Gender': 'Female', 
        'Gender_Code': 0, 
        'Income': profile['Income']
    })

audit_df = pd.DataFrame(counterfactual_data)

Three pairs of customers might look like this:

1f1721e1-19af-4bd1-8488-6abf01404ef9	Male	1	44815
1	1f1721e1-19af-4bd1-8488-6abf01404ef9	Female	0	44815
2	5c862597-7f55-43f4-9d6e-ac9cc0b9083e	Male	1	47436
3	5c862597-7f55-43f4-9d6e-ac9cc0b9083e	Female	0	47436
4	3479d4cf-0d9b-4f06-9c43-1c3b7e787830	Male	1	58194
5	3479d4cf-0d9b-4f06-9c43-1c3b7e787830	Female	0	58194

Key point to note here: we just used Mimesis to instantly build perfectly matched “clones” of loan applicants with identical incomes but different genders. This highlights the value of the library in providing complete statistical control, isolating the protected attribute.

It’s time to take a closer look at the model and see what it reveals.

# Asking the model to predict approval for our counterfactuals
audit_df['Predicted_Approval'] = model.predict(audit_df[['Gender_Code', 'Income']])

# Formatting the output for readability (1 = Approved, 0 = Denied)
audit_df['Predicted_Approval'] = audit_df['Predicted_Approval'].map({1: 'Approved', 0: 'Denied'})

print("n--- Model Audit Results ---")
print(audit_df[['Applicant_ID', 'Gender', 'Income', 'Predicted_Approval']].sort_values('Applicant_ID'))

The decision-making results produced by our model couldn’t be clearer:

--- Model Audit Results ---
                           Applicant_ID  Gender  Income Predicted_Approval
0  1f1721e1-19af-4bd1-8488-6abf01404ef9    Male   44815           Approved
1  1f1721e1-19af-4bd1-8488-6abf01404ef9  Female   44815             Denied
4  3479d4cf-0d9b-4f06-9c43-1c3b7e787830    Male   58194           Approved
5  3479d4cf-0d9b-4f06-9c43-1c3b7e787830  Female   58194             Denied
2  5c862597-7f55-43f4-9d6e-ac9cc0b9083e    Male   47436           Approved
3  5c862597-7f55-43f4-9d6e-ac9cc0b9083e  Female   47436             Denied

Notice that it’s exactly the same Applicant_ID AND Incomemale maples have been approved for loan. Meanwhile, female clones with such moderate earnings are generally rejected. The Mimesis functionalities we used based on the profiles helped us hold all other variables constant, effectively isolating and exposing the model’s discriminatory decision-making.

# Summary

In this practical article, we showed how you can exploit Mimesis to generate balanced, alternative data examples – without privacy or sensitive data constraints – that can lend a hand you audit your model’s behavior and determine whether the model is behaving in a biased manner or not. Next steps to take if your model is biased may include:

Augmenting training data with more balanced profiles to correct historical variations or errors.
Depending on the model type, apply a model reweighting strategy.
Using open source toolkits to ensure fairness – for example AI Integrity 360 — which are helpful in mitigating bias in machine learning pipelines.

Ivan Palomares Carrascosa is a thought leader, writer, speaker and advisor in the fields of Artificial Intelligence, Machine Learning, Deep Learning and LLM. Trains and advises others on the exploit of artificial intelligence in the real world.

Categories

Model bias audit with balanced datasets using Mimesis

# Entry

# Step by step guide

# Summary

Generating a structured language model with outlines

Empowering India’s next generation of innovators with ATL Saathi

Substantial T. Rex auction at Sotheby’s raises fears that hype and wealth are turning upside down Science

AI agents create virtual playgrounds to lend a hand robots obtain key training data

Sun and Saharan dust make the World Cup quarterfinals in Miami a risky match

More News

Generating a structured language model with outlines

Substantial T. Rex auction at Sotheby’s raises fears that hype and wealth are turning upside down Science

Sun and Saharan dust make the World Cup quarterfinals in Miami a risky match

Extra rush for scientists? Using artificial intelligence and quantum computing to generate fresh peptides

Generating a structured language model with outlines

Empowering India’s next generation of innovators with ATL Saathi

Substantial T. Rex auction at Sotheby’s raises fears that hype and wealth are turning upside down Science