Friday, May 2, 2025

Data enrichment best practices

Share

Building a Responsible Approach to Data Collection with the AI ​​Partnership

Our goal at DeepMind is to ensure that everything we do meets the highest standards of safety and ethics, as outlined in our Operating Principles. One of the most essential places to start is how we collect data. Over the last 12 months we have worked with Artificial Intelligence Partnership (PAI) has carefully considered these challenges and collaboratively developed standardized best practices and processes for the responsible collection of personal information.

Collection of personal data

More than three years ago, we formed the Human Behavioral Research Ethics Committee (HuBREC), a governance group modeled on academic institutional review boards (IRBs) such as those in hospitals and universities, to protect the dignity, rights, and well-being of participants in our research. This committee oversees behavioral research involving experiments with humans as the research subject, such as studying how humans interact with artificial intelligence (AI) systems to make decisions.

In addition to projects involving behavioral research, the AI ​​community is increasingly engaging in “data enrichment” efforts – tasks performed by humans to train and validate machine learning models, such as data labeling and model evaluation. While behavioral research often relies on voluntary participants as research subjects, data enrichment involves people being paid to perform tasks that improve artificial intelligence models.

Such tasks are typically conducted on crowdsourced platforms, often raising ethical issues related to employee pay, well-being, and equity, which may lack the necessary guidance or governance systems to ensure sufficient standards are met. As research labs accelerate the development of increasingly sophisticated models, reliance on data enrichment practices is likely to raise, and with it the need for stronger leadership.

As part of our Operating Principles, we commit to and contribute to best practices in AI security and ethics, including integrity and privacy, to avoid unintended consequences that create risks of harm.

Best practices

Following PAI the latest white paper on Responsible Sourcing of Data Enrichment Services, we collaborated on developing our data enrichment practices and processes. This included creating five steps that AI practitioners can take to improve the working conditions of those involved in data enrichment tasks (more details can be found at PAI Guidelines for Acquiring Enhanced Data):

  1. Choose the appropriate payment model and ensure all employees are paid above the local living wage.
  2. Design and pilot before launching your data enrichment project.
  3. Identify the appropriate workers to perform the requested task.
  4. Provide employees with proven instructions and/or training materials that they can follow.
  5. Establish clear and regular communication mechanisms with employees.

We collaboratively developed the necessary policies and resources, gathering iterative feedback from our internal legal, data, security, ethics, and research teams, then tested them in a petite number of data collection projects and then rolled them out across the organization.

These documents provide greater clarity on how to best configure data enrichment jobs at DeepMind, increasing our researchers’ confidence in designing and executing studies. This has not only made our approval and launch processes more proficient, but importantly, it has improved the experience of those involved in data enrichment jobs.

Further information on responsible data enrichment practices and how we have incorporated them into our existing processes is explained in a recent PAI case study, Implementing responsible data enrichment practices in an artificial intelligence company: the example of DeepMind. PAI also provides helpful resources and support materials for AI professionals and organizations looking to develop similar processes.

Waiting for something

While these best practices underpin our work, we should not rely solely on them to ensure that our designs meet the highest standards for the well-being and safety of research participants or staff. Every project at DeepMind is different, which is why we have a dedicated human data review process that allows us to continually collaborate with research teams to identify and mitigate risks on a case-by-case basis.

This work is intended to serve as a resource for other organizations interested in improving their data enrichment practices, and we hope it will lead to cross-sector conversations that can further develop these guidelines and resources for teams and partners. We also hope that through this collaboration, we will spark a broader conversation about how the AI ​​community can continue to develop norms for responsible data collection and collectively build better industry standards.

Read more about our operating principles.

Latest Posts

More News