Sunday, April 20, 2025

Developing reliable AI tools for healthcare

Share

Novel study proposes a system to determine the relative accuracy of predictive AI in hypothetical medical settings and when the system should depend on the clinician

Artificial intelligence (AI) has enormous potential to improve the work of people in various industries. But to integrate AI tools into the workplace in a secure and responsible way, we need to develop more stalwart methods to understand when they can be most useful.

So when is artificial intelligence more right and when is a human? This question is especially significant in healthcare, where predictive AI is increasingly being used in high-stakes tasks to assist physicians.

Today in Nature’s Medicinewe published our joint paper with Google Research proposing CoDoC (Complementarity-Based Clinically Deferred Workflow), an AI system that learns when to rely on predictive AI tools and when to rely on the doctor for the most right interpretation of medical images.

CoDoC is exploring how we could utilize human-AI collaboration in hypothetical medical settings to ensure the best outcomes. In one example scenario, CoDoC reduced false positives by 25% on a vast de-identified UK mammography dataset compared to commonly used clinical procedures – without missing any true positives.

This work is a collaboration with several healthcare organizations, including the United Nations Office for Project Services Stop TB Partnership. To assist researchers leverage our work to improve the transparency and security of real-world AI models, we have also made the software open source CoDoC code on GitHub.

CoDoC: an additional tool for human-artificial intelligence cooperation

Creating more stalwart AI models often requires redesigning the sophisticated inner workings of predictive AI models. However, for many healthcare providers, redesigning a predictive AI model is simply not possible. CoDoC can potentially assist improve predictive AI tools for users without having to modify the AI ​​tool itself.

When developing CoDoC, we were guided by three criteria:

  • Non-machine learning experts, such as healthcare professionals, should be able to deploy the system and run it on a single computer.
  • Training would require a relatively tiny amount of data – typically just a few hundred examples.
  • The system could be compatible with any proprietary AI models and would not need access to the inner workings of the model or the data it was trained on.

Determining when predictive AI or clinician is more right

With CoDoC, we propose a plain and useful AI system that improves reliability by helping predictive AI systems “know when they don’t know.” We looked at scenarios where a doctor might have access to an AI tool designed to assist interpret an image, such as when examining a chest X-ray to see if a TB test is needed.

In any theoretical clinical environment, the CoDoC system requires only three inputs for each case in the training dataset.

  1. Predictive AI generates a confidence score ranging from 0 (confidence that the disease does not exist) to 1 (confidence that the disease does exist).
  2. Clinical interpretation of the medical image.
  3. A fundamental truth about the presence of a disease, as determined by, for example, a biopsy or other clinical examination.

Note: CoDoC does not require access to any medical images.

CoDoC learns to determine the relative accuracy of a predictive AI model compared to clinician interpretation, and how this relationship changes based on predictive AI confidence scores.

Once trained, CoDoC can be incorporated into a hypothetical future clinical trial involving both AI and a physician. When a recent patient image is assessed by a predictive AI model, an associated level of confidence is fed into the system. CoDoC then evaluates whether accepting the AI’s decision or deferring the decision to the doctor will ultimately lead to the most right interpretation.

Increased accuracy and efficiency

Our comprehensive CoDoC tests with multiple real-world datasets – including only historical and de-identified data – have shown that combining the best of human knowledge and predictive AI results in greater accuracy than either solution alone.

In addition to reducing the number of false positives in the mammography dataset by 25%, in hypothetical simulations where the AI ​​could act autonomously in certain situations, CoDoC was able to reduce by two the number of cases that had to be read by a clinician. third. We also showed how CoDoC could hypothetically improve the selection of chest X-rays for further TB testing.

Responsible development of artificial intelligence for healthcare

Although this work is theoretical in nature, it demonstrates the adaptive potential of our AI system: CoDoC was able to improve medical imaging interpretation performance across different demographic populations, clinical settings, medical imaging equipment used, and disease types.

CoDoC is a promising example of how we can leverage the benefits of artificial intelligence combined with human strengths and expertise. We work with external partners to rigorously evaluate our research and the potential benefits of the system. To safely introduce technology like CoDoC into real-world medical settings, healthcare providers and manufacturers will also need to understand how physicians interact with AI and validate systems with specific medical AI tools and settings.

Find out more about CoDoC:

Latest Posts

More News