Researchers say OpenAI's general-purpose speech recognition model is flawed

Share

Associated Press reported recently that she interviewed more than a dozen software engineers, programmers and academic researchers who disagree with artificial intelligence developer OpenAI’s claim that one of its machine learning tools, which is used in clinical documentation in many U.S. health care systems, has human-like accuracy .

WHY IT’S IMPORTANT

University of Michigan researchers and others found that AI hallucinations resulted in erroneous transcripts — sometimes containing racist and violent rhetoric, as well as imagined treatments — according to the AP.

The widespread apply of tools using Whisper raises concerns – available as open source or as API – which could lead to incorrect patient diagnoses or incorrect medical decisions.

Hint Health is one clinical technology provider that added the Whisper API last year to give physicians the ability to record patient consultations in the provider’s app and transcribe them using OpenAI’s gigantic language models.

Meanwhile, more than 30,000 physicians and 40 health systems, such as Children’s Hospital Los Angeles, apply Nabla’s artificial intelligence, which includes a whisper-based tool. According to Nabel, approximately seven million medical appointments have been recorded via whisper report.

A spokesman for this company cited, among others: blog published on Monday and describes specific steps the company is taking to ensure appropriate apply of the models and monitor their apply.

“Nabla detects invalidly generated content based on manual note edits and plain language comments,” the company said in a blog post. “This provides a precise measure of real-world performance and gives us additional input to improve models over time.”

Notably, Whisper is also integrated with some versions of OpenAI’s flagship ChatGPT chatbot and is a built-in offering in cloud computing platforms from Oracle and Microsoft, according to the AP.

Meanwhile, OpenAI warns users that the tool should not be used in “high-risk domains” and in its online publications recommends that Whisper not be used in “decision-making contexts where errors in accuracy may lead to clear errors in results.”

“Will the next model solve the problem of large v3s generating a significant number of hallucinations?” – asked one user OpenAI’s GitHub Whisper discussion forum on Tuesday. A question that had not been answered at press time.

“It seems solvable if the company is willing to make it a priority,” William Saunders, a San Francisco research engineer who left OpenAI earlier this year, told the AP. “It’s problematic if you put it out there and people are overconfident about what it can do and integrate it with all their other systems.”

It’s worth noting that OpenAI recently published the file job opening for a health AI research scientist whose main responsibilities would be to “design and apply practical and scalable methods to improve the security and reliability of our models” and “evaluate methods that use health-related data, ensuring that models provide accurate, reliable and trustworthy information. “

A BIGGER TREND

In September, Texas Attorney General Ken Paxton announced a settlement with Dallas-based artificial intelligence developer Pieces Technologies over allegations that the company’s generative artificial intelligence tools put patient safety at risk by being too precise. This company uses genAI to summarize real-time data from electronic health records about patient conditions and treatments.

1st century test looking at the accuracy of the LLM in medical note taking by the University of Massachusetts Amherst and Mendel, an artificial intelligence company focused on artificial intelligence hallucination detection, there were many errors.

The researchers compared GPT-4o Open AI and Llama-3 Meta and found 50 medical notes, GPT contained 21 summaries with incorrect information and 50 with generalized information, while Llama contained 19 errors and 47 generalizations.

ON RECORDING

“We take this issue seriously and are continually working to improve the accuracy of our models, including reducing hallucinations,” an OpenAI spokesman said by email on Tuesday.

“When using Whisper on our API platform, our terms of apply prohibit its apply in certain high-stakes decision-making contexts, and our model card for open source apply includes recommendations for apply in high-risk domains. We thank the researchers for sharing their findings.”

The AI Sckool

Categories

Researchers say OpenAI’s general-purpose speech recognition model is flawed

3 questions: Building predictive models to characterize cancer progression

Run miniature AI models locally with BitNet – a beginner’s guide

ChatGPT can now create interactive visualizations to facilitate you understand math and science concepts

From gaming to biology and beyond: 10 years of AlphaGo’s impact

Why CDC RFK Supports “Shared Decision Making” on Vaccines

More News

The current and future value of EMR solutions

Preparation of hospital defense in the case of phishing growth powered by artificial intelligence

Why infrastructure is the key to connecting AI and virtual care

The first Filipino city for digital healthcare

3 questions: Building predictive models to characterize cancer progression

Run miniature AI models locally with BitNet – a beginner’s guide

ChatGPT can now create interactive visualizations to facilitate you understand math and science concepts