Openai will present Healthbench to assess LLM safety in healthcare

Share

Openai He announced the launch of Healthbench, a reference point for the assessment of AI models in healthcare using the actual application and assessment of the doctor.

“5000 conversations in Healthbench simulates interactions between AI models and individual users or clinicists. The model’s task is to provide the best possible response to the last message of the user,” said the company in a statement.

Opeli has built a reference point with 262 doctors in 60 countries who are running in 49 languages and have training in 26 medical specialties.

Healthbench includes 5000 health talks, each with a column created by a doctor to assess the model’s response. The edge of the column includes 48,562 unique abrasive criteria.

The company announced that talks were created by “synthetic generation and human opposite tests” are multilingual and include various medical specializations and contexts.

“Each model’s reaction is assessed contrary to the set of colon criteria written by a specific doctor for this conversation,” said the company.

“Each criterion presents what to cover or avoid an ideal reaction (e.g. a special fact that it included or unnecessarily technical jargon to avoid). Each criterion has the right point value, weighted in order to adapt the meaning of this criterion to assessing the doctor.”

The model’s answers are evaluated using GPT-4.1 to determine whether each column criterion is met. The overall result based on the criteria is shown to the user and compared to the maximum possible result.

Healthbench is divided into seven topics: communication with a specialist supplier, depth of answers, emergency referrals, health data tasks, global health, responding to uncertainty and searching for context.

“Assessments such as Healthbench are part of our constant efforts to understand model behavior in high impact settings and help ensure progress towards the actual benefit,” said the company.

“Our findings show that vast language models have improved significantly over time and already outweigh experts in writing responses to examples tested in our hill. However, even the most advanced systems still have a significant space to improve, especially in search of the necessary context for indefinite queries and the worst reliability.

The tools are publicly available at GitHub.

Greater trend

The CEO of Opeli, Altman himself, was part of the press conference of President Donald Trump at the beginning of this year Advertisement of launching Project Stargate. This project worth $ 500 billion would focus on the development of physical and virtual infrastructure to supply the construction of artificial intelligence, including AI to improve health results.

Partners who also made Oracle Technology Director, Larry Ellison i SoftbankThe general director, son of Masayoshi, advertised a project as a healthcare game changer.

Altman said during a press conference that he is excited that he is part of Stargate and predicts that the diseases will be cured at an unusual pace.

Ellison added that the anti -cancer vaccine is one of the “most exciting” things that the group is working on using tools provided by Altman and Son.

At the beginning of this month, the Financial Times announced it Project Stargate considered international expansionGreat Britain is with the best country of choice. Germany and France are also attractive candidates.

However, this week, Bloomberg reported that the project is in the face of delays due to tariffs imposed by Trump and economic uncertainty.

Due to the economic uncertainty and the growing variability of the market, banks and institutional investors are afraid of investing in Star Wetgate, especially since the costs of building the data center are uncertain of American tariffs, especially on tokens systems, server stands and cooling systems.

In addition, Softbank, which has committed to donate an immediate $ 100 billion in a project, in which he became $ 500 billion in the next four years, must still develop a financing template or start discussions with potential sponsors, according to Bloomberg.

The AI Sckool

Categories

Openai will present Healthbench to assess LLM safety in healthcare

Sleep apnea often goes undetected in women. This is starting to change

Anthropic’s contract with the Pentagon is a warning to startups chasing federal contracts

When AI companies go to war, security gets left behind

5 Powerful Python Decorators for Optimizing LLM Applications

War with Iran threatens global chip supplies and the expansion of artificial intelligence

More News

The current and future value of EMR solutions

Preparation of hospital defense in the case of phishing growth powered by artificial intelligence

Why infrastructure is the key to connecting AI and virtual care

The first Filipino city for digital healthcare

Sleep apnea often goes undetected in women. This is starting to change

Anthropic’s contract with the Pentagon is a warning to startups chasing federal contracts

When AI companies go to war, security gets left behind