ChatGPT Health fails to direct users to emergency care in more than half of serious cases: Study

Researchers flag inconsistent suicide-crisis safeguards in AI tool used by millions

By Simgenur Akbolat

ISTANBUL (AA) - ChatGPT Health, the AI-powered health guidance tool used by 40 million people daily, failed to appropriately direct users to emergency care in more than half of serious medical cases assessed by physicians, according to a study.

Researchers developed 60 structured clinical scenarios spanning 21 medical specialties, from minor conditions suitable for home care to life-threatening emergencies. Three independent physicians determined the appropriate level of urgency for each case using guidelines from 56 medical societies.

Each scenario was tested under 16 different contextual conditions, producing a total of 960 interactions with ChatGPT Health.

The findings were published Monday in Nature Medicine.

- Key findings

While the tool handled clear-cut emergencies reasonably well, it undertriaged more than half of the cases that physicians deemed to require emergency care. Researchers at the Icahn School of Medicine at Mount Sinai in New York noted a particularly troubling pattern: the tool often acknowledged dangerous findings in its own explanations while still reassuring the patient rather than urging them to seek immediate help.

The study also identified serious concerns with the tool's suicide-crisis safeguards. Although ChatGPT Health was designed to direct high-risk users to the Suicide and Crisis Lifeline, researchers found that the alerts appeared inconsistently, sometimes triggering in lower-risk situations while failing to appear when users described specific plans for self-harm.

"While we expected some variability, what we observed went beyond inconsistency," according to the study's senior author Girish N. Nadkarni.

-Nuanced conclusions

Researchers stopped short of advising people to abandon AI health tools, instead urging users to seek medical care directly for worsening or concerning symptoms rather than relying solely on chatbot guidance.

"These systems are changing quickly, so part of our training now must consider learning how to understand their outputs critically, identify where they fall short, and use them in ways that protect patients," said Alvira Tyagi, a first-year medical student and second author of the study.

"When millions of people are using an AI system to decide whether they need emergency care, the stakes are extraordinarily high," said Isaac Kohane, chair of biomedical informatics at Harvard Medical School, who was not involved in the research. "Independent evaluation should be routine, not optional."

Kaynak:

This news has been read 43 times in total