An OpenAI transcription tool reportedly adds hallucinatory content to medical consultation recordings

OpenAI in 2022 released an artificial intelligence (AI) tool called Whisper that can transcribe speech into text. However, the report claims that the AI tool is prone to hallucinations and adds imaginary text in the transcription. This is a concern because the tool is said to be used in several high-risk areas such as medicine and accessibility. Of particular concern is reportedly the use of the tool during doctor-patient consultations, where hallucinations can add potentially harmful information and put the patient’s life at risk.

OpenAI Whisper is reportedly prone to hallucinations

Associated Press reported that OpenAI’s automatic speech recognition (ASR) system Whisper has a high potential to generate hallucinatory text. Citing interviews with several software engineers, developers and academic researchers, the publication claimed that the imaginary text contained racial descriptions, violence, treatment and medicine.

Hallucination, in AI parlance, is a major problem that causes AI systems to generate answers that are incorrect or misleading. In the case of Whisper, the AI is said to invent text that no one has ever spoken.

In an example tested by the publication, the speaker’s sentence is: “He, the boy, was going to, I don’t know for sure, take an umbrella.” was changed to “He took a big piece of the cross, a small, small… I’m sure he didn’t have a terrible knife, that’s why he killed a lot of people.” In another case, Whisper reportedly added racial information without any mention of it.

While hallucinations are not a new problem in the field of artificial intelligence, the problem with this particular tool is more important because the open source technology is used by several tools used in high-risk industries. For example, Paris-based Nabla created a Whisper-based tool that is reportedly used by more than 30,000 clinicians and 40 health systems.

The Nabla tool has been used to record more than seven million medical visits. To ensure data security, the company also deletes the original recording from its servers. This means that if any hallucinatory text has been produced in these seven million transcriptions, it cannot be verified and corrected.

Another area where the technology is being used is in the creation of accessibility tools for the deaf and hard of hearing, where, again, testing the accuracy of the tool is much more difficult. Most hallucinations are thought to be caused by background noises, abrupt pauses, and other environmental sounds.

The scale of the problem is also worrying. Citing the researcher, the publication claims that eight out of every 10 audio transcriptions contain hallucinatory text. The developer told the publication that the hallucination occurred in “every one of the 26,000 transcripts he created with Whisper.”

In particular, on launching Whisper, OpenAI said Whisper provides a human level of immunity to accents, background noise and technical language. A representative of the company told the publication that the artificial intelligence firm is constantly studying ways to reduce hallucinations and promised to include feedback in future model updates.