ChatGPT Couldn't Pass US Gastroenterology Exams

Republicat de Platon

Urmaritori: 0

ChatGPT nu a reușit să promoveze examenele Colegiului American de Gastroenterologie și nu este capabil să genereze informații medicale exacte pentru pacienți, au avertizat medicii.

A study led by physicians at the Feinstein Institutes for Medical Research tested both variants of ChatGPT – powered by OpenAI's older GPT-3.5 model and the latest GPT-4 system. The academic team copy and pasted the multiple choice questions taken from the 2021 and 2022 American College of Gastroenterology (ACG) Self-Assessment Tests into the bot, and analyzed the software's responses.

Interesant este că versiunea mai puțin avansată bazată pe GPT-3.5 a răspuns corect la 65.1% dintre cele 455 de întrebări, în timp ce versiunea mai puternică GPT-4 a obținut un scor de 62.4%. Cum s-a întâmplat asta este greu de explicat, deoarece OpenAI este secret cu privire la modul în care își antrenează modelele. Purtătorii de cuvânt ai săi ne-au spus că, cel puțin, ambele modele au fost instruite pe date din septembrie 2021.

În orice caz, niciunul dintre rezultate nu a fost suficient de bun pentru a atinge pragul de 70 la sută pentru a promova examenele.

Arvind Trindade, profesor asociat la Institutele Feinstein pentru Cercetare Medicală și autor principal al studiului publicat în Jurnalul American de Gastroenterologie, a declarat pentru Registrul.

"Although the score is not far away from passing or obtaining a 70 percent, I would argue that for medical advice or medical education, the score should be over 95."

"I don't think a patient would be comfortable with a doctor that only knows 70 percent of his or her medical field. If we demand this high standard for our doctors, we should demand this high standard from medical chatbots," he added.

Colegiul American de Gastroenterologie pregătește medici, iar testele sale sunt folosite ca practică pentru examenele oficiale. Pentru a deveni gastroenterolog certificat de consiliu, medicii trebuie să treacă examenul de gastroenterologie al Consiliului American de Medicină Internă. Pentru asta necesită cunoștințe și studiu – nu doar simțuri instinctive.

ChatGPT generates responses by predicting the next word in a given sentence. AI learns common patterns in its training data to figure out what word should go next, and is partially effective at recalling information. Although the technology has improved rapidly, it's not perfect and is often prone to hallucinating false facts – especially if it's being quizzed on niche subjects that may not be present in its training data.

"ChatGPT's basic function is to predict the next word in a string of text to produce an expected response based on available information, regardless of whether such a response is factually correct or not. It does not have any intrinsic understanding of a topic or issue," the paper explains.

Trindade told us that it's possible that the gastroenterology-related information on webpages used to train the software is not accurate, and that the best resources like medical journals or databases should be used.

Aceste resurse, totuși, nu sunt ușor disponibile și pot fi blocate în spatele paywall-urilor. În acest caz, este posibil ca ChatGPT să nu fi fost suficient expus la cunoștințele experților.

"The results are only applicable to ChatGPT – other chatbots need to be validated. The crux of the issue is where these chatbots are obtaining the information. In its current form ChatGPT should not be used for medical advice or medical education," Trindade concluded. ®