ChatGPT non ha superato gli esami di gastroenterologia negli Stati Uniti

Ripubblicato da Platone

Seguaci: 0

ChatGPT non ha superato gli esami dell'American College of Gastroenterology e non è in grado di generare informazioni mediche accurate per i pazienti, hanno avvertito i medici.

A study led by physicians at the Feinstein Institutes for Medical Research tested both variants of ChatGPT – powered by OpenAI's older GPT-3.5 model and the latest GPT-4 system. The academic team copy and pasted the multiple choice questions taken from the 2021 and 2022 American College of Gastroenterology (ACG) Self-Assessment Tests into the bot, and analyzed the software's responses.

È interessante notare che la versione meno avanzata basata su GPT-3.5 ha risposto correttamente al 65.1% delle 455 domande, mentre la più potente GPT-4 ha ottenuto il 62.4%. Come sia successo è difficile da spiegare poiché OpenAI è riservato sul modo in cui addestra i suoi modelli. I suoi portavoce ci hanno detto, almeno, che entrambi i modelli sono stati addestrati su dati risalenti al settembre 2021.

In ogni caso, nessuno dei due risultati è stato sufficiente per raggiungere la soglia del 70 per cento per superare gli esami.

Arvind Trindade, professore associato presso The Feinstein Institutes for Medical Research e autore senior dello studio pubblicato nel American Journal of Gastroenterology, Ha detto Il registro.

"Although the score is not far away from passing or obtaining a 70 percent, I would argue that for medical advice or medical education, the score should be over 95."

"I don't think a patient would be comfortable with a doctor that only knows 70 percent of his or her medical field. If we demand this high standard for our doctors, we should demand this high standard from medical chatbots," he added.

L'American College of Gastroenterology forma i medici e i suoi test vengono utilizzati come pratica per gli esami ufficiali. Per diventare un gastroenterologo certificato dal consiglio di amministrazione, i medici devono superare l'esame di gastroenterologia dell'American Board of Internal Medicine. Ciò richiede conoscenza e studio, non solo istinto.

ChatGPT generates responses by predicting the next word in a given sentence. AI learns common patterns in its training data to figure out what word should go next, and is partially effective at recalling information. Although the technology has improved rapidly, it's not perfect and is often prone to hallucinating false facts – especially if it's being quizzed on niche subjects that may not be present in its training data.

"ChatGPT's basic function is to predict the next word in a string of text to produce an expected response based on available information, regardless of whether such a response is factually correct or not. It does not have any intrinsic understanding of a topic or issue," the paper explains.

Trindade told us that it's possible that the gastroenterology-related information on webpages used to train the software is not accurate, and that the best resources like medical journals or databases should be used.

Queste risorse, tuttavia, non sono prontamente disponibili e possono essere bloccate dietro paywall. In tal caso, ChatGPT potrebbe non essere stato sufficientemente esposto alla conoscenza degli esperti.

"The results are only applicable to ChatGPT – other chatbots need to be validated. The crux of the issue is where these chatbots are obtaining the information. In its current form ChatGPT should not be used for medical advice or medical education," Trindade concluded. ®