ChatGPT Couldn't Pass US Gastroenterology Exams

Plato tarafından yeniden yayınlandı

İzleyiciler: 0

Doktorlar, ChatGPT'nin American College of Gastroenterology sınavlarını geçemediği ve hastalar için doğru tıbbi bilgi üretemediği konusunda uyardı.

A study led by physicians at the Feinstein Institutes for Medical Research tested both variants of ChatGPT – powered by OpenAI's older GPT-3.5 model and the latest GPT-4 system. The academic team copy and pasted the multiple choice questions taken from the 2021 and 2022 American College of Gastroenterology (ACG) Self-Assessment Tests into the bot, and analyzed the software's responses.

İlginç bir şekilde, GPT-3.5 tabanlı daha az gelişmiş sürüm 65.1 sorunun yüzde 455'ini doğru yanıtlarken, daha güçlü GPT-4 yüzde 62.4 puan aldı. OpenAI, modellerini eğitme yöntemi konusunda ketum davrandığından, bunun nasıl olduğunu açıklamak zor. Sözcüleri bize, en azından her iki modelin de Eylül 2021 gibi yakın tarihli verilere göre eğitildiğini söyledi.

Her durumda, hiçbir sonuç sınavları geçmek için yüzde 70 eşiğine ulaşacak kadar iyi değildi.

Feinstein Tıbbi Araştırma Enstitülerinde doçent ve çalışmanın kıdemli yazarı Arvind Trindade yayınlanan içinde Amerikan Gastroenteroloji DergisiSöyledim, Kayıt.

"Although the score is not far away from passing or obtaining a 70 percent, I would argue that for medical advice or medical education, the score should be over 95."

"I don't think a patient would be comfortable with a doctor that only knows 70 percent of his or her medical field. If we demand this high standard for our doctors, we should demand this high standard from medical chatbots," he added.

Amerikan Gastroenteroloji Koleji doktorları eğitir ve testleri resmi sınavlar için uygulama olarak kullanılır. Kurul onaylı bir gastroenterolog olmak için doktorların American Board of Internal Medicine Gastroenterology sınavını geçmesi gerekir. Bu bilgi ve çalışma gerektirir - sadece içgüdüsel bir his değil.

ChatGPT generates responses by predicting the next word in a given sentence. AI learns common patterns in its training data to figure out what word should go next, and is partially effective at recalling information. Although the technology has improved rapidly, it's not perfect and is often prone to hallucinating false facts – especially if it's being quizzed on niche subjects that may not be present in its training data.

"ChatGPT's basic function is to predict the next word in a string of text to produce an expected response based on available information, regardless of whether such a response is factually correct or not. It does not have any intrinsic understanding of a topic or issue," the paper explains.

Trindade told us that it's possible that the gastroenterology-related information on webpages used to train the software is not accurate, and that the best resources like medical journals or databases should be used.

Bununla birlikte, bu kaynaklar hazır değildir ve ödeme duvarlarının arkasına kilitlenebilir. Bu durumda ChatGPT, uzman bilgisine yeterince maruz kalmamış olabilir.

"The results are only applicable to ChatGPT – other chatbots need to be validated. The crux of the issue is where these chatbots are obtaining the information. In its current form ChatGPT should not be used for medical advice or medical education," Trindade concluded. ®