ChatGPT Couldn't Pass US Gastroenterology Exams

افلاطون کے ذریعہ دوبارہ شائع کیا گیا۔

فالونگ: 0

ڈاکٹروں نے خبردار کیا ہے کہ چیٹ جی پی ٹی امریکن کالج آف گیسٹرو اینٹرولوجی کے امتحانات پاس کرنے میں ناکام رہا ہے اور مریضوں کے لیے درست طبی معلومات پیدا کرنے کے قابل نہیں ہے۔

A study led by physicians at the Feinstein Institutes for Medical Research tested both variants of ChatGPT – powered by OpenAI's older GPT-3.5 model and the latest GPT-4 system. The academic team copy and pasted the multiple choice questions taken from the 2021 and 2022 American College of Gastroenterology (ACG) Self-Assessment Tests into the bot, and analyzed the software's responses.

دلچسپ بات یہ ہے کہ GPT-3.5 پر مبنی کم ایڈوانس ورژن نے 65.1 سوالات میں سے 455 فیصد درست جواب دیا جبکہ زیادہ طاقتور GPT-4 نے 62.4 فیصد اسکور کیا۔ یہ کیسے ہوا اس کی وضاحت کرنا مشکل ہے کیونکہ اوپن اے آئی اپنے ماڈلز کو تربیت دینے کے طریقے کے بارے میں خفیہ ہے۔ اس کے ترجمان نے ہمیں بتایا، کم از کم، دونوں ماڈلز کو ستمبر 2021 کے حالیہ ڈیٹا پر تربیت دی گئی تھی۔

کسی بھی صورت میں، کوئی بھی نتیجہ اتنا اچھا نہیں تھا کہ امتحانات پاس کرنے کے لیے 70 فیصد کی حد تک پہنچ سکے۔

اروند ٹرینڈاڈ، فینسٹائن انسٹی ٹیوٹ فار میڈیکل ریسرچ کے ایک ایسوسی ایٹ پروفیسر اور مطالعہ کے سینئر مصنف شائع میں امریکن جرنل آف گیسٹرو اینٹرولوجی، بتایا رجسٹر.

"Although the score is not far away from passing or obtaining a 70 percent, I would argue that for medical advice or medical education, the score should be over 95."

"I don't think a patient would be comfortable with a doctor that only knows 70 percent of his or her medical field. If we demand this high standard for our doctors, we should demand this high standard from medical chatbots," he added.

امریکن کالج آف گیسٹرو اینٹرولوجی ڈاکٹروں کو تربیت دیتا ہے، اور اس کے ٹیسٹ سرکاری امتحانات کے لیے بطور مشق استعمال ہوتے ہیں۔ بورڈ سے تصدیق شدہ معدے کے ماہر بننے کے لیے، ڈاکٹروں کو امریکن بورڈ آف انٹرنل میڈیسن گیسٹرو اینٹرولوجی کا امتحان پاس کرنا ہوگا۔ اس کے لیے علم اور مطالعہ کی ضرورت ہوتی ہے – نہ صرف آنتوں کا احساس۔

ChatGPT generates responses by predicting the next word in a given sentence. AI learns common patterns in its training data to figure out what word should go next, and is partially effective at recalling information. Although the technology has improved rapidly, it's not perfect and is often prone to hallucinating false facts – especially if it's being quizzed on niche subjects that may not be present in its training data.

"ChatGPT's basic function is to predict the next word in a string of text to produce an expected response based on available information, regardless of whether such a response is factually correct or not. It does not have any intrinsic understanding of a topic or issue," the paper explains.

Trindade told us that it's possible that the gastroenterology-related information on webpages used to train the software is not accurate, and that the best resources like medical journals or databases should be used.

تاہم، یہ وسائل آسانی سے دستیاب نہیں ہیں اور پے والز کے پیچھے بند کیے جا سکتے ہیں۔ اس صورت میں، ہو سکتا ہے کہ ChatGPT کو ماہر علم کے لیے کافی حد تک بے نقاب نہ کیا گیا ہو۔

"The results are only applicable to ChatGPT – other chatbots need to be validated. The crux of the issue is where these chatbots are obtaining the information. In its current form ChatGPT should not be used for medical advice or medical education," Trindade concluded. ®