AI Language Model ChatGPT Achieved Passing Scores on U.S. Medical Licensing Exam

Scientists tested the language-based AI tool’s performance on the United States Medical Licensing Exam (USMLE) and found it could score at or around the test’s passing grade threshold.

ChatGPT is a language-based AI model developed by OpenAI capable of generating human-like text based on input by the user. It has become a popular, yet controversial method for generating general text, blogs, and responses.

While some praise its capabilities, ChatGPT has also caused ethical concerns, mostly surrounding the tool’s ability to create viable text on virtually any subject, including educational and scientific topics. Because of this, New York schools have restricted student access to the text-generating tool, and some scientific journals have forbidden ChatGPT written content in their academic studies.

Recently, scientists discovered something else ChatGPT can potentially do — pass the medical licensing exams required to practice medicine in the US.

The research — published on February 9 in the journal PLOS Digital Health — tested ChatGPT on 350 of the 376 USMLE questions available to the public. The questions came from three exams (Steps 1, 2CK, and 3) from the June 2022 USMLE release.

After the scientists removed vague responses, they found that ChatGPT scored between 52.4% and 70% across all three tests. The score needed to pass is approximately 60%.

Moreover, ChatGPT also exhibited 94.6% concordance across all its responses and delivered at least one new, non-obvious, and clinically valid insight in 88.9% of its answers.

“Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation,” the authors said in a news release.

In addition, the authors say that they used ChatGPT to simplify and offer counterpoints when drafting their manuscript, which substantially contributed to the study paper.

Despite the potential ethical implications of these results, the authors note that large language models such as ChatGPT may potentially help human learners in a medical education setting, which may eventually lead to integration into clinical decision-making.


Leave a reply

Your email will not be published. All fields are required.