In recent years, Large Language Model (LLM) chatbots powered by artificial intelligence have emerged as potentially transformative tools in patient care. With their remarkable conversational capability and capacity to mimic human conversational styles, they hold promise in streamlining communication tasks and providing clinical decision support. However, their unreliability poses significant challenges, preventing their approval as medical devices. In this blog, we explore the current state of LLM chatbots in medicine and the regulatory hurdles they face.
LLMs, such as OpenAI's Generative pre-trained transformer (GPT) and Google's Pathways Language Model (PaLM), are neural network language models trained on vast amounts of text from the internet. ChatGPT, a prominent LLM chatbot, was launched in November 2022 and underwent an update to GPT-4 in March 2023. It boasts impressive conversational abilities and can creatively mimic different human conversational styles.
Under EU and US law, tools used for diagnosis, prevention, monitoring, prediction, prognosis, treatment, or alleviation of disease are categorized as medical devices and are subject to rigorous regulatory controls. LLM chatbots fall under this classification, but their near-infinite range of inputs and outputs makes it challenging to test their usability and on-market performance accurately. Developers of LLM chatbots recognize that they often generate convincing but incorrect statements and, at times, provide inappropriate responses. Additionally, a lack of certainty or confidence indicators and absence of genuine sources hinder their approval as medical devices. These issues pose significant safety concerns and hinder the approval of LLM chatbots as medical devices.
While search engines have transformed the medical landscape, they are not classified as regulated medical devices because they were not initially designed for medical diagnosis, decision support, or therapy planning. Integrating LLM chatbots into search engines could enhance search result confidence, but their potential to provide harmful information, particularly when prompted with medical questions, raises concerns.
Developers have attempted to limit LLM chatbots' generative creativity to improve safety and groundedness. However, fully resolving inaccuracies and hallucinations in LLM-based chat models is challenging due to their inherent nature. To be approvable under current and proposed international regulatory frameworks, forthcoming chatbots must apply more supervised learning approaches, provide genuine citations to support content, and reduce errors and hallucinations.
Several approaches to developing and regulating LLM chatbots for medical devices have been proposed, including enabling public oversight through open AI methodologies and independent oversight tools. Moreover, outputs from LLM chatbots could be tailored to the user's age for safeguarding purposes. However, developers must still take responsibility for safety and validation in medicine.
In conclusion, LLM chatbots hold significant promise in transforming patient care, but their unreliability currently hinders their approval as medical devices. To earn their place in the medical armamentarium, LLM chatbots must demonstrate better accuracy, safety, and clinical efficacy and meet the key principles for AI in healthcare. Regulatory approval is crucial to ensure they provide safe and effective outputs in medicine.