M. Giuffre , S. Kresevic , M. Ajcevic , L. Crocè , D. Shung
{"title":"基于agent的大型语言模型框架在慢性丙型肝炎病毒感染患者中的自动治疗处方","authors":"M. Giuffre , S. Kresevic , M. Ajcevic , L. Crocè , D. Shung","doi":"10.1016/j.dld.2025.01.088","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Large Language Models (LLMs) may be useful for clinical tasks that require reasoning over different information sources. LLMs could be regarded as intelligent \"agents\" with internal planning abilities, enabling them to engage in multi-step reasoning and interact with other agents or external users. Hepatitis C Virus (HCV) management is a potential area where LLM-enabled agents could be useful, since treatment decisions require consideration of genotype, treatment history, liver fibrosis extent, and drug-drug interactions.</div></div><div><h3>Aim</h3><div>To evaluate the performance of different LLM agent-based configurations in automating HCV treatment decisions and to determine whether specialized multi-agent architectures improve prescription accuracy compared to single-agent approaches.</div></div><div><h3>Material and Methods</h3><div>We developed 50 clinical cases focusing on therapeutic regimen prescription. Cases included genotype, prior treatment history, fibrosis status, and concurrent medications. We compared multiple configurations using OpenAI's GPT-3.5 and GPT-4o, fine-tuned with HCV treatment guidelines. Different agent architectures were tested: single agent (one LLM extracting all data), multi-agent (three specialized LLMs for data extraction plus prescriber), and specialized multi-agent (four specialized extraction agents plus prescriber). Each agent accessed specific guideline sections relevant to its task. Performance was compared to baseline fine-tuned models.</div></div><div><h3>Results</h3><div>Using GPT-3.5, the baseline model achieved 24% prescription accuracy. The single agent configuration reached 50% (p=0.007), multi-agent 76% (p<0.001), and specialized multi-agent 89% (p<0.001). With GPT-4, performance improved significantly: baseline 35% accuracy, single agent 65% (p=0.005), multi-agent 88% (p<0.001), and specialized multi-agent 94% (p<0.001).</div></div><div><h3>Conclusions</h3><div>Specialized multi-agent LLM frameworks significantly improve HCV treatment recommendation accuracy, with GPT-4 showing superior performance. The agent-based approach demonstrates the potential for complex clinical decision-making. Future work should validate these findings in real-world settings and explore integration with clinical decision-support systems.</div></div>","PeriodicalId":11268,"journal":{"name":"Digestive and Liver Disease","volume":"57 ","pages":"Pages S46-S47"},"PeriodicalIF":3.8000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large Language Model Agent-Based Framework for automated Treatment Prescription in Patients with Chronic Hepatitis C Virus Infection\",\"authors\":\"M. Giuffre , S. Kresevic , M. Ajcevic , L. Crocè , D. Shung\",\"doi\":\"10.1016/j.dld.2025.01.088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Large Language Models (LLMs) may be useful for clinical tasks that require reasoning over different information sources. LLMs could be regarded as intelligent \\\"agents\\\" with internal planning abilities, enabling them to engage in multi-step reasoning and interact with other agents or external users. Hepatitis C Virus (HCV) management is a potential area where LLM-enabled agents could be useful, since treatment decisions require consideration of genotype, treatment history, liver fibrosis extent, and drug-drug interactions.</div></div><div><h3>Aim</h3><div>To evaluate the performance of different LLM agent-based configurations in automating HCV treatment decisions and to determine whether specialized multi-agent architectures improve prescription accuracy compared to single-agent approaches.</div></div><div><h3>Material and Methods</h3><div>We developed 50 clinical cases focusing on therapeutic regimen prescription. Cases included genotype, prior treatment history, fibrosis status, and concurrent medications. We compared multiple configurations using OpenAI's GPT-3.5 and GPT-4o, fine-tuned with HCV treatment guidelines. Different agent architectures were tested: single agent (one LLM extracting all data), multi-agent (three specialized LLMs for data extraction plus prescriber), and specialized multi-agent (four specialized extraction agents plus prescriber). Each agent accessed specific guideline sections relevant to its task. Performance was compared to baseline fine-tuned models.</div></div><div><h3>Results</h3><div>Using GPT-3.5, the baseline model achieved 24% prescription accuracy. The single agent configuration reached 50% (p=0.007), multi-agent 76% (p<0.001), and specialized multi-agent 89% (p<0.001). With GPT-4, performance improved significantly: baseline 35% accuracy, single agent 65% (p=0.005), multi-agent 88% (p<0.001), and specialized multi-agent 94% (p<0.001).</div></div><div><h3>Conclusions</h3><div>Specialized multi-agent LLM frameworks significantly improve HCV treatment recommendation accuracy, with GPT-4 showing superior performance. The agent-based approach demonstrates the potential for complex clinical decision-making. Future work should validate these findings in real-world settings and explore integration with clinical decision-support systems.</div></div>\",\"PeriodicalId\":11268,\"journal\":{\"name\":\"Digestive and Liver Disease\",\"volume\":\"57 \",\"pages\":\"Pages S46-S47\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digestive and Liver Disease\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1590865825000891\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/3/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"GASTROENTEROLOGY & HEPATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digestive and Liver Disease","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1590865825000891","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/10 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
Large Language Model Agent-Based Framework for automated Treatment Prescription in Patients with Chronic Hepatitis C Virus Infection
Background
Large Language Models (LLMs) may be useful for clinical tasks that require reasoning over different information sources. LLMs could be regarded as intelligent "agents" with internal planning abilities, enabling them to engage in multi-step reasoning and interact with other agents or external users. Hepatitis C Virus (HCV) management is a potential area where LLM-enabled agents could be useful, since treatment decisions require consideration of genotype, treatment history, liver fibrosis extent, and drug-drug interactions.
Aim
To evaluate the performance of different LLM agent-based configurations in automating HCV treatment decisions and to determine whether specialized multi-agent architectures improve prescription accuracy compared to single-agent approaches.
Material and Methods
We developed 50 clinical cases focusing on therapeutic regimen prescription. Cases included genotype, prior treatment history, fibrosis status, and concurrent medications. We compared multiple configurations using OpenAI's GPT-3.5 and GPT-4o, fine-tuned with HCV treatment guidelines. Different agent architectures were tested: single agent (one LLM extracting all data), multi-agent (three specialized LLMs for data extraction plus prescriber), and specialized multi-agent (four specialized extraction agents plus prescriber). Each agent accessed specific guideline sections relevant to its task. Performance was compared to baseline fine-tuned models.
Results
Using GPT-3.5, the baseline model achieved 24% prescription accuracy. The single agent configuration reached 50% (p=0.007), multi-agent 76% (p<0.001), and specialized multi-agent 89% (p<0.001). With GPT-4, performance improved significantly: baseline 35% accuracy, single agent 65% (p=0.005), multi-agent 88% (p<0.001), and specialized multi-agent 94% (p<0.001).
Conclusions
Specialized multi-agent LLM frameworks significantly improve HCV treatment recommendation accuracy, with GPT-4 showing superior performance. The agent-based approach demonstrates the potential for complex clinical decision-making. Future work should validate these findings in real-world settings and explore integration with clinical decision-support systems.
期刊介绍:
Digestive and Liver Disease is an international journal of Gastroenterology and Hepatology. It is the official journal of Italian Association for the Study of the Liver (AISF); Italian Association for the Study of the Pancreas (AISP); Italian Association for Digestive Endoscopy (SIED); Italian Association for Hospital Gastroenterologists and Digestive Endoscopists (AIGO); Italian Society of Gastroenterology (SIGE); Italian Society of Pediatric Gastroenterology and Hepatology (SIGENP) and Italian Group for the Study of Inflammatory Bowel Disease (IG-IBD).
Digestive and Liver Disease publishes papers on basic and clinical research in the field of gastroenterology and hepatology.
Contributions consist of:
Original Papers
Correspondence to the Editor
Editorials, Reviews and Special Articles
Progress Reports
Image of the Month
Congress Proceedings
Symposia and Mini-symposia.