LKAN:基于 LLM 的肝癌临床分期知识感知注意力网络。

IF 6.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Journal of Biomedical and Health Informatics Pub Date : 2024-10-11 DOI:10.1109/JBHI.2024.3478809
Ya Li, Xuecong Zheng, Jiaping Li, Qingyun Dai, Chang-Dong Wang, Min Chen
{"title":"LKAN:基于 LLM 的肝癌临床分期知识感知注意力网络。","authors":"Ya Li, Xuecong Zheng, Jiaping Li, Qingyun Dai, Chang-Dong Wang, Min Chen","doi":"10.1109/JBHI.2024.3478809","DOIUrl":null,"url":null,"abstract":"<p><p>Clinical staging of liver cancer (CSoLC), an important indicator for evaluating the degree of deterioration of primary liver cancer cells (PLCCs), is key in the diagnosis, treatment, and rehabilitation of liver cancer. In China, the current CSoLC adopts the China liver cancer (CNLC) staging, which is usually evaluated by clinicians based on the patient's radiology reports. Therefore, inferring clinical information from unstructured radiology reports can provide auxiliary decision support for clinicians. The key to solving the challenging task is to guide the model to pay attention to the staging-related words or sentences, and the following issues may occur: 1) Imbalanced categories: The symptoms of liver cancer in the early- or mid-stage are not obvious, resulting in more data in the end-stage. 2) Domain sensitivity of liver cancer data: The liver cancer dataset contains a large amount of domain knowledge, and the conventional methods can exacerbate out-of-vocabulary, which greatly affects the accuracy of classification. 3) Free-text and lengthy report: The radiology report of liver cancer sparsely describes various lesions with domain-specific terms, which poses difficulties in mining key information related to staging. To tackle these challenges, this article proposes a large language model (LLM)-based Knowledge-aware Attention Network (LKAN) for CSoLC. First, for maintaining semantic consistency, LLM and a rule-based algorithm are integrated to generate more diverse and reasonable data. Second, unlabeled radiology corpus of liver cancer are pre-trained to introduce domain knowledge for subsequent representation learning. Third, attention is improved by incorporating both global and local features, which can provide professional guidance for the classifier to focus on the important information. Compared with the baseline models, the classification accuracy of LKAN has achieved the best results with 90.3% Accuracy, 90.0% Macro_F1 score, and 90.0% Macro_Recall. The code is available at https://github.com/xczhh/Supplemental-Material.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LKAN: LLM-Based Knowledge-Aware Attention Network for Clinical Staging of Liver Cancer.\",\"authors\":\"Ya Li, Xuecong Zheng, Jiaping Li, Qingyun Dai, Chang-Dong Wang, Min Chen\",\"doi\":\"10.1109/JBHI.2024.3478809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Clinical staging of liver cancer (CSoLC), an important indicator for evaluating the degree of deterioration of primary liver cancer cells (PLCCs), is key in the diagnosis, treatment, and rehabilitation of liver cancer. In China, the current CSoLC adopts the China liver cancer (CNLC) staging, which is usually evaluated by clinicians based on the patient's radiology reports. Therefore, inferring clinical information from unstructured radiology reports can provide auxiliary decision support for clinicians. The key to solving the challenging task is to guide the model to pay attention to the staging-related words or sentences, and the following issues may occur: 1) Imbalanced categories: The symptoms of liver cancer in the early- or mid-stage are not obvious, resulting in more data in the end-stage. 2) Domain sensitivity of liver cancer data: The liver cancer dataset contains a large amount of domain knowledge, and the conventional methods can exacerbate out-of-vocabulary, which greatly affects the accuracy of classification. 3) Free-text and lengthy report: The radiology report of liver cancer sparsely describes various lesions with domain-specific terms, which poses difficulties in mining key information related to staging. To tackle these challenges, this article proposes a large language model (LLM)-based Knowledge-aware Attention Network (LKAN) for CSoLC. First, for maintaining semantic consistency, LLM and a rule-based algorithm are integrated to generate more diverse and reasonable data. Second, unlabeled radiology corpus of liver cancer are pre-trained to introduce domain knowledge for subsequent representation learning. Third, attention is improved by incorporating both global and local features, which can provide professional guidance for the classifier to focus on the important information. Compared with the baseline models, the classification accuracy of LKAN has achieved the best results with 90.3% Accuracy, 90.0% Macro_F1 score, and 90.0% Macro_Recall. The code is available at https://github.com/xczhh/Supplemental-Material.</p>\",\"PeriodicalId\":13073,\"journal\":{\"name\":\"IEEE Journal of Biomedical and Health Informatics\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2024-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Biomedical and Health Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1109/JBHI.2024.3478809\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2024.3478809","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

肝癌临床分期(CSoLC)是评价原发性肝癌细胞(PLCC)恶化程度的重要指标,是肝癌诊断、治疗和康复的关键。在中国,目前的 CSoLC 采用的是中国肝癌(CNLC)分期,通常由临床医生根据患者的放射学报告进行评估。因此,从非结构化的放射学报告中推断临床信息可为临床医生提供辅助决策支持。解决这一挑战性任务的关键在于引导模型关注分期相关的单词或句子,可能会出现以下问题:1)分类失衡:肝癌早期或中期症状不明显,导致末期数据较多。2) 肝癌数据的领域敏感性:肝癌数据集包含大量领域知识,传统方法会加剧词汇缺失,大大影响分类的准确性。3) 自由文本和冗长报告:肝癌的放射报告用特定领域的术语对各种病变进行了稀疏描述,这给挖掘与分期相关的关键信息带来了困难。针对这些难题,本文提出了一种基于大语言模型(LLM)的知识感知注意力网络(LKAN),用于 CSoLC。首先,为了保持语义的一致性,LLM 与基于规则的算法相结合,以生成更多样、更合理的数据。其次,对未标记的肝癌放射学语料进行预训练,为后续的表征学习引入领域知识。第三,通过结合全局和局部特征来提高注意力,为分类器关注重要信息提供专业指导。与基线模型相比,LKAN 的分类准确率达到了最佳效果,准确率为 90.3%,Macro_F1 分数为 90.0%,Macro_Recall 分数为 90.0%。代码见 https://github.com/xczhh/Supplemental-Material。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LKAN: LLM-Based Knowledge-Aware Attention Network for Clinical Staging of Liver Cancer.

Clinical staging of liver cancer (CSoLC), an important indicator for evaluating the degree of deterioration of primary liver cancer cells (PLCCs), is key in the diagnosis, treatment, and rehabilitation of liver cancer. In China, the current CSoLC adopts the China liver cancer (CNLC) staging, which is usually evaluated by clinicians based on the patient's radiology reports. Therefore, inferring clinical information from unstructured radiology reports can provide auxiliary decision support for clinicians. The key to solving the challenging task is to guide the model to pay attention to the staging-related words or sentences, and the following issues may occur: 1) Imbalanced categories: The symptoms of liver cancer in the early- or mid-stage are not obvious, resulting in more data in the end-stage. 2) Domain sensitivity of liver cancer data: The liver cancer dataset contains a large amount of domain knowledge, and the conventional methods can exacerbate out-of-vocabulary, which greatly affects the accuracy of classification. 3) Free-text and lengthy report: The radiology report of liver cancer sparsely describes various lesions with domain-specific terms, which poses difficulties in mining key information related to staging. To tackle these challenges, this article proposes a large language model (LLM)-based Knowledge-aware Attention Network (LKAN) for CSoLC. First, for maintaining semantic consistency, LLM and a rule-based algorithm are integrated to generate more diverse and reasonable data. Second, unlabeled radiology corpus of liver cancer are pre-trained to introduce domain knowledge for subsequent representation learning. Third, attention is improved by incorporating both global and local features, which can provide professional guidance for the classifier to focus on the important information. Compared with the baseline models, the classification accuracy of LKAN has achieved the best results with 90.3% Accuracy, 90.0% Macro_F1 score, and 90.0% Macro_Recall. The code is available at https://github.com/xczhh/Supplemental-Material.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Journal of Biomedical and Health Informatics
IEEE Journal of Biomedical and Health Informatics COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
CiteScore
13.60
自引率
6.50%
发文量
1151
期刊介绍: IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.
期刊最新文献
Machine Learning Identification and Classification of Mitosis and Migration of Cancer Cells in a Lab-on-CMOS Capacitance Sensing platform. Biomedical Information Integration via Adaptive Large Language Model Construction. BloodPatrol: Revolutionizing Blood Cancer Diagnosis - Advanced Real-Time Detection Leveraging Deep Learning & Cloud Technologies. EEG Detection and Prediction of Freezing of Gait in Parkinson's Disease Based on Spatiotemporal Coherent Modes. Functional Data Analysis of Hand Rotation for Open Surgical Suturing Skill Assessment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1