使用自然语言处理的放射学报告自动标记:传统方法和新方法的比较

Seo Yi Chng, Paul J. W. Tern, Matthew R. X. Kan, Lionel T. E. Cheng
{"title":"使用自然语言处理的放射学报告自动标记:传统方法和新方法的比较","authors":"Seo Yi Chng,&nbsp;Paul J. W. Tern,&nbsp;Matthew R. X. Kan,&nbsp;Lionel T. E. Cheng","doi":"10.1002/hcs2.40","DOIUrl":null,"url":null,"abstract":"<p>Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. There are four main methods of automated labelling, namely: (1) rules-based text-matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. Rules-based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores. However, they require proper handling of negative words. Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors. Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets. Deep learning models make use of connected neural networks, often a long short-term memory network, and are similarly able to achieve good performance if trained on a large data set. BERT is a transformer-based model that utilizes attention. Pretrained BERT models only require fine-tuning with small data sets. In particular, domain-specific BERT models can achieve superior performance compared with the other methods for automated labelling.</p>","PeriodicalId":100601,"journal":{"name":"Health Care Science","volume":"2 2","pages":"120-128"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/hcs2.40","citationCount":"1","resultStr":"{\"title\":\"Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods\",\"authors\":\"Seo Yi Chng,&nbsp;Paul J. W. Tern,&nbsp;Matthew R. X. Kan,&nbsp;Lionel T. E. Cheng\",\"doi\":\"10.1002/hcs2.40\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. There are four main methods of automated labelling, namely: (1) rules-based text-matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. Rules-based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores. However, they require proper handling of negative words. Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors. Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets. Deep learning models make use of connected neural networks, often a long short-term memory network, and are similarly able to achieve good performance if trained on a large data set. BERT is a transformer-based model that utilizes attention. Pretrained BERT models only require fine-tuning with small data sets. In particular, domain-specific BERT models can achieve superior performance compared with the other methods for automated labelling.</p>\",\"PeriodicalId\":100601,\"journal\":{\"name\":\"Health Care Science\",\"volume\":\"2 2\",\"pages\":\"120-128\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/hcs2.40\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Care Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/hcs2.40\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Care Science","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/hcs2.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

使用自然语言处理对放射学报告进行自动标记,可以标记计算机视觉模型训练所需的放射学研究的大型数据集的基本事实。本文解释了必要的数据预处理步骤,回顾了自动标记的主要方法,并比较了它们的性能。自动标注主要有四种方法,即:(1)基于规则的文本匹配算法,(2)传统的机器学习模型,(3)神经网络模型和(4)来自变换器的双向编码器表示(BERT)模型。基于规则的标注器对手动策划的关键词进行强力搜索,并能够获得高F1分数。然而,它们需要正确处理负面词语。机器学习模型需要预处理,包括将文本标记化和矢量化为数字向量。在标记放射学报告时需要多标签分类方法,如果传统模型具有足够大的训练集,则可以获得良好的性能。深度学习模型利用连接的神经网络,通常是一种长短期记忆网络,如果在大数据集上进行训练,同样能够获得良好的性能。BERT是一种利用注意力的基于变换器的模型。预训练的BERT模型只需要使用小数据集进行微调。特别是,与其他自动标记方法相比,特定领域的BERT模型可以实现卓越的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods

Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. There are four main methods of automated labelling, namely: (1) rules-based text-matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. Rules-based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores. However, they require proper handling of negative words. Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors. Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets. Deep learning models make use of connected neural networks, often a long short-term memory network, and are similarly able to achieve good performance if trained on a large data set. BERT is a transformer-based model that utilizes attention. Pretrained BERT models only require fine-tuning with small data sets. In particular, domain-specific BERT models can achieve superior performance compared with the other methods for automated labelling.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.90
自引率
0.00%
发文量
0
期刊最新文献
Study protocol: A national cross-sectional study on psychology and behavior investigation of Chinese residents in 2023. Caregiving in Asia: Priority areas for research, policy, and practice to support family caregivers. Innovative public strategies in response to COVID-19: A review of practices from China. Sixty years of ethical evolution: The 2024 revision of the Declaration of Helsinki (DoH). A novel ensemble ARIMA-LSTM approach for evaluating COVID-19 cases and future outbreak preparedness.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1