Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles.

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine Pub Date : 2015-09-01 DOI:10.1145/2808719.2808746

Anthony Rios, Ramakanth Kavuluru

{"title":"Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles.","authors":"Anthony Rios, Ramakanth Kavuluru","doi":"10.1145/2808719.2808746","DOIUrl":null,"url":null,"abstract":"Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2808719.2808746","citationCount":"118","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808719.2808746","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 118

Abstract

Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

生物医学文本分类的卷积神经网络:在生物医学文章索引中的应用。

由于研究文章和临床文献等非结构化叙述中隐藏着丰富的信息，构建高精度的文本分类器是生物医学领域的一项重要任务。由于特征空间大，传统上，判别方法，如逻辑回归和具有n-gram和语义特征(例如，命名实体)的支持向量机已用于文本分类，其中通常通过特征选择和集成方法获得额外的性能提升。在本文中，我们证明了使用卷积神经网络(cnn)的更直接的方法优于几种传统的生物医学文本分类方法，具体用例是为生物医学文章分配医学主题标题(或MeSH术语)。国家医学图书馆(NLM)训练有素的注释员平均为每篇生物医学文章分配13个代码，从而对科学文献进行语义索引，以支持NLM的PubMed搜索系统。最近的证据表明，MeSH术语分配的有效自动化工作从每个术语的二元分类器开始。在本文中，我们使用cnn构建二元文本分类器，与公共数据集上的最佳先前结果相比，在一组选定的难以分类的MeSH术语上实现了超过3%的宏观f分数的绝对提高。对数据集中50个高频项的额外实验也显示了cnn的改进。我们的研究结果表明，cnn在生物医学文本分类任务中具有强大的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

自引率

0.00%

发文量