基于集成的生物医学问答数据多标签分类方法

A. Abdillah, Cornelius Bagus Purnama Putra, Apriantoni Apriantoni, Safitri Juanita, D. Purwitasari
{"title":"基于集成的生物医学问答数据多标签分类方法","authors":"A. Abdillah, Cornelius Bagus Purnama Putra, Apriantoni Apriantoni, Safitri Juanita, D. Purwitasari","doi":"10.20473/jisebi.8.1.42-50","DOIUrl":null,"url":null,"abstract":"Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking.\nObjective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions.\nMethods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making.\nResults: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%.\nConclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions.\nKeywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering","PeriodicalId":16185,"journal":{"name":"Journal of Information Systems Engineering and Business Intelligence","volume":"59 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Ensemble-based Methods for Multi-label Classification on Biomedical Question-Answer Data\",\"authors\":\"A. Abdillah, Cornelius Bagus Purnama Putra, Apriantoni Apriantoni, Safitri Juanita, D. Purwitasari\",\"doi\":\"10.20473/jisebi.8.1.42-50\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking.\\nObjective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions.\\nMethods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making.\\nResults: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%.\\nConclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions.\\nKeywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering\",\"PeriodicalId\":16185,\"journal\":{\"name\":\"Journal of Information Systems Engineering and Business Intelligence\",\"volume\":\"59 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Systems Engineering and Business Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20473/jisebi.8.1.42-50\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Systems Engineering and Business Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20473/jisebi.8.1.42-50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

背景:问答(QA)是寻求健康相关信息和生物医学数据的一种流行方法。这些问题可能涉及多个医疗实体(多标签),因此确定正确的标签并不容易。QA系统中的问题分类(QC)机制可以缩小我们所寻求的答案。目的:利用异构集成方法开发一种多标签分类方法,以提高长文本维度生物医学数据的准确率。方法:采用异构深度学习和机器学习相结合的集成方法进行多标签扩展文本分类。有15种不同的单一模型,由三种深度学习(CNN, LSTM和BERT)和四种机器学习算法(SVM, kNN, Decision Tree和Naïve Bayes)组成,具有各种文本表示(TF-IDF, Word2Vec和FastText)。我们使用了套袋方法和硬投票机制来进行决策。结果:作为单一的多标签生物医学数据分类方法,深度学习比机器学习更强大。此外,结合集成方法,我们发现前三名是基础学习器的最佳数量。基于异质性的三个学习器集成的f1得分为82.3%,优于CNN的最佳单一模型f1得分为80%。结论:采用集成模型对生物医学质量保证的多标签分类效果优于单一模型。结果表明,在长文本维数的生物医学QA数据上,异构集成比同质集成更有效。关键词:生物医学问题分类,集成方法,异构集成,多标签分类,问题回答
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Ensemble-based Methods for Multi-label Classification on Biomedical Question-Answer Data
Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking. Objective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions. Methods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making. Results: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%. Conclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions. Keywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.30
自引率
0.00%
发文量
0
期刊最新文献
Sentiment Analysis on a Large Indonesian Product Review Dataset Leveraging Biotic Interaction Knowledge Graph and Network Analysis to Uncover Insect Vectors of Plant Virus Model-based Decision Support System Using a System Dynamics Approach to Increase Corn Productivity Optimizing Support Vector Machine Performance for Parkinson's Disease Diagnosis Using GridSearchCV and PCA-Based Feature Extraction A Practical Approach to Enhance Data Quality Management in Government: Case Study of Indonesian Customs and Excise Office
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1