基于集成的生物医学问答数据多标签分类方法

Journal of Information Systems Engineering and Business Intelligence Pub Date : 2022-04-26 DOI:10.20473/jisebi.8.1.42-50

A. Abdillah, Cornelius Bagus Purnama Putra, Apriantoni Apriantoni, Safitri Juanita, D. Purwitasari

{"title":"基于集成的生物医学问答数据多标签分类方法","authors":"A. Abdillah, Cornelius Bagus Purnama Putra, Apriantoni Apriantoni, Safitri Juanita, D. Purwitasari","doi":"10.20473/jisebi.8.1.42-50","DOIUrl":null,"url":null,"abstract":"Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking.\nObjective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions.\nMethods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making.\nResults: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%.\nConclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions.\nKeywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering","PeriodicalId":16185,"journal":{"name":"Journal of Information Systems Engineering and Business Intelligence","volume":"59 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Ensemble-based Methods for Multi-label Classification on Biomedical Question-Answer Data\",\"authors\":\"A. Abdillah, Cornelius Bagus Purnama Putra, Apriantoni Apriantoni, Safitri Juanita, D. Purwitasari\",\"doi\":\"10.20473/jisebi.8.1.42-50\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking.\\nObjective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions.\\nMethods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making.\\nResults: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%.\\nConclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions.\\nKeywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering\",\"PeriodicalId\":16185,\"journal\":{\"name\":\"Journal of Information Systems Engineering and Business Intelligence\",\"volume\":\"59 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Systems Engineering and Business Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20473/jisebi.8.1.42-50\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Systems Engineering and Business Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20473/jisebi.8.1.42-50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

背景:问答(QA)是寻求健康相关信息和生物医学数据的一种流行方法。这些问题可能涉及多个医疗实体(多标签)，因此确定正确的标签并不容易。QA系统中的问题分类(QC)机制可以缩小我们所寻求的答案。目的:利用异构集成方法开发一种多标签分类方法，以提高长文本维度生物医学数据的准确率。方法:采用异构深度学习和机器学习相结合的集成方法进行多标签扩展文本分类。有15种不同的单一模型，由三种深度学习(CNN, LSTM和BERT)和四种机器学习算法(SVM, kNN, Decision Tree和Naïve Bayes)组成，具有各种文本表示(TF-IDF, Word2Vec和FastText)。我们使用了套袋方法和硬投票机制来进行决策。结果:作为单一的多标签生物医学数据分类方法，深度学习比机器学习更强大。此外，结合集成方法，我们发现前三名是基础学习器的最佳数量。基于异质性的三个学习器集成的f1得分为82.3%，优于CNN的最佳单一模型f1得分为80%。结论:采用集成模型对生物医学质量保证的多标签分类效果优于单一模型。结果表明，在长文本维数的生物医学QA数据上，异构集成比同质集成更有效。关键词:生物医学问题分类，集成方法，异构集成，多标签分类，问题回答

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Ensemble-based Methods for Multi-label Classification on Biomedical Question-Answer Data

Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking. Objective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions. Methods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making. Results: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%. Conclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions. Keywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Information Systems Engineering and Business Intelligence

CiteScore

0.30

自引率

0.00%

发文量