MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed.

IF 2 3区工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Biomedical Semantics Pub Date : 2024-10-02 DOI:10.1186/s13326-024-00319-w

Houcemeddine Turki, Bonaventure F P Dossou, Chris Chinenye Emezue, Abraham Toluwase Owodunni, Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha, Hanen Ben Hassen, Afif Masmoudi

{"title":"MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed.","authors":"Houcemeddine Turki, Bonaventure F P Dossou, Chris Chinenye Emezue, Abraham Toluwase Owodunni, Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha, Hanen Ben Hassen, Afif Masmoudi","doi":"10.1186/s13326-024-00319-w","DOIUrl":null,"url":null,"abstract":"<p><p>Biomedical relation classification has been significantly improved by the application of advanced machine learning techniques on the raw texts of scholarly publications. Despite this improvement, the reliance on large chunks of raw text makes these algorithms suffer in terms of generalization, precision, and reliability. The use of the distinctive characteristics of bibliographic metadata can prove effective in achieving better performance for this challenging task. In this research paper, we introduce an approach for biomedical relation classification using the qualifiers of co-occurring Medical Subject Headings (MeSH). First of all, we introduce MeSH2Matrix, our dataset consisting of 46,469 biomedical relations curated from PubMed publications using our approach. Our dataset includes a matrix that maps associations between the qualifiers of subject MeSH keywords and those of object MeSH keywords. It also specifies the corresponding Wikidata relation type and the superclass of semantic relations for each relation. Using MeSH2Matrix, we build and train three machine learning models (Support Vector Machine [SVM], a dense model [D-Model], and a convolutional neural network [C-Net]) to evaluate the efficiency of our approach for biomedical relation classification. Our best model achieves an accuracy of 70.78% for 195 classes and 83.09% for five superclasses. Finally, we provide confusion matrix and extensive feature analyses to better examine the relationship between the MeSH qualifiers and the biomedical relations being classified. Our results will hopefully shed light on developing better algorithms for biomedical ontology classification based on the MeSH keywords of PubMed publications. For reproducibility purposes, MeSH2Matrix, as well as all our source codes, are made publicly accessible at https://github.com/SisonkeBiotik-Africa/MeSH2Matrix .</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"18"},"PeriodicalIF":2.0000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11445994/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-024-00319-w","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Biomedical relation classification has been significantly improved by the application of advanced machine learning techniques on the raw texts of scholarly publications. Despite this improvement, the reliance on large chunks of raw text makes these algorithms suffer in terms of generalization, precision, and reliability. The use of the distinctive characteristics of bibliographic metadata can prove effective in achieving better performance for this challenging task. In this research paper, we introduce an approach for biomedical relation classification using the qualifiers of co-occurring Medical Subject Headings (MeSH). First of all, we introduce MeSH2Matrix, our dataset consisting of 46,469 biomedical relations curated from PubMed publications using our approach. Our dataset includes a matrix that maps associations between the qualifiers of subject MeSH keywords and those of object MeSH keywords. It also specifies the corresponding Wikidata relation type and the superclass of semantic relations for each relation. Using MeSH2Matrix, we build and train three machine learning models (Support Vector Machine [SVM], a dense model [D-Model], and a convolutional neural network [C-Net]) to evaluate the efficiency of our approach for biomedical relation classification. Our best model achieves an accuracy of 70.78% for 195 classes and 83.09% for five superclasses. Finally, we provide confusion matrix and extensive feature analyses to better examine the relationship between the MeSH qualifiers and the biomedical relations being classified. Our results will hopefully shed light on developing better algorithms for biomedical ontology classification based on the MeSH keywords of PubMed publications. For reproducibility purposes, MeSH2Matrix, as well as all our source codes, are made publicly accessible at https://github.com/SisonkeBiotik-Africa/MeSH2Matrix .

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MeSH2Matrix：结合 MeSH 关键词和机器学习，基于 PubMed 进行生物医学关系分类。

通过在学术出版物的原始文本上应用先进的机器学习技术，生物医学关系分类得到了显著改善。尽管有所改进，但对大块原始文本的依赖使这些算法在泛化、精确度和可靠性方面受到影响。事实证明，利用书目元数据的显著特征可以有效提高这项具有挑战性任务的性能。在本研究论文中，我们介绍了一种利用共现医学主题词表（MeSH）的限定词进行生物医学关系分类的方法。首先，我们介绍了 MeSH2Matrix，这是我们的数据集，由 46,469 个使用我们的方法从 PubMed 出版物中整理出来的生物医学关系组成。我们的数据集包含一个矩阵，它映射了主题 MeSH 关键词和对象 MeSH 关键词限定词之间的关联。它还指定了相应的 Wikidata 关系类型和每个关系的超类语义关系。利用 MeSH2Matrix，我们建立并训练了三个机器学习模型（支持向量机 [SVM]、密集模型 [D-Model] 和卷积神经网络 [C-Net]），以评估我们的生物医学关系分类方法的效率。我们的最佳模型在 195 个类别中达到了 70.78% 的准确率，在五个超类别中达到了 83.09% 的准确率。最后，我们提供了混淆矩阵和广泛的特征分析，以更好地研究 MeSH 限定词与被分类的生物医学关系之间的关系。我们的研究结果有望为基于 PubMed 出版物的 MeSH 关键词开发更好的生物医学本体分类算法提供启示。出于可重复性的目的，MeSH2Matrix 以及我们所有的源代码均可在 https://github.com/SisonkeBiotik-Africa/MeSH2Matrix 网站上公开访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

4.20

自引率

5.30%

发文量

审稿时长

30 weeks

期刊介绍： Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.