A Machine Learning-Based Tool for Exploring COVID-19 Scientific Literature

2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI) Pub Date : 2021-09-21 DOI:10.1109/ICRAMI52622.2021.9585958

M. Allaoui, Nour El-Houda Sayah Ben Aissa, Abdellah Ben Belghith, M. L. Kherfi

{"title":"A Machine Learning-Based Tool for Exploring COVID-19 Scientific Literature","authors":"M. Allaoui, Nour El-Houda Sayah Ben Aissa, Abdellah Ben Belghith, M. L. Kherfi","doi":"10.1109/ICRAMI52622.2021.9585958","DOIUrl":null,"url":null,"abstract":"The advent of the COVID-19 pandemic caused by the Sars-CoV2 virus has caused serious damage in different areas. This has prompted thousands of researchers from different disciplines (biology, medicine, artificial intelligence, economics, etc.) to publish a very large number of scientific articles in a very short period, to answer questions related to this pandemic. This abundance of literature, however, raised another problem. It has indeed become extremely difficult for a researcher or a decision-maker to stay up to date with the latest scientific advances or to locate scientific articles related to a specific aspect of this pandemic. In this paper, we present an intelligent tool based on Machine learning, which automatically organizes a large dataset of Covid-19 related scientific literature and visualizes them in a way that helps these people navigating easily through this dataset and locating the sought documents easily. The documents are first pre-processed and transformed into numerical features. Then, those features are passed through a deep denoising autoencoder followed by Uniform Manifold Approximation and Projection technique (UMAP) to reduce their dimensionality into a 2D space. The projected data are then clustered with Agglomerative Clustering Algorithm. This is followed by a topic modeling step which we performed using Latent Dirichlet Allocation (LDA), in order to assign a label to each cluster. Finally, the documents are visualized to the user in an interactive interface that we developed. The experiments we conducted proved that our tool is efficient and useful.","PeriodicalId":440750,"journal":{"name":"2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAMI52622.2021.9585958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The advent of the COVID-19 pandemic caused by the Sars-CoV2 virus has caused serious damage in different areas. This has prompted thousands of researchers from different disciplines (biology, medicine, artificial intelligence, economics, etc.) to publish a very large number of scientific articles in a very short period, to answer questions related to this pandemic. This abundance of literature, however, raised another problem. It has indeed become extremely difficult for a researcher or a decision-maker to stay up to date with the latest scientific advances or to locate scientific articles related to a specific aspect of this pandemic. In this paper, we present an intelligent tool based on Machine learning, which automatically organizes a large dataset of Covid-19 related scientific literature and visualizes them in a way that helps these people navigating easily through this dataset and locating the sought documents easily. The documents are first pre-processed and transformed into numerical features. Then, those features are passed through a deep denoising autoencoder followed by Uniform Manifold Approximation and Projection technique (UMAP) to reduce their dimensionality into a 2D space. The projected data are then clustered with Agglomerative Clustering Algorithm. This is followed by a topic modeling step which we performed using Latent Dirichlet Allocation (LDA), in order to assign a label to each cluster. Finally, the documents are visualized to the user in an interactive interface that we developed. The experiments we conducted proved that our tool is efficient and useful.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于机器学习的COVID-19科学文献探索工具

由Sars-CoV2病毒引起的COVID-19大流行的到来，在不同地区造成了严重破坏。这促使来自不同学科(生物学、医学、人工智能、经济学等)的数千名研究人员在很短的时间内发表了大量的科学文章，以回答与此次大流行有关的问题。然而，如此丰富的文献也带来了另一个问题。对于研究人员或决策者来说，跟上最新的科学进展或找到与这一流行病的特定方面有关的科学文章确实变得极其困难。在本文中，我们提出了一种基于机器学习的智能工具，该工具可以自动组织与Covid-19相关的科学文献的大型数据集，并以一种帮助这些人轻松浏览该数据集并轻松定位所需文档的方式将其可视化。首先对文件进行预处理并转换为数值特征。然后，将这些特征通过深度去噪自动编码器，然后使用均匀流形逼近和投影技术(UMAP)将其降维到二维空间。然后用聚类聚类算法对投影数据进行聚类。接下来是主题建模步骤，我们使用潜狄利克雷分配(Latent Dirichlet Allocation, LDA)执行该步骤，以便为每个集群分配一个标签。最后，这些文档在我们开发的交互界面中显示给用户。我们所做的实验证明了我们的工具是高效和有用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)

自引率

0.00%

发文量