M. Allaoui, Nour El-Houda Sayah Ben Aissa, Abdellah Ben Belghith, M. L. Kherfi
{"title":"A Machine Learning-Based Tool for Exploring COVID-19 Scientific Literature","authors":"M. Allaoui, Nour El-Houda Sayah Ben Aissa, Abdellah Ben Belghith, M. L. Kherfi","doi":"10.1109/ICRAMI52622.2021.9585958","DOIUrl":null,"url":null,"abstract":"The advent of the COVID-19 pandemic caused by the Sars-CoV2 virus has caused serious damage in different areas. This has prompted thousands of researchers from different disciplines (biology, medicine, artificial intelligence, economics, etc.) to publish a very large number of scientific articles in a very short period, to answer questions related to this pandemic. This abundance of literature, however, raised another problem. It has indeed become extremely difficult for a researcher or a decision-maker to stay up to date with the latest scientific advances or to locate scientific articles related to a specific aspect of this pandemic. In this paper, we present an intelligent tool based on Machine learning, which automatically organizes a large dataset of Covid-19 related scientific literature and visualizes them in a way that helps these people navigating easily through this dataset and locating the sought documents easily. The documents are first pre-processed and transformed into numerical features. Then, those features are passed through a deep denoising autoencoder followed by Uniform Manifold Approximation and Projection technique (UMAP) to reduce their dimensionality into a 2D space. The projected data are then clustered with Agglomerative Clustering Algorithm. This is followed by a topic modeling step which we performed using Latent Dirichlet Allocation (LDA), in order to assign a label to each cluster. Finally, the documents are visualized to the user in an interactive interface that we developed. The experiments we conducted proved that our tool is efficient and useful.","PeriodicalId":440750,"journal":{"name":"2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAMI52622.2021.9585958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The advent of the COVID-19 pandemic caused by the Sars-CoV2 virus has caused serious damage in different areas. This has prompted thousands of researchers from different disciplines (biology, medicine, artificial intelligence, economics, etc.) to publish a very large number of scientific articles in a very short period, to answer questions related to this pandemic. This abundance of literature, however, raised another problem. It has indeed become extremely difficult for a researcher or a decision-maker to stay up to date with the latest scientific advances or to locate scientific articles related to a specific aspect of this pandemic. In this paper, we present an intelligent tool based on Machine learning, which automatically organizes a large dataset of Covid-19 related scientific literature and visualizes them in a way that helps these people navigating easily through this dataset and locating the sought documents easily. The documents are first pre-processed and transformed into numerical features. Then, those features are passed through a deep denoising autoencoder followed by Uniform Manifold Approximation and Projection technique (UMAP) to reduce their dimensionality into a 2D space. The projected data are then clustered with Agglomerative Clustering Algorithm. This is followed by a topic modeling step which we performed using Latent Dirichlet Allocation (LDA), in order to assign a label to each cluster. Finally, the documents are visualized to the user in an interactive interface that we developed. The experiments we conducted proved that our tool is efficient and useful.