A Machine Learning-Based Tool for Exploring COVID-19 Scientific Literature

M. Allaoui, Nour El-Houda Sayah Ben Aissa, Abdellah Ben Belghith, M. L. Kherfi
{"title":"A Machine Learning-Based Tool for Exploring COVID-19 Scientific Literature","authors":"M. Allaoui, Nour El-Houda Sayah Ben Aissa, Abdellah Ben Belghith, M. L. Kherfi","doi":"10.1109/ICRAMI52622.2021.9585958","DOIUrl":null,"url":null,"abstract":"The advent of the COVID-19 pandemic caused by the Sars-CoV2 virus has caused serious damage in different areas. This has prompted thousands of researchers from different disciplines (biology, medicine, artificial intelligence, economics, etc.) to publish a very large number of scientific articles in a very short period, to answer questions related to this pandemic. This abundance of literature, however, raised another problem. It has indeed become extremely difficult for a researcher or a decision-maker to stay up to date with the latest scientific advances or to locate scientific articles related to a specific aspect of this pandemic. In this paper, we present an intelligent tool based on Machine learning, which automatically organizes a large dataset of Covid-19 related scientific literature and visualizes them in a way that helps these people navigating easily through this dataset and locating the sought documents easily. The documents are first pre-processed and transformed into numerical features. Then, those features are passed through a deep denoising autoencoder followed by Uniform Manifold Approximation and Projection technique (UMAP) to reduce their dimensionality into a 2D space. The projected data are then clustered with Agglomerative Clustering Algorithm. This is followed by a topic modeling step which we performed using Latent Dirichlet Allocation (LDA), in order to assign a label to each cluster. Finally, the documents are visualized to the user in an interactive interface that we developed. The experiments we conducted proved that our tool is efficient and useful.","PeriodicalId":440750,"journal":{"name":"2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAMI52622.2021.9585958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The advent of the COVID-19 pandemic caused by the Sars-CoV2 virus has caused serious damage in different areas. This has prompted thousands of researchers from different disciplines (biology, medicine, artificial intelligence, economics, etc.) to publish a very large number of scientific articles in a very short period, to answer questions related to this pandemic. This abundance of literature, however, raised another problem. It has indeed become extremely difficult for a researcher or a decision-maker to stay up to date with the latest scientific advances or to locate scientific articles related to a specific aspect of this pandemic. In this paper, we present an intelligent tool based on Machine learning, which automatically organizes a large dataset of Covid-19 related scientific literature and visualizes them in a way that helps these people navigating easily through this dataset and locating the sought documents easily. The documents are first pre-processed and transformed into numerical features. Then, those features are passed through a deep denoising autoencoder followed by Uniform Manifold Approximation and Projection technique (UMAP) to reduce their dimensionality into a 2D space. The projected data are then clustered with Agglomerative Clustering Algorithm. This is followed by a topic modeling step which we performed using Latent Dirichlet Allocation (LDA), in order to assign a label to each cluster. Finally, the documents are visualized to the user in an interactive interface that we developed. The experiments we conducted proved that our tool is efficient and useful.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于机器学习的COVID-19科学文献探索工具
由Sars-CoV2病毒引起的COVID-19大流行的到来,在不同地区造成了严重破坏。这促使来自不同学科(生物学、医学、人工智能、经济学等)的数千名研究人员在很短的时间内发表了大量的科学文章,以回答与此次大流行有关的问题。然而,如此丰富的文献也带来了另一个问题。对于研究人员或决策者来说,跟上最新的科学进展或找到与这一流行病的特定方面有关的科学文章确实变得极其困难。在本文中,我们提出了一种基于机器学习的智能工具,该工具可以自动组织与Covid-19相关的科学文献的大型数据集,并以一种帮助这些人轻松浏览该数据集并轻松定位所需文档的方式将其可视化。首先对文件进行预处理并转换为数值特征。然后,将这些特征通过深度去噪自动编码器,然后使用均匀流形逼近和投影技术(UMAP)将其降维到二维空间。然后用聚类聚类算法对投影数据进行聚类。接下来是主题建模步骤,我们使用潜狄利克雷分配(Latent Dirichlet Allocation, LDA)执行该步骤,以便为每个集群分配一个标签。最后,这些文档在我们开发的交互界面中显示给用户。我们所做的实验证明了我们的工具是高效和有用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Simulation Of The Structure FSS Using The WCIP Method For Dual Polarization Applications Impact of Mixup Hyperparameter Tunning on Deep Learning-based Systems for Acoustic Scene Classification Analysis of Solutions for a Reaction-Diffusion Epidemic Model Segmentation of Positron Emission Tomography Images Using Multi-atlas Anatomical Magnetic Resonance Imaging (MRI) Multi-Input CNN for molecular classification in breast cancer
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1