Synergistic Distributed CNN Model for Protein Classification With a Collaborative BSP Synchronization Based on LSTM Prediction

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Concurrency and Computation-Practice & Experience Pub Date : 2025-02-14 DOI:10.1002/cpe.70025

Tasnim Assali, Zayneb Trabelsi Ayoub, Sofiane Ouni

{"title":"Synergistic Distributed CNN Model for Protein Classification With a Collaborative BSP Synchronization Based on LSTM Prediction","authors":"Tasnim Assali, Zayneb Trabelsi Ayoub, Sofiane Ouni","doi":"10.1002/cpe.70025","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Recently, distributed deep learning has been introduced as the new highly computational solution that could handle huge amounts of data and reduce training time. Especially when handling high-dimensional and complicated data, is very challenging, such as dealing with Genomics which is the most demanding in terms of data acquisition, storage, distribution, and analysis. However, Distributed deep learning has issues that need to be resolved. Focusing on the synchronization paradigm, BSP (Bulk Synchronous Parallel) is the most used model. Even so, it is demanding in terms of time due to an exigent problem called the straggler, where all the workers need to wait for the slowest worker to synchronize. Therefore, in this article, we propose a collaborative BSP (Collab-BSP) that aims to solve this issue by adopting LSTM for execution time prediction and implementing it with the Apache Spark environment. We proved the efficiency of our approach in reducing the waiting time and iteration time by 50% and 30%, respectively. Also, our approach demonstrated promising results while training a distributed CNN for protein classification with 98.82% accuracy and proved its capability to enhance distributed deep learning training.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 4-5","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70025","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, distributed deep learning has been introduced as the new highly computational solution that could handle huge amounts of data and reduce training time. Especially when handling high-dimensional and complicated data, is very challenging, such as dealing with Genomics which is the most demanding in terms of data acquisition, storage, distribution, and analysis. However, Distributed deep learning has issues that need to be resolved. Focusing on the synchronization paradigm, BSP (Bulk Synchronous Parallel) is the most used model. Even so, it is demanding in terms of time due to an exigent problem called the straggler, where all the workers need to wait for the slowest worker to synchronize. Therefore, in this article, we propose a collaborative BSP (Collab-BSP) that aims to solve this issue by adopting LSTM for execution time prediction and implementing it with the Apache Spark environment. We proved the efficiency of our approach in reducing the waiting time and iteration time by 50% and 30%, respectively. Also, our approach demonstrated promising results while training a distributed CNN for protein classification with 98.82% accuracy and proved its capability to enhance distributed deep learning training.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.