首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
A novel framework for generic Spark workload characterization and similar pattern recognition using machine learning 利用机器学习进行通用 Spark 工作负载特征描述和类似模式识别的新型框架
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-03-26 DOI: 10.1016/j.jpdc.2024.104881
Mariano Garralda-Barrio, Carlos Eiras-Franco, Verónica Bolón-Canedo

Comprehensive workload characterization plays a pivotal role in comprehending Spark applications, as it enables the analysis of diverse aspects and behaviors. This understanding is indispensable for devising downstream tuning objectives, such as performance improvement. To address this pivotal issue, our work introduces a novel and scalable framework for generic Spark workload characterization, complemented by consistent geometric measurements. The presented approach aims to build robust workload descriptors by profiling only quantitative metrics at the application task-level, in a non-intrusive manner. We expand our framework for downstream workload pattern recognition by incorporating unsupervised machine learning techniques: clustering algorithms and feature selection. These techniques significantly improve the process of grouping similar workloads without relying on predefined labels. We effectively recognize 24 representative Spark workloads from diverse domains, including SQL, machine learning, web search, graph, and micro-benchmarks, available in HiBench. Our framework achieves a high accuracy F-Measure score of up to 90.9% and a Normalized Mutual Information of up to 94.5% in similar workload pattern recognition. These scores significantly outperform the results obtained in a comparative analysis with an established workload characterization approach in the literature.

全面的工作负载特征描述在理解 Spark 应用程序方面起着至关重要的作用,因为它可以对不同的方面和行为进行分析。这种理解对于设计下游调整目标(如提高性能)是不可或缺的。为解决这一关键问题,我们的工作引入了一个新颖且可扩展的框架,用于通用 Spark 工作负载特征描述,并辅以一致的几何测量。所介绍的方法旨在以非侵入式方式,仅对应用任务级的定量指标进行剖析,从而建立稳健的工作负载描述符。我们结合了无监督机器学习技术:聚类算法和特征选择,从而扩展了下游工作负载模式识别框架。这些技术大大改进了类似工作负载的分组过程,而无需依赖预定义标签。我们有效识别了 24 种具有代表性的 Spark 工作负载,它们来自不同的领域,包括 SQL、机器学习、网络搜索、图和 HiBench 中的微基准。在类似工作负载模式识别方面,我们的框架获得了高达 90.9% 的高精度 F-Measure 分数和高达 94.5% 的归一化互信息。这些分数大大超过了与文献中已有的工作负载特征描述方法进行比较分析后得出的结果。
{"title":"A novel framework for generic Spark workload characterization and similar pattern recognition using machine learning","authors":"Mariano Garralda-Barrio,&nbsp;Carlos Eiras-Franco,&nbsp;Verónica Bolón-Canedo","doi":"10.1016/j.jpdc.2024.104881","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104881","url":null,"abstract":"<div><p>Comprehensive workload characterization plays a pivotal role in comprehending Spark applications, as it enables the analysis of diverse aspects and behaviors. This understanding is indispensable for devising downstream tuning objectives, such as performance improvement. To address this pivotal issue, our work introduces a novel and scalable framework for generic Spark workload characterization, complemented by consistent geometric measurements. The presented approach aims to build robust workload descriptors by profiling only quantitative metrics at the application task-level, in a non-intrusive manner. We expand our framework for downstream workload pattern recognition by incorporating unsupervised machine learning techniques: clustering algorithms and feature selection. These techniques significantly improve the process of grouping similar workloads without relying on predefined labels. We effectively recognize 24 representative Spark workloads from diverse domains, including SQL, machine learning, web search, graph, and micro-benchmarks, available in HiBench. Our framework achieves a high accuracy F-Measure score of up to 90.9% and a Normalized Mutual Information of up to 94.5% in similar workload pattern recognition. These scores significantly outperform the results obtained in a comparative analysis with an established workload characterization approach in the literature.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000455/pdfft?md5=f38d6d7d46cfa72abd25c2f3150c7112&pid=1-s2.0-S0743731524000455-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140309738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cloud-edge-end workflow scheduling with multiple privacy levels 具有多种隐私级别的云端工作流程调度
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-03-25 DOI: 10.1016/j.jpdc.2024.104882
Shuang Wang , Zian Yuan , Xiaodong Zhang , Jiawen Wu , Yamin Wang

The cloud-edge-end architecture satisfies the execution requirements of various workflow applications. However, owing to the diversity of resources, the complex hierarchical structure, and different privacy requirements for users, determining how to lease suitable cloud-edge-end resources, schedule multi-privacy-level workflow tasks, and optimize leasing costs is currently one of the key challenges in cloud computing. In this paper, we address the scheduling optimization problem of workflow applications containing tasks with multiple privacy levels. To tackle this problem, we propose a heuristic privacy-preserving workflow scheduling algorithm (PWHSA) designed to minimize rental costs which includes time parameter estimation, task sub-deadline division, scheduling sequence generation, task scheduling, and task adjustment, with candidate strategies developed for each component. These candidate strategies in each step undergo statistical calibration across a comprehensive set of workflow instances. We compare the proposed algorithm with modified classical algorithms that target similar problems. The experimental results demonstrate that the PWHSA algorithm outperforms the comparison algorithms while maintaining acceptable execution times.

云端架构满足了各种工作流应用的执行要求。然而,由于资源的多样性、层次结构的复杂性以及用户对隐私的不同要求,如何租用合适的云端资源、调度多隐私级别的工作流任务并优化租用成本是当前云计算面临的关键挑战之一。在本文中,我们讨论了包含多隐私级别任务的工作流应用的调度优化问题。为了解决这个问题,我们提出了一种旨在最小化租赁成本的启发式隐私保护工作流调度算法(PWHSA),该算法包括时间参数估计、任务子截止日期划分、调度序列生成、任务调度和任务调整,每个部分都有候选策略。每个步骤中的候选策略都会在一组全面的工作流程实例中进行统计校准。我们将所提出的算法与针对类似问题的改进型经典算法进行了比较。实验结果表明,在保持可接受的执行时间的同时,PWHSA 算法优于比较算法。
{"title":"Cloud-edge-end workflow scheduling with multiple privacy levels","authors":"Shuang Wang ,&nbsp;Zian Yuan ,&nbsp;Xiaodong Zhang ,&nbsp;Jiawen Wu ,&nbsp;Yamin Wang","doi":"10.1016/j.jpdc.2024.104882","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104882","url":null,"abstract":"<div><p>The cloud-edge-end architecture satisfies the execution requirements of various workflow applications. However, owing to the diversity of resources, the complex hierarchical structure, and different privacy requirements for users, determining how to lease suitable cloud-edge-end resources, schedule multi-privacy-level workflow tasks, and optimize leasing costs is currently one of the key challenges in cloud computing. In this paper, we address the scheduling optimization problem of workflow applications containing tasks with multiple privacy levels. To tackle this problem, we propose a heuristic privacy-preserving workflow scheduling algorithm (PWHSA) designed to minimize rental costs which includes time parameter estimation, task sub-deadline division, scheduling sequence generation, task scheduling, and task adjustment, with candidate strategies developed for each component. These candidate strategies in each step undergo statistical calibration across a comprehensive set of workflow instances. We compare the proposed algorithm with modified classical algorithms that target similar problems. The experimental results demonstrate that the PWHSA algorithm outperforms the comparison algorithms while maintaining acceptable execution times.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140309231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SCIPIS: Scalable and concurrent persistent indexing and search in high-end computing systems SCIPIS:高端计算系统中的可扩展并发持续索引和搜索
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-03-25 DOI: 10.1016/j.jpdc.2024.104878
Alexandru Iulian Orhean , Anna Giannakou , Lavanya Ramakrishnan , Kyle Chard , Boris Glavic , Ioan Raicu

While it is now routine to search for data on a personal computer or discover data online, there is no such equivalent method for discovering data on large parallel and distributed file systems commonly deployed on HPC systems. In contrast to web search, which has to deal with a larger number of relatively small files, in HPC applications there is a need to also support efficient indexing of large files. We propose SCIPIS, an indexing and search framework, that can exploit the properties of modern high-end computing systems, with many-core architectures, multiple NUMA nodes and multiple NVMe storage devices. SCIPIS supports building and searching TFIDF persistent indexes, and can deliver orders of magnitude better performance than state-of-the-art approaches. We achieve scalability and performance of indexing by decomposing the indexing process into separate components that can be optimized independently, by building disk-friendly data structures in-memory that can be persisted in long sequential writes, and by avoiding communication between indexing threads that collaboratively build an index over a collection of large files. We evaluated SCIPIS with three types of datasets (logs, scientific data, and metadata), on systems with configurations up to 192-cores, 768 GiB of RAM, 8 NUMA nodes, and up to 16 NVMe drives, and achieved up to 29x better indexing while maintaining similar search latency when compared to Apache Lucene.

在个人电脑上搜索数据或在线发现数据现在已是家常便饭,但在大型并行和分布式文件系统上发现数据却没有类似的方法,这些系统通常部署在高性能计算系统上。与必须处理大量相对较小文件的网络搜索不同,在高性能计算应用中,还需要支持高效的大文件索引。我们提出的 SCIPIS 是一个索引和搜索框架,可以利用多核架构、多 NUMA 节点和多 NVMe 存储设备等现代高端计算系统的特性。SCIPIS 支持构建和搜索 TFIDF 持久性索引,其性能比最先进的方法高出几个数量级。我们通过以下方法实现了索引的可扩展性和性能:将索引过程分解为可独立优化的单独组件;在内存中构建磁盘友好型数据结构(可在长时间顺序写入中持久化);避免索引线程之间的通信(这些线程在大型文件集合上协作构建索引)。我们使用三种类型的数据集(日志、科学数据和元数据)对 SCIPIS 进行了评估,系统配置高达 192 核、768GB 内存、8 个 NUMA 节点和多达 16 个 NVMe 驱动器,与 Apache Lucene 相比,索引效果提高了 29 倍,同时保持了类似的搜索延迟。
{"title":"SCIPIS: Scalable and concurrent persistent indexing and search in high-end computing systems","authors":"Alexandru Iulian Orhean ,&nbsp;Anna Giannakou ,&nbsp;Lavanya Ramakrishnan ,&nbsp;Kyle Chard ,&nbsp;Boris Glavic ,&nbsp;Ioan Raicu","doi":"10.1016/j.jpdc.2024.104878","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104878","url":null,"abstract":"<div><p>While it is now routine to search for data on a personal computer or discover data online, there is no such equivalent method for discovering data on large parallel and distributed file systems commonly deployed on HPC systems. In contrast to web search, which has to deal with a larger number of relatively small files, in HPC applications there is a need to also support efficient indexing of large files. We propose SCIPIS, an indexing and search framework, that can exploit the properties of modern high-end computing systems, with many-core architectures, multiple NUMA nodes and multiple NVMe storage devices. SCIPIS supports building and searching TFIDF persistent indexes, and can deliver orders of magnitude better performance than state-of-the-art approaches. We achieve scalability and performance of indexing by decomposing the indexing process into separate components that can be optimized independently, by building disk-friendly data structures in-memory that can be persisted in long sequential writes, and by avoiding communication between indexing threads that collaboratively build an index over a collection of large files. We evaluated SCIPIS with three types of datasets (logs, scientific data, and metadata), on systems with configurations up to 192-cores, 768 GiB of RAM, 8 NUMA nodes, and up to 16 NVMe drives, and achieved up to 29x better indexing while maintaining similar search latency when compared to Apache Lucene.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140321203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning-driven hybrid scaling for multi-type services in cloud 云中多类型服务的学习驱动混合扩展
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-03-24 DOI: 10.1016/j.jpdc.2024.104880
Haitao Zhang, Tongyu Guo, Wei Tian, Huadong Ma

In order to deal with the fast changing requirements of container based services in clouds, auto-scaling is used as an essential mechanism for adapting the number of provisioned resources with the variable service workloads. However, the latest auto-scaling approaches lack the comprehensive consideration of variable workloads and hybrid auto-scaling for multi-type services. Firstly, the historical data based proactive approaches are widely used to handle complex and variable workloads in advance. The decision-making accuracy of proactive approaches depends on the prediction algorithm, which is affected by the anomalies, missing values and errors in the historical workload data, and the unexpected workload cannot be handled. Secondly, the trigger based reactive approaches are seriously affected by workload fluctuation which causes the frequent invalid scaling of service resources. Besides, due to the existence of scaling time, there are different completion delays of different scaling actions. Thirdly, the latest approaches also ignore the different scaling time of hybrid scaling for multi-type services including stateful services and stateless services. Especially, when the stateful services are scaled horizontally, the neglected long scaling time causes the untimely supply and withdrawal of resources. Consequently, all three issues above can lead to the degradation of Quality of Services (QoS) and the inefficient utilization of resources. This paper proposes a new hybrid auto-scaling approach for multi-type services to resolve the impact of service scaling time on decision making. We combine the proactive scaling strategy with the reactive anomaly detection and correction mechanism. For making a proactive decision, the ensemble learning model with the structure improved deep network is designed to predict the future workload. On the basis of the predicted results and the scaling time of different types of services, the auto-scaling decisions are made by a Deep Reinforcement Learning (DRL) model with heterogeneous action space, which integrates horizontal and vertical scaling actions. Meanwhile, with the anomaly detection and correction mechanism, the workload fluctuation and unexpected workload can be detected and handled. We evaluate our approach against three different proactive and reactive auto-scaling approaches in the cloud environment, and the experimental results show the proposed approach can achieve the better scaling behavior compared to state-of-the-art approaches.

为了应对云中基于容器的服务的快速变化需求,自动缩放被用作一种重要机制,用于根据可变的服务工作负载调整供应资源的数量。然而,最新的自动缩放方法缺乏对可变工作负载和多类型服务混合自动缩放的全面考虑。首先,基于历史数据的主动方法被广泛用于提前处理复杂多变的工作负载。主动式方法的决策准确性取决于预测算法,而预测算法会受到历史工作负载数据异常、缺失值和错误的影响,无法处理突发的工作负载。其次,基于触发器的被动方法会受到工作量波动的严重影响,导致服务资源的频繁无效扩展。此外,由于缩放时间的存在,不同的缩放操作存在不同的完成延迟。第三,最新的方法还忽略了多类型服务(包括有状态服务和无状态服务)混合缩放的不同缩放时间。特别是当有状态服务横向扩展时,由于忽略了较长的扩展时间,导致资源的供应和撤出不及时。因此,上述三个问题都会导致服务质量(QoS)下降和资源利用效率低下。本文针对多类型服务提出了一种新的混合自动缩放方法,以解决服务缩放时间对决策的影响。我们将主动缩放策略与被动异常检测和纠正机制相结合。为了做出主动决策,我们设计了具有结构改进深度网络的集合学习模型来预测未来的工作量。在预测结果和不同类型服务的缩放时间的基础上,由具有异构行动空间的深度强化学习(DRL)模型做出自动缩放决策,该模型整合了横向和纵向缩放行动。同时,通过异常检测和纠正机制,可以检测并处理工作负载波动和意外工作负载。我们针对云环境中三种不同的主动和被动自动缩放方法对我们的方法进行了评估,实验结果表明,与最先进的方法相比,我们提出的方法可以实现更好的缩放行为。
{"title":"Learning-driven hybrid scaling for multi-type services in cloud","authors":"Haitao Zhang,&nbsp;Tongyu Guo,&nbsp;Wei Tian,&nbsp;Huadong Ma","doi":"10.1016/j.jpdc.2024.104880","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104880","url":null,"abstract":"<div><p>In order to deal with the fast changing requirements of container based services in clouds, auto-scaling is used as an essential mechanism for adapting the number of provisioned resources with the variable service workloads. However, the latest auto-scaling approaches lack the comprehensive consideration of variable workloads and hybrid auto-scaling for multi-type services. Firstly, the historical data based proactive approaches are widely used to handle complex and variable workloads in advance. The decision-making accuracy of proactive approaches depends on the prediction algorithm, which is affected by the anomalies, missing values and errors in the historical workload data, and the unexpected workload cannot be handled. Secondly, the trigger based reactive approaches are seriously affected by workload fluctuation which causes the frequent invalid scaling of service resources. Besides, due to the existence of scaling time, there are different completion delays of different scaling actions. Thirdly, the latest approaches also ignore the different scaling time of hybrid scaling for multi-type services including stateful services and stateless services. Especially, when the stateful services are scaled horizontally, the neglected long scaling time causes the untimely supply and withdrawal of resources. Consequently, all three issues above can lead to the degradation of Quality of Services (QoS) and the inefficient utilization of resources. This paper proposes a new hybrid auto-scaling approach for multi-type services to resolve the impact of service scaling time on decision making. We combine the proactive scaling strategy with the reactive anomaly detection and correction mechanism. For making a proactive decision, the ensemble learning model with the structure improved deep network is designed to predict the future workload. On the basis of the predicted results and the scaling time of different types of services, the auto-scaling decisions are made by a Deep Reinforcement Learning (DRL) model with heterogeneous action space, which integrates horizontal and vertical scaling actions. Meanwhile, with the anomaly detection and correction mechanism, the workload fluctuation and unexpected workload can be detected and handled. We evaluate our approach against three different proactive and reactive auto-scaling approaches in the cloud environment, and the experimental results show the proposed approach can achieve the better scaling behavior compared to state-of-the-art approaches.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140295918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A characterization of soft-error sensitivity in data-parallel and model-parallel distributed deep learning 数据并行和模型并行分布式深度学习中软误差敏感性的表征
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-03-21 DOI: 10.1016/j.jpdc.2024.104879
Elvis Rojas , Diego Pérez , Esteban Meneses

The latest advances in artificial intelligence deep learning models are unprecedented. A wide spectrum of application areas is now thriving thanks to available massive training datasets and gigantic complex neural network models. Those two characteristics demand outstanding computing power that only advanced computing platforms can provide. Therefore, distributed deep learning has become a necessity in capitalizing on the potential of cutting-edge artificial intelligence. Two basic schemes have emerged in distributed learning. First, the data-parallel approach, which aims at dividing the training dataset into multiple computing nodes. Second, the model-parallel approach, which splits layers of a model into several computing nodes. Each scheme has its upsides and downsides, particularly when running on large machines that are susceptible to soft errors. Those errors occur as a consequence of several factors involved in the manufacturing process of current electronic components of supercomputers. On many occasions, those errors are expressed as bit flips that do not cause the whole system to crash, but generate wrong numerical results in computations. To study the effect of soft error on different approaches for distributed learning, we leverage checkpoint alteration, a technique that injects bit flips on checkpoint files. It allows researchers to understand the effect of soft errors on applications that produce checkpoint files in HDF5 format. This paper uses the popular deep learning PyTorch tool on two distributed-learning platforms: one for data-parallel training and one for model-parallel training. We use well-known deep learning models with popular training datasets to provide a picture of how soft errors challenge the training phase of a deep learning model.

人工智能深度学习模型的最新进展是前所未有的。得益于现有的海量训练数据集和巨型复杂神经网络模型,广泛的应用领域正在蓬勃发展。这两个特点需要出色的计算能力,而只有先进的计算平台才能提供这种能力。因此,分布式深度学习已成为利用尖端人工智能潜力的必然选择。分布式学习出现了两种基本方案。第一,数据并行方法,旨在将训练数据集划分到多个计算节点中。第二,模型并行方法,即把一个模型的各层分成多个计算节点。每种方案都有其优点和缺点,尤其是在大型机器上运行时,容易出现软误差。目前超级计算机电子元件的制造过程中存在多种因素,导致了这些错误的发生。在许多情况下,这些错误表现为位翻转,不会导致整个系统崩溃,但会在计算中产生错误的数值结果。为了研究软错误对不同分布式学习方法的影响,我们利用了检查点更改技术,这是一种在检查点文件中注入位翻转的技术。它能让研究人员了解软错误对生成 HDF5 格式检查点文件的应用程序的影响。本文在两个分布式学习平台上使用了流行的深度学习 PyTorch 工具:一个用于数据并行训练,另一个用于模型并行训练。我们使用知名的深度学习模型和流行的训练数据集,来说明软错误是如何挑战深度学习模型的训练阶段的。
{"title":"A characterization of soft-error sensitivity in data-parallel and model-parallel distributed deep learning","authors":"Elvis Rojas ,&nbsp;Diego Pérez ,&nbsp;Esteban Meneses","doi":"10.1016/j.jpdc.2024.104879","DOIUrl":"10.1016/j.jpdc.2024.104879","url":null,"abstract":"<div><p>The latest advances in artificial intelligence deep learning models are unprecedented. A wide spectrum of application areas is now thriving thanks to available massive training datasets and gigantic complex neural network models. Those two characteristics demand outstanding computing power that only advanced computing platforms can provide. Therefore, distributed deep learning has become a necessity in capitalizing on the potential of cutting-edge artificial intelligence. Two basic schemes have emerged in distributed learning. First, the data-parallel approach, which aims at dividing the training dataset into multiple computing nodes. Second, the model-parallel approach, which splits layers of a model into several computing nodes. Each scheme has its upsides and downsides, particularly when running on large machines that are susceptible to soft errors. Those errors occur as a consequence of several factors involved in the manufacturing process of current electronic components of supercomputers. On many occasions, those errors are expressed as bit flips that do not cause the whole system to crash, but generate wrong numerical results in computations. To study the effect of soft error on different approaches for distributed learning, we leverage checkpoint alteration, a technique that injects bit flips on checkpoint files. It allows researchers to understand the effect of soft errors on applications that produce checkpoint files in HDF5 format. This paper uses the popular deep learning PyTorch tool on two distributed-learning platforms: one for data-parallel training and one for model-parallel training. We use well-known deep learning models with popular training datasets to provide a picture of how soft errors challenge the training phase of a deep learning model.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140282552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “MLLess: Achieving Cost Efficiency in Serverless Machine Learning Training” [Journal of Parallel and Distributed Computing 183 (2024) 104764] MLLess:在无服务器机器学习训练中实现成本效益》[《并行和分布式计算期刊》183 (2024) 104764] 的更正
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-03-13 DOI: 10.1016/j.jpdc.2024.104871
Pablo Gimeno Sarroca, Marc Sánchez-Artigas
{"title":"Corrigendum to “MLLess: Achieving Cost Efficiency in Serverless Machine Learning Training” [Journal of Parallel and Distributed Computing 183 (2024) 104764]","authors":"Pablo Gimeno Sarroca,&nbsp;Marc Sánchez-Artigas","doi":"10.1016/j.jpdc.2024.104871","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104871","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000352/pdfft?md5=6bfcd55d2f425f1367be6ca0fab8384b&pid=1-s2.0-S0743731524000352-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140122501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Public cloud object storage auditing: Design, implementation, and analysis 公共云对象存储审计:设计、实施和分析
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-03-09 DOI: 10.1016/j.jpdc.2024.104870
Fei Chen , Fengming Meng , Zhipeng Li , Li Li , Tao Xiang

Cloud storage auditing is a technique that enables a user to remotely check the integrity of the outsourced data in the cloud storage. Although researchers have proposed various protocols for cloud storage auditing, the proposed schemes are theoretical in nature, which are not fit for existing mainstream cloud storage service practices. To bridge this gap, this paper proposes a cloud storage auditing system that works for current mainstream cloud object storage services. We design the proposed system over existing proof of data possession (PDP) schemes and make them practical as well as usable in the real world. Specifically, we propose an architecture that separates the compute and storage functionalities of a storage auditing scheme. Because cloud object storage only provides read and write interfaces, we leverage a cloud virtual machine to implement the user-defined computations that are needed in a PDP scheme. We store the authentication tags of the outsourced data as an independent object to allow existing popular cloud storage applications, e.g., file online previewing. We also present a cost model to analyze the economic cost of a cloud storage auditing scheme. The cost model allows a user to balance security, efficiency, and economic cost by tuning various system parameters. We implemented, open-sourced the proposed system over a mainstream cloud object storage service. Experimental analysis shows that the proposed system is pretty efficient and promising for a production environment usage. Specifically, for a 40 GB sized data, the proposed system only incurs 1.66% additional storage cost, 3796 bytes communication cost, 2.9 seconds maximum auditing time cost, and 0.9 CNY per auditing monetary cost.

云存储审计是一种能让用户远程检查云存储中外包数据完整性的技术。虽然研究人员提出了各种云存储审计协议,但所提出的方案都是理论性的,不适合现有主流云存储服务实践。为了弥补这一缺陷,本文提出了一种适用于当前主流云对象存储服务的云存储审计系统。我们在现有数据占有证明(PDP)方案的基础上设计了该系统,并使其在现实世界中切实可行。具体来说,我们提出了一种将存储审计方案的计算和存储功能分离开来的架构。由于云对象存储只提供读写接口,因此我们利用云虚拟机来实现 PDP 方案中所需的用户自定义计算。我们将外包数据的认证标签存储为独立对象,以便允许现有的流行云存储应用(如文件在线预览)。我们还提出了一个成本模型,用于分析云存储审核方案的经济成本。该成本模型允许用户通过调整各种系统参数来平衡安全性、效率和经济成本。我们在主流云对象存储服务上实施了开源的拟议系统。实验分析表明,提议的系统非常高效,有望在生产环境中使用。具体来说,对于 40 GB 大小的数据,建议的系统只产生了 1.66% 的额外存储成本、3796 字节的通信成本、2.9 秒的最长审核时间成本和 0.9 元的每次审核货币成本。
{"title":"Public cloud object storage auditing: Design, implementation, and analysis","authors":"Fei Chen ,&nbsp;Fengming Meng ,&nbsp;Zhipeng Li ,&nbsp;Li Li ,&nbsp;Tao Xiang","doi":"10.1016/j.jpdc.2024.104870","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104870","url":null,"abstract":"<div><p>Cloud storage auditing is a technique that enables a user to remotely check the integrity of the outsourced data in the cloud storage. Although researchers have proposed various protocols for cloud storage auditing, the proposed schemes are theoretical in nature, which are not fit for existing mainstream cloud storage service practices. To bridge this gap, this paper proposes a cloud storage auditing system that works for current mainstream cloud <em>object storage</em> services. We design the proposed system over existing proof of data possession (PDP) schemes and make them practical as well as usable in the real world. Specifically, we propose an architecture that separates the compute and storage functionalities of a storage auditing scheme. Because cloud object storage only provides <span>read</span> and <span>write</span> interfaces, we leverage a cloud virtual machine to implement the user-defined computations that are needed in a PDP scheme. We store the authentication tags of the outsourced data as an independent object to allow existing popular cloud storage applications, e.g., file online previewing. We also present a cost model to analyze the economic cost of a cloud storage auditing scheme. The cost model allows a user to balance security, efficiency, and economic cost by tuning various system parameters. We implemented, open-sourced the proposed system over a mainstream cloud object storage service. Experimental analysis shows that the proposed system is pretty efficient and promising for a production environment usage. Specifically, for a 40 GB sized data, the proposed system only incurs 1.66% additional storage cost, 3796 bytes communication cost, 2.9 seconds maximum auditing time cost, and 0.9 CNY per auditing monetary cost.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140122500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面 1 - 完整扉页(常规期刊)/特刊扉页(特刊)
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-03-09 DOI: 10.1016/S0743-7315(24)00038-8
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00038-8","DOIUrl":"https://doi.org/10.1016/S0743-7315(24)00038-8","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000388/pdfft?md5=ef4b0c5d74636a75840725db69cf440c&pid=1-s2.0-S0743731524000388-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataflow optimization with layer-wise design variables estimation method for enflame CNN accelerators 针对enflame CNN加速器的数据流优化与分层设计变量估算方法
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-03-06 DOI: 10.1016/j.jpdc.2024.104869
Tian Chen , Yu-an Tan , Zheng Zhang , Nan Luo , Bin Li , Yuanzhang Li

As convolution layers have been proved to be the most time-consuming operation in convolutional neural network (CNN) algorithms, many efficient CNN accelerators have been designed to boost the performance of convolution operations. Previous works on CNN acceleration usually use fixed design variables for diverse convolutional layers, which would lead to inefficient data movements and low utilization of computing resource. We tackle this issue by proposing a flexible dataflow optimization method with design variables estimation for different layers. The optimization method first narrows the design space by the priori constraints, and then enumerates all legal solutions to select the optimal design variables. We demonstrate the effectiveness of the proposed optimization method by implementing representative CNN models (VGG-16, ResNet-18 and MobileNet V1) on Enflame Technology's programmable CNN accelerator, General Computing Unit (GCU). The results indicate that our optimization can significantly enhance the throughput of the convolution layers in ResNet, VGG and MobileNet on GCU, with improvement of up to 1.84×. Furthermore, it achieves up to 2.08× of GCU utilization specifically for the convolution layers of ResNet on GCU.

卷积层被证明是卷积神经网络(CNN)算法中最耗时的操作,因此人们设计了许多高效的 CNN 加速器来提高卷积操作的性能。以往关于 CNN 加速的研究通常使用固定的设计变量来设计不同的卷积层,这将导致数据移动效率低下和计算资源利用率低。针对这一问题,我们提出了一种灵活的数据流优化方法,对不同层的设计变量进行估算。该优化方法首先根据先验约束条件缩小设计空间,然后枚举所有合法解决方案,选出最优设计变量。我们在恩福莱姆科技公司的可编程 CNN 加速器通用计算单元(GCU)上实现了具有代表性的 CNN 模型(VGG-16、ResNet-18 和 MobileNet V1),证明了所提出的优化方法的有效性。结果表明,我们的优化方法可以在 GCU 上显著提高 ResNet、VGG 和 MobileNet 卷积层的吞吐量,最高可提高 1.84 倍。此外,特别是 ResNet 的卷积层在 GCU 上的 GCU 利用率提高了 2.08 倍。
{"title":"Dataflow optimization with layer-wise design variables estimation method for enflame CNN accelerators","authors":"Tian Chen ,&nbsp;Yu-an Tan ,&nbsp;Zheng Zhang ,&nbsp;Nan Luo ,&nbsp;Bin Li ,&nbsp;Yuanzhang Li","doi":"10.1016/j.jpdc.2024.104869","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104869","url":null,"abstract":"<div><p>As convolution layers have been proved to be the most time-consuming operation in convolutional neural network (CNN) algorithms, many efficient CNN accelerators have been designed to boost the performance of convolution operations. Previous works on CNN acceleration usually use fixed design variables for diverse convolutional layers, which would lead to inefficient data movements and low utilization of computing resource. We tackle this issue by proposing a flexible dataflow optimization method with design variables estimation for different layers. The optimization method first narrows the design space by the priori constraints, and then enumerates all legal solutions to select the optimal design variables. We demonstrate the effectiveness of the proposed optimization method by implementing representative CNN models (VGG-16, ResNet-18 and MobileNet V1) on Enflame Technology's programmable CNN accelerator, General Computing Unit (GCU). The results indicate that our optimization can significantly enhance the throughput of the convolution layers in ResNet, VGG and MobileNet on GCU, with improvement of up to 1.84×. Furthermore, it achieves up to 2.08× of GCU utilization specifically for the convolution layers of ResNet on GCU.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140067279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive patch grid strategy for parallel protein folding using atomic burials with NAMD 利用 NAMD 原子埋藏技术实现并行蛋白质折叠的自适应补丁网格策略
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-03-04 DOI: 10.1016/j.jpdc.2024.104868
Emerson A. Macedo, Alba C.M.A. Melo

The definition of protein structures is an important research topic in molecular biology currently, since there is a direct relationship between the function of the protein in the organism and the 3D geometric configuration it adopts. The transformations that occur in the protein structure from the 1D configuration to the 3D form are called protein folding. Ab initio protein folding methods use physical forces to model the interactions among the atoms that compose the protein. In order to accelerate those methods, parallel tools such as NAMD were proposed. In this paper, we propose two contributions for parallel protein folding simulations: (a) adaptive patch grid (APG) and (b) the addition of atomic burials (AB) to the traditional forces used in the simulation. With APG, we are able to adapt the simulation box (patch grid) to the current shape of the protein during the folding process. AB forces relate the 3D protein structure to its geometric center and are adequate for modeling globular proteins. Thus, adding AB to the forces used in parallel protein folding potentially increases the quality of the result for this class of proteins. APG and AB were implemented in NAMD and tested in supercomputer environments. Our results show that, with APG, we are able to reduce the execution time of the folding simulation of protein 4LNZ (5,714 atoms, 15 million time steps) from 12 hours and 36 minutes to 11 hours and 8 minutes, using 16 nodes (256 CPU cores). We also show that our APG+AB strategy was successfully used in a realistic protein folding simulation (1.7 billion time steps).

蛋白质结构的定义是当前分子生物学的一个重要研究课题,因为蛋白质在生物体内的功能与它所采用的三维几何构型有直接关系。蛋白质结构从一维构型到三维形式的转变称为蛋白质折叠。蛋白质折叠方法使用物理力来模拟组成蛋白质的原子之间的相互作用。为了加速这些方法,人们提出了 NAMD 等并行工具。在本文中,我们提出了并行蛋白质折叠模拟的两个贡献:(a) 自适应补丁网格 (APG) 和 (b) 在模拟中使用的传统力之外添加原子埋藏 (AB)。有了 APG,我们就能在折叠过程中根据蛋白质的当前形状调整模拟框(补丁网格)。AB 力将三维蛋白质结构与其几何中心相关联,适用于球状蛋白质建模。因此,将 AB 力添加到并行蛋白质折叠中可能会提高这类蛋白质的结果质量。在 NAMD 中实现了 APG 和 AB,并在超级计算机环境中进行了测试。结果表明,使用 APG,我们能够将 4LNZ 蛋白质(5714 个原子,1500 万个时间步)折叠模拟的执行时间从 12 小时 36 分钟减少到 11 小时 8 分钟,使用 16 个节点(256 个 CPU 内核)。我们还展示了 APG+AB 策略在实际蛋白质折叠模拟(17 亿时间步)中的成功应用。
{"title":"Adaptive patch grid strategy for parallel protein folding using atomic burials with NAMD","authors":"Emerson A. Macedo,&nbsp;Alba C.M.A. Melo","doi":"10.1016/j.jpdc.2024.104868","DOIUrl":"10.1016/j.jpdc.2024.104868","url":null,"abstract":"<div><p>The definition of protein structures is an important research topic in molecular biology currently, since there is a direct relationship between the function of the protein in the organism and the 3D geometric configuration it adopts. The transformations that occur in the protein structure from the 1D configuration to the 3D form are called protein folding. <em>Ab initio</em> protein folding methods use physical forces to model the interactions among the atoms that compose the protein. In order to accelerate those methods, parallel tools such as NAMD were proposed. In this paper, we propose two contributions for parallel protein folding simulations: (a) adaptive patch grid (APG) and (b) the addition of atomic burials (AB) to the traditional forces used in the simulation. With APG, we are able to adapt the simulation box (patch grid) to the current shape of the protein during the folding process. AB forces relate the 3D protein structure to its geometric center and are adequate for modeling globular proteins. Thus, adding AB to the forces used in parallel protein folding potentially increases the quality of the result for this class of proteins. APG and AB were implemented in NAMD and tested in supercomputer environments. Our results show that, with APG, we are able to reduce the execution time of the folding simulation of protein 4LNZ (5,714 atoms, 15 million time steps) from 12 hours and 36 minutes to 11 hours and 8 minutes, using 16 nodes (256 CPU cores). We also show that our APG+AB strategy was successfully used in a realistic protein folding simulation (1.7 billion time steps).</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140054484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1