首页 > 最新文献

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)最新文献

英文 中文
Towards a Privacy-Aware Electric Vehicle Architecture 面向隐私感知的电动汽车架构
Christian Plappert, Jonathan Stancke, Lukas Jäger
Connected vehicles need to generate, store, process, and exchange a multitude of information with their environment. Much of this information is privacy-critical and thus regulated by privacy laws like the GDPR for Europe. In this paper, we analyze and rate exemplary data (flows) of the electric driving domain with regard to their criticality based on a reference architecture. We classify the corresponding ECUs based on their processed privacy-critical data and propose technical mitigation measures and technologies in form of generic privacy-enhancing building blocks according to the classification and requirements derived from the GDPR.
联网车辆需要生成、存储、处理和与环境交换大量信息。这些信息中的大部分都是隐私关键信息,因此受到隐私法(如欧洲的GDPR)的监管。在本文中,我们基于一个参考体系结构,分析和评价了电力驱动领域的示例数据(流)的临界性。我们根据处理的隐私关键数据对相应的ecu进行分类,并根据GDPR的分类和要求,以通用隐私增强构建块的形式提出技术缓解措施和技术。
{"title":"Towards a Privacy-Aware Electric Vehicle Architecture","authors":"Christian Plappert, Jonathan Stancke, Lukas Jäger","doi":"10.1109/pdp55904.2022.00048","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00048","url":null,"abstract":"Connected vehicles need to generate, store, process, and exchange a multitude of information with their environment. Much of this information is privacy-critical and thus regulated by privacy laws like the GDPR for Europe. In this paper, we analyze and rate exemplary data (flows) of the electric driving domain with regard to their criticality based on a reference architecture. We classify the corresponding ECUs based on their processed privacy-critical data and propose technical mitigation measures and technologies in form of generic privacy-enhancing building blocks according to the classification and requirements derived from the GDPR.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116757819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GraphDEAR: An Accelerator Architecture for Exploiting Cache Locality in Graph Analytics Applications GraphDEAR:在图形分析应用程序中利用缓存局域性的加速架构
Siyi Hu, Masaaki Kondo, Yuan He, Ryuichi Sakamoto, Haotong Zhang, Jun Zhou, Hiroshi Nakamura
Data structure is the key in Edge Computing where various types of data are continuously generated by ubiquitous devices. Within all common data structures, graphs are used to express relationships and dependencies among human identities, objects, and locations; and they are expected to become one of the most important data infrastructure in the near future. Furthermore, as graph processing often requires random accesses to vast memory spaces, conventional memory hierarchies with caches cannot perform efficiently. To alleviate such memory access bottlenecks in graph processing, we present a solution through vertex accesses scheduling and edge array re-ordering, in parallel with the execution of graph processing application to improve both temporal and spatial locality of memory accesses, especially for edge-centric graphs which are popular means in handling dynamic graphs. Our proposed architecture is evaluated and tested through both trace-based cache simulations and cycle-accurate FPGA-based prototyping. Evaluation results show that our proposal has a potential of significantly reducing the quantity of Miss-Per-Kilo-Instructions (MPKI) for Last Level Cache (LLC) by 56.27% on average.
数据结构是边缘计算的关键,无处不在的设备不断产生各种类型的数据。在所有常见的数据结构中,图形用于表达人的身份、对象和位置之间的关系和依赖关系;在不久的将来,它们有望成为最重要的数据基础设施之一。此外,由于图形处理通常需要随机访问大量内存空间,传统的带有缓存的内存层次结构无法有效执行。为了缓解图处理中的内存访问瓶颈,我们提出了一个解决方案,通过顶点访问调度和边缘数组重新排序,并行执行图处理应用程序,以提高内存访问的时间和空间局部性,特别是对于边缘中心图,这是处理动态图的常用方法。我们提出的架构通过基于跟踪的缓存模拟和基于周期精确的fpga原型进行评估和测试。评估结果表明,我们的方案有可能显著降低最后一级缓存(LLC)的每千指令缺失量(MPKI),平均降低56.27%。
{"title":"GraphDEAR: An Accelerator Architecture for Exploiting Cache Locality in Graph Analytics Applications","authors":"Siyi Hu, Masaaki Kondo, Yuan He, Ryuichi Sakamoto, Haotong Zhang, Jun Zhou, Hiroshi Nakamura","doi":"10.1109/pdp55904.2022.00029","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00029","url":null,"abstract":"Data structure is the key in Edge Computing where various types of data are continuously generated by ubiquitous devices. Within all common data structures, graphs are used to express relationships and dependencies among human identities, objects, and locations; and they are expected to become one of the most important data infrastructure in the near future. Furthermore, as graph processing often requires random accesses to vast memory spaces, conventional memory hierarchies with caches cannot perform efficiently. To alleviate such memory access bottlenecks in graph processing, we present a solution through vertex accesses scheduling and edge array re-ordering, in parallel with the execution of graph processing application to improve both temporal and spatial locality of memory accesses, especially for edge-centric graphs which are popular means in handling dynamic graphs. Our proposed architecture is evaluated and tested through both trace-based cache simulations and cycle-accurate FPGA-based prototyping. Evaluation results show that our proposal has a potential of significantly reducing the quantity of Miss-Per-Kilo-Instructions (MPKI) for Last Level Cache (LLC) by 56.27% on average.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121368917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Anatomy of the BLIS Family of Algorithms for Matrix Multiplication 矩阵乘法的BLIS算法族剖析
Adrián Castelló, E. S. Quintana‐Ortí, Francisco D. Igual
The efforts of the scientific community and hardware vendors to develop and optimize linear algebra codes have historically led to highly-tuned libraries, carefully adapted to the underlying processor architecture, with excellent (near-peak) performance. These optimization efforts, however, are commonly focused on obtaining the best performance possible when the involved operands are large and “squarish” matrices. New computationally-intensive applications (e.g., in deep learning) are increasingly demanding high-performance BLAS (Basic Linear Algebra Subprograms) also for small operands in any of their dimensions. In this paper, we tackle this problem by refactoring the general matrix-matrix multiplication (GEMM) algorithm within a specific high-performance implementation of BLAS, named BLIS, proposing a complete family of algorithmic variants to implement GEMM with different strategies to exploit the target cache hierarchy, together with the changes to be applied to architecture-specific codes to instantiate a complete GEMM implementation. Experimental results on an ARM processor (NVIDIA Carmel) reveal significant performance differences between the members of the GEMM family, depending on the shape and dimension of the matrix operands.
科学界和硬件供应商在开发和优化线性代数代码方面的努力已经产生了高度调优的库,这些库仔细地适应底层处理器体系结构,具有出色的(接近峰值的)性能。然而,这些优化工作通常侧重于在涉及的操作数很大且“平方”矩阵时获得最佳性能。新的计算密集型应用(例如,在深度学习中)越来越需要高性能的BLAS(基本线性代数子程序),也适用于任何维度的小操作数。在本文中,我们通过重构通用矩阵-矩阵乘法(GEMM)算法来解决这个问题,该算法在一个特定的高性能BLAS实现中被称为BLIS,提出了一个完整的算法变体家族,通过不同的策略来实现GEMM,以利用目标缓存层次结构,以及应用于特定架构代码的更改来实例化一个完整的GEMM实现。在ARM处理器(NVIDIA Carmel)上的实验结果揭示了GEMM家族成员之间的显著性能差异,这取决于矩阵操作数的形状和维度。
{"title":"Anatomy of the BLIS Family of Algorithms for Matrix Multiplication","authors":"Adrián Castelló, E. S. Quintana‐Ortí, Francisco D. Igual","doi":"10.1109/pdp55904.2022.00023","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00023","url":null,"abstract":"The efforts of the scientific community and hardware vendors to develop and optimize linear algebra codes have historically led to highly-tuned libraries, carefully adapted to the underlying processor architecture, with excellent (near-peak) performance. These optimization efforts, however, are commonly focused on obtaining the best performance possible when the involved operands are large and “squarish” matrices. New computationally-intensive applications (e.g., in deep learning) are increasingly demanding high-performance BLAS (Basic Linear Algebra Subprograms) also for small operands in any of their dimensions. In this paper, we tackle this problem by refactoring the general matrix-matrix multiplication (GEMM) algorithm within a specific high-performance implementation of BLAS, named BLIS, proposing a complete family of algorithmic variants to implement GEMM with different strategies to exploit the target cache hierarchy, together with the changes to be applied to architecture-specific codes to instantiate a complete GEMM implementation. Experimental results on an ARM processor (NVIDIA Carmel) reveal significant performance differences between the members of the GEMM family, depending on the shape and dimension of the matrix operands.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122676121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Decision Tree-Based Rule Derivation for Intrusion Detection in Safety-Critical Automotive Systems 基于决策树的汽车安全关键系统入侵检测规则推导
Lucas Buschlinger, Sanat Sarda, C. Krauß
Intrusion Detection Systems (IDSs) are being introduced into safety-critical systems such as connected vehicles. Since the behavior and effectiveness of measures are validated before approval, the decisions made by an IDS are required to be traceable and the IDS also needs to work efficiently on resource-constrained embedded systems. These requirements complicate the direct use of Machine Learning (ML) approaches in IDS design. In this paper, we propose an approach to using ML to generate rules for an efficient rule-based IDS like Snort. Our approach eases the time-consuming and difficult process of creating a rule set. We use decision trees to generate rules that can be used by experts as a basis for creating a rule set for a specific safety-critical use case. In addition, we use long short-term memory methods to circumvent the problem of limited training data availability, a common limitation in safety-critical systems. Our implementation and evaluation shows the feasibility of our approach to derive specific IDS rules for such systems.
入侵检测系统(ids)正被引入安全关键系统,如联网车辆。由于在批准之前对度量的行为和有效性进行了验证,因此要求IDS所做的决策是可跟踪的,并且IDS还需要在资源受限的嵌入式系统上有效地工作。这些要求使得在IDS设计中直接使用机器学习(ML)方法变得复杂。在本文中,我们提出了一种使用ML为Snort等高效的基于规则的IDS生成规则的方法。我们的方法简化了创建规则集的耗时且困难的过程。我们使用决策树来生成规则,这些规则可以被专家用作为特定的安全关键用例创建规则集的基础。此外,我们使用长短期记忆方法来规避训练数据可用性有限的问题,这是安全关键系统中常见的限制。我们的实现和评估表明,我们的方法为此类系统派生特定的IDS规则是可行的。
{"title":"Decision Tree-Based Rule Derivation for Intrusion Detection in Safety-Critical Automotive Systems","authors":"Lucas Buschlinger, Sanat Sarda, C. Krauß","doi":"10.1109/pdp55904.2022.00046","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00046","url":null,"abstract":"Intrusion Detection Systems (IDSs) are being introduced into safety-critical systems such as connected vehicles. Since the behavior and effectiveness of measures are validated before approval, the decisions made by an IDS are required to be traceable and the IDS also needs to work efficiently on resource-constrained embedded systems. These requirements complicate the direct use of Machine Learning (ML) approaches in IDS design. In this paper, we propose an approach to using ML to generate rules for an efficient rule-based IDS like Snort. Our approach eases the time-consuming and difficult process of creating a rule set. We use decision trees to generate rules that can be used by experts as a basis for creating a rule set for a specific safety-critical use case. In addition, we use long short-term memory methods to circumvent the problem of limited training data availability, a common limitation in safety-critical systems. Our implementation and evaluation shows the feasibility of our approach to derive specific IDS rules for such systems.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126027420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Proposal of Mobility Support for the SimGrid Toolkit: Application to IoT simulations SimGrid工具包的移动性支持建议:应用于物联网模拟
Elías Del-Pozo-Puñal, Félix García Carballeira
Over the last few years, the number of IoT devices in daily use has increased, as they come in many sizes and of different types. In addition to this, these devices have become cheaper, which has led to many more people being able to use them. These devices are capable of both creating and processing information, thus reducing network overload. However, in Cloud or Edge Computing environments, it is useful to know where these devices are located, in order to better distribute the information among the servers and further reduce the network load, allowing users to get the data faster. Therefore, there are simulators capable of analyzing Cloud infrastructures, but most of them fail to offer the possibility of including mobility in the sensors.For these reasons, in this paper we detail an API extension developed on the SimGrid toolkit to add mobility to IoT sensors and, in addition, it integrates with an API called Folium for the visualization of the mobility of these elements.
在过去几年中,日常使用的物联网设备数量有所增加,因为它们有多种尺寸和不同类型。除此之外,这些设备变得更便宜,这使得更多的人能够使用它们。这些设备能够创建和处理信息,从而减少网络过载。然而,在云计算或边缘计算环境中,知道这些设备的位置是有用的,以便更好地在服务器之间分发信息,并进一步减少网络负载,允许用户更快地获得数据。因此,有能够分析云基础设施的模拟器,但大多数模拟器无法提供在传感器中包含移动性的可能性。由于这些原因,在本文中,我们详细介绍了在SimGrid工具包上开发的API扩展,以增加物联网传感器的移动性,此外,它还集成了一个名为Folium的API,用于可视化这些元素的移动性。
{"title":"A Proposal of Mobility Support for the SimGrid Toolkit: Application to IoT simulations","authors":"Elías Del-Pozo-Puñal, Félix García Carballeira","doi":"10.1109/pdp55904.2022.00035","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00035","url":null,"abstract":"Over the last few years, the number of IoT devices in daily use has increased, as they come in many sizes and of different types. In addition to this, these devices have become cheaper, which has led to many more people being able to use them. These devices are capable of both creating and processing information, thus reducing network overload. However, in Cloud or Edge Computing environments, it is useful to know where these devices are located, in order to better distribute the information among the servers and further reduce the network load, allowing users to get the data faster. Therefore, there are simulators capable of analyzing Cloud infrastructures, but most of them fail to offer the possibility of including mobility in the sensors.For these reasons, in this paper we detail an API extension developed on the SimGrid toolkit to add mobility to IoT sensors and, in addition, it integrates with an API called Folium for the visualization of the mobility of these elements.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132813927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP 用矢量特性和OpenMP实现基于winograd的可移植卷积
M. F. Dolz, Adrián Castelló, E. S. Quintana‐Ortí
We take a step forward in the direction of developing high performance codes for the convolution, based on the Winograd transformation, that are easy to customize for different processor architectures. In our approach, augmenting the portability of the solution is achieved via the introduction of vector intrinsics to exploit the SIMD (single-instruction multiple-data) capabilities of current processors as well as OpenMP pragmas to exploit multi-thread parallelism. While this comes at the cost of sacrificing a fraction of the computational performance, our experimental results on two distinct processors, with Intel Xeon Skylake and ARM Cortex A57 architectures, show that the impact is affordable, and still renders a Winograd-based solution that is competitive with the general method for the convolution based on the so-called im2col transform followed by a matrix-matrix multiplication.
我们在开发基于Winograd变换的高性能卷积代码的方向上迈出了一步,这些代码很容易针对不同的处理器架构进行定制。在我们的方法中,通过引入矢量特性来利用当前处理器的SIMD(单指令多数据)功能以及OpenMP pragmas来利用多线程并行性,从而增强了解决方案的可移植性。虽然这是以牺牲一小部分计算性能为代价的,但我们在两个不同的处理器上(Intel Xeon Skylake和ARM Cortex A57架构)的实验结果表明,这种影响是可以承受的,并且仍然呈现出基于winograd的解决方案,与基于所谓的im2col变换和矩阵-矩阵乘法的卷积的一般方法相比具有竞争力。
{"title":"Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP","authors":"M. F. Dolz, Adrián Castelló, E. S. Quintana‐Ortí","doi":"10.1109/pdp55904.2022.00015","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00015","url":null,"abstract":"We take a step forward in the direction of developing high performance codes for the convolution, based on the Winograd transformation, that are easy to customize for different processor architectures. In our approach, augmenting the portability of the solution is achieved via the introduction of vector intrinsics to exploit the SIMD (single-instruction multiple-data) capabilities of current processors as well as OpenMP pragmas to exploit multi-thread parallelism. While this comes at the cost of sacrificing a fraction of the computational performance, our experimental results on two distinct processors, with Intel Xeon Skylake and ARM Cortex A57 architectures, show that the impact is affordable, and still renders a Winograd-based solution that is competitive with the general method for the convolution based on the so-called im2col transform followed by a matrix-matrix multiplication.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133708267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Parallel Approximation Algorithm for the Steiner Forest Problem Steiner森林问题的并行逼近算法
Laleh Ghalami, Daniel Grosu
In the Steiner Forest problem, we are given an undirected graph with non-negative weights for edges, a set of pairs of vertices, called terminals, and the goal is to find the minimum cost subgraph that connects each of the terminal pairs together. There exist several sequential heuristic and approximation algorithms for the Steiner Forest problem. In practice, the primal-dual 2-approximation algorithm is one of the fastest and obtains solutions that are very close to the optimal solution. In this paper, we design a practical parallel approximation algorithm based on the primal-dual sequential algorithm. The parallel algorithm maintains the approximation guarantees of the sequential primal-dual algorithm and it is specifically designed for execution on multi-core computers. We implement and run the parallel algorithm on a multi-core system with a large number of cores and perform an extensive experimental performance analysis on randomly generated graphs. The results show that our proposed parallel approximation algorithm achieves a significant speedup with respect to the sequential primal-dual algorithm.
在斯坦纳森林问题中,我们给定一个边权为非负的无向图,一组顶点对,称为终端,目标是找到将每个终端对连接在一起的最小代价子图。对于斯坦纳森林问题,已有几种顺序启发式和近似算法。在实际应用中,原对偶2逼近算法是最快的算法之一,它得到的解非常接近最优解。本文在原对偶序列算法的基础上,设计了一种实用的并行逼近算法。并行算法保持了顺序原对偶算法的近似保证,是专门为在多核计算机上执行而设计的。我们在具有大量内核的多核系统上实现并运行了并行算法,并对随机生成的图形进行了广泛的实验性能分析。结果表明,我们提出的并行逼近算法相对于顺序原对偶算法有显著的加速。
{"title":"A Parallel Approximation Algorithm for the Steiner Forest Problem","authors":"Laleh Ghalami, Daniel Grosu","doi":"10.1109/pdp55904.2022.00016","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00016","url":null,"abstract":"In the Steiner Forest problem, we are given an undirected graph with non-negative weights for edges, a set of pairs of vertices, called terminals, and the goal is to find the minimum cost subgraph that connects each of the terminal pairs together. There exist several sequential heuristic and approximation algorithms for the Steiner Forest problem. In practice, the primal-dual 2-approximation algorithm is one of the fastest and obtains solutions that are very close to the optimal solution. In this paper, we design a practical parallel approximation algorithm based on the primal-dual sequential algorithm. The parallel algorithm maintains the approximation guarantees of the sequential primal-dual algorithm and it is specifically designed for execution on multi-core computers. We implement and run the parallel algorithm on a multi-core system with a large number of cores and perform an extensive experimental performance analysis on randomly generated graphs. The results show that our proposed parallel approximation algorithm achieves a significant speedup with respect to the sequential primal-dual algorithm.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114102765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Vector Extennsions to Accelerate Time Series Analysis 利用向量扩展来加速时间序列分析
Ricardo Quislant, I. Fernandez, E. Serralvo, E. Gutiérrez, O. Plata
Time series analysis is an important research topic and a key step in monitoring and predicting events in many fields. Recently, the Matrix Profile method, and particularly two of its Euclidean-distance-based implementations – SCRIMP and SCAMP – have become the state-of-the-art approaches in this field. Those algorithms bring the possibility of obtaining exact motifs and discords from a time series, which can be used to infer events, predict outcomes, detect anomalies and more. While matrix profile is embarrassingly parallelizable, we find that autovectorization techniques fail to fully exploit the SIMD capabilities of modern CPU architectures. In this paper, we develop custom-vectorized SCRIMP and SCAMP implementations based on AVX2 and AVX-512 extensions, which we combine with multithreading techniques aimed at exploiting the potential of the underneath architectures. Our experimental evaluation, conducted using real data, shows a performance improvement of more than 4× with respect to the autovectorization.
时间序列分析是一个重要的研究课题,是许多领域事件监测和预测的关键步骤。最近,矩阵剖面方法,特别是它的两个基于欧几里得距离的实现- SCRIMP和SCAMP -已经成为该领域最先进的方法。这些算法带来了从时间序列中获得精确的动机和不和谐的可能性,可用于推断事件,预测结果,检测异常等等。虽然矩阵配置文件具有令人尴尬的并行性,但我们发现自动向量化技术无法充分利用现代CPU架构的SIMD功能。在本文中,我们基于AVX2和AVX-512扩展开发了自定义矢量化的SCRIMP和SCAMP实现,我们将其与多线程技术相结合,旨在开发底层架构的潜力。我们使用真实数据进行的实验评估显示,相对于自动向量化,性能提高了4倍以上。
{"title":"Exploiting Vector Extennsions to Accelerate Time Series Analysis","authors":"Ricardo Quislant, I. Fernandez, E. Serralvo, E. Gutiérrez, O. Plata","doi":"10.1109/pdp55904.2022.00017","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00017","url":null,"abstract":"Time series analysis is an important research topic and a key step in monitoring and predicting events in many fields. Recently, the Matrix Profile method, and particularly two of its Euclidean-distance-based implementations – SCRIMP and SCAMP – have become the state-of-the-art approaches in this field. Those algorithms bring the possibility of obtaining exact motifs and discords from a time series, which can be used to infer events, predict outcomes, detect anomalies and more. While matrix profile is embarrassingly parallelizable, we find that autovectorization techniques fail to fully exploit the SIMD capabilities of modern CPU architectures. In this paper, we develop custom-vectorized SCRIMP and SCAMP implementations based on AVX2 and AVX-512 extensions, which we combine with multithreading techniques aimed at exploiting the potential of the underneath architectures. Our experimental evaluation, conducted using real data, shows a performance improvement of more than 4× with respect to the autovectorization.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129478588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NoaSci: A Numerical Object Array Library for I/O of Scientific Applications on Object Storage NoaSci:用于对象存储科学应用的数字对象数组库
Steven W. D. Chien, Artur Podobas, Martin Svedin, A. Tkachuk, Salem El Sayed, Pawel Herman, G. Umanesan, Sai B. Narasimhamurthy, S. Markidis
The strong consistency and stateful workflow are seen as the major factors for limiting parallel I/O performance because of the need for locking and state management. While the POSIX-based I/O model dominates modern HPC storage infrastructure, emerging object storage technology can potentially improve I/O performance by eliminating these bottlenecks. Despite a wide deployment on the cloud, its adoption in HPC remains low. We argue one reason is the lack of a suitable programming interface for parallel I/O in scientific applications. In this work, we introduce NoaSci, a Numerical Object Array library for scientific applications. NoaSci supports different data formats (e.g. HDF5, binary), and focuses on supporting nodelocal burst buffers and object stores. We demonstrate for the first time how scientific applications can perform parallel I/O on Seagate’s Motr object store through NoaSci. We evaluate NoaSci’s preliminary performance using the iPIC3D space weather application and position against existing I/O methods.
由于需要锁定和状态管理,强一致性和有状态工作流被视为限制并行I/O性能的主要因素。虽然基于posix的I/O模型在现代HPC存储基础设施中占主导地位,但新兴的对象存储技术可以通过消除这些瓶颈来潜在地提高I/O性能。尽管在云上得到了广泛的部署,但在高性能计算中的应用仍然很低。我们认为其中一个原因是在科学应用中缺乏合适的并行I/O编程接口。在这项工作中,我们介绍了NoaSci,一个用于科学应用的数值对象数组库。NoaSci支持不同的数据格式(例如HDF5,二进制),并专注于支持节点本地突发缓冲区和对象存储。我们首次演示了科学应用程序如何通过NoaSci在希捷的mother对象存储上执行并行I/O。我们使用iPIC3D空间天气应用程序和位置对现有的I/O方法评估NoaSci的初步性能。
{"title":"NoaSci: A Numerical Object Array Library for I/O of Scientific Applications on Object Storage","authors":"Steven W. D. Chien, Artur Podobas, Martin Svedin, A. Tkachuk, Salem El Sayed, Pawel Herman, G. Umanesan, Sai B. Narasimhamurthy, S. Markidis","doi":"10.1109/pdp55904.2022.00034","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00034","url":null,"abstract":"The strong consistency and stateful workflow are seen as the major factors for limiting parallel I/O performance because of the need for locking and state management. While the POSIX-based I/O model dominates modern HPC storage infrastructure, emerging object storage technology can potentially improve I/O performance by eliminating these bottlenecks. Despite a wide deployment on the cloud, its adoption in HPC remains low. We argue one reason is the lack of a suitable programming interface for parallel I/O in scientific applications. In this work, we introduce NoaSci, a Numerical Object Array library for scientific applications. NoaSci supports different data formats (e.g. HDF5, binary), and focuses on supporting nodelocal burst buffers and object stores. We demonstrate for the first time how scientific applications can perform parallel I/O on Seagate’s Motr object store through NoaSci. We evaluate NoaSci’s preliminary performance using the iPIC3D space weather application and position against existing I/O methods.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114303546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering Datasets in Cloud Computing Environment for User Identification 云计算环境下聚类数据集的用户识别
Shallaw Mohammed Ali, G. Kecskeméti
Users’ behaviours show a noticeable impact on cloud computing resources. Behaviour prediction models could foster usage awareness of cloud users. This requires training prediction models with datasets that provide user information. Unfortunately, such information is excluded from many relevant datasets. Therefore, in this work, we investigate the ability of extracting these identities via clustering methods. We conduct this by categorising workload datasets according to the availability of users information in their attributes. Then, we focus our attention on shared attributes between user information disclosing and non-disclosing datasets. Eventually, we evaluated the potential of several clustering approaches on user information disclosing datasets. Our results show that users’ identifications can be extracted with relatively high accuracy using clustering. They also show that the highest clustering precision is mostly obtained from the attributes representing request components that strongly relate to the user’s application.
用户行为对云计算资源的影响是显著的。行为预测模型可以培养云用户的使用意识。这需要用提供用户信息的数据集训练预测模型。不幸的是,这些信息被排除在许多相关数据集之外。因此,在这项工作中,我们研究了通过聚类方法提取这些身份的能力。我们通过根据用户信息在其属性中的可用性对工作负载数据集进行分类来实现这一点。然后,我们将重点放在用户信息公开和非公开数据集之间的共享属性上。最后,我们评估了几种聚类方法在用户信息披露数据集上的潜力。结果表明,利用聚类方法可以提取出具有较高准确率的用户身份信息。它们还表明,最高的聚类精度主要来自表示与用户应用程序密切相关的请求组件的属性。
{"title":"Clustering Datasets in Cloud Computing Environment for User Identification","authors":"Shallaw Mohammed Ali, G. Kecskeméti","doi":"10.1109/pdp55904.2022.00033","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00033","url":null,"abstract":"Users’ behaviours show a noticeable impact on cloud computing resources. Behaviour prediction models could foster usage awareness of cloud users. This requires training prediction models with datasets that provide user information. Unfortunately, such information is excluded from many relevant datasets. Therefore, in this work, we investigate the ability of extracting these identities via clustering methods. We conduct this by categorising workload datasets according to the availability of users information in their attributes. Then, we focus our attention on shared attributes between user information disclosing and non-disclosing datasets. Eventually, we evaluated the potential of several clustering approaches on user information disclosing datasets. Our results show that users’ identifications can be extracted with relatively high accuracy using clustering. They also show that the highest clustering precision is mostly obtained from the attributes representing request components that strongly relate to the user’s application.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116270232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1