首页 > 最新文献

Concurrency and Computation: Practice and Experience最新文献

英文 中文
Identity authentication for edge devices based on zero‐trust architecture 基于零信任架构的边缘设备身份认证
Pub Date : 2022-07-23 DOI: 10.1002/cpe.7198
Haiqing Liu, M. Ai, Rong Huang, Rixuan Qiu, Yuancheng Li
Device identity authentication is the first line of defense for edge computing security mechanisms. Many authentication schemes are often accompanied by high communication and computational overhead. In addition, due to the continuous enhancement of network virtualization and dynamics, the security requirements for logical boundaries of many enterprise information systems “cloudification,” and the huge data security challenges faced by enterprise core assets, all make the original "one‐time authentication, all the way" trust model no longer reliable. Therefore, the paper proposes a local identity authentication and roaming identity authentication protocol based on a zero‐trust architecture. First, we propose a revocable group signature scheme, the expiration time is bound to the key of each edge terminal device. According to this solution, since the identity authentication token generated by the expired key is invalid, it does not need to be included in the revocation list, which improves the efficiency of revocation checking. Compared with the current identity authentication protocol, this article not only builds a model based on the zero trust architecture, effectively solves the shortcomings of the network security protection architecture, but also considers the unforgeability of the expiration time, and realizes effective revocation and more efficient identity authentication.
设备身份认证是边缘计算安全机制的第一道防线。许多身份验证方案通常伴随着高通信和计算开销。此外,由于网络虚拟化和动态性的不断增强,许多企业信息系统逻辑边界的安全需求“云化”,以及企业核心资产面临的巨大数据安全挑战,都使得原有的“一次认证,一路”的信任模式不再可靠。因此,本文提出了一种基于零信任架构的本地身份认证和漫游身份认证协议。首先,我们提出了一种可撤销的组签名方案,该方案将过期时间绑定到每个边缘终端设备的密钥上。根据该方案,由于过期密钥生成的身份认证令牌是无效的,因此不需要将其包含在吊销列表中,从而提高了吊销检查的效率。与现有的身份认证协议相比,本文不仅构建了基于零信任架构的模型,有效解决了网络安全防护架构的不足,而且考虑了过期时间的不可伪造性,实现了有效的撤销和更高效的身份认证。
{"title":"Identity authentication for edge devices based on zero‐trust architecture","authors":"Haiqing Liu, M. Ai, Rong Huang, Rixuan Qiu, Yuancheng Li","doi":"10.1002/cpe.7198","DOIUrl":"https://doi.org/10.1002/cpe.7198","url":null,"abstract":"Device identity authentication is the first line of defense for edge computing security mechanisms. Many authentication schemes are often accompanied by high communication and computational overhead. In addition, due to the continuous enhancement of network virtualization and dynamics, the security requirements for logical boundaries of many enterprise information systems “cloudification,” and the huge data security challenges faced by enterprise core assets, all make the original \"one‐time authentication, all the way\" trust model no longer reliable. Therefore, the paper proposes a local identity authentication and roaming identity authentication protocol based on a zero‐trust architecture. First, we propose a revocable group signature scheme, the expiration time is bound to the key of each edge terminal device. According to this solution, since the identity authentication token generated by the expired key is invalid, it does not need to be included in the revocation list, which improves the efficiency of revocation checking. Compared with the current identity authentication protocol, this article not only builds a model based on the zero trust architecture, effectively solves the shortcomings of the network security protection architecture, but also considers the unforgeability of the expiration time, and realizes effective revocation and more efficient identity authentication.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"202 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77404852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Application of microsensors and support vector machines in the assessment of lower limb posture correction in adolescents 微传感器与支持向量机在青少年下肢姿势矫正评估中的应用
Pub Date : 2022-07-22 DOI: 10.1002/cpe.7234
Yinman Zhang, Lulu Wang
{"title":"Application of microsensors and support vector machines in the assessment of lower limb posture correction in adolescents","authors":"Yinman Zhang, Lulu Wang","doi":"10.1002/cpe.7234","DOIUrl":"https://doi.org/10.1002/cpe.7234","url":null,"abstract":"","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73565357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Edge preserving noise robust deep learning networks for vehicle classification 用于车辆分类的边缘保持噪声鲁棒深度学习网络
Pub Date : 2022-07-21 DOI: 10.1002/cpe.7214
V. Kiran, S. Dash, Priyadarsan Parida
{"title":"Edge preserving noise robust deep learning networks for vehicle classification","authors":"V. Kiran, S. Dash, Priyadarsan Parida","doi":"10.1002/cpe.7214","DOIUrl":"https://doi.org/10.1002/cpe.7214","url":null,"abstract":"","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78461760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Assessment of vegetation conservation status in plateau areas based on multi‐view and difference identification 基于多视角和差异识别的高原地区植被保护状况评价
Pub Date : 2022-07-21 DOI: 10.1002/cpe.7223
Xuanzhang Song, Hongyu Zhou, Guoying Liu
{"title":"Assessment of vegetation conservation status in plateau areas based on multi‐view and difference identification","authors":"Xuanzhang Song, Hongyu Zhou, Guoying Liu","doi":"10.1002/cpe.7223","DOIUrl":"https://doi.org/10.1002/cpe.7223","url":null,"abstract":"","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90721005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel compression based community detection approach using hybrid honey badger African vulture optimization for online social networks 基于混合蜜獾非洲秃鹫优化的在线社交网络社区检测方法
Pub Date : 2022-07-20 DOI: 10.1002/cpe.7205
Sankara Nayaki Kannan, Sudheep Elayidom Mannathazhathu, Rajesh Raghavan
Community detection in online social media networks is to identify the connections of nodes within the network. The community can be determined as clusters, modules, or groups in different networks. Community detection is performed to find out the hidden relationships among the nodes in the network. Several works have been conducted till now to detect the community of nodes in the network however the performance is often affected due to the imprecise detection, time complexity, and so on. To detect the community of the nodes in the network effectively we have proposed a novel hybrid honey badger optimization‐based African vulture algorithm (HHBAVO). Prior to the application of HHBAVO, the networks are compressed to reduce the time complexity and effective identification of the community of nodes. The proposed honey badger optimization (HBO) and African vulture optimization (AVO) can be used to achieve global optimization. The algorithms are mainly hybridized to offer optimized global search. This is effectively used to search the nodes globally and to detect the relationship among the nodes. Experimental analyzes depict that the proposed approach can be used to detect the community of the nodes in the online social media networks effectively than the other approaches. For comparative purposes, we have taken state‐of‐art works such as GA, LSMD, DPCD, and ICLA approaches.
在线社交媒体网络中的社区检测是对网络内节点的连接进行识别。社区可以定义为不同网络中的集群、模块或组。社区检测是为了发现网络中节点之间隐藏的关系。迄今为止,对网络中节点社区的检测工作已经开展了许多,但由于检测不精确、时间复杂等问题,往往会影响性能。为了有效地检测网络中节点的社区,我们提出了一种新的基于混合蜜獾优化的非洲秃鹫算法(hbavo)。在hbavo应用之前,为了降低时间复杂度和有效识别节点群体,对网络进行了压缩。提出的蜜獾优化(HBO)和非洲秃鹫优化(AVO)可以实现全局优化。这些算法主要是混合的,以提供优化的全局搜索。这可以有效地用于全局搜索节点和检测节点之间的关系。实验分析表明,该方法比其他方法更能有效地检测在线社交媒体网络中节点的社区。为了比较,我们采用了最先进的方法,如GA、LSMD、DPCD和ICLA方法。
{"title":"A novel compression based community detection approach using hybrid honey badger African vulture optimization for online social networks","authors":"Sankara Nayaki Kannan, Sudheep Elayidom Mannathazhathu, Rajesh Raghavan","doi":"10.1002/cpe.7205","DOIUrl":"https://doi.org/10.1002/cpe.7205","url":null,"abstract":"Community detection in online social media networks is to identify the connections of nodes within the network. The community can be determined as clusters, modules, or groups in different networks. Community detection is performed to find out the hidden relationships among the nodes in the network. Several works have been conducted till now to detect the community of nodes in the network however the performance is often affected due to the imprecise detection, time complexity, and so on. To detect the community of the nodes in the network effectively we have proposed a novel hybrid honey badger optimization‐based African vulture algorithm (HHBAVO). Prior to the application of HHBAVO, the networks are compressed to reduce the time complexity and effective identification of the community of nodes. The proposed honey badger optimization (HBO) and African vulture optimization (AVO) can be used to achieve global optimization. The algorithms are mainly hybridized to offer optimized global search. This is effectively used to search the nodes globally and to detect the relationship among the nodes. Experimental analyzes depict that the proposed approach can be used to detect the community of the nodes in the online social media networks effectively than the other approaches. For comparative purposes, we have taken state‐of‐art works such as GA, LSMD, DPCD, and ICLA approaches.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"2011 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78590459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An efficient stable node selection based on Garson's pruned recurrent neural network and MSO model for multipath routing in MANET 基于Garson剪枝递归神经网络和MSO模型的多路径路由的高效稳定节点选择
Pub Date : 2022-07-20 DOI: 10.1002/cpe.7105
R. Hemalatha, R. Umamaheswari, S. Jothi
In mobile ad hoc network, the routing task is considered to be more complicated issue. In order to find efficient route in MANET, the positions of each stable node in the network is identified initially. The main contribution of this paper is to identify the locations of neighboring nodes in MANET for the establishment of multi path routing in diverse mobility patterns. It also handles packet scheduling to balance the load as well as data forwarding with less communication time. The proposed work explains four phases such as stable node prediction, stability determination, route exploration, and packet dissemination. At first, the stable nodes are identified using recurrent neural network along with modified seagull optimization approach. By means of Garson's pruning based recurrent neural network accompanied with modified sea gull optimization (RMSG) algorithm, the stable neighbors are chosen. The network routing is formed by interconnecting the stable nodes from the source to the destination. When a routing link failure happens, the route recovery process will be initiated. Thus, the data packets are broadcasted in multi‐paths without any intervention. The measures namely packet delivery ratio, throughput, end‐to‐end delay, routing overhead, optimal path selection, and energy consumption are utilized to evaluate the performance of proposed approach. The experimental analysis proved that the proposed approach well performed than other compared existing approaches.
在移动自组织网络中,路由任务被认为是一个更为复杂的问题。为了在MANET中找到有效的路由,首先对网络中每个稳定节点的位置进行初始识别。本文的主要贡献在于识别出MANET中相邻节点的位置,从而建立不同移动模式下的多路径路由。它还处理数据包调度以平衡负载以及用更少的通信时间进行数据转发。提出的工作解释了四个阶段:稳定节点预测、稳定性确定、路由探索和分组分发。首先利用递归神经网络和改进的海鸥优化方法对稳定节点进行识别。采用基于Garson剪枝的递归神经网络,结合改进的海鸥优化(RMSG)算法,选取稳定邻域。网络路由是由从源到目的的稳定节点相互连接而形成的。当路由链路发生故障时,将启动路由恢复过程。因此,数据包在没有任何干预的情况下以多路径广播。利用数据包传送率、吞吐量、端到端延迟、路由开销、最优路径选择和能耗等指标来评估所提方法的性能。实验分析表明,该方法的性能优于其他方法。
{"title":"An efficient stable node selection based on Garson's pruned recurrent neural network and MSO model for multipath routing in MANET","authors":"R. Hemalatha, R. Umamaheswari, S. Jothi","doi":"10.1002/cpe.7105","DOIUrl":"https://doi.org/10.1002/cpe.7105","url":null,"abstract":"In mobile ad hoc network, the routing task is considered to be more complicated issue. In order to find efficient route in MANET, the positions of each stable node in the network is identified initially. The main contribution of this paper is to identify the locations of neighboring nodes in MANET for the establishment of multi path routing in diverse mobility patterns. It also handles packet scheduling to balance the load as well as data forwarding with less communication time. The proposed work explains four phases such as stable node prediction, stability determination, route exploration, and packet dissemination. At first, the stable nodes are identified using recurrent neural network along with modified seagull optimization approach. By means of Garson's pruning based recurrent neural network accompanied with modified sea gull optimization (RMSG) algorithm, the stable neighbors are chosen. The network routing is formed by interconnecting the stable nodes from the source to the destination. When a routing link failure happens, the route recovery process will be initiated. Thus, the data packets are broadcasted in multi‐paths without any intervention. The measures namely packet delivery ratio, throughput, end‐to‐end delay, routing overhead, optimal path selection, and energy consumption are utilized to evaluate the performance of proposed approach. The experimental analysis proved that the proposed approach well performed than other compared existing approaches.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75823063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Approximate function memoization 近似函数记忆法
Pub Date : 2022-07-20 DOI: 10.1002/cpe.7204
Priya Arundhati, S. K. Jena, S. Pani
Function memoization is an optimization technique that reduces a function call overhead when the same input appears again. A table that stores the previous result is searched and used to skip the repeated computation. This way, it increases the performance of the function call. In this article, we propose a software approach of function memoization to improve computing efficiency by bypassing the execution of the function implemented using approximate computing techniques. Searching overhead is a primary concern in any memoization technique proposed so far. In traditional function memoization, the input arguments are first searched in the look‐up table (LUT) for an exact match, and the corresponding result is extracted for further use. But, in this article, a decision‐making rule is proposed to help us decide whether to search the LUT or go for the actual computation. This decision‐making model is implemented through Bloom filter and Cantor's pairing function. Because Bloom filter sometimes produces false‐positive results, we suggest a simple approximation technique that searches the LUT for an approximate match rather than an exact match. The proposed model also contains a bypass algorithm implemented through C++ code that identifies the trivial computations from the input argument of the candidate function. By this, we can avoid the actual calculation and generate the result directly. Here, trivial computation identifies one or more input arguments that are either 0 or ±1$$ pm 1 $$ . To analyze the effectiveness of our proposed technique, we conducted several experiments using the benchmarks from the AxBench suite. We found that our result outperforms some of the methods proposed so far in terms of energy consumption and quality of results, particularly in image processing applications.
函数记忆是一种优化技术,可以在再次出现相同输入时减少函数调用开销。搜索存储先前结果的表,并使用该表跳过重复计算。这样可以提高函数调用的性能。在本文中,我们提出了一种函数记忆的软件方法,通过绕过使用近似计算技术实现的函数的执行来提高计算效率。搜索开销是目前提出的任何记忆技术的主要关注点。在传统的函数记忆中,输入参数首先在查找表(LUT)中搜索精确匹配,然后提取相应的结果以供进一步使用。但是,在本文中,提出了一个决策规则来帮助我们决定是搜索LUT还是进行实际计算。该决策模型通过Bloom滤波和Cantor配对函数实现。由于布隆过滤器有时会产生假阳性结果,我们建议使用一种简单的近似技术,在LUT中搜索近似匹配而不是精确匹配。建议的模型还包含一个通过c++代码实现的绕过算法,该算法从候选函数的输入参数中识别琐碎的计算。这样可以避免实际计算,直接生成结果。这里,简单的计算识别一个或多个输入参数,它们要么为0,要么为±1 $$ pm 1 $$。为了分析我们提出的技术的有效性,我们使用来自AxBench套件的基准测试进行了几个实验。我们发现我们的结果在能量消耗和结果质量方面优于目前提出的一些方法,特别是在图像处理应用中。
{"title":"Approximate function memoization","authors":"Priya Arundhati, S. K. Jena, S. Pani","doi":"10.1002/cpe.7204","DOIUrl":"https://doi.org/10.1002/cpe.7204","url":null,"abstract":"Function memoization is an optimization technique that reduces a function call overhead when the same input appears again. A table that stores the previous result is searched and used to skip the repeated computation. This way, it increases the performance of the function call. In this article, we propose a software approach of function memoization to improve computing efficiency by bypassing the execution of the function implemented using approximate computing techniques. Searching overhead is a primary concern in any memoization technique proposed so far. In traditional function memoization, the input arguments are first searched in the look‐up table (LUT) for an exact match, and the corresponding result is extracted for further use. But, in this article, a decision‐making rule is proposed to help us decide whether to search the LUT or go for the actual computation. This decision‐making model is implemented through Bloom filter and Cantor's pairing function. Because Bloom filter sometimes produces false‐positive results, we suggest a simple approximation technique that searches the LUT for an approximate match rather than an exact match. The proposed model also contains a bypass algorithm implemented through C++ code that identifies the trivial computations from the input argument of the candidate function. By this, we can avoid the actual calculation and generate the result directly. Here, trivial computation identifies one or more input arguments that are either 0 or ±1$$ pm 1 $$ . To analyze the effectiveness of our proposed technique, we conducted several experiments using the benchmarks from the AxBench suite. We found that our result outperforms some of the methods proposed so far in terms of energy consumption and quality of results, particularly in image processing applications.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80107848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU 在AMD GPU上使用压缩稀疏行存储格式的高效稀疏刚度矩阵向量乘法
Pub Date : 2022-07-20 DOI: 10.1002/cpe.7186
Longyue Xing, Zhaoshun Wang, Zhezhao Ding, Genshen Chu, Lingyu Dong, Nan Xiao
The performance of sparse stiffness matrix‐vector multiplication is essential for large‐scale structural mechanics numerical simulation. Compressed sparse row (CSR) is the most common format for storing sparse stiffness matrices. However, the high sparsity of the sparse stiffness matrix makes the number of nonzero elements per row very small. Therefore, the CSR‐scalar algorithm, light algorithm, and HOLA algorithm in the calculation will cause some threads in the GPU to be in idle state, which will not only affect the computing performance but also waste computing resources. In this article, a new algorithm, CSR‐vector row, is proposed for fine‐grained computing optimization based on the AMD GPU architecture on heterogeneous supercomputers. This algorithm can set a vector to calculate a row based on the number of nonzero elements of the stiffness matrix. CSR‐vector row has efficient reduce operations, deep memory access optimization, better memory access, and calculation overlapping kernel function configuration scheme. The access bandwidth of the algorithm on AMD GPU is more than 700 GB/s. Compared with CSR‐scalar algorithm, the parallel efficiency of CSR‐vector row is improved by 7.2 times. And floating‐point computing performance is 41%–95% higher than that of light algorithm and HOLA algorithm. In addition, CSR‐vector row is used to calculate the examples from CFD, electromagnetics, quantum chemistry, power network, and semiconductor process, the memory access bandwidth and double floating‐point performance are also improved compared with rocSPARSE‐CSR‐vector.
稀疏刚度矩阵向量乘法的性能是大规模结构力学数值模拟的必要条件。压缩稀疏行(CSR)是存储稀疏刚度矩阵最常用的格式。然而,稀疏刚度矩阵的高稀疏性使得每行非零元素的数量非常少。因此,在计算中使用CSR - scalar算法、light算法和HOLA算法会导致GPU中的一些线程处于空闲状态,不仅会影响计算性能,还会浪费计算资源。本文提出了一种新的基于AMD GPU架构的异构超级计算机上的细粒度计算优化算法——CSR向量行算法。该算法可以根据刚度矩阵的非零元素个数设置一个向量来计算一行。CSR‐vector行具有高效的约简操作、深度内存访问优化、更好的内存访问和计算重叠核函数配置方案。该算法在AMD GPU上的访问带宽大于700gb /s。与CSR -标量算法相比,CSR -向量行并行效率提高了7.2倍。浮点运算性能比光算法和HOLA算法提高41% ~ 95%。此外,CSR - vector行用于计算CFD、电磁学、量子化学、电力网络和半导体工艺的实例,与rocSPARSE - CSR - vector相比,其存储器访问带宽和双浮点性能也得到了提高。
{"title":"An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU","authors":"Longyue Xing, Zhaoshun Wang, Zhezhao Ding, Genshen Chu, Lingyu Dong, Nan Xiao","doi":"10.1002/cpe.7186","DOIUrl":"https://doi.org/10.1002/cpe.7186","url":null,"abstract":"The performance of sparse stiffness matrix‐vector multiplication is essential for large‐scale structural mechanics numerical simulation. Compressed sparse row (CSR) is the most common format for storing sparse stiffness matrices. However, the high sparsity of the sparse stiffness matrix makes the number of nonzero elements per row very small. Therefore, the CSR‐scalar algorithm, light algorithm, and HOLA algorithm in the calculation will cause some threads in the GPU to be in idle state, which will not only affect the computing performance but also waste computing resources. In this article, a new algorithm, CSR‐vector row, is proposed for fine‐grained computing optimization based on the AMD GPU architecture on heterogeneous supercomputers. This algorithm can set a vector to calculate a row based on the number of nonzero elements of the stiffness matrix. CSR‐vector row has efficient reduce operations, deep memory access optimization, better memory access, and calculation overlapping kernel function configuration scheme. The access bandwidth of the algorithm on AMD GPU is more than 700 GB/s. Compared with CSR‐scalar algorithm, the parallel efficiency of CSR‐vector row is improved by 7.2 times. And floating‐point computing performance is 41%–95% higher than that of light algorithm and HOLA algorithm. In addition, CSR‐vector row is used to calculate the examples from CFD, electromagnetics, quantum chemistry, power network, and semiconductor process, the memory access bandwidth and double floating‐point performance are also improved compared with rocSPARSE‐CSR‐vector.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83992425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Visualization of profiling and tracing in CPU‐GPU programs CPU - GPU程序中分析和跟踪的可视化
Pub Date : 2022-07-19 DOI: 10.1002/cpe.7188
Arnaud Fiorini, M. Dagenais
As the complexity of the toolchain increases for heterogeneous CPU‐GPU systems, the needs for comprehensive tracing and debugging tools also grows. Heterogeneous platforms bring new possibilities but also new performance issues that are hard to detect. Some techniques that were used on CPU programs are now adapted to GPUs. However, there are some concepts specific to GPUs, like SIMD processing, and the effects of the close interactions between the CPUs and the GPUs, with shared virtual memory and user‐level queues. Multiple sources of data need to be extracted and correlated to obtain a more global view of the performance. In this article, we introduce a novel approach for measuring and visualizing performance defects inside CPU‐GPU programs by combining kernel events, compute kernel events, user API calls and memory transfers. We created two new views that combine this information, to help provide a global view. This framework uses the open source user queue system described in the HSA standard. It can easily be adapted to any user queue system for heterogeneous computing devices. We compare this framework with current existing tools and test it against the Rodinia benchmark. We look at how the execution behavior affects the tracing and profiling overhead and we use Trace Compass to visualize the resulting trace.
随着异构CPU - GPU系统的工具链的复杂性增加,对全面跟踪和调试工具的需求也在增长。异构平台带来了新的可能性,但也带来了难以检测的新性能问题。一些在CPU程序上使用的技术现在适用于gpu。然而,有一些特定于gpu的概念,如SIMD处理,以及cpu和gpu之间密切交互的影响,共享虚拟内存和用户级队列。需要提取和关联多个数据源,以获得更全面的性能视图。在本文中,我们介绍了一种通过结合内核事件、计算内核事件、用户API调用和内存传输来测量和可视化CPU - GPU程序内部性能缺陷的新方法。我们创建了两个新视图来组合这些信息,以帮助提供全局视图。该框架使用HSA标准中描述的开源用户队列系统。它可以很容易地适应于任何异构计算设备的用户队列系统。我们将此框架与当前现有的工具进行比较,并针对Rodinia基准进行测试。我们将查看执行行为如何影响跟踪和分析开销,并使用Trace Compass将结果跟踪可视化。
{"title":"Visualization of profiling and tracing in CPU‐GPU programs","authors":"Arnaud Fiorini, M. Dagenais","doi":"10.1002/cpe.7188","DOIUrl":"https://doi.org/10.1002/cpe.7188","url":null,"abstract":"As the complexity of the toolchain increases for heterogeneous CPU‐GPU systems, the needs for comprehensive tracing and debugging tools also grows. Heterogeneous platforms bring new possibilities but also new performance issues that are hard to detect. Some techniques that were used on CPU programs are now adapted to GPUs. However, there are some concepts specific to GPUs, like SIMD processing, and the effects of the close interactions between the CPUs and the GPUs, with shared virtual memory and user‐level queues. Multiple sources of data need to be extracted and correlated to obtain a more global view of the performance. In this article, we introduce a novel approach for measuring and visualizing performance defects inside CPU‐GPU programs by combining kernel events, compute kernel events, user API calls and memory transfers. We created two new views that combine this information, to help provide a global view. This framework uses the open source user queue system described in the HSA standard. It can easily be adapted to any user queue system for heterogeneous computing devices. We compare this framework with current existing tools and test it against the Rodinia benchmark. We look at how the execution behavior affects the tracing and profiling overhead and we use Trace Compass to visualize the resulting trace.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80877059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Min‐max kurtosis stratum mean: An improved K‐means cluster initialization approach for microarray gene clustering on multidimensional big data 最小-最大峰度层均值:一种改进的K - means聚类初始化方法用于多维大数据上的微阵列基因聚类
Pub Date : 2022-07-16 DOI: 10.1002/cpe.7185
K. Pandey, D. Shukla
Microarray gene clustering is a big data application that employs the K‐means (KM) clustering algorithm to identify hidden patterns, evolutionary relationships, unknown functions and gene trends for disease diagnosis, tissue detection and biological analysis. The selection of initial centroids is a major issue in the KM algorithm because it influences the effectiveness, efficiency and local optima of the cluster. The existing initial centroid initialization algorithm is computationally expensive and degrades cluster quality due to the large dimensionality and interconnectedness of microarray gene data. To deal with this issue, this study proposed the min‐max kurtosis stratum mean (MKSM) algorithm for big data clustering in a single machine environment. The MKSM algorithm uses kurtosis for dimension selection, mean distance for gene relationship identification, and stratification for heterogeneous centroid extraction. The results of the presented algorithm are compared to the state‐of‐the‐art initialization strategy on twelve microarray gene datasets utilizing internal, external and statistical assessment criteria. The experimental results demonstrate that the MKSMKM algorithm reduces iterations, distance computation, data comparison and local optima, and improves cluster performance, effectiveness and efficiency with stable convergence.
微阵列基因聚类是一种大数据应用,采用K - means (KM)聚类算法识别隐藏模式、进化关系、未知功能和基因趋势,用于疾病诊断、组织检测和生物学分析。初始质心的选择是KM算法中的一个重要问题,它影响聚类的有效性、效率和局部最优性。现有的初始质心初始化算法由于微阵列基因数据的大维度和互联性,计算成本高,并且降低了聚类质量。为了解决这一问题,本研究提出了单机环境下大数据聚类的最小-最大峰度地层均值(MKSM)算法。MKSM算法使用峰度进行维数选择,使用平均距离进行基因关系识别,使用分层进行异质质心提取。将所提出算法的结果与利用内部、外部和统计评估标准的12个微阵列基因数据集的最先进初始化策略进行比较。实验结果表明,MKSMKM算法减少了迭代、距离计算、数据比较和局部最优,提高了聚类性能、有效性和效率,收敛稳定。
{"title":"Min‐max kurtosis stratum mean: An improved K‐means cluster initialization approach for microarray gene clustering on multidimensional big data","authors":"K. Pandey, D. Shukla","doi":"10.1002/cpe.7185","DOIUrl":"https://doi.org/10.1002/cpe.7185","url":null,"abstract":"Microarray gene clustering is a big data application that employs the K‐means (KM) clustering algorithm to identify hidden patterns, evolutionary relationships, unknown functions and gene trends for disease diagnosis, tissue detection and biological analysis. The selection of initial centroids is a major issue in the KM algorithm because it influences the effectiveness, efficiency and local optima of the cluster. The existing initial centroid initialization algorithm is computationally expensive and degrades cluster quality due to the large dimensionality and interconnectedness of microarray gene data. To deal with this issue, this study proposed the min‐max kurtosis stratum mean (MKSM) algorithm for big data clustering in a single machine environment. The MKSM algorithm uses kurtosis for dimension selection, mean distance for gene relationship identification, and stratification for heterogeneous centroid extraction. The results of the presented algorithm are compared to the state‐of‐the‐art initialization strategy on twelve microarray gene datasets utilizing internal, external and statistical assessment criteria. The experimental results demonstrate that the MKSMKM algorithm reduces iterations, distance computation, data comparison and local optima, and improves cluster performance, effectiveness and efficiency with stable convergence.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79106957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Concurrency and Computation: Practice and Experience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1