Haiqing Liu, M. Ai, Rong Huang, Rixuan Qiu, Yuancheng Li
Device identity authentication is the first line of defense for edge computing security mechanisms. Many authentication schemes are often accompanied by high communication and computational overhead. In addition, due to the continuous enhancement of network virtualization and dynamics, the security requirements for logical boundaries of many enterprise information systems “cloudification,” and the huge data security challenges faced by enterprise core assets, all make the original "one‐time authentication, all the way" trust model no longer reliable. Therefore, the paper proposes a local identity authentication and roaming identity authentication protocol based on a zero‐trust architecture. First, we propose a revocable group signature scheme, the expiration time is bound to the key of each edge terminal device. According to this solution, since the identity authentication token generated by the expired key is invalid, it does not need to be included in the revocation list, which improves the efficiency of revocation checking. Compared with the current identity authentication protocol, this article not only builds a model based on the zero trust architecture, effectively solves the shortcomings of the network security protection architecture, but also considers the unforgeability of the expiration time, and realizes effective revocation and more efficient identity authentication.
{"title":"Identity authentication for edge devices based on zero‐trust architecture","authors":"Haiqing Liu, M. Ai, Rong Huang, Rixuan Qiu, Yuancheng Li","doi":"10.1002/cpe.7198","DOIUrl":"https://doi.org/10.1002/cpe.7198","url":null,"abstract":"Device identity authentication is the first line of defense for edge computing security mechanisms. Many authentication schemes are often accompanied by high communication and computational overhead. In addition, due to the continuous enhancement of network virtualization and dynamics, the security requirements for logical boundaries of many enterprise information systems “cloudification,” and the huge data security challenges faced by enterprise core assets, all make the original \"one‐time authentication, all the way\" trust model no longer reliable. Therefore, the paper proposes a local identity authentication and roaming identity authentication protocol based on a zero‐trust architecture. First, we propose a revocable group signature scheme, the expiration time is bound to the key of each edge terminal device. According to this solution, since the identity authentication token generated by the expired key is invalid, it does not need to be included in the revocation list, which improves the efficiency of revocation checking. Compared with the current identity authentication protocol, this article not only builds a model based on the zero trust architecture, effectively solves the shortcomings of the network security protection architecture, but also considers the unforgeability of the expiration time, and realizes effective revocation and more efficient identity authentication.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"202 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77404852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of microsensors and support vector machines in the assessment of lower limb posture correction in adolescents","authors":"Yinman Zhang, Lulu Wang","doi":"10.1002/cpe.7234","DOIUrl":"https://doi.org/10.1002/cpe.7234","url":null,"abstract":"","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73565357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Edge preserving noise robust deep learning networks for vehicle classification","authors":"V. Kiran, S. Dash, Priyadarsan Parida","doi":"10.1002/cpe.7214","DOIUrl":"https://doi.org/10.1002/cpe.7214","url":null,"abstract":"","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78461760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessment of vegetation conservation status in plateau areas based on multi‐view and difference identification","authors":"Xuanzhang Song, Hongyu Zhou, Guoying Liu","doi":"10.1002/cpe.7223","DOIUrl":"https://doi.org/10.1002/cpe.7223","url":null,"abstract":"","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90721005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Community detection in online social media networks is to identify the connections of nodes within the network. The community can be determined as clusters, modules, or groups in different networks. Community detection is performed to find out the hidden relationships among the nodes in the network. Several works have been conducted till now to detect the community of nodes in the network however the performance is often affected due to the imprecise detection, time complexity, and so on. To detect the community of the nodes in the network effectively we have proposed a novel hybrid honey badger optimization‐based African vulture algorithm (HHBAVO). Prior to the application of HHBAVO, the networks are compressed to reduce the time complexity and effective identification of the community of nodes. The proposed honey badger optimization (HBO) and African vulture optimization (AVO) can be used to achieve global optimization. The algorithms are mainly hybridized to offer optimized global search. This is effectively used to search the nodes globally and to detect the relationship among the nodes. Experimental analyzes depict that the proposed approach can be used to detect the community of the nodes in the online social media networks effectively than the other approaches. For comparative purposes, we have taken state‐of‐art works such as GA, LSMD, DPCD, and ICLA approaches.
{"title":"A novel compression based community detection approach using hybrid honey badger African vulture optimization for online social networks","authors":"Sankara Nayaki Kannan, Sudheep Elayidom Mannathazhathu, Rajesh Raghavan","doi":"10.1002/cpe.7205","DOIUrl":"https://doi.org/10.1002/cpe.7205","url":null,"abstract":"Community detection in online social media networks is to identify the connections of nodes within the network. The community can be determined as clusters, modules, or groups in different networks. Community detection is performed to find out the hidden relationships among the nodes in the network. Several works have been conducted till now to detect the community of nodes in the network however the performance is often affected due to the imprecise detection, time complexity, and so on. To detect the community of the nodes in the network effectively we have proposed a novel hybrid honey badger optimization‐based African vulture algorithm (HHBAVO). Prior to the application of HHBAVO, the networks are compressed to reduce the time complexity and effective identification of the community of nodes. The proposed honey badger optimization (HBO) and African vulture optimization (AVO) can be used to achieve global optimization. The algorithms are mainly hybridized to offer optimized global search. This is effectively used to search the nodes globally and to detect the relationship among the nodes. Experimental analyzes depict that the proposed approach can be used to detect the community of the nodes in the online social media networks effectively than the other approaches. For comparative purposes, we have taken state‐of‐art works such as GA, LSMD, DPCD, and ICLA approaches.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"2011 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78590459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In mobile ad hoc network, the routing task is considered to be more complicated issue. In order to find efficient route in MANET, the positions of each stable node in the network is identified initially. The main contribution of this paper is to identify the locations of neighboring nodes in MANET for the establishment of multi path routing in diverse mobility patterns. It also handles packet scheduling to balance the load as well as data forwarding with less communication time. The proposed work explains four phases such as stable node prediction, stability determination, route exploration, and packet dissemination. At first, the stable nodes are identified using recurrent neural network along with modified seagull optimization approach. By means of Garson's pruning based recurrent neural network accompanied with modified sea gull optimization (RMSG) algorithm, the stable neighbors are chosen. The network routing is formed by interconnecting the stable nodes from the source to the destination. When a routing link failure happens, the route recovery process will be initiated. Thus, the data packets are broadcasted in multi‐paths without any intervention. The measures namely packet delivery ratio, throughput, end‐to‐end delay, routing overhead, optimal path selection, and energy consumption are utilized to evaluate the performance of proposed approach. The experimental analysis proved that the proposed approach well performed than other compared existing approaches.
{"title":"An efficient stable node selection based on Garson's pruned recurrent neural network and MSO model for multipath routing in MANET","authors":"R. Hemalatha, R. Umamaheswari, S. Jothi","doi":"10.1002/cpe.7105","DOIUrl":"https://doi.org/10.1002/cpe.7105","url":null,"abstract":"In mobile ad hoc network, the routing task is considered to be more complicated issue. In order to find efficient route in MANET, the positions of each stable node in the network is identified initially. The main contribution of this paper is to identify the locations of neighboring nodes in MANET for the establishment of multi path routing in diverse mobility patterns. It also handles packet scheduling to balance the load as well as data forwarding with less communication time. The proposed work explains four phases such as stable node prediction, stability determination, route exploration, and packet dissemination. At first, the stable nodes are identified using recurrent neural network along with modified seagull optimization approach. By means of Garson's pruning based recurrent neural network accompanied with modified sea gull optimization (RMSG) algorithm, the stable neighbors are chosen. The network routing is formed by interconnecting the stable nodes from the source to the destination. When a routing link failure happens, the route recovery process will be initiated. Thus, the data packets are broadcasted in multi‐paths without any intervention. The measures namely packet delivery ratio, throughput, end‐to‐end delay, routing overhead, optimal path selection, and energy consumption are utilized to evaluate the performance of proposed approach. The experimental analysis proved that the proposed approach well performed than other compared existing approaches.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75823063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Function memoization is an optimization technique that reduces a function call overhead when the same input appears again. A table that stores the previous result is searched and used to skip the repeated computation. This way, it increases the performance of the function call. In this article, we propose a software approach of function memoization to improve computing efficiency by bypassing the execution of the function implemented using approximate computing techniques. Searching overhead is a primary concern in any memoization technique proposed so far. In traditional function memoization, the input arguments are first searched in the look‐up table (LUT) for an exact match, and the corresponding result is extracted for further use. But, in this article, a decision‐making rule is proposed to help us decide whether to search the LUT or go for the actual computation. This decision‐making model is implemented through Bloom filter and Cantor's pairing function. Because Bloom filter sometimes produces false‐positive results, we suggest a simple approximation technique that searches the LUT for an approximate match rather than an exact match. The proposed model also contains a bypass algorithm implemented through C++ code that identifies the trivial computations from the input argument of the candidate function. By this, we can avoid the actual calculation and generate the result directly. Here, trivial computation identifies one or more input arguments that are either 0 or ±1$$ pm 1 $$ . To analyze the effectiveness of our proposed technique, we conducted several experiments using the benchmarks from the AxBench suite. We found that our result outperforms some of the methods proposed so far in terms of energy consumption and quality of results, particularly in image processing applications.
{"title":"Approximate function memoization","authors":"Priya Arundhati, S. K. Jena, S. Pani","doi":"10.1002/cpe.7204","DOIUrl":"https://doi.org/10.1002/cpe.7204","url":null,"abstract":"Function memoization is an optimization technique that reduces a function call overhead when the same input appears again. A table that stores the previous result is searched and used to skip the repeated computation. This way, it increases the performance of the function call. In this article, we propose a software approach of function memoization to improve computing efficiency by bypassing the execution of the function implemented using approximate computing techniques. Searching overhead is a primary concern in any memoization technique proposed so far. In traditional function memoization, the input arguments are first searched in the look‐up table (LUT) for an exact match, and the corresponding result is extracted for further use. But, in this article, a decision‐making rule is proposed to help us decide whether to search the LUT or go for the actual computation. This decision‐making model is implemented through Bloom filter and Cantor's pairing function. Because Bloom filter sometimes produces false‐positive results, we suggest a simple approximation technique that searches the LUT for an approximate match rather than an exact match. The proposed model also contains a bypass algorithm implemented through C++ code that identifies the trivial computations from the input argument of the candidate function. By this, we can avoid the actual calculation and generate the result directly. Here, trivial computation identifies one or more input arguments that are either 0 or ±1$$ pm 1 $$ . To analyze the effectiveness of our proposed technique, we conducted several experiments using the benchmarks from the AxBench suite. We found that our result outperforms some of the methods proposed so far in terms of energy consumption and quality of results, particularly in image processing applications.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80107848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The performance of sparse stiffness matrix‐vector multiplication is essential for large‐scale structural mechanics numerical simulation. Compressed sparse row (CSR) is the most common format for storing sparse stiffness matrices. However, the high sparsity of the sparse stiffness matrix makes the number of nonzero elements per row very small. Therefore, the CSR‐scalar algorithm, light algorithm, and HOLA algorithm in the calculation will cause some threads in the GPU to be in idle state, which will not only affect the computing performance but also waste computing resources. In this article, a new algorithm, CSR‐vector row, is proposed for fine‐grained computing optimization based on the AMD GPU architecture on heterogeneous supercomputers. This algorithm can set a vector to calculate a row based on the number of nonzero elements of the stiffness matrix. CSR‐vector row has efficient reduce operations, deep memory access optimization, better memory access, and calculation overlapping kernel function configuration scheme. The access bandwidth of the algorithm on AMD GPU is more than 700 GB/s. Compared with CSR‐scalar algorithm, the parallel efficiency of CSR‐vector row is improved by 7.2 times. And floating‐point computing performance is 41%–95% higher than that of light algorithm and HOLA algorithm. In addition, CSR‐vector row is used to calculate the examples from CFD, electromagnetics, quantum chemistry, power network, and semiconductor process, the memory access bandwidth and double floating‐point performance are also improved compared with rocSPARSE‐CSR‐vector.
{"title":"An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU","authors":"Longyue Xing, Zhaoshun Wang, Zhezhao Ding, Genshen Chu, Lingyu Dong, Nan Xiao","doi":"10.1002/cpe.7186","DOIUrl":"https://doi.org/10.1002/cpe.7186","url":null,"abstract":"The performance of sparse stiffness matrix‐vector multiplication is essential for large‐scale structural mechanics numerical simulation. Compressed sparse row (CSR) is the most common format for storing sparse stiffness matrices. However, the high sparsity of the sparse stiffness matrix makes the number of nonzero elements per row very small. Therefore, the CSR‐scalar algorithm, light algorithm, and HOLA algorithm in the calculation will cause some threads in the GPU to be in idle state, which will not only affect the computing performance but also waste computing resources. In this article, a new algorithm, CSR‐vector row, is proposed for fine‐grained computing optimization based on the AMD GPU architecture on heterogeneous supercomputers. This algorithm can set a vector to calculate a row based on the number of nonzero elements of the stiffness matrix. CSR‐vector row has efficient reduce operations, deep memory access optimization, better memory access, and calculation overlapping kernel function configuration scheme. The access bandwidth of the algorithm on AMD GPU is more than 700 GB/s. Compared with CSR‐scalar algorithm, the parallel efficiency of CSR‐vector row is improved by 7.2 times. And floating‐point computing performance is 41%–95% higher than that of light algorithm and HOLA algorithm. In addition, CSR‐vector row is used to calculate the examples from CFD, electromagnetics, quantum chemistry, power network, and semiconductor process, the memory access bandwidth and double floating‐point performance are also improved compared with rocSPARSE‐CSR‐vector.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83992425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the complexity of the toolchain increases for heterogeneous CPU‐GPU systems, the needs for comprehensive tracing and debugging tools also grows. Heterogeneous platforms bring new possibilities but also new performance issues that are hard to detect. Some techniques that were used on CPU programs are now adapted to GPUs. However, there are some concepts specific to GPUs, like SIMD processing, and the effects of the close interactions between the CPUs and the GPUs, with shared virtual memory and user‐level queues. Multiple sources of data need to be extracted and correlated to obtain a more global view of the performance. In this article, we introduce a novel approach for measuring and visualizing performance defects inside CPU‐GPU programs by combining kernel events, compute kernel events, user API calls and memory transfers. We created two new views that combine this information, to help provide a global view. This framework uses the open source user queue system described in the HSA standard. It can easily be adapted to any user queue system for heterogeneous computing devices. We compare this framework with current existing tools and test it against the Rodinia benchmark. We look at how the execution behavior affects the tracing and profiling overhead and we use Trace Compass to visualize the resulting trace.
{"title":"Visualization of profiling and tracing in CPU‐GPU programs","authors":"Arnaud Fiorini, M. Dagenais","doi":"10.1002/cpe.7188","DOIUrl":"https://doi.org/10.1002/cpe.7188","url":null,"abstract":"As the complexity of the toolchain increases for heterogeneous CPU‐GPU systems, the needs for comprehensive tracing and debugging tools also grows. Heterogeneous platforms bring new possibilities but also new performance issues that are hard to detect. Some techniques that were used on CPU programs are now adapted to GPUs. However, there are some concepts specific to GPUs, like SIMD processing, and the effects of the close interactions between the CPUs and the GPUs, with shared virtual memory and user‐level queues. Multiple sources of data need to be extracted and correlated to obtain a more global view of the performance. In this article, we introduce a novel approach for measuring and visualizing performance defects inside CPU‐GPU programs by combining kernel events, compute kernel events, user API calls and memory transfers. We created two new views that combine this information, to help provide a global view. This framework uses the open source user queue system described in the HSA standard. It can easily be adapted to any user queue system for heterogeneous computing devices. We compare this framework with current existing tools and test it against the Rodinia benchmark. We look at how the execution behavior affects the tracing and profiling overhead and we use Trace Compass to visualize the resulting trace.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80877059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Microarray gene clustering is a big data application that employs the K‐means (KM) clustering algorithm to identify hidden patterns, evolutionary relationships, unknown functions and gene trends for disease diagnosis, tissue detection and biological analysis. The selection of initial centroids is a major issue in the KM algorithm because it influences the effectiveness, efficiency and local optima of the cluster. The existing initial centroid initialization algorithm is computationally expensive and degrades cluster quality due to the large dimensionality and interconnectedness of microarray gene data. To deal with this issue, this study proposed the min‐max kurtosis stratum mean (MKSM) algorithm for big data clustering in a single machine environment. The MKSM algorithm uses kurtosis for dimension selection, mean distance for gene relationship identification, and stratification for heterogeneous centroid extraction. The results of the presented algorithm are compared to the state‐of‐the‐art initialization strategy on twelve microarray gene datasets utilizing internal, external and statistical assessment criteria. The experimental results demonstrate that the MKSMKM algorithm reduces iterations, distance computation, data comparison and local optima, and improves cluster performance, effectiveness and efficiency with stable convergence.
微阵列基因聚类是一种大数据应用,采用K - means (KM)聚类算法识别隐藏模式、进化关系、未知功能和基因趋势,用于疾病诊断、组织检测和生物学分析。初始质心的选择是KM算法中的一个重要问题,它影响聚类的有效性、效率和局部最优性。现有的初始质心初始化算法由于微阵列基因数据的大维度和互联性,计算成本高,并且降低了聚类质量。为了解决这一问题,本研究提出了单机环境下大数据聚类的最小-最大峰度地层均值(MKSM)算法。MKSM算法使用峰度进行维数选择,使用平均距离进行基因关系识别,使用分层进行异质质心提取。将所提出算法的结果与利用内部、外部和统计评估标准的12个微阵列基因数据集的最先进初始化策略进行比较。实验结果表明,MKSMKM算法减少了迭代、距离计算、数据比较和局部最优,提高了聚类性能、有效性和效率,收敛稳定。
{"title":"Min‐max kurtosis stratum mean: An improved K‐means cluster initialization approach for microarray gene clustering on multidimensional big data","authors":"K. Pandey, D. Shukla","doi":"10.1002/cpe.7185","DOIUrl":"https://doi.org/10.1002/cpe.7185","url":null,"abstract":"Microarray gene clustering is a big data application that employs the K‐means (KM) clustering algorithm to identify hidden patterns, evolutionary relationships, unknown functions and gene trends for disease diagnosis, tissue detection and biological analysis. The selection of initial centroids is a major issue in the KM algorithm because it influences the effectiveness, efficiency and local optima of the cluster. The existing initial centroid initialization algorithm is computationally expensive and degrades cluster quality due to the large dimensionality and interconnectedness of microarray gene data. To deal with this issue, this study proposed the min‐max kurtosis stratum mean (MKSM) algorithm for big data clustering in a single machine environment. The MKSM algorithm uses kurtosis for dimension selection, mean distance for gene relationship identification, and stratification for heterogeneous centroid extraction. The results of the presented algorithm are compared to the state‐of‐the‐art initialization strategy on twelve microarray gene datasets utilizing internal, external and statistical assessment criteria. The experimental results demonstrate that the MKSMKM algorithm reduces iterations, distance computation, data comparison and local optima, and improves cluster performance, effectiveness and efficiency with stable convergence.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79106957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}