Pub Date : 2022-03-01DOI: 10.1109/pdp55904.2022.00048
Christian Plappert, Jonathan Stancke, Lukas Jäger
Connected vehicles need to generate, store, process, and exchange a multitude of information with their environment. Much of this information is privacy-critical and thus regulated by privacy laws like the GDPR for Europe. In this paper, we analyze and rate exemplary data (flows) of the electric driving domain with regard to their criticality based on a reference architecture. We classify the corresponding ECUs based on their processed privacy-critical data and propose technical mitigation measures and technologies in form of generic privacy-enhancing building blocks according to the classification and requirements derived from the GDPR.
{"title":"Towards a Privacy-Aware Electric Vehicle Architecture","authors":"Christian Plappert, Jonathan Stancke, Lukas Jäger","doi":"10.1109/pdp55904.2022.00048","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00048","url":null,"abstract":"Connected vehicles need to generate, store, process, and exchange a multitude of information with their environment. Much of this information is privacy-critical and thus regulated by privacy laws like the GDPR for Europe. In this paper, we analyze and rate exemplary data (flows) of the electric driving domain with regard to their criticality based on a reference architecture. We classify the corresponding ECUs based on their processed privacy-critical data and propose technical mitigation measures and technologies in form of generic privacy-enhancing building blocks according to the classification and requirements derived from the GDPR.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116757819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data structure is the key in Edge Computing where various types of data are continuously generated by ubiquitous devices. Within all common data structures, graphs are used to express relationships and dependencies among human identities, objects, and locations; and they are expected to become one of the most important data infrastructure in the near future. Furthermore, as graph processing often requires random accesses to vast memory spaces, conventional memory hierarchies with caches cannot perform efficiently. To alleviate such memory access bottlenecks in graph processing, we present a solution through vertex accesses scheduling and edge array re-ordering, in parallel with the execution of graph processing application to improve both temporal and spatial locality of memory accesses, especially for edge-centric graphs which are popular means in handling dynamic graphs. Our proposed architecture is evaluated and tested through both trace-based cache simulations and cycle-accurate FPGA-based prototyping. Evaluation results show that our proposal has a potential of significantly reducing the quantity of Miss-Per-Kilo-Instructions (MPKI) for Last Level Cache (LLC) by 56.27% on average.
{"title":"GraphDEAR: An Accelerator Architecture for Exploiting Cache Locality in Graph Analytics Applications","authors":"Siyi Hu, Masaaki Kondo, Yuan He, Ryuichi Sakamoto, Haotong Zhang, Jun Zhou, Hiroshi Nakamura","doi":"10.1109/pdp55904.2022.00029","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00029","url":null,"abstract":"Data structure is the key in Edge Computing where various types of data are continuously generated by ubiquitous devices. Within all common data structures, graphs are used to express relationships and dependencies among human identities, objects, and locations; and they are expected to become one of the most important data infrastructure in the near future. Furthermore, as graph processing often requires random accesses to vast memory spaces, conventional memory hierarchies with caches cannot perform efficiently. To alleviate such memory access bottlenecks in graph processing, we present a solution through vertex accesses scheduling and edge array re-ordering, in parallel with the execution of graph processing application to improve both temporal and spatial locality of memory accesses, especially for edge-centric graphs which are popular means in handling dynamic graphs. Our proposed architecture is evaluated and tested through both trace-based cache simulations and cycle-accurate FPGA-based prototyping. Evaluation results show that our proposal has a potential of significantly reducing the quantity of Miss-Per-Kilo-Instructions (MPKI) for Last Level Cache (LLC) by 56.27% on average.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121368917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1109/pdp55904.2022.00023
Adrián Castelló, E. S. Quintana‐Ortí, Francisco D. Igual
The efforts of the scientific community and hardware vendors to develop and optimize linear algebra codes have historically led to highly-tuned libraries, carefully adapted to the underlying processor architecture, with excellent (near-peak) performance. These optimization efforts, however, are commonly focused on obtaining the best performance possible when the involved operands are large and “squarish” matrices. New computationally-intensive applications (e.g., in deep learning) are increasingly demanding high-performance BLAS (Basic Linear Algebra Subprograms) also for small operands in any of their dimensions. In this paper, we tackle this problem by refactoring the general matrix-matrix multiplication (GEMM) algorithm within a specific high-performance implementation of BLAS, named BLIS, proposing a complete family of algorithmic variants to implement GEMM with different strategies to exploit the target cache hierarchy, together with the changes to be applied to architecture-specific codes to instantiate a complete GEMM implementation. Experimental results on an ARM processor (NVIDIA Carmel) reveal significant performance differences between the members of the GEMM family, depending on the shape and dimension of the matrix operands.
{"title":"Anatomy of the BLIS Family of Algorithms for Matrix Multiplication","authors":"Adrián Castelló, E. S. Quintana‐Ortí, Francisco D. Igual","doi":"10.1109/pdp55904.2022.00023","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00023","url":null,"abstract":"The efforts of the scientific community and hardware vendors to develop and optimize linear algebra codes have historically led to highly-tuned libraries, carefully adapted to the underlying processor architecture, with excellent (near-peak) performance. These optimization efforts, however, are commonly focused on obtaining the best performance possible when the involved operands are large and “squarish” matrices. New computationally-intensive applications (e.g., in deep learning) are increasingly demanding high-performance BLAS (Basic Linear Algebra Subprograms) also for small operands in any of their dimensions. In this paper, we tackle this problem by refactoring the general matrix-matrix multiplication (GEMM) algorithm within a specific high-performance implementation of BLAS, named BLIS, proposing a complete family of algorithmic variants to implement GEMM with different strategies to exploit the target cache hierarchy, together with the changes to be applied to architecture-specific codes to instantiate a complete GEMM implementation. Experimental results on an ARM processor (NVIDIA Carmel) reveal significant performance differences between the members of the GEMM family, depending on the shape and dimension of the matrix operands.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122676121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1109/pdp55904.2022.00046
Lucas Buschlinger, Sanat Sarda, C. Krauß
Intrusion Detection Systems (IDSs) are being introduced into safety-critical systems such as connected vehicles. Since the behavior and effectiveness of measures are validated before approval, the decisions made by an IDS are required to be traceable and the IDS also needs to work efficiently on resource-constrained embedded systems. These requirements complicate the direct use of Machine Learning (ML) approaches in IDS design. In this paper, we propose an approach to using ML to generate rules for an efficient rule-based IDS like Snort. Our approach eases the time-consuming and difficult process of creating a rule set. We use decision trees to generate rules that can be used by experts as a basis for creating a rule set for a specific safety-critical use case. In addition, we use long short-term memory methods to circumvent the problem of limited training data availability, a common limitation in safety-critical systems. Our implementation and evaluation shows the feasibility of our approach to derive specific IDS rules for such systems.
{"title":"Decision Tree-Based Rule Derivation for Intrusion Detection in Safety-Critical Automotive Systems","authors":"Lucas Buschlinger, Sanat Sarda, C. Krauß","doi":"10.1109/pdp55904.2022.00046","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00046","url":null,"abstract":"Intrusion Detection Systems (IDSs) are being introduced into safety-critical systems such as connected vehicles. Since the behavior and effectiveness of measures are validated before approval, the decisions made by an IDS are required to be traceable and the IDS also needs to work efficiently on resource-constrained embedded systems. These requirements complicate the direct use of Machine Learning (ML) approaches in IDS design. In this paper, we propose an approach to using ML to generate rules for an efficient rule-based IDS like Snort. Our approach eases the time-consuming and difficult process of creating a rule set. We use decision trees to generate rules that can be used by experts as a basis for creating a rule set for a specific safety-critical use case. In addition, we use long short-term memory methods to circumvent the problem of limited training data availability, a common limitation in safety-critical systems. Our implementation and evaluation shows the feasibility of our approach to derive specific IDS rules for such systems.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126027420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1109/pdp55904.2022.00035
Elías Del-Pozo-Puñal, Félix García Carballeira
Over the last few years, the number of IoT devices in daily use has increased, as they come in many sizes and of different types. In addition to this, these devices have become cheaper, which has led to many more people being able to use them. These devices are capable of both creating and processing information, thus reducing network overload. However, in Cloud or Edge Computing environments, it is useful to know where these devices are located, in order to better distribute the information among the servers and further reduce the network load, allowing users to get the data faster. Therefore, there are simulators capable of analyzing Cloud infrastructures, but most of them fail to offer the possibility of including mobility in the sensors.For these reasons, in this paper we detail an API extension developed on the SimGrid toolkit to add mobility to IoT sensors and, in addition, it integrates with an API called Folium for the visualization of the mobility of these elements.
{"title":"A Proposal of Mobility Support for the SimGrid Toolkit: Application to IoT simulations","authors":"Elías Del-Pozo-Puñal, Félix García Carballeira","doi":"10.1109/pdp55904.2022.00035","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00035","url":null,"abstract":"Over the last few years, the number of IoT devices in daily use has increased, as they come in many sizes and of different types. In addition to this, these devices have become cheaper, which has led to many more people being able to use them. These devices are capable of both creating and processing information, thus reducing network overload. However, in Cloud or Edge Computing environments, it is useful to know where these devices are located, in order to better distribute the information among the servers and further reduce the network load, allowing users to get the data faster. Therefore, there are simulators capable of analyzing Cloud infrastructures, but most of them fail to offer the possibility of including mobility in the sensors.For these reasons, in this paper we detail an API extension developed on the SimGrid toolkit to add mobility to IoT sensors and, in addition, it integrates with an API called Folium for the visualization of the mobility of these elements.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132813927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1109/pdp55904.2022.00015
M. F. Dolz, Adrián Castelló, E. S. Quintana‐Ortí
We take a step forward in the direction of developing high performance codes for the convolution, based on the Winograd transformation, that are easy to customize for different processor architectures. In our approach, augmenting the portability of the solution is achieved via the introduction of vector intrinsics to exploit the SIMD (single-instruction multiple-data) capabilities of current processors as well as OpenMP pragmas to exploit multi-thread parallelism. While this comes at the cost of sacrificing a fraction of the computational performance, our experimental results on two distinct processors, with Intel Xeon Skylake and ARM Cortex A57 architectures, show that the impact is affordable, and still renders a Winograd-based solution that is competitive with the general method for the convolution based on the so-called im2col transform followed by a matrix-matrix multiplication.
{"title":"Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP","authors":"M. F. Dolz, Adrián Castelló, E. S. Quintana‐Ortí","doi":"10.1109/pdp55904.2022.00015","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00015","url":null,"abstract":"We take a step forward in the direction of developing high performance codes for the convolution, based on the Winograd transformation, that are easy to customize for different processor architectures. In our approach, augmenting the portability of the solution is achieved via the introduction of vector intrinsics to exploit the SIMD (single-instruction multiple-data) capabilities of current processors as well as OpenMP pragmas to exploit multi-thread parallelism. While this comes at the cost of sacrificing a fraction of the computational performance, our experimental results on two distinct processors, with Intel Xeon Skylake and ARM Cortex A57 architectures, show that the impact is affordable, and still renders a Winograd-based solution that is competitive with the general method for the convolution based on the so-called im2col transform followed by a matrix-matrix multiplication.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133708267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1109/pdp55904.2022.00016
Laleh Ghalami, Daniel Grosu
In the Steiner Forest problem, we are given an undirected graph with non-negative weights for edges, a set of pairs of vertices, called terminals, and the goal is to find the minimum cost subgraph that connects each of the terminal pairs together. There exist several sequential heuristic and approximation algorithms for the Steiner Forest problem. In practice, the primal-dual 2-approximation algorithm is one of the fastest and obtains solutions that are very close to the optimal solution. In this paper, we design a practical parallel approximation algorithm based on the primal-dual sequential algorithm. The parallel algorithm maintains the approximation guarantees of the sequential primal-dual algorithm and it is specifically designed for execution on multi-core computers. We implement and run the parallel algorithm on a multi-core system with a large number of cores and perform an extensive experimental performance analysis on randomly generated graphs. The results show that our proposed parallel approximation algorithm achieves a significant speedup with respect to the sequential primal-dual algorithm.
{"title":"A Parallel Approximation Algorithm for the Steiner Forest Problem","authors":"Laleh Ghalami, Daniel Grosu","doi":"10.1109/pdp55904.2022.00016","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00016","url":null,"abstract":"In the Steiner Forest problem, we are given an undirected graph with non-negative weights for edges, a set of pairs of vertices, called terminals, and the goal is to find the minimum cost subgraph that connects each of the terminal pairs together. There exist several sequential heuristic and approximation algorithms for the Steiner Forest problem. In practice, the primal-dual 2-approximation algorithm is one of the fastest and obtains solutions that are very close to the optimal solution. In this paper, we design a practical parallel approximation algorithm based on the primal-dual sequential algorithm. The parallel algorithm maintains the approximation guarantees of the sequential primal-dual algorithm and it is specifically designed for execution on multi-core computers. We implement and run the parallel algorithm on a multi-core system with a large number of cores and perform an extensive experimental performance analysis on randomly generated graphs. The results show that our proposed parallel approximation algorithm achieves a significant speedup with respect to the sequential primal-dual algorithm.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114102765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1109/pdp55904.2022.00017
Ricardo Quislant, I. Fernandez, E. Serralvo, E. Gutiérrez, O. Plata
Time series analysis is an important research topic and a key step in monitoring and predicting events in many fields. Recently, the Matrix Profile method, and particularly two of its Euclidean-distance-based implementations – SCRIMP and SCAMP – have become the state-of-the-art approaches in this field. Those algorithms bring the possibility of obtaining exact motifs and discords from a time series, which can be used to infer events, predict outcomes, detect anomalies and more. While matrix profile is embarrassingly parallelizable, we find that autovectorization techniques fail to fully exploit the SIMD capabilities of modern CPU architectures. In this paper, we develop custom-vectorized SCRIMP and SCAMP implementations based on AVX2 and AVX-512 extensions, which we combine with multithreading techniques aimed at exploiting the potential of the underneath architectures. Our experimental evaluation, conducted using real data, shows a performance improvement of more than 4× with respect to the autovectorization.
{"title":"Exploiting Vector Extennsions to Accelerate Time Series Analysis","authors":"Ricardo Quislant, I. Fernandez, E. Serralvo, E. Gutiérrez, O. Plata","doi":"10.1109/pdp55904.2022.00017","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00017","url":null,"abstract":"Time series analysis is an important research topic and a key step in monitoring and predicting events in many fields. Recently, the Matrix Profile method, and particularly two of its Euclidean-distance-based implementations – SCRIMP and SCAMP – have become the state-of-the-art approaches in this field. Those algorithms bring the possibility of obtaining exact motifs and discords from a time series, which can be used to infer events, predict outcomes, detect anomalies and more. While matrix profile is embarrassingly parallelizable, we find that autovectorization techniques fail to fully exploit the SIMD capabilities of modern CPU architectures. In this paper, we develop custom-vectorized SCRIMP and SCAMP implementations based on AVX2 and AVX-512 extensions, which we combine with multithreading techniques aimed at exploiting the potential of the underneath architectures. Our experimental evaluation, conducted using real data, shows a performance improvement of more than 4× with respect to the autovectorization.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129478588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1109/pdp55904.2022.00034
Steven W. D. Chien, Artur Podobas, Martin Svedin, A. Tkachuk, Salem El Sayed, Pawel Herman, G. Umanesan, Sai B. Narasimhamurthy, S. Markidis
The strong consistency and stateful workflow are seen as the major factors for limiting parallel I/O performance because of the need for locking and state management. While the POSIX-based I/O model dominates modern HPC storage infrastructure, emerging object storage technology can potentially improve I/O performance by eliminating these bottlenecks. Despite a wide deployment on the cloud, its adoption in HPC remains low. We argue one reason is the lack of a suitable programming interface for parallel I/O in scientific applications. In this work, we introduce NoaSci, a Numerical Object Array library for scientific applications. NoaSci supports different data formats (e.g. HDF5, binary), and focuses on supporting nodelocal burst buffers and object stores. We demonstrate for the first time how scientific applications can perform parallel I/O on Seagate’s Motr object store through NoaSci. We evaluate NoaSci’s preliminary performance using the iPIC3D space weather application and position against existing I/O methods.
{"title":"NoaSci: A Numerical Object Array Library for I/O of Scientific Applications on Object Storage","authors":"Steven W. D. Chien, Artur Podobas, Martin Svedin, A. Tkachuk, Salem El Sayed, Pawel Herman, G. Umanesan, Sai B. Narasimhamurthy, S. Markidis","doi":"10.1109/pdp55904.2022.00034","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00034","url":null,"abstract":"The strong consistency and stateful workflow are seen as the major factors for limiting parallel I/O performance because of the need for locking and state management. While the POSIX-based I/O model dominates modern HPC storage infrastructure, emerging object storage technology can potentially improve I/O performance by eliminating these bottlenecks. Despite a wide deployment on the cloud, its adoption in HPC remains low. We argue one reason is the lack of a suitable programming interface for parallel I/O in scientific applications. In this work, we introduce NoaSci, a Numerical Object Array library for scientific applications. NoaSci supports different data formats (e.g. HDF5, binary), and focuses on supporting nodelocal burst buffers and object stores. We demonstrate for the first time how scientific applications can perform parallel I/O on Seagate’s Motr object store through NoaSci. We evaluate NoaSci’s preliminary performance using the iPIC3D space weather application and position against existing I/O methods.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114303546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1109/pdp55904.2022.00033
Shallaw Mohammed Ali, G. Kecskeméti
Users’ behaviours show a noticeable impact on cloud computing resources. Behaviour prediction models could foster usage awareness of cloud users. This requires training prediction models with datasets that provide user information. Unfortunately, such information is excluded from many relevant datasets. Therefore, in this work, we investigate the ability of extracting these identities via clustering methods. We conduct this by categorising workload datasets according to the availability of users information in their attributes. Then, we focus our attention on shared attributes between user information disclosing and non-disclosing datasets. Eventually, we evaluated the potential of several clustering approaches on user information disclosing datasets. Our results show that users’ identifications can be extracted with relatively high accuracy using clustering. They also show that the highest clustering precision is mostly obtained from the attributes representing request components that strongly relate to the user’s application.
{"title":"Clustering Datasets in Cloud Computing Environment for User Identification","authors":"Shallaw Mohammed Ali, G. Kecskeméti","doi":"10.1109/pdp55904.2022.00033","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00033","url":null,"abstract":"Users’ behaviours show a noticeable impact on cloud computing resources. Behaviour prediction models could foster usage awareness of cloud users. This requires training prediction models with datasets that provide user information. Unfortunately, such information is excluded from many relevant datasets. Therefore, in this work, we investigate the ability of extracting these identities via clustering methods. We conduct this by categorising workload datasets according to the availability of users information in their attributes. Then, we focus our attention on shared attributes between user information disclosing and non-disclosing datasets. Eventually, we evaluated the potential of several clustering approaches on user information disclosing datasets. Our results show that users’ identifications can be extracted with relatively high accuracy using clustering. They also show that the highest clustering precision is mostly obtained from the attributes representing request components that strongly relate to the user’s application.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116270232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}