Noaman Ahmad, Ben Baenen, Chen Chen, Jakob Eriksson
We present Trust, a general, type- and memory-safe alternative to locking in concurrent programs. Instead of synchronizing multi-threaded access to an object of type T with a lock, the programmer may place the object in a Trust. The object is then no longer directly accessible. Instead a designated thread, the object's trustee, is responsible for applying any requested operations to the object, as requested via the Trust API. Locking is often said to offer a limited throughput per lock. Trust is based on delegation, a message-passing technique which does not suffer this per-lock limitation. Instead, per-object throughput is limited by the capacity of the object's trustee, which is typically considerably higher. Our evaluation shows Trust consistently and considerably outperforming locking where lock contention exists, with up to 22x higher throughput in microbenchmarks, and 5-9x for a home grown key-value store, as well as memcached, in situations with high lock contention. Moreover, Trust is competitive with locks even in the absence of lock contention.
我们提出了 "信任"(Trust)这一通用的、类型和内存安全的并发程序锁定替代方案。程序员可以将 T 类型的对象置于信任中,而不是用锁来同步多线程对该对象的访问。这样,对象就不再被直接访问。取而代之的是,指定的线程,即对象的受托人,负责对对象执行任何通过信任 API 请求的操作。锁定通常被认为每锁提供的吞吐量有限。而信任是基于委托的,这是一种消息传递技术,不会受到每个锁的限制。相反,每个对象的吞吐量受限于对象托管人的容量,而托管人的容量通常要高得多。我们的评估结果表明,在存在锁竞争的情况下,信任技术的性能始终大大优于锁定技术,在微基准测试中,信任技术的吞吐量比锁定技术高出 22 倍;在锁竞争严重的情况下,信任技术的吞吐量比自家开发的键值存储和 memcached 高出 5-9 倍。此外,即使在没有锁竞争的情况下,Trust 也能与锁竞争。
{"title":"Delegation with Trust: A Scalable, Type- and Memory-Safe Alternative to Locks","authors":"Noaman Ahmad, Ben Baenen, Chen Chen, Jakob Eriksson","doi":"arxiv-2408.11173","DOIUrl":"https://doi.org/arxiv-2408.11173","url":null,"abstract":"We present Trust<T>, a general, type- and memory-safe alternative to locking\u0000in concurrent programs. Instead of synchronizing multi-threaded access to an\u0000object of type T with a lock, the programmer may place the object in a\u0000Trust<T>. The object is then no longer directly accessible. Instead a\u0000designated thread, the object's trustee, is responsible for applying any\u0000requested operations to the object, as requested via the Trust<T> API. Locking\u0000is often said to offer a limited throughput per lock. Trust<T> is based on\u0000delegation, a message-passing technique which does not suffer this per-lock\u0000limitation. Instead, per-object throughput is limited by the capacity of the\u0000object's trustee, which is typically considerably higher. Our evaluation shows\u0000Trust<T> consistently and considerably outperforming locking where lock\u0000contention exists, with up to 22x higher throughput in microbenchmarks, and\u00005-9x for a home grown key-value store, as well as memcached, in situations with\u0000high lock contention. Moreover, Trust<T> is competitive with locks even in the\u0000absence of lock contention.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current methods for low- and few-shot object detection have primarily focused on enhancing model performance for detecting objects. One common approach to achieve this is by combining model finetuning with data augmentation strategies. However, little attention has been given to the energy efficiency of these approaches in data-scarce regimes. This paper seeks to conduct a comprehensive empirical study that examines both model performance and energy efficiency of custom data augmentations and automated data augmentation selection strategies when combined with a lightweight object detector. The methods are evaluated in three different benchmark datasets in terms of their performance and energy consumption, and the Efficiency Factor is employed to gain insights into their effectiveness considering both performance and efficiency. Consequently, it is shown that in many cases, the performance gains of data augmentation strategies are overshadowed by their increased energy usage, necessitating the development of more energy efficient data augmentation strategies to address data scarcity.
{"title":"A Closer Look at Data Augmentation Strategies for Finetuning-Based Low/Few-Shot Object Detection","authors":"Vladislav Li, Georgios Tsoumplekas, Ilias Siniosoglou, Vasileios Argyriou, Anastasios Lytos, Eleftherios Fountoukidis, Panagiotis Sarigiannidis","doi":"arxiv-2408.10940","DOIUrl":"https://doi.org/arxiv-2408.10940","url":null,"abstract":"Current methods for low- and few-shot object detection have primarily focused\u0000on enhancing model performance for detecting objects. One common approach to\u0000achieve this is by combining model finetuning with data augmentation\u0000strategies. However, little attention has been given to the energy efficiency\u0000of these approaches in data-scarce regimes. This paper seeks to conduct a\u0000comprehensive empirical study that examines both model performance and energy\u0000efficiency of custom data augmentations and automated data augmentation\u0000selection strategies when combined with a lightweight object detector. The\u0000methods are evaluated in three different benchmark datasets in terms of their\u0000performance and energy consumption, and the Efficiency Factor is employed to\u0000gain insights into their effectiveness considering both performance and\u0000efficiency. Consequently, it is shown that in many cases, the performance gains\u0000of data augmentation strategies are overshadowed by their increased energy\u0000usage, necessitating the development of more energy efficient data augmentation\u0000strategies to address data scarcity.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tanzima Z. Islam, Aniruddha Marathe, Holland Schutte, Mohammad Zaeed
With heterogeneous systems, the number of GPUs per chip increases to provide computational capabilities for solving science at a nanoscopic scale. However, low utilization for single GPUs defies the need to invest more money for expensive ccelerators. While related work develops optimizations for improving application performance, none studies how these optimizations impact hardware resource usage or the average GPU utilization. This paper takes a data-driven analysis approach in addressing this gap by (1) characterizing how hardware resource usage affects device utilization, execution time, or both, (2) presenting a multi-objective metric to identify important application-device interactions that can be optimized to improve device utilization and application performance jointly, (3) studying hardware resource usage behaviors of several optimizations for a benchmark application, and finally (4) identifying optimization opportunities for several scientific proxy applications based on their hardware resource usage behaviors. Furthermore, we demonstrate the applicability of our methodology by applying the identified optimizations to a proxy application, which improves the execution time, device utilization and power consumption by up to 29.6%, 5.3% and 26.5% respectively.
{"title":"Data-Driven Analysis to Understand GPU Hardware Resource Usage of Optimizations","authors":"Tanzima Z. Islam, Aniruddha Marathe, Holland Schutte, Mohammad Zaeed","doi":"arxiv-2408.10143","DOIUrl":"https://doi.org/arxiv-2408.10143","url":null,"abstract":"With heterogeneous systems, the number of GPUs per chip increases to provide\u0000computational capabilities for solving science at a nanoscopic scale. However,\u0000low utilization for single GPUs defies the need to invest more money for\u0000expensive ccelerators. While related work develops optimizations for improving\u0000application performance, none studies how these optimizations impact hardware\u0000resource usage or the average GPU utilization. This paper takes a data-driven\u0000analysis approach in addressing this gap by (1) characterizing how hardware\u0000resource usage affects device utilization, execution time, or both, (2)\u0000presenting a multi-objective metric to identify important application-device\u0000interactions that can be optimized to improve device utilization and\u0000application performance jointly, (3) studying hardware resource usage behaviors\u0000of several optimizations for a benchmark application, and finally (4)\u0000identifying optimization opportunities for several scientific proxy\u0000applications based on their hardware resource usage behaviors. Furthermore, we\u0000demonstrate the applicability of our methodology by applying the identified\u0000optimizations to a proxy application, which improves the execution time, device\u0000utilization and power consumption by up to 29.6%, 5.3% and 26.5% respectively.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-performance computing (HPC) is essential for tackling complex computational problems across various domains. As the scale and complexity of HPC applications continue to grow, the need for scalable systems and software architectures becomes paramount. This paper provides a comprehensive overview of architecture for HPC on premise focusing on both hardware and software aspects and details the associated challenges in building the HPC cluster on premise. It explores design principles, challenges, and emerging trends in building scalable HPC systems and software, addressing issues such as parallelism, memory hierarchy, communication overhead, and fault tolerance on various cloud platforms. By synthesizing research findings and technological advancements, this paper aims to provide insights into scalable solutions for meeting the evolving demands of HPC applications on cloud.
{"title":"Scalable Systems and Software Architectures for High-Performance Computing on cloud platforms","authors":"Risshab Srinivas Ramesh","doi":"arxiv-2408.10281","DOIUrl":"https://doi.org/arxiv-2408.10281","url":null,"abstract":"High-performance computing (HPC) is essential for tackling complex\u0000computational problems across various domains. As the scale and complexity of\u0000HPC applications continue to grow, the need for scalable systems and software\u0000architectures becomes paramount. This paper provides a comprehensive overview\u0000of architecture for HPC on premise focusing on both hardware and software\u0000aspects and details the associated challenges in building the HPC cluster on\u0000premise. It explores design principles, challenges, and emerging trends in\u0000building scalable HPC systems and software, addressing issues such as\u0000parallelism, memory hierarchy, communication overhead, and fault tolerance on\u0000various cloud platforms. By synthesizing research findings and technological\u0000advancements, this paper aims to provide insights into scalable solutions for\u0000meeting the evolving demands of HPC applications on cloud.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aravind Sankaran, Ilya Zhukov, Wolfgang Frings, Paolo Bientinesi
We aim to identify the differences in Input/Output(I/O) behavior between multiple user programs in terms of contentions for system resources by inspecting the I/O requests made to the operating system. A typical program issues a large number of I/O requests to the operating system, thereby making the process of inspection challenging. In this paper, we address this challenge by presenting a methodology to synthesize I/O system call traces into a specific type of directed graph, known as the Directly-Follows-Graph (DFG). Based on the DFG, we present a technique to compare the traces from multiple programs or different configurations of the same program, such that it is possible to identify the differences in the I/O requests made to the operating system. We apply our methodology to the IOR benchmark, and compare the contentions for file accesses when the benchmark is run with different options for file output and software interface.
{"title":"Inspection of I/O Operations from System Call Traces using Directly-Follows-Graph","authors":"Aravind Sankaran, Ilya Zhukov, Wolfgang Frings, Paolo Bientinesi","doi":"arxiv-2408.07378","DOIUrl":"https://doi.org/arxiv-2408.07378","url":null,"abstract":"We aim to identify the differences in Input/Output(I/O) behavior between\u0000multiple user programs in terms of contentions for system resources by\u0000inspecting the I/O requests made to the operating system. A typical program\u0000issues a large number of I/O requests to the operating system, thereby making\u0000the process of inspection challenging. In this paper, we address this challenge\u0000by presenting a methodology to synthesize I/O system call traces into a\u0000specific type of directed graph, known as the Directly-Follows-Graph (DFG).\u0000Based on the DFG, we present a technique to compare the traces from multiple\u0000programs or different configurations of the same program, such that it is\u0000possible to identify the differences in the I/O requests made to the operating\u0000system. We apply our methodology to the IOR benchmark, and compare the\u0000contentions for file accesses when the benchmark is run with different options\u0000for file output and software interface.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"320 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philipp Suffa, Markus Holzer, Harald Köstler, Ulrich Rüde
We implement and analyse a sparse / indirect-addressing data structure for the Lattice Boltzmann Method to support efficient compute kernels for fluid dynamics problems with a high number of non-fluid nodes in the domain, such as in porous media flows. The data structure is integrated into a code generation pipeline to enable sparse Lattice Boltzmann Methods with a variety of stencils and collision operators and to generate efficient code for kernels for CPU as well as for AMD and NVIDIA accelerator cards. We optimize these sparse kernels with an in-place streaming pattern to save memory accesses and memory consumption and we implement a communication hiding technique to prove scalability. We present single GPU performance results with up to 99% of maximal bandwidth utilization. We integrate the optimized generated kernels in the high performance framework WALBERLA and achieve a scaling efficiency of at least 82% on up to 1024 NVIDIA A100 GPUs and up to 4096 AMD MI250X GPUs on modern HPC systems. Further, we set up three different applications to test the sparse data structure for realistic demonstrator problems. We show performance results for flow through porous media, free flow over a particle bed, and blood flow in a coronary artery. We achieve a maximal performance speed-up of 2 and a significantly reduced memory consumption by up to 75% with the sparse / indirect-addressing data structure compared to the direct-addressing data structure for these applications.
{"title":"Architecture Specific Generation of Large Scale Lattice Boltzmann Methods for Sparse Complex Geometries","authors":"Philipp Suffa, Markus Holzer, Harald Köstler, Ulrich Rüde","doi":"arxiv-2408.06880","DOIUrl":"https://doi.org/arxiv-2408.06880","url":null,"abstract":"We implement and analyse a sparse / indirect-addressing data structure for\u0000the Lattice Boltzmann Method to support efficient compute kernels for fluid\u0000dynamics problems with a high number of non-fluid nodes in the domain, such as\u0000in porous media flows. The data structure is integrated into a code generation\u0000pipeline to enable sparse Lattice Boltzmann Methods with a variety of stencils\u0000and collision operators and to generate efficient code for kernels for CPU as\u0000well as for AMD and NVIDIA accelerator cards. We optimize these sparse kernels\u0000with an in-place streaming pattern to save memory accesses and memory\u0000consumption and we implement a communication hiding technique to prove\u0000scalability. We present single GPU performance results with up to 99% of\u0000maximal bandwidth utilization. We integrate the optimized generated kernels in\u0000the high performance framework WALBERLA and achieve a scaling efficiency of at\u0000least 82% on up to 1024 NVIDIA A100 GPUs and up to 4096 AMD MI250X GPUs on\u0000modern HPC systems. Further, we set up three different applications to test the\u0000sparse data structure for realistic demonstrator problems. We show performance\u0000results for flow through porous media, free flow over a particle bed, and blood\u0000flow in a coronary artery. We achieve a maximal performance speed-up of 2 and a\u0000significantly reduced memory consumption by up to 75% with the sparse /\u0000indirect-addressing data structure compared to the direct-addressing data\u0000structure for these applications.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"176 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contemporary memory systems contain a variety of memory types, each possessing distinct characteristics. This trend empowers applications to opt for memory types aligning with developer's desired behavior. As a result, developers gain flexibility to tailor their applications to specific needs, factoring in attributes like latency, bandwidth, and power consumption. Our research centers on the aspect of power consumption within memory systems. We introduce an approach that equips developers with comprehensive insights into the power consumption of individual memory types. Additionally, we propose an ordered hierarchy of memory types. Through this methodology, developers can make informed decisions for efficient memory usage aligned with their unique requirements.
{"title":"Understanding Power Consumption Metric on Heterogeneous Memory Systems","authors":"Andrès Rubio Proaño, Kento Sato","doi":"arxiv-2408.06579","DOIUrl":"https://doi.org/arxiv-2408.06579","url":null,"abstract":"Contemporary memory systems contain a variety of memory types, each\u0000possessing distinct characteristics. This trend empowers applications to opt\u0000for memory types aligning with developer's desired behavior. As a result,\u0000developers gain flexibility to tailor their applications to specific needs,\u0000factoring in attributes like latency, bandwidth, and power consumption. Our\u0000research centers on the aspect of power consumption within memory systems. We\u0000introduce an approach that equips developers with comprehensive insights into\u0000the power consumption of individual memory types. Additionally, we propose an\u0000ordered hierarchy of memory types. Through this methodology, developers can\u0000make informed decisions for efficient memory usage aligned with their unique\u0000requirements.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pranab DashPurdue University, Y. Charlie HuPurdue University, Abhilash JindalIIT Delhi
The rise of machine learning workload on smartphones has propelled GPUs into one of the most power-hungry components of modern smartphones and elevates the need for optimizing the GPU power draw by mobile apps. Optimizing the power consumption of mobile GPUs in turn requires accurate estimation of their power draw during app execution. In this paper, we observe that the prior-art, utilization-frequency based GPU models cannot capture the diverse micro-architectural usage of modern mobile GPUs.We show that these models suffer poor modeling accuracy under diverse GPU workload, and study whether performance monitoring counter (PMC)-based models recently proposed for desktop/server GPUs can be applied to accurately model mobile GPU power. Our study shows that the PMCs that come with dominating mobile GPUs used in modern smartphones are sufficient to model mobile GPU power, but exhibit multicollinearity if used altogether. We present APGPM, the mobile GPU power modeling methodology that automatically selects an optimal set of PMCs that maximizes the GPU power model accuracy. Evaluation on two representative mobile GPUs shows that APGPM-generated GPU power models reduce the MAPE modeling error of prior-art by 1.95x to 2.66x (i.e., by 11.3% to 15.4%) while using only 4.66% to 20.41% of the total number of available PMCs.
{"title":"Automated PMC-based Power Modeling Methodology for Modern Mobile GPUs","authors":"Pranab DashPurdue University, Y. Charlie HuPurdue University, Abhilash JindalIIT Delhi","doi":"arxiv-2408.04886","DOIUrl":"https://doi.org/arxiv-2408.04886","url":null,"abstract":"The rise of machine learning workload on smartphones has propelled GPUs into\u0000one of the most power-hungry components of modern smartphones and elevates the\u0000need for optimizing the GPU power draw by mobile apps. Optimizing the power\u0000consumption of mobile GPUs in turn requires accurate estimation of their power\u0000draw during app execution. In this paper, we observe that the prior-art,\u0000utilization-frequency based GPU models cannot capture the diverse\u0000micro-architectural usage of modern mobile GPUs.We show that these models\u0000suffer poor modeling accuracy under diverse GPU workload, and study whether\u0000performance monitoring counter (PMC)-based models recently proposed for\u0000desktop/server GPUs can be applied to accurately model mobile GPU power. Our\u0000study shows that the PMCs that come with dominating mobile GPUs used in modern\u0000smartphones are sufficient to model mobile GPU power, but exhibit\u0000multicollinearity if used altogether. We present APGPM, the mobile GPU power\u0000modeling methodology that automatically selects an optimal set of PMCs that\u0000maximizes the GPU power model accuracy. Evaluation on two representative mobile\u0000GPUs shows that APGPM-generated GPU power models reduce the MAPE modeling error\u0000of prior-art by 1.95x to 2.66x (i.e., by 11.3% to 15.4%) while using only 4.66%\u0000to 20.41% of the total number of available PMCs.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"98 5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jakob Görgen, Vaastav Anand, Hejing Li, Jialin Li, Antoine Kaufmann
Fully understanding performance is a growing challenge when building next-generation cloud systems. Often these systems build on next-generation hardware, and evaluation in realistic physical testbeds is out of reach. Even when physical testbeds are available, visibility into essential system aspects is a challenge in modern systems where system performance depends on often sub-$mu s$ interactions between HW and SW components. Existing tools such as performance counters, logging, and distributed tracing provide aggregate or sampled information, but remain insufficient for understanding individual requests in-depth. In this paper, we explore a fundamentally different approach to enable in-depth understanding of cloud system behavior at the software and hardware level, with (almost) arbitrarily fine-grained visibility. Our proposal is to run cloud systems in detailed full-system simulations, configure the simulators to collect detailed events without affecting the system, and finally assemble these events into end-to-end system traces that can be analyzed by existing distributed tracing tools.
{"title":"Columbo: Low Level End-to-End System Traces through Modular Full-System Simulation","authors":"Jakob Görgen, Vaastav Anand, Hejing Li, Jialin Li, Antoine Kaufmann","doi":"arxiv-2408.05251","DOIUrl":"https://doi.org/arxiv-2408.05251","url":null,"abstract":"Fully understanding performance is a growing challenge when building\u0000next-generation cloud systems. Often these systems build on next-generation\u0000hardware, and evaluation in realistic physical testbeds is out of reach. Even\u0000when physical testbeds are available, visibility into essential system aspects\u0000is a challenge in modern systems where system performance depends on often\u0000sub-$mu s$ interactions between HW and SW components. Existing tools such as\u0000performance counters, logging, and distributed tracing provide aggregate or\u0000sampled information, but remain insufficient for understanding individual\u0000requests in-depth. In this paper, we explore a fundamentally different approach\u0000to enable in-depth understanding of cloud system behavior at the software and\u0000hardware level, with (almost) arbitrarily fine-grained visibility. Our proposal\u0000is to run cloud systems in detailed full-system simulations, configure the\u0000simulators to collect detailed events without affecting the system, and finally\u0000assemble these events into end-to-end system traces that can be analyzed by\u0000existing distributed tracing tools.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The blockchain system has emerged as one of the focal points of research in recent years, particularly in applications and services such as cryptocurrencies and smart contracts. In this context, the hash value serves as a crucial element in linking blocks within the blockchain, ensuring the integrity of block contents. Therefore, hash algorithms represent a vital security technology for ensuring the integrity and security of blockchain systems. This study primarily focuses on analyzing the security and execution efficiency of mainstream hash algorithms in the Proof of Work (PoW) calculations within blockchain systems. It proposes an evaluation factor and conducts comparative experiments to evaluate each hash algorithm. The experimental results indicate that there are no significant differences in the security aspects among SHA-2, SHA-3, and BLAKE2. However, SHA-2 and BLAKE2 demonstrate shorter computation times, indicating higher efficiency in execution.
{"title":"Evaluation of Hash Algorithm Performance for Cryptocurrency Exchanges Based on Blockchain System","authors":"Abel C. H. Chen","doi":"arxiv-2408.11950","DOIUrl":"https://doi.org/arxiv-2408.11950","url":null,"abstract":"The blockchain system has emerged as one of the focal points of research in\u0000recent years, particularly in applications and services such as\u0000cryptocurrencies and smart contracts. In this context, the hash value serves as\u0000a crucial element in linking blocks within the blockchain, ensuring the\u0000integrity of block contents. Therefore, hash algorithms represent a vital\u0000security technology for ensuring the integrity and security of blockchain\u0000systems. This study primarily focuses on analyzing the security and execution\u0000efficiency of mainstream hash algorithms in the Proof of Work (PoW)\u0000calculations within blockchain systems. It proposes an evaluation factor and\u0000conducts comparative experiments to evaluate each hash algorithm. The\u0000experimental results indicate that there are no significant differences in the\u0000security aspects among SHA-2, SHA-3, and BLAKE2. However, SHA-2 and BLAKE2\u0000demonstrate shorter computation times, indicating higher efficiency in\u0000execution.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}