Companion of the 2022 ACM/SPEC International Conference on Performance Engineering最新文献

英文中文

Characterizing X86 and ARM Serverless Performance Variation: A Natural Language Processing Case Study 表征X86和ARM无服务器性能变化:自然语言处理案例研究

Companion of the 2022 ACM/SPEC International Conference on Performance Engineering

Pub Date : 2022-07-14 DOI: 10.1145/3491204.3543506

Danielle Lambion, Robert Schmitz, R. Cordingly, Navid Heydari, W. Lloyd

In this paper, we leverage a Natural Language Processing (NLP) pipeline for topic modeling consisting of three functions for data preprocessing, model training, and inferencing to analyze serverless platform performance variation. Specifically, we investigated performance using x86_64 and ARM64 processors over a 24-hour day starting at midnight local time on four cloud regions across three continents on AWS Lambda. We identified public cloud resource contention by leveraging the CPU steal metric, and examined relationships to NLP pipeline runtime. Intel x86_64 Xeon processors at the same clock rate as ARM64 processors (Graviton 2) were more than 23% faster for model training, but ARM64 processors were faster for data preprocessing and inferencing. Use of the Intel x86_64 architecture for the NLP pipeline was up to 33.4% more expensive than ARM64 as a result of incentivized pricing from the cloud provider and slower pipeline runtime due to greater resource contention for Intel processors.

在本文中，我们利用自然语言处理(NLP)管道进行主题建模，该管道由数据预处理、模型训练和推理三个功能组成，以分析无服务器平台的性能变化。具体来说，我们在AWS Lambda上横跨三大洲的四个云区域上使用x86_64和ARM64处理器，从当地时间午夜开始，在24小时内调查了性能。我们通过利用CPU窃取指标确定了公共云资源争用，并检查了与NLP管道运行时的关系。与ARM64处理器(Graviton 2)时钟速率相同的Intel x86_64 Xeon处理器在模型训练方面要快23%以上，但ARM64处理器在数据预处理和推理方面要快得多。使用Intel x86_64架构的NLP管道的成本比ARM64高33.4%，这是由于云提供商的激励定价以及由于Intel处理器更大的资源争用而导致的管道运行时间较慢。

引用次数: 5

Change Point Detection for MongoDB Time Series Performance Regression MongoDB时间序列性能回归的变化点检测

Companion of the 2022 ACM/SPEC International Conference on Performance Engineering

Pub Date : 2022-07-14 DOI: 10.1145/3491204.3527488

Mark Leznik, Md Shahriar Iqbal, Igor A. Trubin, Arne Lochner, Pooyan Jamshidi, A. Bauer

Commits to the MongoDB software repository trigger a collection of automatically run tests. Here, the identification of commits responsible for performance regressions is paramount. Previously, the process relied on manual inspection of time series graphs to identify significant changes, later replaced with a threshold-based detection system. However, neither system was sufficient for finding changes in performance in a timely manner. This work describes our recent implementation of a change point detection system built upon time series features, a voting system, the Perfomalist approach, and XGBoost. The algorithm produces a list of change points representing significant changes from a given history of performance results. We are able to automatically detect change points and achieve an 83% accuracy, all while reducing the human effort in the process.

提交到MongoDB软件存储库会触发一组自动运行的测试。在这里，识别导致性能退化的提交是至关重要的。以前，该过程依赖于人工检查时间序列图来识别重大变化，后来被基于阈值的检测系统所取代。但是，这两种系统都不足以及时发现业绩的变化。这项工作描述了我们最近实现的一个基于时间序列特征、投票系统、Perfomalist方法和XGBoost的变化点检测系统。该算法生成一个变化点列表，表示从给定的性能结果历史中产生的重大变化。我们能够自动检测变更点并达到83%的准确率，同时减少了过程中的人力。

引用次数: 3

Performance Evaluation of GraphCore IPU-M2000 Accelerator for Text Detection Application GraphCore IPU-M2000文本检测加速器性能评价

Companion of the 2022 ACM/SPEC International Conference on Performance Engineering

Pub Date : 2022-07-14 DOI: 10.1145/3491204.3527469

Nupur Sumeet, Karan Rawat, M. Nambiar

The large compute load and memory footprint of modern deep neural networks motivates the use of accelerators for high through- put deployments in application spanning multiple domains. In this paper, we evaluate throughput capabilities of a comparatively new hardware from Graphcore, IPU-M2000 that supports massive par- allelism and in-memory compute. For a text detection model, we measured the throughput and power variations with batch size. We also evaluate compressed versions of this model and analyze perfor- mance variation with model precision. Additionally, we compare IPU (Intelligence Processing Unit) results with state-of-the-art GPU and FPGA deployments of a compute intensive text region detec- tion application. Our experiments suggest, IPU supports superior throughput, 27×, 1.89×, and 1.56× as compared to CPU, FPGA DPU and A100 GPU, respectively for text detection application.

现代深度神经网络庞大的计算负载和内存占用促使加速器在跨多个领域的应用中用于高吞吐量部署。在本文中，我们评估了来自Graphcore的一种相对较新的硬件，IPU-M2000的吞吐量能力，该硬件支持大规模并行化和内存计算。对于文本检测模型，我们测量了吞吐量和功率随批大小的变化。我们还评估了该模型的压缩版本，并分析了性能随模型精度的变化。此外，我们将IPU(智能处理单元)结果与最先进的GPU和FPGA部署的计算密集型文本区域检测应用程序进行比较。我们的实验表明，在文本检测应用中，IPU的吞吐量分别是CPU、FPGA DPU和A100 GPU的27倍、1.89倍和1.56倍。

引用次数: 2

HLS_Profiler: Non-Intrusive Profiling tool for HLS based Applications HLS_Profiler:基于HLS的应用程序的非侵入性分析工具

Companion of the 2022 ACM/SPEC International Conference on Performance Engineering

Pub Date : 2022-07-14 DOI: 10.1145/3491204.3527496

Nupur Sumeet, D. Deeksha, M. Nambiar

The High-Level Synthesis (HLS) tools aid in simplified and faster design development without familiarity with Hardware Description Language (HDL) and Register Transfer Logic (RTL) design flow that can be implemented on an FPGA (Field Programmable Gate Array). However, it is not straight forward to trace and link source code to synthesized hardware design. On the other hand, the traditional RTL-based design development flow provides the fine-grained performance profile through waveforms. With the same level of visibility in HLS designs, the designers can identify the performance-bottlenecks and obtain the target performance by iteratively fine-tuning the source code. Although, the HLS development tools provide the low-level waveforms, interpreting them in terms of source code variables is a challenging and tedious task. Addressing this gap, we propose to demonstrate an automated profiler tool, HLS_Profiler, that provides a performance profile of source code in a cycle-accurate manner.

高级综合(HLS)工具有助于简化和更快的设计开发，而无需熟悉硬件描述语言(HDL)和寄存器传输逻辑(RTL)设计流程，可以在FPGA(现场可编程门阵列)上实现。但是，要跟踪源代码并将其链接到合成硬件设计并不容易。另一方面，传统的基于rtl的设计开发流程通过波形提供细粒度的性能概况。在HLS设计中具有相同级别的可见性，设计人员可以识别性能瓶颈，并通过迭代微调源代码来获得目标性能。尽管HLS开发工具提供了低级波形，但是根据源代码变量来解释它们是一项具有挑战性且乏味的任务。为了解决这个问题，我们建议演示一个自动化的分析工具，HLS_Profiler，它以周期精确的方式提供源代码的性能分析。

引用次数: 0

MiSeRTrace: Kernel-level Request Tracing for Microservice Visibility MiSeRTrace:微服务可见性的内核级请求跟踪

Companion of the 2022 ACM/SPEC International Conference on Performance Engineering

Pub Date : 2022-03-26 DOI: 10.1145/3491204.3527462

Thrivikraman V, Vishnu R. Dixit, Nikhil Ram S, Vikas K. Gowda, Santhosh Kumar Vasudevan, Subramaniam Kalambur

With the evolution of microservice applications, the underlying architectures have become increasingly complex compared to their monolith counterparts. This mainly brings in the challenge of observability. By providing a deeper understanding into the functioning of distributed applications, observability enables improving the performance of the system by obtaining a view of the bottlenecks in the implementation. The observability provided by currently existing tools that perform dynamic tracing on distributed applications is limited to the user-space and requires the application to be instrumented to track request flows. In this paper, we present a new open-source framework MiSeRTrace that can trace the end-to-end path of requests entering a microservice application at the kernel space without requiring instrumentation or modification of the application. Observability at the comprehensiveness of the kernel space allows breaking down of various steps in activities such as network transfers and IO tasks, thus enabling root cause based performance analysis and accurate identification of hotspots. MiSeRTrace supports tracing user-enabled kernel events provided by frameworks such as bpftrace or ftrace and isolates kernel activity associated with each application request with minimal overheads. We then demonstrate the working of the solution with results on a benchmark microservice application.

随着微服务应用程序的发展，底层体系结构变得越来越复杂。这主要带来了可观测性的挑战。通过对分布式应用程序的功能提供更深入的理解，可观察性可以通过获得实现中的瓶颈视图来改进系统的性能。对分布式应用程序执行动态跟踪的现有工具所提供的可观察性仅限于用户空间，并且需要对应用程序进行检测以跟踪请求流。在本文中，我们提出了一个新的开源框架MiSeRTrace，它可以在内核空间跟踪进入微服务应用程序的请求的端到端路径，而无需对应用程序进行检测或修改。内核空间的可观察性允许分解网络传输和IO任务等活动中的各个步骤，从而实现基于根本原因的性能分析和准确识别热点。MiSeRTrace支持跟踪框架(如bpftrace或ftrace)提供的由用户启用的内核事件，并以最小的开销隔离与每个应用程序请求相关的内核活动。然后，我们在一个基准微服务应用程序上演示了解决方案的工作结果。

{"title":"MiSeRTrace: Kernel-level Request Tracing for Microservice Visibility","authors":"Thrivikraman V, Vishnu R. Dixit, Nikhil Ram S, Vikas K. Gowda, Santhosh Kumar Vasudevan, Subramaniam Kalambur","doi":"10.1145/3491204.3527462","DOIUrl":"https://doi.org/10.1145/3491204.3527462","url":null,"abstract":"With the evolution of microservice applications, the underlying architectures have become increasingly complex compared to their monolith counterparts. This mainly brings in the challenge of observability. By providing a deeper understanding into the functioning of distributed applications, observability enables improving the performance of the system by obtaining a view of the bottlenecks in the implementation. The observability provided by currently existing tools that perform dynamic tracing on distributed applications is limited to the user-space and requires the application to be instrumented to track request flows. In this paper, we present a new open-source framework MiSeRTrace that can trace the end-to-end path of requests entering a microservice application at the kernel space without requiring instrumentation or modification of the application. Observability at the comprehensiveness of the kernel space allows breaking down of various steps in activities such as network transfers and IO tasks, thus enabling root cause based performance analysis and accurate identification of hotspots. MiSeRTrace supports tracing user-enabled kernel events provided by frameworks such as bpftrace or ftrace and isolates kernel activity associated with each application request with minimal overheads. We then demonstrate the working of the solution with results on a benchmark microservice application.","PeriodicalId":129216,"journal":{"name":"Companion of the 2022 ACM/SPEC International Conference on Performance Engineering","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133267727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Beauty and the Beast: A Case Study on Performance Prototyping of Data-Intensive Containerized Cloud Applications 美女与野兽:数据密集型容器化云应用性能原型的案例研究

Companion of the 2022 ACM/SPEC International Conference on Performance Engineering

Pub Date : 2022-03-17 DOI: 10.1145/3491204.3527482

Floriment Klinaku, Martina Rapp, Jörg Henß, Stephan Rhode

Data-intensive container-based cloud applications have become popular with the increased use cases in the Internet of Things domain. Challenges arise when engineering such applications to meet quality requirements, both classical ones like performance and emerging ones like resilience. There is a lack of reference use cases, applications, and experiences when prototyping such applications that could benefit the research community. Moreover, it is hard to generate realistic and reliable workloads that exercise the resources according to a specification. Hence, designing reference applications that would exhibit similar performance behavior in such environments is hard. In this paper, we present a work in progress towards a reference use case and application for data-intensive containerized cloud applications having an industrial motivation. Moreover, to generate reliable CPU workloads we make use of ProtoCom, a well-known library for the generation of resource demands, and report the performance under various quality requirements in a Kubernetes cluster of moderate size. Finally, we present the scalability of the current solution assuming a particular autoscaling policy. Results of the calibration show high variability of the ProtoCom library when executed in a cloud environment. We observe a moderate association between the occupancy of node and the relative variability of execution time.

随着物联网领域用例的增加，基于数据密集型容器的云应用程序已经变得流行起来。当设计这样的应用程序以满足质量要求时，挑战就出现了，包括传统的要求(如性能)和新兴的要求(如弹性)。在对这样的应用程序进行原型设计时，缺乏参考用例、应用程序和经验，而这些应用程序可以使研究社区受益。此外，很难根据规范生成实际可靠的工作负载来使用资源。因此，设计在这种环境中表现出类似性能行为的参考应用程序是很困难的。在本文中，我们介绍了一项正在进行的工作，旨在为具有工业动机的数据密集型容器化云应用程序提供参考用例和应用程序。此外，为了生成可靠的CPU工作负载，我们使用了ProtoCom，一个众所周知的用于生成资源需求的库，并在中等规模的Kubernetes集群中报告各种质量要求下的性能。最后，我们给出了当前解决方案的可扩展性，假设一个特定的自动扩展策略。校准结果表明，在云环境中执行时，ProtoCom库具有很高的可变性。我们观察到节点占用和执行时间的相对可变性之间存在适度的关联。

{"title":"Beauty and the Beast: A Case Study on Performance Prototyping of Data-Intensive Containerized Cloud Applications","authors":"Floriment Klinaku, Martina Rapp, Jörg Henß, Stephan Rhode","doi":"10.1145/3491204.3527482","DOIUrl":"https://doi.org/10.1145/3491204.3527482","url":null,"abstract":"Data-intensive container-based cloud applications have become popular with the increased use cases in the Internet of Things domain. Challenges arise when engineering such applications to meet quality requirements, both classical ones like performance and emerging ones like resilience. There is a lack of reference use cases, applications, and experiences when prototyping such applications that could benefit the research community. Moreover, it is hard to generate realistic and reliable workloads that exercise the resources according to a specification. Hence, designing reference applications that would exhibit similar performance behavior in such environments is hard. In this paper, we present a work in progress towards a reference use case and application for data-intensive containerized cloud applications having an industrial motivation. Moreover, to generate reliable CPU workloads we make use of ProtoCom, a well-known library for the generation of resource demands, and report the performance under various quality requirements in a Kubernetes cluster of moderate size. Finally, we present the scalability of the current solution assuming a particular autoscaling policy. Results of the calibration show high variability of the ProtoCom library when executed in a cloud environment. We observe a moderate association between the occupancy of node and the relative variability of execution time.","PeriodicalId":129216,"journal":{"name":"Companion of the 2022 ACM/SPEC International Conference on Performance Engineering","volume":" 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120933934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Companion of the 2022 ACM/SPEC International Conference on Performance Engineering 2022年ACM/SPEC性能工程国际会议论文集

Companion of the 2022 ACM/SPEC International Conference on Performance Engineering

Pub Date : 1900-01-01 DOI: 10.1145/3491204

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Companion of the 2022 ACM/SPEC International Conference on Performance Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀