首页 > 最新文献

Proceedings of the ACM on Management of Data最新文献

英文 中文
BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach BladeDISC:通过编译器方法优化动态形状机器学习工作负载
Pub Date : 2023-11-13 DOI: 10.1145/3617327
Zhen Zheng, Zaifeng Pan, Dalin Wang, Kai Zhu, Wenyi Zhao, Tianyou Guo, Xiafei Qiu, Minmin Sun, Junjie Bai, Feng Zhang, Xiaoyong Du, Jidong Zhai, Wei Lin
Compiler optimization plays an increasingly important role to boost the performance of machine learning models for data processing and management. With increasingly complex data, the dynamic tensor shape phenomenon emerges for ML models. However, existing ML compilers either can only handle static shape models or expose a series of performance problems for both operator fusion optimization and code generation in dynamic shape scenes. This paper tackles the main challenges of dynamic shape optimization: the fusion optimization without shape value, and code generation supporting arbitrary shapes. To tackle the fundamental challenge of the absence of shape values, it systematically abstracts and excavates the shape information and designs a cross-level symbolic shape representation. With the insight that what fusion optimization relies upon is tensor shape relationships between adjacent operators rather than exact shape values, it proposes the dynamic shape fusion approach based on shape information propagation. To generate code that adapts to arbitrary shapes efficiently, it proposes a compile-time and runtime combined code generation approach. Finally, it presents a complete optimization pipeline for dynamic shape models and implements an industrial-grade ML compiler, named BladeDISC. The extensive evaluation demonstrates that BladeDISC outperforms PyTorch, TorchScript, TVM, ONNX Runtime, XLA, Torch Inductor (dynamic shape), and TensorRT by up to 6.95×, 6.25×, 4.08×, 2.04×, 2.06×, 7.92×, and 4.16× (3.54×, 3.12×, 1.95×, 1.47×, 1.24×, 2.93×, and 1.46× on average) in terms of end-to-end inference speedup on the A10 and T4 GPU, respectively. BladeDISC's source code is publicly available at https://github.com/alibaba/BladeDISC.
编译器优化在提高机器学习模型的数据处理和管理性能方面发挥着越来越重要的作用。随着数据的日益复杂,ML模型出现了动态张量形状现象。然而,现有的ML编译器要么只能处理静态形状模型,要么在动态形状场景的算子融合优化和代码生成方面暴露出一系列性能问题。本文解决了动态形状优化的主要挑战:无形状值的融合优化和支持任意形状的代码生成。为了解决形状值缺失的根本性挑战,系统地对形状信息进行抽象和挖掘,设计了一种跨层次的符号形状表示。鉴于融合优化依赖的是相邻算子之间的张量形状关系,而不是精确的形状值,提出了基于形状信息传播的动态形状融合方法。为了有效地生成适应任意形状的代码,提出了一种编译时和运行时相结合的代码生成方法。最后,给出了一个完整的动态形状模型优化管道,并实现了一个工业级ML编译器BladeDISC。广泛的评估表明,在A10和T4 GPU上,BladeDISC在端到端推理加速方面分别优于PyTorch、TorchScript、TVM、ONNX Runtime、XLA、Torch Inductor(动态形状)和TensorRT,分别高达6.95×、6.25×、4.08×、2.04×、2.06×、7.92×和4.16×(平均为3.54×、3.12×、1.95×、1.47×、1.24×、2.93×和1.46×)。BladeDISC的源代码可在https://github.com/alibaba/BladeDISC公开获取。
{"title":"BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach","authors":"Zhen Zheng, Zaifeng Pan, Dalin Wang, Kai Zhu, Wenyi Zhao, Tianyou Guo, Xiafei Qiu, Minmin Sun, Junjie Bai, Feng Zhang, Xiaoyong Du, Jidong Zhai, Wei Lin","doi":"10.1145/3617327","DOIUrl":"https://doi.org/10.1145/3617327","url":null,"abstract":"Compiler optimization plays an increasingly important role to boost the performance of machine learning models for data processing and management. With increasingly complex data, the dynamic tensor shape phenomenon emerges for ML models. However, existing ML compilers either can only handle static shape models or expose a series of performance problems for both operator fusion optimization and code generation in dynamic shape scenes. This paper tackles the main challenges of dynamic shape optimization: the fusion optimization without shape value, and code generation supporting arbitrary shapes. To tackle the fundamental challenge of the absence of shape values, it systematically abstracts and excavates the shape information and designs a cross-level symbolic shape representation. With the insight that what fusion optimization relies upon is tensor shape relationships between adjacent operators rather than exact shape values, it proposes the dynamic shape fusion approach based on shape information propagation. To generate code that adapts to arbitrary shapes efficiently, it proposes a compile-time and runtime combined code generation approach. Finally, it presents a complete optimization pipeline for dynamic shape models and implements an industrial-grade ML compiler, named BladeDISC. The extensive evaluation demonstrates that BladeDISC outperforms PyTorch, TorchScript, TVM, ONNX Runtime, XLA, Torch Inductor (dynamic shape), and TensorRT by up to 6.95×, 6.25×, 4.08×, 2.04×, 2.06×, 7.92×, and 4.16× (3.54×, 3.12×, 1.95×, 1.47×, 1.24×, 2.93×, and 1.46× on average) in terms of end-to-end inference speedup on the A10 and T4 GPU, respectively. BladeDISC's source code is publicly available at https://github.com/alibaba/BladeDISC.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"36 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136281618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PACMMOD Volume 1, Issue 3: Editorial PACMMOD第1卷,第3期:社论
Pub Date : 2023-11-13 DOI: 10.1145/3617307
Divyakant Agrawal, Alexandra Meliou, S. Sudarshan
We are excited to introduce this new issue of PACMMOD (Proceedings of the ACM on Management of Data). PACMMOD is a new journal, concerned with the principles, algorithms, techniques, systems, and applications of database management systems, data management technology, and science and engineering of data. It includes articles reporting cutting-edge data management, data engineering, and data science research. Articles published at PACMMOD address data challenges at various stages of the data lifecycle, from modeling, acquisition, cleaning, integration, indexing, querying, analysis, exploration, visualization, interpretation, and explanation. They focus on dataintensive components of data pipelines, and solve problems in areas of interest to our community (e.g., data curation, optimization, performance, storage, systems), operating within accuracy, privacy, fairness, and diversity constraints. Articles reporting deployed systems and solutions to data science pipelines and/or fundamental experiences and insights from evaluating real-world data engineering problems are especially encouraged.
我们很高兴地向大家介绍最新一期的PACMMOD (ACM数据管理论文集)。PACMMOD是一本新的期刊,关注数据库管理系统、数据管理技术、数据科学与工程的原理、算法、技术、系统和应用。它包括报道前沿数据管理、数据工程和数据科学研究的文章。在PACMMOD上发表的文章涉及数据生命周期各个阶段的数据挑战,包括建模、获取、清理、集成、索引、查询、分析、探索、可视化、解释和解释。他们专注于数据管道的数据密集型组件,并解决我们社区感兴趣的领域的问题(例如,数据管理,优化,性能,存储,系统),在准确性,隐私性,公平性和多样性约束下操作。特别鼓励文章报告数据科学管道的部署系统和解决方案,以及/或评估现实世界数据工程问题的基本经验和见解。
{"title":"PACMMOD Volume 1, Issue 3: Editorial","authors":"Divyakant Agrawal, Alexandra Meliou, S. Sudarshan","doi":"10.1145/3617307","DOIUrl":"https://doi.org/10.1145/3617307","url":null,"abstract":"We are excited to introduce this new issue of PACMMOD (Proceedings of the ACM on Management of Data). PACMMOD is a new journal, concerned with the principles, algorithms, techniques, systems, and applications of database management systems, data management technology, and science and engineering of data. It includes articles reporting cutting-edge data management, data engineering, and data science research. Articles published at PACMMOD address data challenges at various stages of the data lifecycle, from modeling, acquisition, cleaning, integration, indexing, querying, analysis, exploration, visualization, interpretation, and explanation. They focus on dataintensive components of data pipelines, and solve problems in areas of interest to our community (e.g., data curation, optimization, performance, storage, systems), operating within accuracy, privacy, fairness, and diversity constraints. Articles reporting deployed systems and solutions to data science pipelines and/or fundamental experiences and insights from evaluating real-world data engineering problems are especially encouraged.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136281619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs TeraHAC:万亿边图的层次聚集聚类
Pub Date : 2023-11-13 DOI: 10.1145/3617341
Laxman Dhulipala, Jakub Łącki, Jason Lee, Vahab Mirrokni
We introduce TeraHAC, a (1+ε)-approximate hierarchical agglomerative clustering (HAC) algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to computing (1+ε)-approximate HAC, which is a novel combination of the nearest-neighbor chain algorithm and the notion of (1+ε)-approximate HAC. Our approach allows us to partition the graph among multiple machines and make significant progress in computing the clustering within each partition before any communication with other partitions is needed. We evaluate TeraHAC on a number of real-world and synthetic graphs of up to 8 trillion edges. We show that TeraHAC requires over 100x fewer rounds compared to previously known approaches for computing HAC. It is up to 8.3x faster than SCC, the state-of-the-art distributed algorithm for hierarchical clustering, while achieving 1.16x higher quality. In fact, TeraHAC essentially retains the quality of the celebrated HAC algorithm while significantly improving the running time.
介绍了一种(1+ε)近似层次聚类(HAC)算法TeraHAC,该算法可扩展到万亿边图。我们的算法是基于一种新的计算(1+ε)-近似HAC的方法,它是最近邻链算法和(1+ε)-近似HAC概念的新颖结合。我们的方法允许我们在多台机器之间划分图,并在需要与其他分区进行任何通信之前,在计算每个分区内的集群方面取得重大进展。我们在许多真实世界和多达8万亿个边的合成图上评估了TeraHAC。我们表明,与以前已知的计算HAC的方法相比,TeraHAC所需的轮数减少了100倍以上。它比SCC(最先进的分布式分层聚类算法)快8.3倍,同时质量提高1.16倍。事实上,TeraHAC基本上保留了著名的HAC算法的质量,同时显著改善了运行时间。
{"title":"TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs","authors":"Laxman Dhulipala, Jakub Łącki, Jason Lee, Vahab Mirrokni","doi":"10.1145/3617341","DOIUrl":"https://doi.org/10.1145/3617341","url":null,"abstract":"We introduce TeraHAC, a (1+ε)-approximate hierarchical agglomerative clustering (HAC) algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to computing (1+ε)-approximate HAC, which is a novel combination of the nearest-neighbor chain algorithm and the notion of (1+ε)-approximate HAC. Our approach allows us to partition the graph among multiple machines and make significant progress in computing the clustering within each partition before any communication with other partitions is needed. We evaluate TeraHAC on a number of real-world and synthetic graphs of up to 8 trillion edges. We show that TeraHAC requires over 100x fewer rounds compared to previously known approaches for computing HAC. It is up to 8.3x faster than SCC, the state-of-the-art distributed algorithm for hierarchical clustering, while achieving 1.16x higher quality. In fact, TeraHAC essentially retains the quality of the celebrated HAC algorithm while significantly improving the running time.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"34 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136281461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAGA: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications SAGA:用于优化机器学习应用程序的数据清洗管道的可扩展框架
Pub Date : 2023-11-13 DOI: 10.1145/3617338
Shafaq Siddiqi, Roman Kern, Matthias Boehm
In the exploratory data science lifecycle, data scientists often spent the majority of their time finding, integrating, validating and cleaning relevant datasets. Despite recent work on data validation, and numerous error detection and correction algorithms, in practice, data cleaning for ML remains largely a manual, unpleasant, and labor-intensive trial and error process, especially in large-scale, distributed computation. The target ML application---such as classification or regression models---can be used as a signal of valuable feedback though, for selecting effective data cleaning strategies. In this paper, we introduce SAGA, a framework for automatically generating the top-K most effective data cleaning pipelines. SAGA adopts ideas from Auto-ML, feature selection, and hyper-parameter tuning. Our framework is extensible for user-provided constraints, new data cleaning primitives, and ML applications; automatically generates hybrid runtime plans of local and distributed operations; and performs pruning by interesting properties (e.g., monotonicity). Instead of full automation---which is rather unrealistic---SAGA simplifies the mechanical aspects of data cleaning. Our experiments show that SAGA yields robust accuracy improvements over state-of-the-art, and good scalability regarding increasing data sizes and number of evaluated pipelines.
在探索性数据科学生命周期中,数据科学家通常将大部分时间用于查找、集成、验证和清理相关数据集。尽管最近在数据验证和许多错误检测和纠正算法方面进行了工作,但在实践中,ML的数据清理在很大程度上仍然是一个手动的、令人不快的、劳动密集型的试验和错误过程,特别是在大规模的分布式计算中。目标ML应用程序(如分类或回归模型)可以作为有价值的反馈信号,用于选择有效的数据清理策略。在本文中,我们介绍了SAGA,一个自动生成top-K最有效的数据清理管道的框架。SAGA采用了Auto-ML、特征选择和超参数调优的思想。我们的框架可以扩展到用户提供的约束、新的数据清理原语和ML应用程序;自动生成本地和分布式操作的混合运行时计划;并根据有趣的性质(如单调性)执行剪枝。SAGA不是完全自动化(这是相当不现实的),而是简化了数据清理的机械方面。我们的实验表明,SAGA比最先进的技术产生了强大的准确性改进,并且在增加数据大小和评估管道数量方面具有良好的可扩展性。
{"title":"SAGA: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications","authors":"Shafaq Siddiqi, Roman Kern, Matthias Boehm","doi":"10.1145/3617338","DOIUrl":"https://doi.org/10.1145/3617338","url":null,"abstract":"In the exploratory data science lifecycle, data scientists often spent the majority of their time finding, integrating, validating and cleaning relevant datasets. Despite recent work on data validation, and numerous error detection and correction algorithms, in practice, data cleaning for ML remains largely a manual, unpleasant, and labor-intensive trial and error process, especially in large-scale, distributed computation. The target ML application---such as classification or regression models---can be used as a signal of valuable feedback though, for selecting effective data cleaning strategies. In this paper, we introduce SAGA, a framework for automatically generating the top-K most effective data cleaning pipelines. SAGA adopts ideas from Auto-ML, feature selection, and hyper-parameter tuning. Our framework is extensible for user-provided constraints, new data cleaning primitives, and ML applications; automatically generates hybrid runtime plans of local and distributed operations; and performs pruning by interesting properties (e.g., monotonicity). Instead of full automation---which is rather unrealistic---SAGA simplifies the mechanical aspects of data cleaning. Our experiments show that SAGA yields robust accuracy improvements over state-of-the-art, and good scalability regarding increasing data sizes and number of evaluated pipelines.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"36 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136281616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Closest Pairs Search Over Data Stream 最接近对搜索数据流
Pub Date : 2023-11-13 DOI: 10.1145/3617326
Rui Zhu, Bin Wang, Xiaochun Yang, Baihua Zheng
k-closest pair (KCP for short) search is a fundamental problem in database research. Given a set of d-dimensional streaming data S, KCP search aims to retrieve k pairs with the shortest distances between them. While existing works have studied continuous 1-closest pair query (i.e., k=1) over dynamic data environments, which allow for object insertions/deletions, they require high computational costs and cannot easily support KCP search with k>1. This paper investigates the problem of KCP search over data stream, aiming to incrementally maintain as few pairs as possible to support KCP search with arbitrarily k. To achieve this, we introduce the concept of NNS (short for Nearest Neighbour pair-Set), which consists of all the nearest neighbour pairs and allows us to support KCP search via only accessing O(k) objects. We further observe that in most cases, we only need to use a small portion of NNS to answer KCP search as typically kłl n. Based on this observation, we propose TNNS (short for Threshold-based NNpair Set), which contains a small number of high-quality NN pairs, and a partition named τ-DLBP (short for τ-Distance Lower-Bound based Partition) to organize objects, with τ being an integer significantly smaller than n. τ-DLBP organizes objects using up to O(łog n / τ) partitions and is able to support the construction and update of TNNS efficiently.
k-最接近对(KCP)搜索是数据库研究中的一个基本问题。给定一组d维流数据S, KCP搜索旨在检索k对之间距离最短的对。虽然已有的工作已经研究了动态数据环境下的连续1-最接近对查询(即k=1),它允许对象插入/删除,但它们需要很高的计算成本,并且不容易支持k>1的KCP搜索。本文研究了数据流上的KCP搜索问题,旨在增量地维护尽可能少的对以支持任意k的KCP搜索。为了实现这一目标,我们引入了NNS (<u>N</u>最近</u> N</u>邻居对的缩写-<u> </u>et)的概念,它由所有最近的邻居对组成,允许我们通过访问O(k)个对象来支持KCP搜索。我们进一步观察到在大多数情况下,我们只需要使用一小部分NNS回答KCP搜索通常kłl n。在此基础上观察,我们建议TNNS(简称& lt;标签;T< /标签;hreshold-based & lt;标签;NN< /标签;一对& lt;标签;S< /标签;等),其中包含少量的高质量的神经网络对,和一个名叫τ的分区-DLBP(简称τ& lt;标签;D< /标签;协助& lt;标签;L< /标签;电源& lt;标签;B< /标签;基于一样的& lt;标签;术中;/标签;artition)组织对象,τ是一个显著小于n的整数。τ- dlbp使用最多O(łog n / τ)个分区来组织对象,能够有效地支持TNNS的构建和更新。
{"title":"Closest Pairs Search Over Data Stream","authors":"Rui Zhu, Bin Wang, Xiaochun Yang, Baihua Zheng","doi":"10.1145/3617326","DOIUrl":"https://doi.org/10.1145/3617326","url":null,"abstract":"k-closest pair (KCP for short) search is a fundamental problem in database research. Given a set of d-dimensional streaming data S, KCP search aims to retrieve k pairs with the shortest distances between them. While existing works have studied continuous 1-closest pair query (i.e., k=1) over dynamic data environments, which allow for object insertions/deletions, they require high computational costs and cannot easily support KCP search with k>1. This paper investigates the problem of KCP search over data stream, aiming to incrementally maintain as few pairs as possible to support KCP search with arbitrarily k. To achieve this, we introduce the concept of NNS (short for <u>N</u>earest <u>N</u>eighbour pair-<u>S</u>et), which consists of all the nearest neighbour pairs and allows us to support KCP search via only accessing O(k) objects. We further observe that in most cases, we only need to use a small portion of NNS to answer KCP search as typically kłl n. Based on this observation, we propose TNNS (short for <u>T</u>hreshold-based <u>NN</u>pair <u>S</u>et), which contains a small number of high-quality NN pairs, and a partition named τ-DLBP (short for τ-<u>D</u>istance <u>L</u>ower-<u>B</u>ound based <u>P</u>artition) to organize objects, with τ being an integer significantly smaller than n. τ-DLBP organizes objects using up to O(łog n / τ) partitions and is able to support the construction and update of TNNS efficiently.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"34 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory-Efficient and Flexible Detection of Heavy Hitters in High-Speed Networks 高速网络中高效内存和灵活的重磅攻击检测
Pub Date : 2023-11-13 DOI: 10.1145/3617334
He Huang, Jiakun Yu, Yang Du, Jia Liu, Haipeng Dai, Yu-E Sun
Heavy-hitter detection is a fundamental task in network traffic measurement and security. Existing work faces the dilemma of suffering dynamic and imbalanced traffic characteristics or lowering the detection efficiency and flexibility. In this paper, we propose a flexible sketch called SwitchSketch that embraces dynamic and skewed traffic for efficient and accurate heavy-hitter detection. The key idea of SwitchSketch is allowing the sketch to dynamically switch among different modes and take full use of each bit of the memory. We present an encoding-based switching scheme together with a flexible bucket structure to jointly achieve this goal by using a combination of design features, including variable-length cells, shrunk counters, embedded metadata, and switchable modes. We further implement SwitchSketch on the NetFPGA-1G-CML board. Experimental results based on real Internet traces show that SwitchSketch achieves a high Fβ-Score of threshold-t detection (consistently higher than 0.938) and over 99% precision rate of top-k detection under a tight memory size (e.g., 100KB). Besides, it outperforms the state-of-the-art by reducing the ARE by 30.77%sim99.96%. All related implementations are open-sourced.
重攻击检测是网络流量测量和网络安全的一项基础性工作。现有工作面临着受动态、不均衡交通特征影响或降低检测效率和灵活性的困境。在本文中,我们提出了一个灵活的草图,称为SwitchSketch,它包含动态和倾斜的流量,以实现高效和准确的重磅检测。SwitchSketch的关键思想是允许草图在不同模式之间动态切换,并充分利用每一位内存。我们提出了一种基于编码的切换方案以及灵活的桶结构,通过使用可变长度单元、收缩计数器、嵌入元数据和可切换模式等设计特征的组合来共同实现这一目标。我们进一步在NetFPGA-1G-CML板上实现SwitchSketch。基于真实互联网痕迹的实验结果表明,在较紧的内存大小(例如100KB)下,SwitchSketch实现了较高的阈值t检测Fβ-Score(始终高于0.938),top-k检测准确率超过99%。此外,它比最先进的技术降低了30.77% / 99.96%的ARE。所有相关的实现都是开源的。
{"title":"Memory-Efficient and Flexible Detection of Heavy Hitters in High-Speed Networks","authors":"He Huang, Jiakun Yu, Yang Du, Jia Liu, Haipeng Dai, Yu-E Sun","doi":"10.1145/3617334","DOIUrl":"https://doi.org/10.1145/3617334","url":null,"abstract":"Heavy-hitter detection is a fundamental task in network traffic measurement and security. Existing work faces the dilemma of suffering dynamic and imbalanced traffic characteristics or lowering the detection efficiency and flexibility. In this paper, we propose a flexible sketch called SwitchSketch that embraces dynamic and skewed traffic for efficient and accurate heavy-hitter detection. The key idea of SwitchSketch is allowing the sketch to dynamically switch among different modes and take full use of each bit of the memory. We present an encoding-based switching scheme together with a flexible bucket structure to jointly achieve this goal by using a combination of design features, including variable-length cells, shrunk counters, embedded metadata, and switchable modes. We further implement SwitchSketch on the NetFPGA-1G-CML board. Experimental results based on real Internet traces show that SwitchSketch achieves a high Fβ-Score of threshold-t detection (consistently higher than 0.938) and over 99% precision rate of top-k detection under a tight memory size (e.g., 100KB). Besides, it outperforms the state-of-the-art by reducing the ARE by 30.77%sim99.96%. All related implementations are open-sourced.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"34 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OptiQL: Robust Optimistic Locking for Memory-Optimized Indexes OptiQL:用于内存优化索引的健壮乐观锁定
Pub Date : 2023-11-13 DOI: 10.1145/3617336
Ge Shi, Ziyi Yan, Tianzheng Wang
Modern memory-optimized indexes often use optimistic locks for concurrent accesses. Read operations can proceed optimistically without taking the lock, greatly improving performance on multicore CPUs. But this is at the cost of robustness against contention where many threads contend on a small set of locks, causing excessive cacheline invalidation, interconnect traffic and eventually performance collapse. Yet existing solutions often sacrifice desired properties such as compact 8-byte lock size and fairness among lock requesters. This paper presents optimistic queuing lock (OptiQL), a new optimistic lock for database indexing to solve this problem. OptiQL extends the classic MCS lock---a fair, compact and robust mutual exclusion lock---with optimistic read capabilities for index workloads to achieve both robustness and high performance while maintaining various desirable properties. Evaluation using memory-optimized B+-trees on a 40-core, dual-socket server shows that OptiQL matches existing optimistic locks for read operations, while avoiding performance collapse under high contention.
现代内存优化索引通常使用乐观锁进行并发访问。读操作可以在不占用锁的情况下乐观地进行,从而大大提高了多核cpu上的性能。但是,这是以抗争用的健壮性为代价的,其中许多线程争用一小组锁,导致过多的缓存无效、互连流量和最终的性能崩溃。然而,现有的解决方案往往会牺牲理想的属性,比如紧凑的8字节锁大小和锁请求者之间的公平性。为了解决这一问题,本文提出了一种新的面向数据库索引的乐观排队锁(OptiQL)。OptiQL扩展了经典的MCS锁——一种公平、紧凑和健壮的互斥锁——为索引工作负载提供了乐观的读取能力,在保持各种理想属性的同时实现了健壮性和高性能。在40核双套接字服务器上使用内存优化的B+树进行评估,结果表明OptiQL匹配现有的读操作乐观锁,同时避免了高争用下的性能崩溃。
{"title":"OptiQL: Robust Optimistic Locking for Memory-Optimized Indexes","authors":"Ge Shi, Ziyi Yan, Tianzheng Wang","doi":"10.1145/3617336","DOIUrl":"https://doi.org/10.1145/3617336","url":null,"abstract":"Modern memory-optimized indexes often use optimistic locks for concurrent accesses. Read operations can proceed optimistically without taking the lock, greatly improving performance on multicore CPUs. But this is at the cost of robustness against contention where many threads contend on a small set of locks, causing excessive cacheline invalidation, interconnect traffic and eventually performance collapse. Yet existing solutions often sacrifice desired properties such as compact 8-byte lock size and fairness among lock requesters. This paper presents optimistic queuing lock (OptiQL), a new optimistic lock for database indexing to solve this problem. OptiQL extends the classic MCS lock---a fair, compact and robust mutual exclusion lock---with optimistic read capabilities for index workloads to achieve both robustness and high performance while maintaining various desirable properties. Evaluation using memory-optimized B+-trees on a 40-core, dual-socket server shows that OptiQL matches existing optimistic locks for read operations, while avoiding performance collapse under high contention.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"34 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedCSS: Joint Client-and-Sample Selection for Hard Sample-Aware Noise-Robust Federated Learning 硬样本感知噪声鲁棒联邦学习的联合客户-样本选择
Pub Date : 2023-11-13 DOI: 10.1145/3617332
Anran Li, Yue Cao, Jiabao Guo, Hongyi Peng, Qing Guo, Han Yu
Federated Learning (FL) enables a large number of data owners (a.k.a. FL clients) to jointly train a machine learning model without disclosing private local data. The importance of local data samples to the FL model vary widely. This is exacerbated by the presence of noisy data, which exhibit large losses similar to important (hard) samples. Currently, there lacks an FL approach that can effectively distinguish hard samples (which are beneficial) from noisy samples (which are harmful). To bridge this gap, we propose the Federated Client and Sample Selection (FedCSS) approach. It is a bilevel optimization approach for FL client-and-sample selection to achieve hard sample-aware noise-robust learning in a privacy preserving manner. It performs meta-learning based online approximation to iteratively update global FL models, select the most positively influential samples and deal with training data noise. Theoretical analysis shows that it is guaranteed to converge in an efficient manner. Experimental comparison against six state-of-the-art baselines on five real-world datasets in the presence of data noise and heterogeneity shows that it achieves up to 26.4% higher test accuracy, while saving communication and computation costs by at least 41.5% and 1.2%, respectively.
联邦学习(FL)使大量数据所有者(也称为FL客户端)能够在不泄露私有本地数据的情况下共同训练机器学习模型。局部数据样本对FL模型的重要性差别很大。噪声数据的存在加剧了这种情况,这些数据表现出与重要(硬)样本相似的巨大损失。目前,缺乏一种能够有效区分硬样本(有益)和噪声样本(有害)的FL方法。为了弥补这一差距,我们提出了联邦客户端和样本选择(federalclient and Sample Selection, federcss)方法。这是一种双层优化方法,用于FL客户端和样本选择,以保护隐私的方式实现硬样本感知噪声鲁棒学习。它执行基于元学习的在线逼近来迭代更新全局FL模型,选择最具积极影响的样本并处理训练数据噪声。理论分析表明,该算法能保证有效收敛。在存在数据噪声和异质性的五个真实数据集上与六个最先进的基线进行的实验比较表明,该方法的测试精度提高了26.4%,同时通信和计算成本分别节省了至少41.5%和1.2%。
{"title":"FedCSS: Joint Client-and-Sample Selection for Hard Sample-Aware Noise-Robust Federated Learning","authors":"Anran Li, Yue Cao, Jiabao Guo, Hongyi Peng, Qing Guo, Han Yu","doi":"10.1145/3617332","DOIUrl":"https://doi.org/10.1145/3617332","url":null,"abstract":"Federated Learning (FL) enables a large number of data owners (a.k.a. FL clients) to jointly train a machine learning model without disclosing private local data. The importance of local data samples to the FL model vary widely. This is exacerbated by the presence of noisy data, which exhibit large losses similar to important (hard) samples. Currently, there lacks an FL approach that can effectively distinguish hard samples (which are beneficial) from noisy samples (which are harmful). To bridge this gap, we propose the Federated Client and Sample Selection (FedCSS) approach. It is a bilevel optimization approach for FL client-and-sample selection to achieve hard sample-aware noise-robust learning in a privacy preserving manner. It performs meta-learning based online approximation to iteratively update global FL models, select the most positively influential samples and deal with training data noise. Theoretical analysis shows that it is guaranteed to converge in an efficient manner. Experimental comparison against six state-of-the-art baselines on five real-world datasets in the presence of data noise and heterogeneity shows that it achieves up to 26.4% higher test accuracy, while saving communication and computation costs by at least 41.5% and 1.2%, respectively.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"34 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads 学习优化lsm树:面向动态工作负载的基于强化学习的键值存储
Pub Date : 2023-11-13 DOI: 10.1145/3617333
Dingheng Mo, Fanchao Chen, Siqiang Luo, Caihua Shan
LSM-trees are widely adopted as the storage backend of key-value stores. However, optimizing the system performance under dynamic workloads has not been sufficiently studied or evaluated in previous work. To fill the gap, we present RusKey, a key-value store with the following new features: (1) RusKey is a first attempt to orchestrate LSM-tree structures online to enable robust performance under the context of dynamic workloads; (2) RusKey is the first study to use Reinforcement Learning (RL) to guide LSM-tree transformations; (3) RusKey includes a new LSM-tree design, named FLSM-tree, for an efficient transition between different compaction policies -- the bottleneck of dynamic key-value stores. We justify the superiority of the new design with theoretical analysis; (4) RusKey requires no prior workload knowledge for system adjustment, in contrast to state-of-the-art techniques. Experiments show that RusKey exhibits strong performance robustness in diverse workloads, achieving up to 4x better end-to-end performance than the RocksDB system under various settings.
lsm树被广泛用作键值存储的存储后端。然而,在以往的工作中,对动态工作负载下的系统性能优化并没有进行充分的研究和评估。为了填补这一空白,我们提出了RusKey,一个具有以下新特性的键值存储:(1)RusKey是首次尝试在线编排lsm树结构,以在动态工作负载环境下实现健壮的性能;(2) RusKey是第一个使用强化学习(RL)来指导lsm树转换的研究;(3) RusKey包含了一种新的LSM-tree设计,称为FLSM-tree,用于在不同压缩策略之间进行有效转换——这是动态键值存储的瓶颈。通过理论分析证明了新设计的优越性;(4)与最先进的技术相比,RusKey不需要事先了解系统调整的工作量。实验表明,在不同的工作负载下,RusKey表现出很强的性能稳健性,在不同的设置下,其端到端性能比RocksDB系统高出4倍。
{"title":"Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads","authors":"Dingheng Mo, Fanchao Chen, Siqiang Luo, Caihua Shan","doi":"10.1145/3617333","DOIUrl":"https://doi.org/10.1145/3617333","url":null,"abstract":"LSM-trees are widely adopted as the storage backend of key-value stores. However, optimizing the system performance under dynamic workloads has not been sufficiently studied or evaluated in previous work. To fill the gap, we present RusKey, a key-value store with the following new features: (1) RusKey is a first attempt to orchestrate LSM-tree structures online to enable robust performance under the context of dynamic workloads; (2) RusKey is the first study to use Reinforcement Learning (RL) to guide LSM-tree transformations; (3) RusKey includes a new LSM-tree design, named FLSM-tree, for an efficient transition between different compaction policies -- the bottleneck of dynamic key-value stores. We justify the superiority of the new design with theoretical analysis; (4) RusKey requires no prior workload knowledge for system adjustment, in contrast to state-of-the-art techniques. Experiments show that RusKey exhibits strong performance robustness in diverse workloads, achieving up to 4x better end-to-end performance than the RocksDB system under various settings.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"33 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enriching Recommendation Models with Logic Conditions 用逻辑条件丰富推荐模型
Pub Date : 2023-11-13 DOI: 10.1145/3617330
Lihang Fan, Wenfei Fan, Ping Lu, Chao Tian, Qiang Yin
This paper proposes RecLogic, a framework for improving the accuracy of machine learning (ML) models for recommendation. It aims to enhance existing ML models with logic conditions to reduce false positives and false negatives, without training a new model. Underlying RecLogic are (a) a class of prediction rules on graphs, denoted by TIEs, (b) a new approach to learning TIEs, and (c) a new paradigm for recommendation with TIEs. TIEs may embed ML recommendation models as predicates; as opposed to prior graph rules, it is tractable to decide whether a graph satisfies a set of TIEs. To enrich ML models, RecLogic iteratively trains a generator with feedback from each round, to learn TIEs with a probabilistic bound. RecLogic also provides a PTIME parallel algorithm for making recommendations with the learned TIEs. Using real-life data, we empirically verify that RecLogic improves the accuracy of ML predictions by 22.89% on average in an area where the prediction strength is neither sufficiently large nor sufficiently small, up to 33.10%.
本文提出了RecLogic,一个用于提高推荐机器学习(ML)模型准确性的框架。它的目的是在不训练新模型的情况下,用逻辑条件增强现有的ML模型,以减少误报和误报。RecLogic的基础是(a)图上的一类预测规则,用TIEs表示,(b)学习TIEs的新方法,以及(c)使用TIEs进行推荐的新范例。tie可以嵌入ML推荐模型作为谓词;与先前的图规则相反,确定图是否满足一组关系是很容易的。为了丰富ML模型,RecLogic使用每轮的反馈来迭代训练生成器,以学习具有概率界的TIEs。RecLogic还提供了一个PTIME并行算法,用于使用学习到的关系进行推荐。使用实际数据,我们经验验证了RecLogic在预测强度既不足够大也不足够小的区域平均提高了22.89%的ML预测精度,最高可达33.10%。
{"title":"Enriching Recommendation Models with Logic Conditions","authors":"Lihang Fan, Wenfei Fan, Ping Lu, Chao Tian, Qiang Yin","doi":"10.1145/3617330","DOIUrl":"https://doi.org/10.1145/3617330","url":null,"abstract":"This paper proposes RecLogic, a framework for improving the accuracy of machine learning (ML) models for recommendation. It aims to enhance existing ML models with logic conditions to reduce false positives and false negatives, without training a new model. Underlying RecLogic are (a) a class of prediction rules on graphs, denoted by TIEs, (b) a new approach to learning TIEs, and (c) a new paradigm for recommendation with TIEs. TIEs may embed ML recommendation models as predicates; as opposed to prior graph rules, it is tractable to decide whether a graph satisfies a set of TIEs. To enrich ML models, RecLogic iteratively trains a generator with feedback from each round, to learn TIEs with a probabilistic bound. RecLogic also provides a PTIME parallel algorithm for making recommendations with the learned TIEs. Using real-life data, we empirically verify that RecLogic improves the accuracy of ML predictions by 22.89% on average in an area where the prediction strength is neither sufficiently large nor sufficiently small, up to 33.10%.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"33 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the ACM on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1