关于Apache Kafka中主题的高效分区

Theofanis P. Raptis, A. Passarella
{"title":"关于Apache Kafka中主题的高效分区","authors":"Theofanis P. Raptis, A. Passarella","doi":"10.1109/CITS55221.2022.9832981","DOIUrl":null,"url":null,"abstract":"Apache Kafka addresses the general problem of delivering extreme high volume event data to diverse consumers via a publish-subscribe messaging system. It uses partitions to scale a topic across many brokers for producers to write data in parallel, and also to facilitate parallel reading of consumers. Even though Apache Kafka provides some out of the box optimizations, it does not strictly define how each topic shall be efficiently distributed into partitions. The well-formulated fine-tuning that is needed in order to improve an Apache Kafka cluster performance is still an open research problem. In this paper, we first model the Apache Kafka topic partitioning process for a given topic. Then, given the set of brokers, constraints and application requirements on throughput, OS load, replication latency and unavailability, we formulate the optimization problem of finding how many partitions are needed and show that it is computationally intractable, being an integer program. Furthermore, we propose two simple, yet efficient heuristics to solve the problem: the first tries to minimize and the second to maximize the number of brokers used in the cluster. Finally, we evaluate its performance via largescale simulations, considering as benchmarks some Apache Kafka cluster configuration recommendations provided by Microsoft and Confluent. We demonstrate that, unlike the recommendations, the proposed heuristics respect the hard constraints on replication latency and perform better w.r.t. unavailability time and OS load, using the system resources in a more prudent way.","PeriodicalId":136239,"journal":{"name":"2022 International Conference on Computer, Information and Telecommunication Systems (CITS)","volume":"347 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"On Efficiently Partitioning a Topic in Apache Kafka\",\"authors\":\"Theofanis P. Raptis, A. Passarella\",\"doi\":\"10.1109/CITS55221.2022.9832981\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Apache Kafka addresses the general problem of delivering extreme high volume event data to diverse consumers via a publish-subscribe messaging system. It uses partitions to scale a topic across many brokers for producers to write data in parallel, and also to facilitate parallel reading of consumers. Even though Apache Kafka provides some out of the box optimizations, it does not strictly define how each topic shall be efficiently distributed into partitions. The well-formulated fine-tuning that is needed in order to improve an Apache Kafka cluster performance is still an open research problem. In this paper, we first model the Apache Kafka topic partitioning process for a given topic. Then, given the set of brokers, constraints and application requirements on throughput, OS load, replication latency and unavailability, we formulate the optimization problem of finding how many partitions are needed and show that it is computationally intractable, being an integer program. Furthermore, we propose two simple, yet efficient heuristics to solve the problem: the first tries to minimize and the second to maximize the number of brokers used in the cluster. Finally, we evaluate its performance via largescale simulations, considering as benchmarks some Apache Kafka cluster configuration recommendations provided by Microsoft and Confluent. We demonstrate that, unlike the recommendations, the proposed heuristics respect the hard constraints on replication latency and perform better w.r.t. unavailability time and OS load, using the system resources in a more prudent way.\",\"PeriodicalId\":136239,\"journal\":{\"name\":\"2022 International Conference on Computer, Information and Telecommunication Systems (CITS)\",\"volume\":\"347 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Computer, Information and Telecommunication Systems (CITS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CITS55221.2022.9832981\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computer, Information and Telecommunication Systems (CITS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITS55221.2022.9832981","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

Apache Kafka解决了通过发布-订阅消息传递系统向不同消费者交付海量事件数据的通用问题。它使用分区跨多个代理扩展主题,以便生产者并行写入数据,也方便消费者并行读取数据。尽管Apache Kafka提供了一些开箱即用的优化,但它并没有严格定义每个主题如何有效地分布到分区中。为了提高Apache Kafka集群性能所需要的精心制定的微调仍然是一个开放的研究问题。在本文中,我们首先对给定主题的Apache Kafka主题分区过程进行建模。然后,给定一组代理、约束和应用程序对吞吐量、操作系统负载、复制延迟和不可用性的需求,我们制定了寻找需要多少分区的优化问题,并表明它是计算难以处理的,是一个整数程序。此外,我们提出了两个简单而有效的启发式方法来解决这个问题:第一个尝试最小化集群中使用的代理数量,第二个尝试最大化集群中使用的代理数量。最后,我们通过大规模模拟来评估它的性能,将微软和Confluent提供的一些Apache Kafka集群配置建议作为基准。我们证明,与建议不同,所提出的启发式方法尊重复制延迟的硬约束,并以更谨慎的方式使用系统资源,更好地执行不可用时间和操作系统负载。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
On Efficiently Partitioning a Topic in Apache Kafka
Apache Kafka addresses the general problem of delivering extreme high volume event data to diverse consumers via a publish-subscribe messaging system. It uses partitions to scale a topic across many brokers for producers to write data in parallel, and also to facilitate parallel reading of consumers. Even though Apache Kafka provides some out of the box optimizations, it does not strictly define how each topic shall be efficiently distributed into partitions. The well-formulated fine-tuning that is needed in order to improve an Apache Kafka cluster performance is still an open research problem. In this paper, we first model the Apache Kafka topic partitioning process for a given topic. Then, given the set of brokers, constraints and application requirements on throughput, OS load, replication latency and unavailability, we formulate the optimization problem of finding how many partitions are needed and show that it is computationally intractable, being an integer program. Furthermore, we propose two simple, yet efficient heuristics to solve the problem: the first tries to minimize and the second to maximize the number of brokers used in the cluster. Finally, we evaluate its performance via largescale simulations, considering as benchmarks some Apache Kafka cluster configuration recommendations provided by Microsoft and Confluent. We demonstrate that, unlike the recommendations, the proposed heuristics respect the hard constraints on replication latency and perform better w.r.t. unavailability time and OS load, using the system resources in a more prudent way.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tracking container network connections in a Digital Data Marketplace with P4 Ciphertext-Policy Attribute-based Encryption for Securing IoT Devices in Fog Computing A CNN based localization and activity recognition algorithm using multi-receiver CSI measurements and decision fusion Learning-Automata-Based Energy Efficient Model for Device Lifetime Enhancement in LoRaWAN Networks A Deep Learning Based Bluetooth Indoor Localization Algorithm by RSSI and AOA Feature Fusion
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1