首页 > 最新文献

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)最新文献

英文 中文
LSbM-tree: Re-Enabling Buffer Caching in Data Management for Mixed Reads and Writes lsm -tree:在数据管理中为混合读写重新启用缓冲区缓存
Dejun Teng, Lei Guo, Rubao Lee, Feng Chen, Siyuan Ma, Yanfeng Zhang, Xiaodong Zhang
LSM-tree has been widely used in data management production systems for write-intensive workloads. However, as read and write workloads co-exist under LSM-tree, data accesses can experience long latency and low throughput due to the interferences to buffer caching from the compaction, a major and frequent operation in LSM-tree. After a compaction, the existing data blocks are reorganized and written to other locations on disks. As a result, the related data blocks that have been loaded in the buffer cache are invalidated since their referencing addresses are changed, causing serious performance degradations. In order to re-enable high-speed buffer caching during intensive writes, we propose Log-Structured buffered-Merge tree (simplified as LSbM-tree) by adding a compaction buffer on disks, to minimize the cache invalidations on buffer cache caused by compactions. The compaction buffer efficiently and adaptively maintains the frequently visited data sets. In LSbM, strong locality objects can be effectively kept in the buffer cache with minimum or without harmful invalidations. With the help of a small on-disk compaction buffer, LSbM achieves a high query performance by enabling effective buffer caching, while retaining all the merits of LSM-tree for write-intensive data processing, and providing high bandwidth of disks for range queries. We have implemented LSbM based on LevelDB. We show that with a standard buffer cache and a hard disk, LSbM can achieve 2x performance improvement over LevelDB. We have also compared LSbM with other existing solutions to show its strong effectiveness.
LSM-tree在数据管理生产系统中广泛用于写密集型工作负载。然而,由于读写工作负载在LSM-tree下共存,数据访问可能会经历长延迟和低吞吐量,这是由于压缩(LSM-tree中一个主要且频繁的操作)对缓冲区缓存的干扰。在压缩之后,现有的数据块被重新组织并写入磁盘上的其他位置。结果,加载到缓存中的相关数据块由于其引用地址被更改而失效,从而导致严重的性能下降。为了在密集写入期间重新启用高速缓冲区缓存,我们提出了日志结构缓冲合并树(简化为lsm -tree),通过在磁盘上添加压缩缓冲区,以最大限度地减少由于压缩导致的缓冲区缓存失效。压缩缓冲区有效且自适应地维护频繁访问的数据集。在lsdb中,强局部性对象可以有效地保存在缓冲缓存中,而不会产生有害的无效。在一个小的磁盘上压缩缓冲区的帮助下,通过启用有效的缓冲区缓存,LSbM实现了高查询性能,同时保留了LSM-tree的所有优点,用于写密集型数据处理,并为范围查询提供高带宽的磁盘。我们已经实现了基于LevelDB的lsdb。我们表明,使用标准缓冲缓存和硬盘,LSbM可以比LevelDB实现2倍的性能改进。我们还将LSbM与其他现有解决方案进行了比较,显示了其强大的有效性。
{"title":"LSbM-tree: Re-Enabling Buffer Caching in Data Management for Mixed Reads and Writes","authors":"Dejun Teng, Lei Guo, Rubao Lee, Feng Chen, Siyuan Ma, Yanfeng Zhang, Xiaodong Zhang","doi":"10.1109/ICDCS.2017.70","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.70","url":null,"abstract":"LSM-tree has been widely used in data management production systems for write-intensive workloads. However, as read and write workloads co-exist under LSM-tree, data accesses can experience long latency and low throughput due to the interferences to buffer caching from the compaction, a major and frequent operation in LSM-tree. After a compaction, the existing data blocks are reorganized and written to other locations on disks. As a result, the related data blocks that have been loaded in the buffer cache are invalidated since their referencing addresses are changed, causing serious performance degradations. In order to re-enable high-speed buffer caching during intensive writes, we propose Log-Structured buffered-Merge tree (simplified as LSbM-tree) by adding a compaction buffer on disks, to minimize the cache invalidations on buffer cache caused by compactions. The compaction buffer efficiently and adaptively maintains the frequently visited data sets. In LSbM, strong locality objects can be effectively kept in the buffer cache with minimum or without harmful invalidations. With the help of a small on-disk compaction buffer, LSbM achieves a high query performance by enabling effective buffer caching, while retaining all the merits of LSM-tree for write-intensive data processing, and providing high bandwidth of disks for range queries. We have implemented LSbM based on LevelDB. We show that with a standard buffer cache and a hard disk, LSbM can achieve 2x performance improvement over LevelDB. We have also compared LSbM with other existing solutions to show its strong effectiveness.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123943993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Job Scheduling without Prior Information in Big Data Processing Systems 大数据处理系统中无先验信息的任务调度
Zhiming Hu, Baochun Li, Zheng Qin, Rick Siow Mong Goh
Job scheduling plays an important role in improving the overall system performance in big data processing frameworks. Simple job scheduling policies, such as Fair and FIFO scheduling, do not consider job sizes and may degrade the performance when jobs of varying sizes arrive. More elaborate job scheduling policies make the convenient assumption that jobs are recurring, and complete information about their sizes is available from their prior runs. In this paper, we design and implement an efficient and practical job scheduler for big data processing systems to achieve better performance even without prior information about job sizes. The superior performance of our job scheduler originates from the design of multiple level priority queues, where jobs are demoted to lower priority queues if the amount of service consumed so far reaches a certain threshold. In this case, jobs in need of a small amount of service can finish in the topmost several levels of queues, while jobs that need a large amount of service to complete are moved to lower priority queues to avoid head-of-line blocking. Our new job scheduler can effectively mimic the shortest job first scheduling policy without knowing the job sizes in advance. To demonstrate its performance, we have implemented our new job scheduler in YARN, a popular resource manager used by Hadoop/Spark, and validated its performance with both experiments on real datasets and large-scale trace-driven simulations. Our experimental and simulation results have strongly confirmed the effectiveness of our design: our new job scheduler can reduce the average job response time of the Fair scheduler by up to 45%.
在大数据处理框架中,作业调度在提高系统整体性能方面发挥着重要作用。简单的作业调度策略(如公平调度和先进先出调度)不考虑作业大小,当不同大小的作业到达时可能会降低性能。更复杂的作业调度策略则方便地假设作业是重复出现的,并且可以从作业之前的运行中获得有关作业大小的完整信息。在本文中,我们为大数据处理系统设计并实现了一种高效实用的作业调度程序,即使在没有作业规模相关信息的情况下也能获得更好的性能。我们的作业调度程序的优越性能源于多级优先队列的设计,在多级优先队列中,如果迄今为止所消耗的服务量达到一定阈值,作业就会被降级到较低优先级队列。在这种情况下,需要少量服务的作业可以在最上层的几级队列中完成,而需要大量服务才能完成的作业则会被移到较低优先级的队列中,以避免头部阻塞。我们的新作业调度器可以有效地模仿最短作业优先调度策略,而无需提前知道作业大小。为了证明新作业调度程序的性能,我们在 Hadoop/Spark 使用的流行资源管理器 YARN 中实施了新作业调度程序,并通过真实数据集实验和大规模跟踪仿真验证了其性能。我们的实验和仿真结果有力地证实了我们设计的有效性:我们的新作业调度程序可以将公平调度程序的平均作业响应时间最多缩短 45%。
{"title":"Job Scheduling without Prior Information in Big Data Processing Systems","authors":"Zhiming Hu, Baochun Li, Zheng Qin, Rick Siow Mong Goh","doi":"10.1109/ICDCS.2017.105","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.105","url":null,"abstract":"Job scheduling plays an important role in improving the overall system performance in big data processing frameworks. Simple job scheduling policies, such as Fair and FIFO scheduling, do not consider job sizes and may degrade the performance when jobs of varying sizes arrive. More elaborate job scheduling policies make the convenient assumption that jobs are recurring, and complete information about their sizes is available from their prior runs. In this paper, we design and implement an efficient and practical job scheduler for big data processing systems to achieve better performance even without prior information about job sizes. The superior performance of our job scheduler originates from the design of multiple level priority queues, where jobs are demoted to lower priority queues if the amount of service consumed so far reaches a certain threshold. In this case, jobs in need of a small amount of service can finish in the topmost several levels of queues, while jobs that need a large amount of service to complete are moved to lower priority queues to avoid head-of-line blocking. Our new job scheduler can effectively mimic the shortest job first scheduling policy without knowing the job sizes in advance. To demonstrate its performance, we have implemented our new job scheduler in YARN, a popular resource manager used by Hadoop/Spark, and validated its performance with both experiments on real datasets and large-scale trace-driven simulations. Our experimental and simulation results have strongly confirmed the effectiveness of our design: our new job scheduler can reduce the average job response time of the Fair scheduler by up to 45%.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126130744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
On the Design of a Blockchain Platform for Clinical Trial and Precision Medicine 临床试验与精准医疗区块链平台设计
Zonyin Shae, J. Tsai
This paper proposes a blockchain platform architecture for clinical trial and precision medicine and discusses various design aspects and provides some insights in the technology requirements and challenges. We identify 4 new system architecture components that are required to be built on top of traditional blockchain and discuss their technology challenges in our blockchain platform: (a) a new blockchain based general distributed and parallel computing paradigm component to devise and study parallel computing methodology for big data analytics, (b) blockchain application data management component for data integrity, big data integration, and integrating disparity of medical related data, (c) verifiable anonymous identity management component for identity privacy for both person and Internet of Things (IoT) devices and secure data access to make possible of the patient centric medicine, and (d) trust data sharing management component to enable a trust medical data ecosystem for collaborative research.
本文提出了一种用于临床试验和精准医疗的区块链平台架构,并对其设计的各个方面进行了讨论,并对其技术需求和挑战提出了一些见解。我们确定了需要在传统区块链平台上构建的4个新的系统架构组件,并讨论了它们在区块链平台上面临的技术挑战:(a)一个新的基于区块链的通用分布式并行计算范式组件,用于设计和研究大数据分析的并行计算方法;(b)区块链应用数据管理组件,用于数据完整性、大数据集成和整合医疗相关数据的差异;(c)可验证的匿名身份管理组件,用于个人和物联网(IoT)设备的身份隐私和安全数据访问,以实现以患者为中心的医疗,以及(d)信任数据共享管理组件,以实现协作研究的信任医疗数据生态系统。
{"title":"On the Design of a Blockchain Platform for Clinical Trial and Precision Medicine","authors":"Zonyin Shae, J. Tsai","doi":"10.1109/ICDCS.2017.61","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.61","url":null,"abstract":"This paper proposes a blockchain platform architecture for clinical trial and precision medicine and discusses various design aspects and provides some insights in the technology requirements and challenges. We identify 4 new system architecture components that are required to be built on top of traditional blockchain and discuss their technology challenges in our blockchain platform: (a) a new blockchain based general distributed and parallel computing paradigm component to devise and study parallel computing methodology for big data analytics, (b) blockchain application data management component for data integrity, big data integration, and integrating disparity of medical related data, (c) verifiable anonymous identity management component for identity privacy for both person and Internet of Things (IoT) devices and secure data access to make possible of the patient centric medicine, and (d) trust data sharing management component to enable a trust medical data ecosystem for collaborative research.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132624859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 156
Optimizing Shuffle in Wide-Area Data Analytics 优化洗牌广域数据分析
Shuhao Liu, Hao Wang, Baochun Li
As increasingly large volumes of raw data are generated at geographically distributed datacenters, they need to be efficiently processed by data analytic jobs spanning multiple datacenters across wide-area networks. Designed for a single datacenter, existing data processing frameworks, such as Apache Spark, are not able to deliver satisfactory performance when these wide-area analytic jobs are executed. As wide-area networks interconnecting datacenters may not be congestion free, there is a compelling need for a new system framework that is optimized for wide-area data analytics. In this paper, we design and implement a new proactive data aggregation framework based on Apache Spark, with a focus on optimizing the network traffic incurred in shuffle stages of data analytic jobs. The objective of this framework is to strategically and proactively aggregate the output data of mapper tasks to a subset of worker datacenters, as a replacement to Spark's original passive fetch mechanism across datacenters. It improves the performance of wide-area analytic jobs by avoiding repetitive data transfers, which improves the utilization of inter-datacenter links. Our extensive experimental results using standard benchmarks across six Amazon EC2 regions have shown that our proposed framework is able to reduce job completion times by up to 73%, as compared to the existing baseline implementation in Spark.
随着地理分布的数据中心产生越来越多的大量原始数据,需要跨广域网的多个数据中心的数据分析作业有效地处理这些数据。现有的数据处理框架(如Apache Spark)是为单个数据中心设计的,当执行这些广域分析作业时,它们无法提供令人满意的性能。由于连接数据中心的广域网可能不会没有拥塞,因此迫切需要针对广域数据分析进行优化的新系统框架。本文设计并实现了一个新的基于Apache Spark的主动数据聚合框架,重点对数据分析作业shuffle阶段产生的网络流量进行优化。该框架的目标是战略性地、主动地将mapper任务的输出数据聚合到工作数据中心的一个子集,作为Spark原始的跨数据中心被动获取机制的替代。它通过避免重复的数据传输提高了广域分析工作的性能,从而提高了数据中心间链路的利用率。我们在六个Amazon EC2区域使用标准基准进行了广泛的实验,结果表明,与Spark中现有的基线实现相比,我们提出的框架能够将作业完成时间减少73%。
{"title":"Optimizing Shuffle in Wide-Area Data Analytics","authors":"Shuhao Liu, Hao Wang, Baochun Li","doi":"10.1109/ICDCS.2017.131","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.131","url":null,"abstract":"As increasingly large volumes of raw data are generated at geographically distributed datacenters, they need to be efficiently processed by data analytic jobs spanning multiple datacenters across wide-area networks. Designed for a single datacenter, existing data processing frameworks, such as Apache Spark, are not able to deliver satisfactory performance when these wide-area analytic jobs are executed. As wide-area networks interconnecting datacenters may not be congestion free, there is a compelling need for a new system framework that is optimized for wide-area data analytics. In this paper, we design and implement a new proactive data aggregation framework based on Apache Spark, with a focus on optimizing the network traffic incurred in shuffle stages of data analytic jobs. The objective of this framework is to strategically and proactively aggregate the output data of mapper tasks to a subset of worker datacenters, as a replacement to Spark's original passive fetch mechanism across datacenters. It improves the performance of wide-area analytic jobs by avoiding repetitive data transfers, which improves the utilization of inter-datacenter links. Our extensive experimental results using standard benchmarks across six Amazon EC2 regions have shown that our proposed framework is able to reduce job completion times by up to 73%, as compared to the existing baseline implementation in Spark.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131771379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Real-Time Power Cycling in Video on Demand Data Centres Using Online Bayesian Prediction 基于在线贝叶斯预测的视频点播数据中心实时电力循环
Vicent Sanz Marco, Z. Wang, Barry Porter
Energy usage in data centres continues to be a major and growing concern as an increasing number of everyday services depend on these facilities. Research in this area has examined topics including power smoothing using batteries and deep learning to control cooling systems, in addition to optimisation techniques for the software running inside data centres. We present a novel real-time power-cycling architecture, supported by a media distribution approach and online prediction model, to automatically determine when servers are needed based on demand. We demonstrate with experimental evaluation that this approach can save up to 31% of server energy in a cluster. Our evaluation is conducted on typical rack mount servers in a data centre testbed and uses a recent real-world workload trace from the BBC iPlayer, an extremely popular video on demand service in the UK.
随着越来越多的日常服务依赖于数据中心设施,数据中心的能源使用仍然是一个主要且日益令人关注的问题。该领域的研究包括使用电池平滑电源、深度学习控制冷却系统,以及数据中心内部运行的软件优化技术。我们提出了一种新的实时功率循环架构,该架构由媒体分发方法和在线预测模型支持,可以根据需求自动确定何时需要服务器。我们通过实验评估证明,这种方法可以在集群中节省高达31%的服务器能源。我们的评估是在数据中心测试台上的典型机架式服务器上进行的,并使用了最近来自BBC iPlayer的真实工作负载跟踪,这是英国非常受欢迎的视频点播服务。
{"title":"Real-Time Power Cycling in Video on Demand Data Centres Using Online Bayesian Prediction","authors":"Vicent Sanz Marco, Z. Wang, Barry Porter","doi":"10.1109/ICDCS.2017.167","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.167","url":null,"abstract":"Energy usage in data centres continues to be a major and growing concern as an increasing number of everyday services depend on these facilities. Research in this area has examined topics including power smoothing using batteries and deep learning to control cooling systems, in addition to optimisation techniques for the software running inside data centres. We present a novel real-time power-cycling architecture, supported by a media distribution approach and online prediction model, to automatically determine when servers are needed based on demand. We demonstrate with experimental evaluation that this approach can save up to 31% of server energy in a cluster. Our evaluation is conducted on typical rack mount servers in a data centre testbed and uses a recent real-world workload trace from the BBC iPlayer, an extremely popular video on demand service in the UK.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124325583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Edge Computing and IoT Based Research for Building Safe Smart Cities Resistant to Disasters 基于边缘计算和物联网的防灾安全智慧城市建设研究
T. Higashino, H. Yamaguchi, Akihito Hiromori, A. Uchiyama, K. Yasumoto
Recently, several researches concerning with smart and connected communities have been studied. Soon the 4G / 5G technology becomes popular, and cellular base stations will be located densely in the urban space. They may offer intelligent services for autonomous driving, urban environment improvement, disaster mitigation, elderly/disabled people support and so on. Such infrastructure might function as edge servers for disaster support base. In this paper, we enumerate several research issues to be developed in the ICDCS community in the next decade in order for building safe, smart cities resistant to disasters. In particular, we focus on (A) up-to-date urban crowd mobility prediction and (B) resilient disaster information gathering mechanisms based on the edge computing paradigm. We investigate recent related works and projects, and introduce our on-going research work and insight for disaster mitigation.
近年来,人们对智能互联社区进行了一些研究。很快4G / 5G技术就会普及,蜂窝基站将密集分布在城市空间。他们可以为自动驾驶、城市环境改善、减灾、老年人/残疾人支持等提供智能服务。这种基础设施可以作为灾难支持基地的边缘服务器。在本文中,我们列举了未来十年ICDCS社区需要发展的几个研究问题,以建设安全,抗灾的智慧城市。我们特别关注(A)最新的城市人群流动性预测和(B)基于边缘计算范式的弹性灾害信息收集机制。我们调查了最近的相关工作和项目,并介绍了我们正在进行的减灾研究工作和见解。
{"title":"Edge Computing and IoT Based Research for Building Safe Smart Cities Resistant to Disasters","authors":"T. Higashino, H. Yamaguchi, Akihito Hiromori, A. Uchiyama, K. Yasumoto","doi":"10.1109/ICDCS.2017.160","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.160","url":null,"abstract":"Recently, several researches concerning with smart and connected communities have been studied. Soon the 4G / 5G technology becomes popular, and cellular base stations will be located densely in the urban space. They may offer intelligent services for autonomous driving, urban environment improvement, disaster mitigation, elderly/disabled people support and so on. Such infrastructure might function as edge servers for disaster support base. In this paper, we enumerate several research issues to be developed in the ICDCS community in the next decade in order for building safe, smart cities resistant to disasters. In particular, we focus on (A) up-to-date urban crowd mobility prediction and (B) resilient disaster information gathering mechanisms based on the edge computing paradigm. We investigate recent related works and projects, and introduce our on-going research work and insight for disaster mitigation.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128528936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Pairwise Ranking Aggregation by Non-interactive Crowdsourcing with Budget Constraints 预算约束下非交互式众包的成对排序聚合
Changjiang Cai, Haipei Sun, Boxiang Dong, Bo Zhang, Ting Wang, Wendy Hui Wang
Crowdsourced ranking algorithms ask the crowd to compare the objects and infer the full ranking based on the crowdsourced pairwise comparison results. In this paper, we consider the setting in which the task requester is equipped with a limited budget that can afford only a small number of pairwise comparisons. To make the problem more complicated, the crowd may return noisy comparison answers. We propose an approach to obtain a good-quality full ranking from a small number of pairwise preferences in two steps, namely task assignment and result inference. In the task assignment step, we generate pairwise comparison tasks that produce a full ranking with high probability. In the result inference step, based on the transitive property of pairwise comparisons and truth discovery, we design an efficient heuristic algorithm to find the best full ranking from the potentially conflictive pairwise preferences. The experiment results demonstrate the effectiveness and efficiency of our approach.
众包排序算法要求人群对对象进行比较,并根据众包的两两比较结果推断出完整的排序。在本文中,我们考虑这样一种设置:任务请求者的预算有限,只能进行少量的两两比较。为了使问题更加复杂,人群可能会返回嘈杂的比较答案。我们提出了一种从少量成对偏好中获得高质量完整排名的方法,分为任务分配和结果推理两个步骤。在任务分配步骤中,我们生成两两比较任务,产生高概率的完整排名。在结果推断步骤中,基于两两比较和真值发现的传递特性,设计了一种高效的启发式算法,从潜在冲突的两两偏好中找到最佳的完整排序。实验结果证明了该方法的有效性和高效性。
{"title":"Pairwise Ranking Aggregation by Non-interactive Crowdsourcing with Budget Constraints","authors":"Changjiang Cai, Haipei Sun, Boxiang Dong, Bo Zhang, Ting Wang, Wendy Hui Wang","doi":"10.1109/ICDCS.2017.102","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.102","url":null,"abstract":"Crowdsourced ranking algorithms ask the crowd to compare the objects and infer the full ranking based on the crowdsourced pairwise comparison results. In this paper, we consider the setting in which the task requester is equipped with a limited budget that can afford only a small number of pairwise comparisons. To make the problem more complicated, the crowd may return noisy comparison answers. We propose an approach to obtain a good-quality full ranking from a small number of pairwise preferences in two steps, namely task assignment and result inference. In the task assignment step, we generate pairwise comparison tasks that produce a full ranking with high probability. In the result inference step, based on the transitive property of pairwise comparisons and truth discovery, we design an efficient heuristic algorithm to find the best full ranking from the potentially conflictive pairwise preferences. The experiment results demonstrate the effectiveness and efficiency of our approach.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116336220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Analysis of Cloud Computing Centers Serving Parallelizable Rendering Jobs Using M/M/c/r Queuing Systems 云计算中心使用M/M/c/r排队系统服务并行渲染作业的性能分析
Xiulin Li, Li Pan, Jiwei Huang, Shijun Liu, Yuliang Shi, Li-zhen Cui, C. Pu
Performance analysis is crucial to the successful development of cloud computing paradigm. And it is especially important for a cloud computing center serving parallelizable application jobs, for determining a proper degree of parallelism could reduce the mean service response time and thus improve the performance of cloud computing obviously. In this paper, taking the cloud based rendering service platform as an example application, we propose an approximate analytical model for cloud computing centers serving parallelizable jobs using M/M/c/r queuing systems, by modeling the rendering service platform as a multi-station multi-server system. We solve the proposed analytical model to obtain a complete probability distribution of response time, blocking probability and other important performance metrics for given cloud system settings. Thus this model can guide cloud operators to determine a proper setting, such as the number of servers, the buffer size and the degree of parallelism, for achieving specific performance levels. Through extensive simulations based on both synthetic data and real-world workload traces, we show that our proposed analytical model can provide approximate performance prediction results for cloud computing centers serving parallelizable jobs, even those job arrivals follow different distributions.
性能分析对云计算范式的成功开发至关重要。对于服务于可并行应用程序作业的云计算中心来说,确定适当的并行度可以减少平均服务响应时间,从而明显提高云计算的性能,这一点尤为重要。本文以基于云的渲染服务平台为例,通过将渲染服务平台建模为多站多服务器系统,提出了云计算中心使用M/M/c/r排队系统服务并行作业的近似解析模型。我们解决了提出的分析模型,以获得给定云系统设置的响应时间、阻塞概率和其他重要性能指标的完整概率分布。因此,该模型可以指导云计算运营商确定适当的设置,例如服务器数量、缓冲区大小和并行度,以实现特定的性能水平。通过基于合成数据和真实工作负载跟踪的广泛模拟,我们表明,我们提出的分析模型可以为服务可并行作业的云计算中心提供近似的性能预测结果,即使这些作业到达遵循不同的分布。
{"title":"Performance Analysis of Cloud Computing Centers Serving Parallelizable Rendering Jobs Using M/M/c/r Queuing Systems","authors":"Xiulin Li, Li Pan, Jiwei Huang, Shijun Liu, Yuliang Shi, Li-zhen Cui, C. Pu","doi":"10.1109/ICDCS.2017.132","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.132","url":null,"abstract":"Performance analysis is crucial to the successful development of cloud computing paradigm. And it is especially important for a cloud computing center serving parallelizable application jobs, for determining a proper degree of parallelism could reduce the mean service response time and thus improve the performance of cloud computing obviously. In this paper, taking the cloud based rendering service platform as an example application, we propose an approximate analytical model for cloud computing centers serving parallelizable jobs using M/M/c/r queuing systems, by modeling the rendering service platform as a multi-station multi-server system. We solve the proposed analytical model to obtain a complete probability distribution of response time, blocking probability and other important performance metrics for given cloud system settings. Thus this model can guide cloud operators to determine a proper setting, such as the number of servers, the buffer size and the degree of parallelism, for achieving specific performance levels. Through extensive simulations based on both synthetic data and real-world workload traces, we show that our proposed analytical model can provide approximate performance prediction results for cloud computing centers serving parallelizable jobs, even those job arrivals follow different distributions.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125222283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Approximation and Online Algorithms for NFV-Enabled Multicasting in SDNs sdn中支持nfv的组播的逼近和在线算法
Zichuan Xu, W. Liang, Meitian Huang, M. Jia, Song Guo, A. Galis
Multicasting is a fundamental functionality of networks for many applications including online conferencing, event monitoring, video streaming, and system monitoring in data centers. To ensure multicasting reliable, secure and scalable, a service chain consisting of network functions (e.g., firewalls, Intrusion Detection Systems (IDSs), and transcoders) usually is associated with each multicast request. Such a multicast request is referred to as an NFV-enabled multicast request. In this paper we study NFV-enabled multicasting in a Software-Defined Network (SDN) with the aims to minimize the implementation cost of each NFV-enabled multicast request or maximize the network throughput for a sequence of NFV-enabled requests, subject to network resource capacity constraints. We first formulate novel NFV-enabled multicasting and online NFV-enabled multicasting problems. We then devise the very first approximation algorithm with an approximation ratio of 2K for the NFV-enabled multicasting problem if the number of servers for implementing the network functions of each request is no more than a constant K (1). We also study dynamic admissions of NFV-enabled multicast requests without the knowledge of future request arrivals with the objective to maximize the network throughput, for which we propose an online algorithm with a competitive ratio of O(log n) when K = 1, where n is the number of nodes in the network. We finally evaluate the performance of the proposed algorithms through experimental simulations. Experimental results demonstrate that the proposed algorithms outperform other existing heuristics.
多播是网络的一项基本功能,适用于许多应用程序,包括在线会议、事件监控、视频流和数据中心的系统监控。为了确保组播的可靠性、安全性和可扩展性,通常需要一个由网络功能(如防火墙、入侵检测系统和转码器)组成的服务链与每个组播请求相关联。这样的组播请求称为启用nfv的组播请求。在本文中,我们研究了软件定义网络(SDN)中启用nfv的组播,目的是在网络资源容量限制的情况下,最小化每个启用nfv的组播请求的实现成本,或最大化一系列启用nfv的请求的网络吞吐量。我们首先提出了新的支持nfv的多播和在线支持nfv的多播问题。然后,如果用于实现每个请求的网络功能的服务器数量不超过常数K(1),则我们设计了具有2K近似比率的第一个近似算法,用于支持nfv的多播问题。我们还研究了支持nfv的多播请求的动态接收,而不知道未来的请求到达,目标是最大化网络吞吐量。为此,我们提出了一种在线算法,当K = 1时,竞争比为O(log n),其中n为网络中的节点数。最后,我们通过实验模拟来评估所提出算法的性能。实验结果表明,该算法优于现有的启发式算法。
{"title":"Approximation and Online Algorithms for NFV-Enabled Multicasting in SDNs","authors":"Zichuan Xu, W. Liang, Meitian Huang, M. Jia, Song Guo, A. Galis","doi":"10.1109/ICDCS.2017.43","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.43","url":null,"abstract":"Multicasting is a fundamental functionality of networks for many applications including online conferencing, event monitoring, video streaming, and system monitoring in data centers. To ensure multicasting reliable, secure and scalable, a service chain consisting of network functions (e.g., firewalls, Intrusion Detection Systems (IDSs), and transcoders) usually is associated with each multicast request. Such a multicast request is referred to as an NFV-enabled multicast request. In this paper we study NFV-enabled multicasting in a Software-Defined Network (SDN) with the aims to minimize the implementation cost of each NFV-enabled multicast request or maximize the network throughput for a sequence of NFV-enabled requests, subject to network resource capacity constraints. We first formulate novel NFV-enabled multicasting and online NFV-enabled multicasting problems. We then devise the very first approximation algorithm with an approximation ratio of 2K for the NFV-enabled multicasting problem if the number of servers for implementing the network functions of each request is no more than a constant K (1). We also study dynamic admissions of NFV-enabled multicast requests without the knowledge of future request arrivals with the objective to maximize the network throughput, for which we propose an online algorithm with a competitive ratio of O(log n) when K = 1, where n is the number of nodes in the network. We finally evaluate the performance of the proposed algorithms through experimental simulations. Experimental results demonstrate that the proposed algorithms outperform other existing heuristics.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"457 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123406297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Online Resource Allocation for Arbitrary User Mobility in Distributed Edge Clouds 分布式边缘云下任意用户移动的在线资源分配
L. Wang, Lei Jiao, Jun Yu Li, M. Mühlhäuser
As clouds move to the network edge to facilitate mobile applications, edge cloud providers are facing new challenges on resource allocation. As users may move and resource prices may vary arbitrarily, %and service delays are heterogeneous, resources in edge clouds must be allocated and adapted continuously in order to accommodate such dynamics. In this paper, we first formulate this problem with a comprehensive model that captures the key challenges, then introduce a gap-preserving transformation of the problem, and propose a novel online algorithm that optimally solves a series of subproblems with a carefully designed logarithmic objective, finally producing feasible solutions for edge cloud resource allocation over time. We further prove via rigorous analysis that our online algorithm can provide a parameterized competitive ratio, without requiring any a priori knowledge on either the resource price or the user mobility. Through extensive experiments with both real-world and synthetic data, we further confirm the effectiveness of the proposed algorithm. We show that the proposed algorithm achieves near-optimal results with an empirical competitive ratio of about 1.1, reduces the total cost by up to 4x compared to static approaches, and outperforms the online greedy one-shot optimizations by up to 70%.
随着云向网络边缘移动以促进移动应用,边缘云提供商在资源分配方面面临新的挑战。由于用户可能移动,资源价格可能任意变化,%和服务延迟是异构的,因此必须不断分配和调整边缘云中的资源,以适应这种动态。在本文中,我们首先用一个捕获关键挑战的综合模型来阐述这个问题,然后引入问题的间隙保持变换,并提出一种新的在线算法,该算法以精心设计的对数目标最优地解决了一系列子问题,最终产生了边缘云资源随时间分配的可行解决方案。通过严格的分析,我们进一步证明了我们的在线算法可以提供参数化的竞争比,而不需要任何关于资源价格或用户移动性的先验知识。通过真实世界和合成数据的大量实验,我们进一步证实了所提出算法的有效性。我们的研究表明,该算法获得了接近最优的结果,经验竞争比约为1.1,与静态方法相比,总成本降低了4倍,并且比在线贪婪一次性优化高出70%。
{"title":"Online Resource Allocation for Arbitrary User Mobility in Distributed Edge Clouds","authors":"L. Wang, Lei Jiao, Jun Yu Li, M. Mühlhäuser","doi":"10.1109/ICDCS.2017.30","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.30","url":null,"abstract":"As clouds move to the network edge to facilitate mobile applications, edge cloud providers are facing new challenges on resource allocation. As users may move and resource prices may vary arbitrarily, %and service delays are heterogeneous, resources in edge clouds must be allocated and adapted continuously in order to accommodate such dynamics. In this paper, we first formulate this problem with a comprehensive model that captures the key challenges, then introduce a gap-preserving transformation of the problem, and propose a novel online algorithm that optimally solves a series of subproblems with a carefully designed logarithmic objective, finally producing feasible solutions for edge cloud resource allocation over time. We further prove via rigorous analysis that our online algorithm can provide a parameterized competitive ratio, without requiring any a priori knowledge on either the resource price or the user mobility. Through extensive experiments with both real-world and synthetic data, we further confirm the effectiveness of the proposed algorithm. We show that the proposed algorithm achieves near-optimal results with an empirical competitive ratio of about 1.1, reduces the total cost by up to 4x compared to static approaches, and outperforms the online greedy one-shot optimizations by up to 70%.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130294422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
期刊
2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1