2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)最新文献

英文中文

NMFDIV: A Nonnegative Matrix Factorization Approach for Search Result Diversification on Attributed Networks NMFDIV:一种属性网络搜索结果多样化的非负矩阵分解方法

2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Pub Date : 2017-12-01 DOI: 10.1109/PDCAT.2017.00023

Zaiqiao Meng, Hong Shen

Search result diversification is effective way to tackle query ambiguity and enhance result novelty. In the context of large information networks, diversifying search result is also critical for further design of applications such as link prediction and citation recommendation. In previous work, this problem has mainly been tackled in a way of implicit query intent. To further enhance the performance, we propose an explicit search result diversification method that explicitly encode query intent and represent nodes as representation vectors by a novel nonnegative matrix factorization approach, and the diversity of the results node account for the query relevance and the novelty w.r.t. these vectors. To learn representation vectors for networks, we derive the multiplicative update rules to train the nonnegative matrix factorization model. Finally, we perform a comprehensive evaluation on our proposals with various baselines. Experimental results show the effectiveness of our proposed solution, and verify that attributes do help improve diversification performance.

搜索结果多样化是解决查询歧义和提高搜索结果新颖性的有效途径。在大信息网络环境下，搜索结果的多样化对于链接预测、引文推荐等应用的进一步设计也是至关重要的。在以往的工作中，主要采用隐式查询意图的方式来解决这个问题。为了进一步提高性能，我们提出了一种显式搜索结果多样化方法，该方法通过一种新颖的非负矩阵分解方法显式编码查询意图并将节点表示为表示向量，结果节点的多样性解释了这些向量的查询相关性和新颖性。为了学习网络的表示向量，我们推导了乘法更新规则来训练非负矩阵分解模型。最后，我们以不同的基准对我们的提案进行全面的评估。实验结果表明了该方法的有效性，并验证了属性确实有助于提高分散性能。

{"title":"NMFDIV: A Nonnegative Matrix Factorization Approach for Search Result Diversification on Attributed Networks","authors":"Zaiqiao Meng, Hong Shen","doi":"10.1109/PDCAT.2017.00023","DOIUrl":"https://doi.org/10.1109/PDCAT.2017.00023","url":null,"abstract":"Search result diversification is effective way to tackle query ambiguity and enhance result novelty. In the context of large information networks, diversifying search result is also critical for further design of applications such as link prediction and citation recommendation. In previous work, this problem has mainly been tackled in a way of implicit query intent. To further enhance the performance, we propose an explicit search result diversification method that explicitly encode query intent and represent nodes as representation vectors by a novel nonnegative matrix factorization approach, and the diversity of the results node account for the query relevance and the novelty w.r.t. these vectors. To learn representation vectors for networks, we derive the multiplicative update rules to train the nonnegative matrix factorization model. Finally, we perform a comprehensive evaluation on our proposals with various baselines. Experimental results show the effectiveness of our proposed solution, and verify that attributes do help improve diversification performance.","PeriodicalId":119197,"journal":{"name":"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114436530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Privacy-Preserving Cloud-Based Data Management System with Efficient Revocation Scheme 具有有效撤销机制的保护隐私云端数据管理系统

2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Pub Date : 2017-12-01 DOI: 10.1109/PDCAT.2017.00011

S. Chang, Ja-Ling Wu

There are lots of data management systems, according to various reasons, designating their high computational work-loads to public cloud service providers. It is well-known that once we entrust our tasks to a cloud server, we may face several threats, such as privacy-infringement with regard to users attribute information; therefore, an appropriate privacy preserving mechanism is a must for constructing a secure cloud-based data management system (SCBDMS). To design a reliable SCBDMS with server-enforced revocation ability is a very challenging task even if the server is working under the honest-but-curious mode. In existing data management systems, there seldom provide privacy-preserving revocation service, especially when it is outsourced to a third party. In this work, with the aids of oblivious transfer and the newly proposed stateless lazy re-encryption (SLREN) mechanism, a SCBDMS, with secure, reliable and efficient server-enforced attribute revocation ability is built. Comparing with related works, our experimental results show that, in the newly constructed SCBDMS, the storage-requirement of the cloud server and the communication overheads between cloud server and systems users are largely reduced, due to the nature of late involvement of SLREN.

有很多数据管理系统，由于各种原因，将其较高的计算工作量委托给公共云服务提供商。众所周知，一旦我们将任务委托给云服务器，我们可能会面临一些威胁，例如用户属性信息的隐私侵犯;因此，构建安全的基于云的数据管理系统(SCBDMS)必须有适当的隐私保护机制。即使服务器在诚实但好奇的模式下工作，设计具有服务器强制撤销能力的可靠SCBDMS也是一项非常具有挑战性的任务。在现有的数据管理系统中，很少提供保护隐私的撤销服务，特别是外包给第三方的撤销服务。本文利用遗忘传输和新提出的无状态延迟重加密(SLREN)机制，构建了一个具有安全、可靠和高效的服务器强制属性撤销能力的SCBDMS。实验结果表明，在新构建的SCBDMS中，由于SLREN的后期介入性质，大大降低了云服务器的存储需求和云服务器与系统用户之间的通信开销。

{"title":"A Privacy-Preserving Cloud-Based Data Management System with Efficient Revocation Scheme","authors":"S. Chang, Ja-Ling Wu","doi":"10.1109/PDCAT.2017.00011","DOIUrl":"https://doi.org/10.1109/PDCAT.2017.00011","url":null,"abstract":"There are lots of data management systems, according to various reasons, designating their high computational work-loads to public cloud service providers. It is well-known that once we entrust our tasks to a cloud server, we may face several threats, such as privacy-infringement with regard to users attribute information; therefore, an appropriate privacy preserving mechanism is a must for constructing a secure cloud-based data management system (SCBDMS). To design a reliable SCBDMS with server-enforced revocation ability is a very challenging task even if the server is working under the honest-but-curious mode. In existing data management systems, there seldom provide privacy-preserving revocation service, especially when it is outsourced to a third party. In this work, with the aids of oblivious transfer and the newly proposed stateless lazy re-encryption (SLREN) mechanism, a SCBDMS, with secure, reliable and efficient server-enforced attribute revocation ability is built. Comparing with related works, our experimental results show that, in the newly constructed SCBDMS, the storage-requirement of the cloud server and the communication overheads between cloud server and systems users are largely reduced, due to the nature of late involvement of SLREN.","PeriodicalId":119197,"journal":{"name":"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115101448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Parallel Implementation of Dynamic Programming Problems Using Wavefront and Rank Convergence with Full Resource Utilization 基于波前和秩收敛的动态规划问题并行实现

2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Pub Date : 2017-12-01 DOI: 10.1109/PDCAT.2017.00033

Vivek Sourabh, Parth Pahariya, Isha Agarwal, Ankit Gautam, C. R. Chowdary

In this paper, we propose a novel approach which uses full processor utilization to compute a particular class of dynamic programming problems parallelly. This class includes algorithms such as Longest Common Subsequence and Needleman-Wunsch. In a dynamic programming, a larger problem is divided into smaller problems which are then solved, and the results are used to compute the final result. Each subproblem can be considered as a stage. If computations made in a stage are independent of the computations made in other stages, then these stages can be calculated in parallel. The idling of processors bottlenecks the performance of the currently existing parallel algorithms. In this paper, we are using rank convergence for computation of each stage ensuring full processor utilization. This increases the efficiency and speedup of the parallel algorithm.

在本文中，我们提出了一种新的方法，利用充分的处理器利用率来并行计算一类特定的动态规划问题。这类包括算法，如最长公共子序列和Needleman-Wunsch。在动态规划中，将一个较大的问题分解为若干较小的问题，然后求解这些较小的问题，并将求解结果用于计算最终结果。每个子问题可以看作是一个阶段。如果在一个阶段中进行的计算独立于在其他阶段中进行的计算，则这些阶段可以并行计算。处理器的闲置是当前并行算法性能的瓶颈。在本文中，我们使用秩收敛来计算每个阶段，以确保充分利用处理器。这提高了并行算法的效率和速度。

引用次数: 0

Strike the Balance between System Utilization and Data Locality under Deadline Constraint for MapReduce Clusters MapReduce集群在Deadline约束下如何平衡系统利用率和数据局部性

2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Pub Date : 2017-12-01 DOI: 10.1109/PDCAT.2017.00061

Yeh-Cheng Chen, J. Chou

MapReduce paradigm has become a popular platform for massive data processing and Big Data applications. Although MapReduce was initially designed for high throughput and batch processing, it has also been used for handling many other types of applications and workloads due to its scalable and reliable system architecture. One of the emerging requirements for enterprise data-process computing is completion time guar- antee. However, there are only a few research works have been done for MapReduce jobs with deadline constraint. Therefore, in this paper, we aim to prevent jobs from missing deadline while maximizing the resource utilization and data locality of a MapReduce cluster. Our approach is to introduce a two-phase job scheduling mechanism which combines a job admission controller policy and a priority-based scheduling algorithm. We use a series of simulations over diverted workload to evaluate our system. The results show that our approach can guarantee job completion time in a heavy-loaded system, and achieve comparable data locality to the delay schedule algorithm in a light-loaded system. Furthermore, our approach can maximize system throughput by preventing system resources from being wasted by the jobs missing their deadlines.

MapReduce范式已经成为海量数据处理和大数据应用的流行平台。虽然MapReduce最初是为高吞吐量和批处理而设计的，但由于其可扩展和可靠的系统架构，它也被用于处理许多其他类型的应用程序和工作负载。完成时间保证是企业数据处理计算的新需求之一。然而，对于有期限约束的MapReduce作业，研究工作很少。因此，在本文中，我们的目标是在最大限度地提高MapReduce集群的资源利用率和数据局部性的同时，防止作业错过截止日期。我们的方法是引入一种两阶段作业调度机制，该机制结合了作业接纳控制器策略和基于优先级的调度算法。我们在转移的工作负载上使用一系列模拟来评估我们的系统。结果表明，该方法在重载系统中可以保证作业完成时间，在轻负载系统中可以达到与延迟调度算法相当的数据局部性。此外，我们的方法可以通过防止作业错过截止日期而浪费系统资源来最大化系统吞吐量。

{"title":"Strike the Balance between System Utilization and Data Locality under Deadline Constraint for MapReduce Clusters","authors":"Yeh-Cheng Chen, J. Chou","doi":"10.1109/PDCAT.2017.00061","DOIUrl":"https://doi.org/10.1109/PDCAT.2017.00061","url":null,"abstract":"MapReduce paradigm has become a popular platform for massive data processing and Big Data applications. Although MapReduce was initially designed for high throughput and batch processing, it has also been used for handling many other types of applications and workloads due to its scalable and reliable system architecture. One of the emerging requirements for enterprise data-process computing is completion time guar- antee. However, there are only a few research works have been done for MapReduce jobs with deadline constraint. Therefore, in this paper, we aim to prevent jobs from missing deadline while maximizing the resource utilization and data locality of a MapReduce cluster. Our approach is to introduce a two-phase job scheduling mechanism which combines a job admission controller policy and a priority-based scheduling algorithm. We use a series of simulations over diverted workload to evaluate our system. The results show that our approach can guarantee job completion time in a heavy-loaded system, and achieve comparable data locality to the delay schedule algorithm in a light-loaded system. Furthermore, our approach can maximize system throughput by preventing system resources from being wasted by the jobs missing their deadlines.","PeriodicalId":119197,"journal":{"name":"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116347472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Handling Churn in Similarity Based Clustering Overlays Using Weighted Benefit 使用加权收益处理基于相似性的聚类重叠中的波动

2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Pub Date : 2017-12-01 DOI: 10.1109/PDCAT.2017.00069

I. Bukhari, A. Harwood, S. Karunasekera

Similarity based clustering (SBC) overlays are decentralized networks of nodes on the Internet edge, where each node maintains some number of direct connections to other nodes that are most "similar" to it. The challenge is: how do the nodes in the overlay converge to and maintain the most similar neighbors, given that the network is decentralized, is subject to churn and that similarity varies over time. Protocols that simultaneously provide fast convergence and low bandwidth consumption are the objective of this research. We present a protocol, that we call Weighted Benefit Scheme (WBS), that improves upon existing state-of-the-art in this area: it has equivalent convergence rate to the Optimum Benefit Protocol while simultaneously handling churn competitively to the Vicinity protocol. We use real world datasets from Yahoo WebScope that comprises of 15,400 users with 354,000 ratings about 1000 songs and our experiments are performed on the simulation test-bed PeerNet.

基于相似性的聚类(SBC)覆盖是Internet边缘上分散的节点网络，其中每个节点与与它最“相似”的其他节点保持一定数量的直接连接。面临的挑战是:考虑到网络是分散的，容易发生波动，而且相似性随时间而变化，覆盖层中的节点如何收敛并保持最相似的邻居。同时提供快速收敛和低带宽消耗的协议是本研究的目标。我们提出了一种协议，我们称之为加权收益方案(WBS)，它改进了该领域的现有技术:它具有与最优收益协议相当的收敛速度，同时与邻近协议竞争处理流失。我们使用来自Yahoo WebScope的真实世界数据集，其中包括15400名用户和354,000个评分，大约1000首歌曲，我们的实验是在模拟测试平台PeerNet上进行的。

引用次数: 2

Parallel Implementation of Local Similarity Search for Unstructured Text Using Prefix Filtering 基于前缀过滤的非结构化文本局部相似度搜索并行实现

2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Pub Date : 2017-12-01 DOI: 10.1109/PDCAT.2017.00025

Manu Agrawal, Kartik Manchanda, Ribhav Soni, A. Lal, C. R. Chowdary

Identifying partially duplicated text segments among documents is an important research problem with applications in plagiarism detection and near-duplicate web page detection. We investigate the problem of local similarity search for finding partially replicated text, focusing on its parallel implementation. Our aim is to find text windows that are approximately similar in two documents, using a filter verification framework. We present various parallel approaches to the problem, of which input data partitioning along with the reduction of individual index maps was found to be most suitable. We analyzed the effect of varying similarity threshold and number of processes on speedup, and also performed cost analysis. Experimental results show that the proposed method achieves up to 13x speedup on a 24-core processor.

在剽窃检测和近重复网页检测中，识别文档中部分重复的文本片段是一个重要的研究问题。我们研究了局部相似度搜索的问题，以寻找部分复制的文本，重点是它的并行实现。我们的目标是使用过滤器验证框架在两个文档中找到近似相似的文本窗口。我们提出了各种并行方法来解决这个问题，其中输入数据分区以及单个索引映射的减少被认为是最合适的。分析了不同相似阈值和进程数对加速的影响，并进行了成本分析。实验结果表明，该方法在24核处理器上的速度提高了13倍。

引用次数: 1

SMiPE: Estimating the Progress of Recurring Iterative Distributed Dataflows SMiPE:估计循环迭代分布式数据流的进展

2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Pub Date : 2017-12-01 DOI: 10.1109/PDCAT.2017.00034

Jannis Koch, L. Thamsen, Florian Schmidt, O. Kao

Distributed dataflow systems such as Apache Spark allow the execution of iterative programs at large scale on clusters. In production use, programs are often recurring and have strict latency requirements. Yet, choosing appropriate resource allocations is difficult as runtimes are dependent on hard-to-predict factors, including failures, cluster utilization and dataset characteristics. Offline runtime prediction helps to estimate resource requirements, but cannot take into account inherent variance due to, for example, changing cluster states. We present SMiPE, a system estimating the progress of iterative dataflows by matching a running job to previous executions based on similarity, capturing properties such as convergence, hardware utilization and runtime. SMiPE is not limited to a specific framework due to its black-box approach and is able to adapt to changing cluster states reflected in the current job’s statistics. SMiPE automatically adapts its similarity matching to algorithm-specific profiles by training parameters on the job history. We evaluated SMiPE with three iterative Spark jobs and nine datasets. The results show that SMiPE is effective in choosing useful historic runs and predicts runtimes with a mean relative error of 9.1% to 13.1%.

像Apache Spark这样的分布式数据流系统允许在集群上大规模地执行迭代程序。在生产使用中，程序通常是重复出现的，并且具有严格的延迟要求。然而，选择适当的资源分配是困难的，因为运行时依赖于难以预测的因素，包括故障、集群利用率和数据集特征。离线运行时预测有助于估计资源需求，但不能考虑由于诸如集群状态变化等原因而产生的固有方差。我们提出了smempe，一个通过基于相似性将正在运行的作业与之前的执行相匹配来估计迭代数据流进度的系统，捕获诸如收敛性，硬件利用率和运行时间等属性。SMiPE并不局限于特定的框架，因为它采用了黑盒方法，并且能够适应当前作业统计数据中反映的集群状态的变化。smype通过训练工作历史上的参数，自动将其相似度匹配适应算法特定的配置文件。我们使用3个迭代Spark作业和9个数据集来评估smype。结果表明，smype在选择有用的历史运行和预测运行时间方面是有效的，平均相对误差为9.1%到13.1%。

{"title":"SMiPE: Estimating the Progress of Recurring Iterative Distributed Dataflows","authors":"Jannis Koch, L. Thamsen, Florian Schmidt, O. Kao","doi":"10.1109/PDCAT.2017.00034","DOIUrl":"https://doi.org/10.1109/PDCAT.2017.00034","url":null,"abstract":"Distributed dataflow systems such as Apache Spark allow the execution of iterative programs at large scale on clusters. In production use, programs are often recurring and have strict latency requirements. Yet, choosing appropriate resource allocations is difficult as runtimes are dependent on hard-to-predict factors, including failures, cluster utilization and dataset characteristics. Offline runtime prediction helps to estimate resource requirements, but cannot take into account inherent variance due to, for example, changing cluster states. We present SMiPE, a system estimating the progress of iterative dataflows by matching a running job to previous executions based on similarity, capturing properties such as convergence, hardware utilization and runtime. SMiPE is not limited to a specific framework due to its black-box approach and is able to adapt to changing cluster states reflected in the current job’s statistics. SMiPE automatically adapts its similarity matching to algorithm-specific profiles by training parameters on the job history. We evaluated SMiPE with three iterative Spark jobs and nine datasets. The results show that SMiPE is effective in choosing useful historic runs and predicts runtimes with a mean relative error of 9.1% to 13.1%.","PeriodicalId":119197,"journal":{"name":"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130800643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A Survey of User Preferences Oriented Service Selection and Deployment in Multi-Cloud Environment 多云环境下面向用户偏好的服务选择与部署研究

2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Pub Date : 2017-12-01 DOI: 10.1109/PDCAT.2017.00065

Letian Yang, Li Liu, Qi Fan

Service selection based on users preference and service deployment are challenge due to the diversity of user demands and preferences in the multi-cloud environment. Few works have clearly reviewed the existing works for the users preference-oriented service selection and service deployment in the multi-cloud environment. In this paper, we propose and motivate taxonomies for users preference-oriented service selection and deployment. We present a detailed survey of the state of the art in terms of the description and analysis of user preference, optimization objectives and constraints. Finally, we analyze the existing works and discuss future work in this area of multi-Cloud service selection and deployment based users preference.

由于多云环境中用户需求和偏好的多样性，基于用户偏好的服务选择和服务部署面临挑战。对于多云环境下面向用户偏好的服务选择和服务部署，很少有著作对现有的工作进行了清晰的回顾。在本文中，我们提出并激励了面向用户偏好的服务选择和部署的分类法。我们在用户偏好、优化目标和约束的描述和分析方面对当前的技术状况进行了详细的调查。最后，我们分析了现有的工作，并讨论了基于用户偏好的多云服务选择和部署领域的未来工作。

引用次数: 3

The Computing of Optimized Clustering Threshold Values Based on Quasi-Classes Space for the Merchandise Recommendation 基于准类空间的商品推荐优化聚类阈值计算

2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Pub Date : 2017-12-01 DOI: 10.1109/PDCAT.2017.00043

Mingshan Xie, Yanfang Deng, Yong Bai, Mengxing Huang, Wenbo Jiang, Zhuhua Hu

The merchandise recommendation is an important part of electronic commerce. In view of the difficulty in obtaining user private information and modeling user interest, this paper is based on the relationship between goods for commodity recommendation. We use fuzzy clustering learning to construct quasi-classes space. Through the intersection of quasi-class and the collection of goods that are being ordered by users, we can know the customers appetites for merchandise, and then recommend the goods. In the construction of quasi-classes space, the value of the threshold Λ must be appropriate, because the threshold Λ determines the size of the quasi-class. It will affect the recommendation of the goods that the size of the quasi-class is too large or too small. The influence of threshold Λ on commodity recommendation is discussed by numerical example, and we finally find the best value of Λ in this paper.

商品推荐是电子商务的重要组成部分。针对用户隐私信息获取和用户兴趣建模困难的问题，本文基于商品之间的关系进行商品推荐。利用模糊聚类学习构造拟类空间。通过准类与用户正在订购的商品集合的交集，我们可以知道客户对商品的胃口，进而推荐商品。在拟类空间的构造中，阈值Λ的取值必须合适，因为阈值Λ决定了拟类的大小。准类尺寸过大或过小都会影响商品的推荐。通过数值算例讨论了阈值Λ对商品推荐的影响，最终找到了Λ的最优值。

引用次数: 0

Computation Capability Deduction Architecture for MapReduce on Cloud Computing 基于云计算的MapReduce计算能力演绎体系

2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Pub Date : 2017-12-01 DOI: 10.1109/PDCAT.2017.00067

Tzu-Chi Huang, Kuo-Chih Chu, Guo-Hao Huang, Yan-Chen Shen, C. Shieh

MapReduce gradually becomes the de facto programming standard of applications on cloud computing. However, MapReduce needs a cloud administrator to manually configure parameters of the run-time system such as slot numbers for Map and Reduce tasks in order to get the best performance. Because the manual configuration has a risk of performance degradation, MapReduce should utilize the Computation Capability Deduction Architecture (CCDA) proposed in this paper to avoid the risk. MapReduce can use CCDA to help the run-time system to distribute appropriate numbers of tasks over computers in a cloud at run time without any manual configuration made by a cloud administrator. According to experiment observations in this paper, MapReduce can get great performance improvement with the help of CCDA in data-intensive applications such as Inverted Index and Word Count that are usually required to process big data on cloud computing.

MapReduce逐渐成为云计算应用程序事实上的编程标准。但是，MapReduce需要云管理员手动配置Map和Reduce任务的槽位号等运行时系统参数，才能获得最佳性能。由于手工配置存在性能下降的风险，MapReduce应该利用本文提出的CCDA (computational Capability Deduction Architecture)来避免这种风险。MapReduce可以使用CCDA帮助运行时系统在运行时将适当数量的任务分配到云中的计算机上，而无需云管理员进行任何手动配置。根据本文的实验观察，在倒排索引(Inverted Index)、字数统计(Word Count)等通常需要在云计算上处理大数据的数据密集型应用中，MapReduce在CCDA的帮助下可以获得很大的性能提升。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀