2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)最新文献

英文中文

Physically based parallel ray tracer for the Metropolis light transport algorithm on the Tianhe-2 supercomputer 天河2号超级计算机上Metropolis光传输算法的物理并行光线追踪器

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097840

Changmao Wu, Yunquan Zhang, Congli Yang, Yutong Lu

Developing an efficient and highly scalable ray tracer for the Metropolis light transport algorithm is becoming increasingly important as the request for photorealistic images becomes a common trend. Although the Metropolis light transport algorithm has produced some of the most realistic images to date, it usually takes a great amount of time to render an image. The development of an efficient and highly scalable ray tracer for the Metropolis light transport algorithm is hard due in large part to the irregular memory access patterns, the imbalanced workload of light-carrying paths and the complicated mathematical model and complex physical processes. In this paper, we present a highly scalable physically based parallel ray tracer for the Metropolis light transport algorithm. Firstly, we present the idea of snapshot and sub-snapshot, then propose a novel assignment partitioning algorithm for compute nodes and CPU cores since the demand-driven assignment partitioning algorithms don't work. Secondly, we propose a physically based parallel ray racing framework for the Metropolis light transport algorithm, which is based on a master-worker architecture. Finally, we discuss the issue of granularity of the assignment partitioning and some optimization strategies for improving overall performance, then a hybrid scheduling strategy combining a static and dynamic scheduling strategy is described. Experiments show that our physically based ray tracer almost reaches linear speedup by using 26,400 CPU cores on the Tianhe-2 supercomputer. Our ray tracer is more efficient and highly scalable.

随着对逼真图像的要求成为一种普遍趋势，为Metropolis光传输算法开发一种高效且高度可扩展的光线追踪器变得越来越重要。尽管Metropolis光传输算法已经产生了一些迄今为止最逼真的图像，但渲染图像通常需要花费大量时间。对于Metropolis光传输算法，由于内存访问模式不规范、光传输路径负载不平衡以及复杂的数学模型和物理过程，开发高效、高可扩展性的光线追踪器是一项艰巨的任务。在本文中，我们提出了一个高度可扩展的基于物理的平行光线追踪器，用于Metropolis光传输算法。首先，我们提出了快照和子快照的思想，然后针对需求驱动的分配分区算法不能工作的情况，提出了一种新的计算节点和CPU内核分配分区算法。其次，我们提出了一种基于主工架构的Metropolis光传输算法的物理并行光线竞速框架。最后，讨论了分配分区的粒度问题和提高整体性能的优化策略，并提出了一种静态和动态混合调度策略。实验表明，在天河二号超级计算机上使用26400个CPU内核，我们的物理光线追踪器几乎达到了线性加速。我们的光线追踪器更高效，可高度扩展。

{"title":"Physically based parallel ray tracer for the Metropolis light transport algorithm on the Tianhe-2 supercomputer","authors":"Changmao Wu, Yunquan Zhang, Congli Yang, Yutong Lu","doi":"10.1109/PADSW.2014.7097840","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097840","url":null,"abstract":"Developing an efficient and highly scalable ray tracer for the Metropolis light transport algorithm is becoming increasingly important as the request for photorealistic images becomes a common trend. Although the Metropolis light transport algorithm has produced some of the most realistic images to date, it usually takes a great amount of time to render an image. The development of an efficient and highly scalable ray tracer for the Metropolis light transport algorithm is hard due in large part to the irregular memory access patterns, the imbalanced workload of light-carrying paths and the complicated mathematical model and complex physical processes. In this paper, we present a highly scalable physically based parallel ray tracer for the Metropolis light transport algorithm. Firstly, we present the idea of snapshot and sub-snapshot, then propose a novel assignment partitioning algorithm for compute nodes and CPU cores since the demand-driven assignment partitioning algorithms don't work. Secondly, we propose a physically based parallel ray racing framework for the Metropolis light transport algorithm, which is based on a master-worker architecture. Finally, we discuss the issue of granularity of the assignment partitioning and some optimization strategies for improving overall performance, then a hybrid scheduling strategy combining a static and dynamic scheduling strategy is described. Experiments show that our physically based ray tracer almost reaches linear speedup by using 26,400 CPU cores on the Tianhe-2 supercomputer. Our ray tracer is more efficient and highly scalable.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123001694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

An improved simulated annealing heuristic for static partitioning of task graphs onto heterogeneous architectures 异构架构任务图静态分区的改进模拟退火启发式算法

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097796

Aravind Vasudevan, Avinash Malik, David Gregg

We present a simulated annealing based partitioning technique for mapping task graphs, onto heterogeneous processing architectures. Task partitioning onto homogeneous architectures to minimize the makespan of a task graph, is a known NP-hard problem. Heterogeneity greatly complicates the aforementioned partitioning problem, thus making heuristic solutions essential. A number of heuristic approaches have been proposed, some using simulated annealing. We propose a simulated annealing method with a novel NEXT STATE function to enable exploration of different regions of the global search space when the annealing temperature is high and making the search more local as the temperature drops. The novelty of our approach is two fold: (1) we go a step further than the existing scientific literature, considering heterogeneity at levels of task parallelism, data parallelism and communication. (2) We present a novel algorithm that uses simulated annealing to find better partitions in the presence of heterogeneous architectures, data parallel execution units, and significant data communication costs. We conduct a statistical analysis of the performance of the proposed method, which shows that our approach clearly outperforms the existing simulated annealing method.

我们提出了一种基于模拟退火的分区技术，用于将任务图映射到异构处理架构上。将任务划分到同构架构上以最小化任务图的最大跨度是一个已知的np困难问题。异构性极大地使上述划分问题复杂化，因此使启发式解决方案必不可少。已经提出了许多启发式方法，其中一些使用模拟退火。我们提出了一种具有新颖NEXT STATE函数的模拟退火方法，可以在退火温度较高时探索全局搜索空间的不同区域，并随着温度的降低使搜索更加局部化。我们的方法的新颖之处在于两个方面:(1)我们比现有的科学文献更进一步，考虑了任务并行性、数据并行性和通信水平的异质性。(2)我们提出了一种新的算法，该算法使用模拟退火在异构架构、数据并行执行单元和大量数据通信成本的情况下找到更好的分区。我们对所提出的方法的性能进行了统计分析，结果表明我们的方法明显优于现有的模拟退火方法。

{"title":"An improved simulated annealing heuristic for static partitioning of task graphs onto heterogeneous architectures","authors":"Aravind Vasudevan, Avinash Malik, David Gregg","doi":"10.1109/PADSW.2014.7097796","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097796","url":null,"abstract":"We present a simulated annealing based partitioning technique for mapping task graphs, onto heterogeneous processing architectures. Task partitioning onto homogeneous architectures to minimize the makespan of a task graph, is a known NP-hard problem. Heterogeneity greatly complicates the aforementioned partitioning problem, thus making heuristic solutions essential. A number of heuristic approaches have been proposed, some using simulated annealing. We propose a simulated annealing method with a novel NEXT STATE function to enable exploration of different regions of the global search space when the annealing temperature is high and making the search more local as the temperature drops. The novelty of our approach is two fold: (1) we go a step further than the existing scientific literature, considering heterogeneity at levels of task parallelism, data parallelism and communication. (2) We present a novel algorithm that uses simulated annealing to find better partitions in the presence of heterogeneous architectures, data parallel execution units, and significant data communication costs. We conduct a statistical analysis of the performance of the proposed method, which shows that our approach clearly outperforms the existing simulated annealing method.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126693550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

DASH: A duplication-aware flash cache architecture in virtualization environment DASH:虚拟化环境中的一种可识别重复的闪存缓存架构

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097893

Xian Chen, Wenzhi Chen, Shuiqiao Yang, Zhongyong Lu, Zonghui Wang

With the rapid development of multi-core and multi-threading technologies, the performance gap between CPU and storage system is widening year by year, causing the storage system to be the bottleneck of the whole system performance. To alleviate this situation, flash memory has been used as the caching device of HDDs. On the other hand, cloud computing is becoming more and more popular and mature in industry field. As the key building block of it, virtualization technology allows several virtual machines (VMs) running on one single physical machine simultaneously, most of which usually run the same or similar operating systems and applications. In this scenario, flash cache will be occupied by many duplicate data blocks. However, existing flash cache architectures and replacement policies don't take this observation into consideration, which greatly limits the efficient use of the flash cache. In this paper, we propose a new duplication-aware flash cache architecture (DASH). In this architecture, flash cache is organized to cache only one copy of the duplicate data blocks, which can notably expand the effective cache capacity, making more I/O requests hit in the cache. Moreover, this architecture can reduce the amount of data written to flash cache, and thus the life span of flash device can be significantly prolonged. Experiments based on realistic applications show that, in some situations, our cache architecture can improve the cache hit ratio by 5 times, reduce the average I/O latency by 63% and eliminate flash cache writes by 81%.

随着多核、多线程技术的飞速发展，CPU与存储系统的性能差距逐年拉大，使得存储系统成为整个系统性能的瓶颈。为了缓解这种情况，闪存被用作hdd的缓存设备。另一方面，云计算在工业领域越来越普及和成熟。作为它的关键构建块，虚拟化技术允许多个虚拟机(vm)同时运行在单个物理机上，其中大多数通常运行相同或类似的操作系统和应用程序。在这种情况下，flash缓存将被许多重复的数据块占用。然而，现有的闪存缓存架构和替换策略并没有考虑到这一点，这极大地限制了闪存缓存的有效利用。在本文中，我们提出了一种新的重复感知闪存缓存架构(DASH)。在这种体系结构中，闪存缓存被组织为只缓存重复数据块的一个副本，这可以显着扩展有效缓存容量，使更多的I/O请求命中缓存。此外，这种架构可以减少写入闪存缓存的数据量，从而可以显着延长闪存设备的寿命。基于实际应用的实验表明，在某些情况下，我们的缓存架构可以将缓存命中率提高5倍，平均I/O延迟降低63%，消除闪存缓存写81%。

{"title":"DASH: A duplication-aware flash cache architecture in virtualization environment","authors":"Xian Chen, Wenzhi Chen, Shuiqiao Yang, Zhongyong Lu, Zonghui Wang","doi":"10.1109/PADSW.2014.7097893","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097893","url":null,"abstract":"With the rapid development of multi-core and multi-threading technologies, the performance gap between CPU and storage system is widening year by year, causing the storage system to be the bottleneck of the whole system performance. To alleviate this situation, flash memory has been used as the caching device of HDDs. On the other hand, cloud computing is becoming more and more popular and mature in industry field. As the key building block of it, virtualization technology allows several virtual machines (VMs) running on one single physical machine simultaneously, most of which usually run the same or similar operating systems and applications. In this scenario, flash cache will be occupied by many duplicate data blocks. However, existing flash cache architectures and replacement policies don't take this observation into consideration, which greatly limits the efficient use of the flash cache. In this paper, we propose a new duplication-aware flash cache architecture (DASH). In this architecture, flash cache is organized to cache only one copy of the duplicate data blocks, which can notably expand the effective cache capacity, making more I/O requests hit in the cache. Moreover, this architecture can reduce the amount of data written to flash cache, and thus the life span of flash device can be significantly prolonged. Experiments based on realistic applications show that, in some situations, our cache architecture can improve the cache hit ratio by 5 times, reduce the average I/O latency by 63% and eliminate flash cache writes by 81%.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129145945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

EasiMG: A method of maximizing lifetime for group request in web-based sensor network EasiMG:一种基于web传感器网络的组请求生存期最大化方法

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097800

Chenying Hou, Dong Li, Li Cui

The lightweight RESTful protocols, such as CoAP and SeaHttp, have been proposed for web-based sensor network (WBSN), which provides web service on resource-constrained devices. In general, because sensing data are spatially correlated in sensor network, it is efficient to request a group of devices located in nearby area. Thus group requesting is a typical way to provide web service for resource-constrained devices in WBSN. However, it is a critical problem that how to make an optimal assignment of nodes for a group of requests to maximize network lifetime. In this paper, we address this problem in the scenario where nodes have different initial energy, and they can process in-network group request with branch and combine methods supported by SeaHttp. We prove this problem is NP-complete and transform the problem into an edge-weighted semi-matching problem in bipartite graph using the fat tree construction algorithm. Finally we propose an approximation algorithm to solve the problem. Simulation results show that our approach prolong lifetime of the network by 29.11% on average, which is more competitive when it is applied in a high concurrency scenario compared with traditional methods.

基于web的传感器网络(WBSN)在资源受限的设备上提供web服务，提出了基于rest的轻量级协议，如CoAP和SeaHttp。通常，由于传感器网络中的传感数据在空间上是相关的，因此请求位于附近区域的一组设备是高效的。因此，组请求是WBSN中为资源受限设备提供web服务的一种典型方式。然而，如何对一组请求进行节点的最优分配以使网络生存时间最大化是一个关键问题。本文在节点初始能量不同的情况下，利用SeaHttp支持的分支和组合方法处理网络内的组请求。我们证明了该问题是np完全的，并利用胖树构造算法将其转化为二部图的边加权半匹配问题。最后，我们提出了一种近似算法来解决这个问题。仿真结果表明，该方法使网络寿命平均延长29.11%，在高并发场景下与传统方法相比更具竞争力。

{"title":"EasiMG: A method of maximizing lifetime for group request in web-based sensor network","authors":"Chenying Hou, Dong Li, Li Cui","doi":"10.1109/PADSW.2014.7097800","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097800","url":null,"abstract":"The lightweight RESTful protocols, such as CoAP and SeaHttp, have been proposed for web-based sensor network (WBSN), which provides web service on resource-constrained devices. In general, because sensing data are spatially correlated in sensor network, it is efficient to request a group of devices located in nearby area. Thus group requesting is a typical way to provide web service for resource-constrained devices in WBSN. However, it is a critical problem that how to make an optimal assignment of nodes for a group of requests to maximize network lifetime. In this paper, we address this problem in the scenario where nodes have different initial energy, and they can process in-network group request with branch and combine methods supported by SeaHttp. We prove this problem is NP-complete and transform the problem into an edge-weighted semi-matching problem in bipartite graph using the fat tree construction algorithm. Finally we propose an approximation algorithm to solve the problem. Simulation results show that our approach prolong lifetime of the network by 29.11% on average, which is more competitive when it is applied in a high concurrency scenario compared with traditional methods.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128822417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

WorkQ: A many-core producer/consumer execution model applied to PGAS computations 一个应用于PGAS计算的多核生产者/消费者执行模型

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097863

David Ozog, A. Malony, J. Hammond, P. Balaji

Partitioned global address space (PGAS) applications, such as the Tensor Contraction Engine (TCE) in NWChem, often apply a one-process-per-core mapping in which each process iterates through the following work-processing cycle: (1) determine a work-item dynamically, (2) get data via one-sided operations on remote blocks, (3) perform computation on the data locally, (4) put (or accumulate) resultant data into an appropriate remote location, and (5) repeat the cycle. However, this simple flow of execution does not effectively hide communication latency costs despite the opportunities for making asynchronous progress. Utilizing nonblocking communication calls is not sufficient unless care is taken to efficiently manage a responsive queue of outstanding communication requests. This paper presents a new runtime model and its library implementation for managing tunable “work queues” in PGAS applications. Our runtime execution model, called WorkQ, assigns some number of on-node “producer” processes to primarily do communication (steps 1, 2, 4, and 5) and the other “consumer” processes to do computation (step 3); but processes can switch roles dynamically for the sake of performance. Load balance, synchronization, and overlap of communication and computation are facilitated by a tunable nodewise FIFO message queue protocol. Our WorkQ library implementation enables an MPI+X hybrid programming model where the X comprises SysV message queues and the user's choice of SysV, POSIX, and MPI shared memory. We develop a simplified software mini-application that mimics the performance behavior of the TCE at arbitrary scale, and we show that the WorkQ engine outperforms the original model by about a factor of 2. We also show performance improvement in the TCE coupled cluster module of NWChem.

分区全局地址空间(PGAS)应用程序，如NWChem中的张量收缩引擎(TCE)，通常采用一核一进程的映射，其中每个进程迭代以下工作处理周期:(1)动态确定工作项，(2)通过远程块上的单侧操作获取数据，(3)在本地执行数据计算，(4)将结果数据放入(或累积)到适当的远程位置，(5)重复此循环。然而，这个简单的执行流并不能有效地隐藏通信延迟成本，尽管有机会进行异步进程。利用非阻塞通信调用是不够的，除非注意有效地管理未完成通信请求的响应队列。本文提出了一种新的运行时模型及其库实现，用于管理PGAS应用程序中可调的“工作队列”。我们的运行时执行模型，称为WorkQ，分配了一些节点上的“生产者”进程来主要进行通信(步骤1、2、4和5)，其他“消费者”进程来进行计算(步骤3);但是进程可以为了性能而动态地切换角色。负载平衡、同步以及通信和计算的重叠由可调节点FIFO消息队列协议促进。我们的WorkQ库实现实现了MPI+X混合编程模型，其中X包括SysV消息队列和用户选择的SysV, POSIX和MPI共享内存。我们开发了一个简化的软件迷你应用程序，以任意规模模仿TCE的性能行为，并且我们表明，WorkQ引擎的性能比原始模型高出约2倍。我们还展示了NWChem的TCE耦合集群模块的性能改进。

{"title":"WorkQ: A many-core producer/consumer execution model applied to PGAS computations","authors":"David Ozog, A. Malony, J. Hammond, P. Balaji","doi":"10.1109/PADSW.2014.7097863","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097863","url":null,"abstract":"Partitioned global address space (PGAS) applications, such as the Tensor Contraction Engine (TCE) in NWChem, often apply a one-process-per-core mapping in which each process iterates through the following work-processing cycle: (1) determine a work-item dynamically, (2) get data via one-sided operations on remote blocks, (3) perform computation on the data locally, (4) put (or accumulate) resultant data into an appropriate remote location, and (5) repeat the cycle. However, this simple flow of execution does not effectively hide communication latency costs despite the opportunities for making asynchronous progress. Utilizing nonblocking communication calls is not sufficient unless care is taken to efficiently manage a responsive queue of outstanding communication requests. This paper presents a new runtime model and its library implementation for managing tunable “work queues” in PGAS applications. Our runtime execution model, called WorkQ, assigns some number of on-node “producer” processes to primarily do communication (steps 1, 2, 4, and 5) and the other “consumer” processes to do computation (step 3); but processes can switch roles dynamically for the sake of performance. Load balance, synchronization, and overlap of communication and computation are facilitated by a tunable nodewise FIFO message queue protocol. Our WorkQ library implementation enables an MPI+X hybrid programming model where the X comprises SysV message queues and the user's choice of SysV, POSIX, and MPI shared memory. We develop a simplified software mini-application that mimics the performance behavior of the TCE at arbitrary scale, and we show that the WorkQ engine outperforms the original model by about a factor of 2. We also show performance improvement in the TCE coupled cluster module of NWChem.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117121899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fault-Tolerant bi-directional communications in web-based applications 基于web的应用程序中的容错双向通信

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097891

N. Ivaki, Filipe Araújo

The Hypertext Transfer Protocol (HTTP) and the Transmission Control Protocol (TCP) are the most popular protocols used in the development of web-based applications. Despite their popularity, the use of these protocols brings two limitations to applications and systems that require reliable interactive real-time communications: 1) HTTP forces applications to work in a request-response paradigm, even if a reply is not necessary, not allowing the server to send anything to a client without the client explicitly requesting it; 2) TCP provides no recovery options for network outages, thus forcing developers to write their own error-prone, complex, and ad hoc solutions. In this paper we introduce a solution that offers both bi-directional and reliable communication to web-based applications, even in presence of connection failures. To make this possible, we combine the idea behind WebSockets and a Session-Based Fault-Tolerant design pattern.

超文本传输协议(HTTP)和传输控制协议(TCP)是基于web的应用程序开发中最常用的协议。尽管这些协议很流行，但它们的使用给需要可靠的交互式实时通信的应用程序和系统带来了两个限制:1)HTTP强制应用程序以请求-响应范式工作，即使不需要回复，也不允许服务器在没有客户端明确请求的情况下向客户端发送任何内容;2) TCP没有为网络中断提供恢复选项，因此迫使开发人员编写自己的易出错、复杂和特别的解决方案。在本文中，我们介绍了一种解决方案，即使在存在连接故障的情况下，也能为基于web的应用程序提供双向和可靠的通信。为了实现这一点，我们将WebSockets背后的思想与基于会话的容错设计模式结合起来。

引用次数: 1

Hand-to-Hand instant message communication: Revisiting Morse code 手对手的即时通讯:重温摩尔斯电码

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097823

N. Thepvilojanapong, H. Saito, K. Murase, Tsubasa Ito, Ryo Kanaoka, T. Leppänen, J. Riekki, Y. Tobe

In this paper, we propose a novel vibration-based communication system for Bluetooth-equipped smartphones. Smartphones are commonly utilized through the display; however, sometimes users would like to exchange pieces of information with nearby peers, without shifting their focus from current task. Additionally, this enables communication in cases without visual or sound contact. For this purpose, we developed a novel application called Hand-to-Hand on Bluetooth Communication (H2BCom). We use the common gestures of tapping and touching the smartphone display for sending a Morse coded message over a Bluetooth channel. On the receiving side, the Morse coded message is presented as vibration of the smartphone. We show the design, implementation and evaluation of H2BCom on Android phones. In our user study, we found that using the Morse code was difficult for beginners, but the users'skill improved after taking a tutorial of the Morse code.

在本文中，我们提出了一种新的基于振动的蓝牙智能手机通信系统。智能手机通常通过显示屏来使用;然而，有时用户希望与附近的同伴交换信息，而不转移他们对当前任务的注意力。此外，这使得在没有视觉或声音接触的情况下进行通信成为可能。为此，我们开发了一种新的应用程序，称为手对手蓝牙通信(H2BCom)。我们使用敲击和触摸智能手机显示屏的常见手势，通过蓝牙频道发送莫尔斯电码信息。在接收端，莫尔斯电码信息以智能手机振动的形式呈现。我们展示了H2BCom在Android手机上的设计、实现和评估。在我们的用户研究中，我们发现使用摩尔斯电码对初学者来说是困难的，但在学习了摩尔斯电码的教程后，用户的技能得到了提高。

引用次数: 2

An efficient implementation of PBKDF2 with RIPEMD-160 on multiple FPGAs PBKDF2的RIPEMD-160在多个fpga上的高效实现

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097841

Ayman Abbas, Rian Voss, Lars Wienbrandt, M. Schimmler

A weakness of many security systems is the strength of the chosen password or key derivation function. We show how FPGA technology can be used to effectively attack cryptographic applications with a password dictionary. We have implemented two independent PBKDF2 cores each using four HMAC cores with pipelines calculating a RIPEMD-160 hash to derive encryption keys together with one resource optimized AES-256 XTS core for direct decryption on a Xilinx Spartan6-LX150 FPGA. Our design targets TRUECRYPT containers, but may be applied to similar encryption tools with little adaption. In order to save resources and maximize speed, we have further optimized the RIPEMD-160 hash function for this purpose. Our design executed on the multi-FPGA system RIVYERA S6-LX150 containing 128 S6-LX150 FPGAs, finally reaches a peak performance of about 245,000 passwords per second.

许多安全系统的弱点是所选择的密码或密钥派生功能的强度。我们展示了FPGA技术如何使用密码字典有效地攻击加密应用程序。我们在Xilinx Spartan6-LX150 FPGA上实现了两个独立的PBKDF2内核，每个内核使用四个HMAC内核和管道计算RIPEMD-160哈希来导出加密密钥，以及一个资源优化的AES-256 XTS内核，用于直接解密。我们的设计目标是TRUECRYPT容器，但可以应用于类似的加密工具，几乎没有调整。为了节省资源和提高速度，我们为此进一步优化了RIPEMD-160哈希函数。我们的设计在包含128个S6-LX150 fpga的多fpga系统RIVYERA S6-LX150上执行，最终达到每秒约245,000个密码的峰值性能。

引用次数: 12

Simplifying index file structure to improve I/O performance of parallel indexing 简化索引文件结构，提高并行索引的I/O性能

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097856

Hsuan-Te Chiu, J. Chou, V. Vishwanath, S. Byna, Kesheng Wu

Complex indexing techniques are needed to reduce the time of analyzing massive scientific datasets, but generating these indexing data structures can be very time consuming. In this work, we propose a set of strategies to simplify the index file structure and to improve the I/O performance during index construction using FastQuery, which is a parallel indexing and querying system for scientific data. FastQuery has been used to analyze data from various scientific applications, including a trillion plasma particles simulation. To accelerate query process, FastQuery uses FastBit to build indexes, and then stores the indexes into file system through parallel scientific data format libraries, such as HDF5. Although these data format libraries are designed to support more complex multi-dimensional arrays, we observed that it still takes considerable work to map the indexing data structures into arrays, especially on parallel machines. To address this problem, in this paper, we attempt to minimize the I/O time by storing indexes into our self-defined binary data format. By fully controlling the data structure, we can minimize the I/O synchronization overhead and explore more efficient I/O strategy for storing indexes. Our experiments of indexing a trillion particle dataset using 20,000 cores of a supercomputer show that the proposed binary I/O driver can reach 85% of the peak I/O bandwidth on the system, and achieves a speedup of up to 4X in terms of the total execution time comparing to the previous FastQuery implementation with HDF5 I/O driver.

为了减少分析大量科学数据集的时间，需要使用复杂的索引技术，但是生成这些索引数据结构非常耗时。在这项工作中，我们提出了一套策略，以简化索引文件的结构，提高I/O性能在索引建设FastQuery，这是一个并行索引和查询系统的科学数据。FastQuery已被用于分析各种科学应用的数据，包括万亿等离子体粒子模拟。为了加快查询速度，FastQuery使用FastBit建立索引，然后通过并行的科学数据格式库(如HDF5)将索引存储到文件系统中。尽管这些数据格式库是为支持更复杂的多维数组而设计的，但我们发现，将索引数据结构映射到数组中仍然需要大量的工作，尤其是在并行机器上。为了解决这个问题，在本文中，我们尝试通过将索引存储到我们自定义的二进制数据格式中来最小化I/O时间。通过完全控制数据结构，我们可以最小化I/O同步开销，并探索更有效的I/O存储索引策略。我们使用超级计算机的20,000个核对1万亿个粒子数据集进行索引的实验表明，所提出的二进制I/O驱动程序可以达到系统峰值I/O带宽的85%，并且与之前使用HDF5 I/O驱动程序的FastQuery实现相比，在总执行时间方面实现了高达4倍的加速。

{"title":"Simplifying index file structure to improve I/O performance of parallel indexing","authors":"Hsuan-Te Chiu, J. Chou, V. Vishwanath, S. Byna, Kesheng Wu","doi":"10.1109/PADSW.2014.7097856","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097856","url":null,"abstract":"Complex indexing techniques are needed to reduce the time of analyzing massive scientific datasets, but generating these indexing data structures can be very time consuming. In this work, we propose a set of strategies to simplify the index file structure and to improve the I/O performance during index construction using FastQuery, which is a parallel indexing and querying system for scientific data. FastQuery has been used to analyze data from various scientific applications, including a trillion plasma particles simulation. To accelerate query process, FastQuery uses FastBit to build indexes, and then stores the indexes into file system through parallel scientific data format libraries, such as HDF5. Although these data format libraries are designed to support more complex multi-dimensional arrays, we observed that it still takes considerable work to map the indexing data structures into arrays, especially on parallel machines. To address this problem, in this paper, we attempt to minimize the I/O time by storing indexes into our self-defined binary data format. By fully controlling the data structure, we can minimize the I/O synchronization overhead and explore more efficient I/O strategy for storing indexes. Our experiments of indexing a trillion particle dataset using 20,000 cores of a supercomputer show that the proposed binary I/O driver can reach 85% of the peak I/O bandwidth on the system, and achieves a speedup of up to 4X in terms of the total execution time comparing to the previous FastQuery implementation with HDF5 I/O driver.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123487711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Distributed sensor device resource-object connection based on service delivery platform 基于服务交付平台的分布式传感器设备资源对象连接

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097900

Changwoo Yoon, K. Song

A system for providing a distributed device resource-object-connection service based on a service delivery platform (SDP) is described. The system includes an SDP and proxy. The SDP configures to define distributed service functions as enablers, generates a convergence service by combining the enablers, and provides the generated convergence service. The proxy configures to connect a distributed device and an SDP to allow the SDP to use the distributed device as a resource, and define and use the distributed device as an enabler. The system are capable of defining distributed service functions as well as distributed sensors as enablers, and thereby allowing the distributed sensors to be used in the same sense as service-function enablers.

描述了一种基于服务交付平台(SDP)的分布式设备资源-对象连接服务系统。系统包括SDP和代理。SDP配置将分布式业务功能定义为使能器，并将使能器组合生成聚合服务，提供生成的聚合服务。代理配置分布式设备和SDP之间的连接，允许SDP将分布式设备作为资源使用，并定义和使用分布式设备作为使能器。该系统能够将分布式服务功能和分布式传感器定义为使能器，从而允许在与服务功能使能器相同的意义上使用分布式传感器。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀