2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)最新文献

英文中文

Parallel discrete wavelet transform using the Open Computing Language: a performance and portability study 使用开放计算语言的并行离散小波变换:性能和可移植性研究

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Pub Date : 2010-04-19 DOI: 10.1109/IPDPSW.2010.5470830

Bharatkumar Sharma, N. Vydyanathan

The discrete wavelet transform (DWT) is a powerful signal processing technique used in the JPEG 2000 image compression standard. The multi-resolution sub-band encoding provided by DWT allows for higher compression ratios, avoids blocking artifacts and enables progressive transmission of images. However, these advantages come at the expense of additional computational complexity. Achieving real-time or interactive compression/de-compression speeds, therefore, requires a fast implementation of DWT that leverages emerging parallel hardware systems. In this paper, we develop an optimized parallel implementation of the lifting-based DWT algorithm using the recently proposed Open Computing Language (OpenCL). OpenCL is a standard for cross-platform parallel programming of heterogeneous systems comprising of multi-core CPUs, GPUs and other accelerators. We explore the potential of OpenCL in accelerating the DWT computation and analyze the programmability, portability and performance aspects of this language. Our experimental analysis is done using NVIDIA's and AMD's drivers that support OpenCL.

离散小波变换(DWT)是JPEG 2000图像压缩标准中使用的一种功能强大的信号处理技术。DWT提供的多分辨率子带编码允许更高的压缩比，避免阻塞伪影，并允许图像的渐进传输。然而，这些优势是以额外的计算复杂性为代价的。因此，实现实时或交互式压缩/解压缩速度需要利用新兴并行硬件系统的DWT快速实现。在本文中，我们使用最近提出的开放计算语言(OpenCL)开发了基于提升的DWT算法的优化并行实现。OpenCL是由多核cpu、gpu和其他加速器组成的异构系统的跨平台并行编程标准。我们探索了OpenCL在加速DWT计算方面的潜力，并分析了该语言的可编程性、可移植性和性能方面。我们的实验分析是使用支持OpenCL的NVIDIA和AMD驱动程序完成的。

{"title":"Parallel discrete wavelet transform using the Open Computing Language: a performance and portability study","authors":"Bharatkumar Sharma, N. Vydyanathan","doi":"10.1109/IPDPSW.2010.5470830","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470830","url":null,"abstract":"The discrete wavelet transform (DWT) is a powerful signal processing technique used in the JPEG 2000 image compression standard. The multi-resolution sub-band encoding provided by DWT allows for higher compression ratios, avoids blocking artifacts and enables progressive transmission of images. However, these advantages come at the expense of additional computational complexity. Achieving real-time or interactive compression/de-compression speeds, therefore, requires a fast implementation of DWT that leverages emerging parallel hardware systems. In this paper, we develop an optimized parallel implementation of the lifting-based DWT algorithm using the recently proposed Open Computing Language (OpenCL). OpenCL is a standard for cross-platform parallel programming of heterogeneous systems comprising of multi-core CPUs, GPUs and other accelerators. We explore the potential of OpenCL in accelerating the DWT computation and analyze the programmability, portability and performance aspects of this language. Our experimental analysis is done using NVIDIA's and AMD's drivers that support OpenCL.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126956408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

GPU-accelerated multi-scoring functions protein loop structure sampling gpu加速多重评分功能蛋白环结构采样

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Pub Date : 2010-04-19 DOI: 10.1109/IPDPSW.2010.5470901

Yaohang Li, Weihang Zhu

Accurate protein loop structure models are important to understand functions of many proteins. One of the main problems in correctly modeling protein loop structures is sampling the large loop backbone conformation space, particularly when the loop is long. In this paper, we present a GPU-accelerated loop backbone structure modeling approach by sampling multiple scoring functions based on pair-wise atom distance, torsion angles of triplet residues, or soft-sphere van der Waals potential. The sampling program implemented on a heterogeneous CPU-GPU platform has observed a speedup of ∼40 in sampling long loops, which enables the sampling process to carry out computation with large population size. The GPU-accelerated multi-scoring functions loop structure sampling allows fast generation of decoy sets composed of structurally-diversified backbone decoys with various compromises of multiple scoring functions. In the 53 long loop benchmark targets we tested, our computational results show that in more than 90% of the targets, the decoy sets we generated include decoys within 1.5A RMSD (Root Mean Square Deviation) from native while in 77% of the targets, decoys in 1.0A RMSD are reached.

准确的蛋白质环结构模型对于理解许多蛋白质的功能具有重要意义。正确建模蛋白质环结构的主要问题之一是对大环主链构象空间进行采样，特别是当环很长时。本文提出了一种基于双原子距离、三重态残基扭转角或软球范德华势的多重评分函数采样的gpu加速环骨架结构建模方法。在异构CPU-GPU平台上实现的采样程序在采样长循环中观察到约40的加速，这使得采样过程能够进行大人口规模的计算。gpu加速的多计分函数循环结构采样可以快速生成由结构多样化的骨干诱饵组成的诱饵集，并对多个计分函数进行各种折衷。在我们测试的53个长回路基准目标中，我们的计算结果表明，在超过90%的目标中，我们生成的诱饵集包括与原始目标1.5A RMSD(均方根偏差)以内的诱饵集，而在77%的目标中，诱饵集达到了1.0A RMSD。

{"title":"GPU-accelerated multi-scoring functions protein loop structure sampling","authors":"Yaohang Li, Weihang Zhu","doi":"10.1109/IPDPSW.2010.5470901","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470901","url":null,"abstract":"Accurate protein loop structure models are important to understand functions of many proteins. One of the main problems in correctly modeling protein loop structures is sampling the large loop backbone conformation space, particularly when the loop is long. In this paper, we present a GPU-accelerated loop backbone structure modeling approach by sampling multiple scoring functions based on pair-wise atom distance, torsion angles of triplet residues, or soft-sphere van der Waals potential. The sampling program implemented on a heterogeneous CPU-GPU platform has observed a speedup of ∼40 in sampling long loops, which enables the sampling process to carry out computation with large population size. The GPU-accelerated multi-scoring functions loop structure sampling allows fast generation of decoy sets composed of structurally-diversified backbone decoys with various compromises of multiple scoring functions. In the 53 long loop benchmark targets we tested, our computational results show that in more than 90% of the targets, the decoy sets we generated include decoys within 1.5A RMSD (Root Mean Square Deviation) from native while in 77% of the targets, decoys in 1.0A RMSD are reached.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115332356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A multi-threaded approach for data-flow analysis 用于数据流分析的多线程方法

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Pub Date : 2010-04-19 DOI: 10.1109/IPDPSW.2010.5470818

Marcus Edvinsson, Welf Löwe

Program analysis supporting software development is often part of edit-compile-cycles, and precise program analysis is time consuming. With the availability of parallel processing power on desktop computers, parallelization is a way to speed up program analysis. This requires a parallel data-flow analysis with sufficient work for each processing unit. The present paper suggests such an approach for object-oriented programs analyzing the target methods of polymorphic calls in parallel. With carefully selected thresholds guaranteeing sufficient work for the parallel threads and only little redundancy between them, this approach achieves a maximum speed-up of 5 (average 1.78) on 8 cores for the benchmark programs.

支持软件开发的程序分析通常是编辑-编译-周期的一部分，而精确的程序分析非常耗时。随着台式计算机上并行处理能力的可用性，并行化是一种加速程序分析的方法。这需要并行数据流分析，并为每个处理单元提供足够的工作。本文提出了一种面向对象程序并行分析多态调用目标方法的方法。通过精心选择的阈值来保证并行线程有足够的工作，并且它们之间只有很少的冗余，这种方法在基准测试程序的8核上实现了5(平均1.78)的最大加速。

引用次数: 3

A survey on bee colony algorithms 蜂群算法研究综述

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Pub Date : 2010-04-19 DOI: 10.1109/IPDPSW.2010.5470701

S. Bitam, M. Batouche, E. Talbi

This paper presents a survey of current research activities inspired by bee life. This work is intended to provide a broad and comprehensive view of the various principles and applications of these bio-inspired systems. We propose to classify them into two major models. The first one is based on the foraging behavior in the bee quotidian life and the second is inspired by the marriage principle. Different original studies are described and classified along with their applications, comparisons against other approaches and results. We then summarize a review of their derived algorithms and research efforts.

本文介绍了目前由蜜蜂生命启发的研究活动的概况。这项工作旨在为这些生物启发系统的各种原理和应用提供一个广泛而全面的观点。我们建议将它们分为两大模型。前者是基于蜜蜂日常生活中的觅食行为，后者是受到婚姻原则的启发。不同的原始研究描述和分类连同他们的应用，与其他方法和结果的比较。然后，我们总结了他们的衍生算法和研究工作的回顾。

引用次数: 86

Support of cross calls between a microprocessor and FPGA in CPU-FPGA coupling architecture 在CPU-FPGA耦合架构中支持微处理器和FPGA之间的交叉调用

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Pub Date : 2010-04-19 DOI: 10.1109/IPDPSW.2010.5470741

Giang Nguyen Thi Huong, S. Kim

The coupling architecture containing an FPGA device and a microprocessor has been widely used to accelerate microprocessor execution. Therefore, there have been intensive researches about synthesizing high-level programming languages (HLL) such as C and C++ into HW in the high-level synthesis community in order to make the work of reconfiguring the FPGA easier. However, the difference in a calling method in terms of semantics between HDLs and HLLs makes their interface implementation very difficult. This paper presents a novel communication framework between a microprocessor and FPGA, which allows the full implementation of cross calls between SW and HW and even recursive calls in HW without any limitation. We show that our proposed calling overhead is very small. With our communication framework, hardware components inside the FPGA are no longer isolated accelerators, and they can work as other master components in a system configuration.

包含FPGA器件和微处理器的耦合体系结构已被广泛用于加速微处理器的执行。因此，将C、c++等高级编程语言(high-level programming language, HLL)合成为HW，以使FPGA的重新配置工作变得更加容易，在高级合成社区中进行了大量的研究。然而，HDLs和hls之间调用方法在语义方面的差异使得它们的接口实现非常困难。本文提出了一种新的微处理器和FPGA之间的通信框架，它可以完全实现软件和硬件之间的交叉调用，甚至在硬件中不受任何限制地递归调用。我们展示了我们建议的调用开销非常小。使用我们的通信框架，FPGA内部的硬件组件不再是孤立的加速器，它们可以作为系统配置中的其他主组件工作。

引用次数: 4

User level DB: a debugging API for user-level thread libraries 用户级DB:用户级线程库的调试API

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Pub Date : 2010-04-19 DOI: 10.1109/IPDPSW.2010.5470815

K. Pouget, Marc Pérache, Patrick Carribault, H. Jourdren

With the advent of the multicore era, parallel programming is becoming ubiquitous. Multithreading is a common approach to benefit from these architectures. Hybrid M:N libraries like MultiProcessor Communication (MPC) or MARCEL reach high performance expressing fine-grain parallelism by mapping M user-level threads onto N kernel-level threads. However, such implementations skew the debuggers' ability to distinguish one thread from another, because only kernel threads can be handled. SUN MICROSYSTEMS' THREAD_DB API is an interface between the debugger and the thread library allowing the debugger to inquire for thread semantics details. In this paper we introduce the USER LEVEL DB (ULDB) library, an implementation of the THREAD_DB interface abstracting the common features of user-level thread libraries. ULDB gathers the generic algorithms required to debug threads and provide the thread library with a small and focused interface. We describe the usage of our library with widely-used debuggers (GDB, DBX) and the integration into a user-level thread library (GNUPTH) and two high-performance hybrid libraries (MPC, MARCEL).

随着多核时代的到来，并行编程变得无处不在。多线程是从这些体系结构中获益的一种常见方法。混合M:N类库，如多处理器通信(MPC)或MARCEL，通过将M个用户级线程映射到N个内核级线程，实现了高性能，表达了细粒度的并行性。然而，这样的实现使调试器无法区分线程，因为只能处理内核线程。SUN MICROSYSTEMS的THREAD_DB API是调试器和线程库之间的接口，允许调试器查询线程语义细节。本文介绍了用户级数据库(ULDB)库，它是THREAD_DB接口的一个实现，抽象了用户级线程库的共同特征。ULDB收集调试线程所需的通用算法，并为线程库提供一个小而集中的接口。我们描述了我们的库与广泛使用的调试器(GDB, DBX)的使用情况，以及与用户级线程库(GNUPTH)和两个高性能混合库(MPC, MARCEL)的集成。

{"title":"User level DB: a debugging API for user-level thread libraries","authors":"K. Pouget, Marc Pérache, Patrick Carribault, H. Jourdren","doi":"10.1109/IPDPSW.2010.5470815","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470815","url":null,"abstract":"With the advent of the multicore era, parallel programming is becoming ubiquitous. Multithreading is a common approach to benefit from these architectures. Hybrid M:N libraries like MultiProcessor Communication (MPC) or MARCEL reach high performance expressing fine-grain parallelism by mapping M user-level threads onto N kernel-level threads. However, such implementations skew the debuggers' ability to distinguish one thread from another, because only kernel threads can be handled. SUN MICROSYSTEMS' THREAD_DB API is an interface between the debugger and the thread library allowing the debugger to inquire for thread semantics details. In this paper we introduce the USER LEVEL DB (ULDB) library, an implementation of the THREAD_DB interface abstracting the common features of user-level thread libraries. ULDB gathers the generic algorithms required to debug threads and provide the thread library with a small and focused interface. We describe the usage of our library with widely-used debuggers (GDB, DBX) and the integration into a user-level thread library (GNUPTH) and two high-performance hybrid libraries (MPC, MARCEL).","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122052621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

High-level synthesis techniques for in-circuit assertion-based verification 基于断言的在线验证的高级综合技术

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Pub Date : 2010-04-19 DOI: 10.1109/IPDPSW.2010.5470747

J. Curreri, G. Stitt, A. George

Field-Programmable Gate Arrays (FPGAs) are increasingly employed in both high-performance computing and embedded systems due to performance and power advantages compared to microprocessors. However, widespread usage of FPGAs has been limited by increased design complexity. Highlevel synthesis has reduced this complexity but often relies on inaccurate software simulation or lengthy register-transfer-level simulations for verification and debugging, which is unattractive to software developers. In this paper, we present high-level synthesis techniques that allow application designers to efficiently synthesize ANSI-C assertions into FPGA circuits, enabling real-time verification and debugging of circuits generated from highlevel languages, while executing in the actual FPGA environment. Although not appropriate for all systems (e.g., safety-critical systems), the proposed techniques enable software developers to rapidly verify and debug FPGA applications, while reducing frequency by less than 3% and increasing FPGA resource utilization by less than 0.13% for several application case studies on an Altera Stratix-II EP2S180 using Impulse-C. The presented techniques reduced area overhead by as much as 3x and improved assertion performance by as much as 100% compared to unoptimized in-circuit assertions.

与微处理器相比，现场可编程门阵列(fpga)由于其性能和功耗优势，越来越多地应用于高性能计算和嵌入式系统中。然而，fpga的广泛使用受到设计复杂性增加的限制。高级合成降低了这种复杂性，但通常依赖于不准确的软件模拟或冗长的寄存器-传输级模拟来进行验证和调试，这对软件开发人员来说是没有吸引力的。在本文中，我们提出了高级合成技术，使应用程序设计人员能够有效地将ANSI-C断言合成到FPGA电路中，从而在实际的FPGA环境中执行时，能够实时验证和调试由高级语言生成的电路。虽然并不适用于所有系统(例如，安全关键系统)，但所提出的技术使软件开发人员能够快速验证和调试FPGA应用程序，同时在使用Impulse-C的Altera Stratix-II EP2S180的几个应用案例研究中，频率降低不到3%，FPGA资源利用率提高不到0.13%。与未优化的在线断言相比，所介绍的技术将面积开销减少了3倍，并将断言性能提高了100%。

{"title":"High-level synthesis techniques for in-circuit assertion-based verification","authors":"J. Curreri, G. Stitt, A. George","doi":"10.1109/IPDPSW.2010.5470747","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470747","url":null,"abstract":"Field-Programmable Gate Arrays (FPGAs) are increasingly employed in both high-performance computing and embedded systems due to performance and power advantages compared to microprocessors. However, widespread usage of FPGAs has been limited by increased design complexity. Highlevel synthesis has reduced this complexity but often relies on inaccurate software simulation or lengthy register-transfer-level simulations for verification and debugging, which is unattractive to software developers. In this paper, we present high-level synthesis techniques that allow application designers to efficiently synthesize ANSI-C assertions into FPGA circuits, enabling real-time verification and debugging of circuits generated from highlevel languages, while executing in the actual FPGA environment. Although not appropriate for all systems (e.g., safety-critical systems), the proposed techniques enable software developers to rapidly verify and debug FPGA applications, while reducing frequency by less than 3% and increasing FPGA resource utilization by less than 0.13% for several application case studies on an Altera Stratix-II EP2S180 using Impulse-C. The presented techniques reduced area overhead by as much as 3x and improved assertion performance by as much as 100% compared to unoptimized in-circuit assertions.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129313289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Mobile-friendly Peer-to-Peer client routing using out-of-band signaling 移动友好的点对点客户端路由使用带外信令

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Pub Date : 2010-04-19 DOI: 10.1109/IPDPSW.2010.5470936

Wei Wu, J. Womack, Xinhua Ling

It is expected that Peer-to-Peer (P2P) services will co-exist with the client-server based services such as IMS. Mobile users may subscribe to the traditional wireless cellular services while participating in P2P overlay networks. In this paper, a method is proposed to reduce the signaling overhead in a mobile P2P system. With the help of the underlying infrastructure, a mobile device in the P2P overlay can be located using out-of-band non-P2P signaling. This reduces its P2P signaling for location update while a mobile device is changing the point of attachment in the P2P overlay. As the signaling cost depends on both the client's mobility and traffic models, an analytical model has been developed to determine the optimal threshold for the registration update. Analytical results have shown that the proposed method could save up to 70% signaling cost when the Call-to-Mobility Ratio (CMR) is low. On the other hand, it would be better to fall back to the base client routing method when the CMR is high, i.e., perform the registration update whenever the client changes the point of attachment in the P2P overlay.

预计点对点(P2P)服务将与基于客户机-服务器的服务(如IMS)共存。移动用户可以在参与P2P覆盖网络的同时订阅传统的无线蜂窝服务。本文提出了一种减少移动P2P系统信令开销的方法。在底层基础设施的帮助下，可以使用带外非P2P信令对P2P覆盖中的移动设备进行定位。这减少了它的P2P信令的位置更新，而移动设备正在改变点的附着在P2P覆盖。由于信令成本取决于客户端的移动性和流量模型，因此开发了一个分析模型来确定注册更新的最佳阈值。分析结果表明，当CMR较低时，该方法可节省高达70%的信令成本。另一方面，当CMR较高时，最好退回到基本客户端路由方法，即，每当客户端更改P2P覆盖中的附件点时，执行注册更新。

{"title":"Mobile-friendly Peer-to-Peer client routing using out-of-band signaling","authors":"Wei Wu, J. Womack, Xinhua Ling","doi":"10.1109/IPDPSW.2010.5470936","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470936","url":null,"abstract":"It is expected that Peer-to-Peer (P2P) services will co-exist with the client-server based services such as IMS. Mobile users may subscribe to the traditional wireless cellular services while participating in P2P overlay networks. In this paper, a method is proposed to reduce the signaling overhead in a mobile P2P system. With the help of the underlying infrastructure, a mobile device in the P2P overlay can be located using out-of-band non-P2P signaling. This reduces its P2P signaling for location update while a mobile device is changing the point of attachment in the P2P overlay. As the signaling cost depends on both the client's mobility and traffic models, an analytical model has been developed to determine the optimal threshold for the registration update. Analytical results have shown that the proposed method could save up to 70% signaling cost when the Call-to-Mobility Ratio (CMR) is low. On the other hand, it would be better to fall back to the base client routing method when the CMR is high, i.e., perform the registration update whenever the client changes the point of attachment in the P2P overlay.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124562134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BlobSeer: Efficient data management for data-intensive applications distributed at large-scale BlobSeer:大规模分布的数据密集型应用程序的高效数据管理

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Pub Date : 2010-04-19 DOI: 10.1109/IPDPSW.2010.5470802

Bogdan Nicolae, Gabriel Antoniu, L. Bougé

As the rate, scale and variety of data increases in complexity, the need for flexible applications that can crunch huge amounts of heterogeneous data fast and cost-effective is of utmost importance. Such applications are data-intensive: in a typical scenario, they continuously acquire massive datasets (e.g. by crawling the Web or analyzing access logs) while performing computations over these changing datasets (e.g. building up-to-date search indexes). In order to achieve scalability and performance, data acquisitions and computations need to be distributed at large scale in infrastructures comprising hundreds and thousands of machines. As these applications focus on data rather then on computation, a heavy burden is put on the storage service employed to handle data management, because it must efficiently deal with massively parallel data accesses. In order to achieve this, a series of issues need to be address properly: scalable aggregation of storage space from the participating nodes with minimal overhead, the ability to store huge data objects, efficient fine-grain access to data subsets, high throughput even under heavy access concurrency, versioning, as well as fault tolerance and a high quality of service for access throughput. This paper introduces BlobSeer, an efficient distributed data management service that addresses the issues presented above. In BlobSeer, long sequences of bytes representing unstructured data are called blobs (Binary Large OBject).

随着数据的速率、规模和种类的复杂性不断增加，对能够快速且经济高效地处理大量异构数据的灵活应用程序的需求变得至关重要。这样的应用程序是数据密集型的:在一个典型的场景中，它们在对这些不断变化的数据集执行计算(例如，构建最新的搜索索引)的同时，不断获取大量数据集(例如，通过抓取Web或分析访问日志)。为了实现可伸缩性和性能，数据采集和计算需要大规模地分布在由成百上千台机器组成的基础设施中。由于这些应用程序关注数据而不是计算，因此用于处理数据管理的存储服务负担沉重，因为它必须有效地处理大规模并行数据访问。为了实现这一目标，需要正确解决一系列问题:以最小的开销从参与节点可伸缩的存储空间聚合，存储巨大数据对象的能力，对数据子集的高效细粒度访问，即使在高访问并发性下的高吞吐量，版本控制，以及容错和高质量的访问吞吐量服务。本文介绍了BlobSeer，它是一种高效的分布式数据管理服务，可以解决上述问题。在BlobSeer中，表示非结构化数据的长字节序列称为blob(二进制大对象)。

{"title":"BlobSeer: Efficient data management for data-intensive applications distributed at large-scale","authors":"Bogdan Nicolae, Gabriel Antoniu, L. Bougé","doi":"10.1109/IPDPSW.2010.5470802","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470802","url":null,"abstract":"As the rate, scale and variety of data increases in complexity, the need for flexible applications that can crunch huge amounts of heterogeneous data fast and cost-effective is of utmost importance. Such applications are data-intensive: in a typical scenario, they continuously acquire massive datasets (e.g. by crawling the Web or analyzing access logs) while performing computations over these changing datasets (e.g. building up-to-date search indexes). In order to achieve scalability and performance, data acquisitions and computations need to be distributed at large scale in infrastructures comprising hundreds and thousands of machines. As these applications focus on data rather then on computation, a heavy burden is put on the storage service employed to handle data management, because it must efficiently deal with massively parallel data accesses. In order to achieve this, a series of issues need to be address properly: scalable aggregation of storage space from the participating nodes with minimal overhead, the ability to store huge data objects, efficient fine-grain access to data subsets, high throughput even under heavy access concurrency, versioning, as well as fault tolerance and a high quality of service for access throughput. This paper introduces BlobSeer, an efficient distributed data management service that addresses the issues presented above. In BlobSeer, long sequences of bytes representing unstructured data are called blobs (Binary Large OBject).","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125161985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Scheduling complex streaming applications on the Cell processor 在Cell处理器上调度复杂的流应用程序

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Pub Date : 2010-04-19 DOI: 10.1109/IPDPSW.2010.5470684

M. Gallet, M. Jacquelin, L. Marchal

In this paper, we consider the problem of scheduling streaming applications described by complex task graphs on a heterogeneous multicore processor, the STI Cell BE processor. We first present a theoretical model of the Cell processor. Then, we use this model to express the problem of maximizing the throughput of a streaming application on this processor. Although the problem is proven NP-complete, we present an optimal solution based on mixed linear programming. This allows us to compute the optimal mapping for a number of applications, ranging from a real audio encoder to complex random task graphs. These mappings are then tested on two platforms embedding Cell processors, and compared to simple heuristic solutions. We show that we are able to achieve a good speed-up, whereas the heuristic solutions generally fail to deal with the strong memory and communication constraints.

在本文中，我们考虑了在异构多核处理器STI Cell BE处理器上由复杂任务图描述的流应用程序调度问题。我们首先提出了Cell处理器的理论模型。然后，我们使用该模型来表达该处理器上流应用程序的吞吐量最大化问题。尽管该问题被证明是np完全的，但我们提出了一个基于混合线性规划的最优解。这允许我们计算许多应用程序的最佳映射，范围从真实的音频编码器到复杂的随机任务图。然后在两个嵌入Cell处理器的平台上测试这些映射，并与简单的启发式解决方案进行比较。我们表明，我们能够实现良好的加速，而启发式解决方案通常无法处理强内存和通信约束。

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀