2015 IEEE International Parallel and Distributed Processing Symposium Workshop最新文献

英文中文

Towards Context-Aware DNA Sequence Compression for Efficient Data Exchange 面向上下文感知DNA序列压缩的高效数据交换

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.89

Wajeeta Lohana, J. Shamsi, T. Syed, Farrukh Hasan

DNA sequencing has emerged as one of the principal research directions in systems biology because of its usefulness in predicting the provenance of disease but also has profound impact in other fields like biotechnology, biological systematic and forensic medicine. The experiments in high throughput DNA sequencing technology are notorious for generating DNA sequences in huge quantities, and this poses a challenge in the computation, storage and exchange of sequence data. Computing on the Cloud helps mitigate the first two challenges because it gives on-demand machines through which we are able to save cost and it gives flexibility to balance the load, both computation- and storage-wise. The problem with data exchange could be mitigated to an extent through the use of data compression. This work proposes a context-aware framework that decides the compression algorithm which can minimize the time-to-completion and efficiently utilize the resources by performing experiments on different Cloud and algorithm combinations and configurations. The results obtained from this framework and experimental setup shows that DNAX is better than rest of the algorithms in any context, but if the file size is less than 50kb then one can go for CTW or Gencompress. The Gzip algorithm which is used in the NCBI repository to store the sequences has the worst compression ratio and time.

DNA测序已成为系统生物学的主要研究方向之一，因为它可以预测疾病的起源，而且在生物技术、生物系统医学和法医学等其他领域也产生了深远的影响。高通量DNA测序技术的实验以产生大量的DNA序列而闻名，这给序列数据的计算、存储和交换带来了挑战。云计算有助于缓解前两个挑战，因为它提供了按需机器，通过它我们能够节省成本，并且它提供了平衡负载的灵活性，无论是计算还是存储方面。数据交换的问题可以通过使用数据压缩在一定程度上得到缓解。本工作提出了一个上下文感知框架，该框架通过在不同云和算法组合和配置上进行实验，决定压缩算法，该算法可以最大限度地减少完成时间并有效利用资源。从这个框架和实验设置中获得的结果表明，DNAX在任何情况下都比其他算法更好，但如果文件大小小于50kb，则可以选择CTW或Gencompress。NCBI存储库中用于存储序列的Gzip算法具有最差的压缩比和时间。

{"title":"Towards Context-Aware DNA Sequence Compression for Efficient Data Exchange","authors":"Wajeeta Lohana, J. Shamsi, T. Syed, Farrukh Hasan","doi":"10.1109/IPDPSW.2015.89","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.89","url":null,"abstract":"DNA sequencing has emerged as one of the principal research directions in systems biology because of its usefulness in predicting the provenance of disease but also has profound impact in other fields like biotechnology, biological systematic and forensic medicine. The experiments in high throughput DNA sequencing technology are notorious for generating DNA sequences in huge quantities, and this poses a challenge in the computation, storage and exchange of sequence data. Computing on the Cloud helps mitigate the first two challenges because it gives on-demand machines through which we are able to save cost and it gives flexibility to balance the load, both computation- and storage-wise. The problem with data exchange could be mitigated to an extent through the use of data compression. This work proposes a context-aware framework that decides the compression algorithm which can minimize the time-to-completion and efficiently utilize the resources by performing experiments on different Cloud and algorithm combinations and configurations. The results obtained from this framework and experimental setup shows that DNAX is better than rest of the algorithms in any context, but if the file size is less than 50kb then one can go for CTW or Gencompress. The Gzip algorithm which is used in the NCBI repository to store the sequences has the worst compression ratio and time.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127203497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Empowering Fast Incremental Computation over Large Scale Dynamic Graphs 授权快速增量计算在大规模动态图形

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.136

Charith Wickramaarachchi, C. Chelmis, V. Prasanna

Unprecedented growth of online social networks, communication networks and internet of things have given birth to large volume, fast changing datasets. Data generated from such systems have an inherent graph structure in it. Updates in staggering frequencies (e.g. edges created by message exchanges in online social media) impose a fundamental requirement for real-time processing of unruly yet highly interconnected data. As a result, large-scale dynamic graph processing has become a new research frontier in computer science. In this paper, we present a new vertex-centric hierarchical bulk synchronous parallel model for distributed processing of dynamic graphs. Our model allows users to easily compose static graph algorithms similar to the widely used vertex-centric model. It also enables incremental processing of dynamic graphs by automatically executing user composed static graph algorithms in an incremental manner. We map widely used single source shortest path and connected component algorithms to this model and empirically analyze the performance on real-world large scale graphs. Experimental results show that our model improves the performance of both static and dynamic graph computation compared to the vertex-centric model by reducing the global synchronization overhead.

在线社交网络、通信网络和物联网的空前增长催生了海量、快速变化的数据集。这些系统生成的数据具有固有的图形结构。惊人频率的更新(例如，在线社交媒体上的消息交换产生的边缘)对实时处理难以控制但高度互联的数据提出了基本要求。因此，大规模动态图处理已成为计算机科学的一个新的研究前沿。本文提出了一种新的以顶点为中心的分层批量同步并行模型，用于动态图的分布式处理。我们的模型允许用户轻松地编写静态图形算法，类似于广泛使用的以顶点为中心的模型。它还通过以增量方式自动执行用户组成的静态图形算法来支持动态图形的增量处理。我们将广泛使用的单源最短路径算法和连通分量算法映射到该模型中，并对该模型在实际大尺度图上的性能进行了实证分析。实验结果表明，与以顶点为中心的模型相比，我们的模型通过减少全局同步开销，提高了静态和动态图计算的性能。

{"title":"Empowering Fast Incremental Computation over Large Scale Dynamic Graphs","authors":"Charith Wickramaarachchi, C. Chelmis, V. Prasanna","doi":"10.1109/IPDPSW.2015.136","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.136","url":null,"abstract":"Unprecedented growth of online social networks, communication networks and internet of things have given birth to large volume, fast changing datasets. Data generated from such systems have an inherent graph structure in it. Updates in staggering frequencies (e.g. edges created by message exchanges in online social media) impose a fundamental requirement for real-time processing of unruly yet highly interconnected data. As a result, large-scale dynamic graph processing has become a new research frontier in computer science. In this paper, we present a new vertex-centric hierarchical bulk synchronous parallel model for distributed processing of dynamic graphs. Our model allows users to easily compose static graph algorithms similar to the widely used vertex-centric model. It also enables incremental processing of dynamic graphs by automatically executing user composed static graph algorithms in an incremental manner. We map widely used single source shortest path and connected component algorithms to this model and empirically analyze the performance on real-world large scale graphs. Experimental results show that our model improves the performance of both static and dynamic graph computation compared to the vertex-centric model by reducing the global synchronization overhead.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125305432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Parallel Methods for Optimizing High Order Constellations on GPUs gpu上高阶星座的并行优化方法

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.48

Paolo Spallaccini, F. Kayhan, Stefano Chinnici, G. Montorsi

The increasing demand for fast mobile data has driven transmission systems to use high order signal constellations. Conventional modulation schemes such as QAM and APSK are sub-optimal, large gains may be obtained by properly optimizing the constellation signals set under given channel constraints. The constellation optimization problem is computationally intensive and the known methods become rapidly unfeasible as the constellation order increases. Very few attempts to optimize constellations in excess of 64 signals have been reported. In this paper, we apply a simulated annealing (SA) algorithm to maximize the Mutual Information (MI) and Pragmatic Mutual Information (PMI), given the channel constraints. We first propose a GPU accelerated method for calculating MI and PMI of a constellation. For AWGN channels the method grants one order of magnitude speedup over a CPU realization. We also propose a parallelization of the Gaussian-Hermite Quadrature to compute the Average Mutual Information (AMI) and the Pragmatic Average Mutual Information (PAMI) on GPUs. Considering the more complex problem of constellation optimization over phase noise channels, we obtain two orders of magnitude speedup over CPUs. In order to reach such performance, novel parallel algorithms have been devised. Using our method, constellations with thousands of signals can be optimized.

对快速移动数据日益增长的需求促使传输系统使用高阶信号星座。传统的QAM和APSK调制方案是次优的，在给定的信道约束下，适当优化星座信号集可以获得较大的增益。星座优化问题计算量大，随着星座阶数的增加，已知方法很快变得不可行。很少有人尝试优化超过64个信号的星座。在本文中，我们应用模拟退火(SA)算法来最大化互信息(MI)和实用互信息(PMI)，给定信道约束。首先提出了一种计算星座MI和PMI的GPU加速方法。对于AWGN通道，该方法比CPU实现提供一个数量级的加速。我们还提出了一种并行化的高斯-埃尔米正交来计算gpu上的平均互信息(AMI)和实用平均互信息(PAMI)。考虑到更复杂的相位噪声信道星座优化问题，我们在cpu上获得了两个数量级的加速。为了达到这样的性能，新的并行算法被设计出来。使用我们的方法，可以优化具有数千个信号的星座。

{"title":"Parallel Methods for Optimizing High Order Constellations on GPUs","authors":"Paolo Spallaccini, F. Kayhan, Stefano Chinnici, G. Montorsi","doi":"10.1109/IPDPSW.2015.48","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.48","url":null,"abstract":"The increasing demand for fast mobile data has driven transmission systems to use high order signal constellations. Conventional modulation schemes such as QAM and APSK are sub-optimal, large gains may be obtained by properly optimizing the constellation signals set under given channel constraints. The constellation optimization problem is computationally intensive and the known methods become rapidly unfeasible as the constellation order increases. Very few attempts to optimize constellations in excess of 64 signals have been reported. In this paper, we apply a simulated annealing (SA) algorithm to maximize the Mutual Information (MI) and Pragmatic Mutual Information (PMI), given the channel constraints. We first propose a GPU accelerated method for calculating MI and PMI of a constellation. For AWGN channels the method grants one order of magnitude speedup over a CPU realization. We also propose a parallelization of the Gaussian-Hermite Quadrature to compute the Average Mutual Information (AMI) and the Pragmatic Average Mutual Information (PAMI) on GPUs. Considering the more complex problem of constellation optimization over phase noise channels, we obtain two orders of magnitude speedup over CPUs. In order to reach such performance, novel parallel algorithms have been devised. Using our method, constellations with thousands of signals can be optimized.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125310489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Automated High-Level Design Framework for Partially Reconfigurable FPGAs 部分可重构fpga的自动化高级设计框架

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.99

Rohit Kumar, A. Gordon-Ross

Modern field-programmable gate arrays (FPGAs) allow runtime partial reconfiguration (PR) of the FPGA, enabling PR benefits such as runtime adaptability and extensibility, and reduces the application's area requirement. However, PR application development requires non-traditional expertise and lengthy design time effort. Since high-level synthesis (HLS) languages afford fast application development time, these languages are becoming increasingly popular for FPGA application development. However, widely used HLS languages, such as C variants, do not contain PR-specific constructs, thus exploiting PR benefits using an HLS language is a challenging task. To alleviate this challenge, we present an automated high-level design framework -- PaRAT (partial reconfiguration amenability test). PaRAT parses, analyzes, and partitions an application's HLS code to generate the application's PR architectures, which contain the application's runtime modifiable modules and thus, allows the application's runtime reconfiguration. Case study analysis demonstrates PaRAT's ability to quickly and automatically generate PR architectures from an application's HLS code.

现代现场可编程门阵列(FPGA)允许FPGA的运行时部分重新配置(PR)，从而实现了运行时适应性和可扩展性等PR优势，并减少了应用程序的面积要求。然而，PR应用程序开发需要非传统的专业知识和漫长的设计时间。由于高级合成(HLS)语言提供了快速的应用程序开发时间，这些语言在FPGA应用程序开发中越来越受欢迎。然而，广泛使用的HLS语言(如C变体)不包含特定于PR的结构，因此使用HLS语言利用PR优势是一项具有挑战性的任务。为了缓解这一挑战，我们提出了一个自动化的高级设计框架——PaRAT(部分重构适应性测试)。PaRAT解析、分析和分区应用程序的HLS代码，以生成应用程序的PR体系结构，其中包含应用程序的运行时可修改模块，从而允许应用程序的运行时重新配置。案例研究分析展示了PaRAT从应用程序的HLS代码快速自动生成PR架构的能力。

{"title":"An Automated High-Level Design Framework for Partially Reconfigurable FPGAs","authors":"Rohit Kumar, A. Gordon-Ross","doi":"10.1109/IPDPSW.2015.99","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.99","url":null,"abstract":"Modern field-programmable gate arrays (FPGAs) allow runtime partial reconfiguration (PR) of the FPGA, enabling PR benefits such as runtime adaptability and extensibility, and reduces the application's area requirement. However, PR application development requires non-traditional expertise and lengthy design time effort. Since high-level synthesis (HLS) languages afford fast application development time, these languages are becoming increasingly popular for FPGA application development. However, widely used HLS languages, such as C variants, do not contain PR-specific constructs, thus exploiting PR benefits using an HLS language is a challenging task. To alleviate this challenge, we present an automated high-level design framework -- PaRAT (partial reconfiguration amenability test). PaRAT parses, analyzes, and partitions an application's HLS code to generate the application's PR architectures, which contain the application's runtime modifiable modules and thus, allows the application's runtime reconfiguration. Case study analysis demonstrates PaRAT's ability to quickly and automatically generate PR architectures from an application's HLS code.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"72 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116396818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Improved Internode Communication for Tile QR Decomposition for Multicore Cluster Systems 基于改进节点间通信的多核集群系统Tile QR分解

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.145

Tomohiro Suzuki

Tile algorithms for matrix decomposition can generate many fine-grained tasks. Therefore, their suitability for processing with multicourse architecture has attracted much attention from the high-performance computing (HPC) community. Our implementation of tile QR decomposition for a cluster system has dynamic scheduling, OpenMP work- sharing, and other useful features. In this article, we discuss the problems in internodes communications that were present in our previous implementation. The improved implementation has both strong and weak scalability.

矩阵分解的Tile算法可以生成许多细粒度的任务。因此，它们在多课程体系结构下的适用性引起了高性能计算界的广泛关注。我们为集群系统实现的tile QR分解具有动态调度、OpenMP工作共享和其他有用的特性。在本文中，我们将讨论在以前的实现中存在的节点间通信问题。改进后的实现具有强可伸缩性和弱可伸缩性。

引用次数: 1

Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs 部分可重构fpga上硬件多任务处理的部分区域和比特流代价模型

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.148

Aurelio Morales-Villanueva, A. Gordon-Ross

Partial reconfiguration (PR) on field-programmable gate arrays (FPGAs) enables multiple PR modules (PRMs) to time multiplex partially reconfigurable regions (PRRs), which affords reduced reconfiguration time, area overhead, etc., as compared to non-PR systems. However, to effectively leverage PR, system designers must determine appropriate PRR sizes/organizations during early stages of PR system design, since inappropriate PRRs, given PRM requirements, can negate PR benefits, potentially resulting in system performance worse than a functionally-equivalent non-PR design. To aid in PR system design, we present two portable, high-level cost models, which are based on the synthesis report results generated by Xilinx tools. These cost models estimate PRR size/organization given the PRR's associated PRMs to maximize the PRRs' resource utilizations and estimate the PRM's associated partial bitstream sizes based on the PRR sizes/organizations. Experiments evaluate our cost models' accuracies for different PRMs and required resources, which enable our models to afford enhanced designer productivity since these models preclude the lengthy PR design flow, which is typically required to attain such analysis.

现场可编程门阵列(fpga)上的部分可重构(PR)使多个PR模块(PRMs)能够对部分可重构区域(PRRs)进行时间复用，与非PR系统相比，可以减少重新配置时间和面积开销等。然而，为了有效地利用PR，系统设计师必须在PR系统设计的早期阶段确定适当的PRR大小/组织，因为不适当的PRR，给定PRM需求，可能会抵消PR的好处，潜在地导致系统性能比功能等效的非PR设计更差。为了帮助PR系统设计，我们提出了两个可移植的高级成本模型，它们基于Xilinx工具生成的综合报告结果。这些成本模型根据PRR的相关PRM来估计PRR的大小/组织，以最大化PRR的资源利用率，并根据PRR的大小/组织来估计PRM的相关部分比特流大小。实验评估了我们的成本模型对不同PRMs和所需资源的准确性，这使我们的模型能够提供更高的设计师生产力，因为这些模型排除了冗长的PR设计流程，而这通常是获得此类分析所必需的。

{"title":"Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs","authors":"Aurelio Morales-Villanueva, A. Gordon-Ross","doi":"10.1109/IPDPSW.2015.148","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.148","url":null,"abstract":"Partial reconfiguration (PR) on field-programmable gate arrays (FPGAs) enables multiple PR modules (PRMs) to time multiplex partially reconfigurable regions (PRRs), which affords reduced reconfiguration time, area overhead, etc., as compared to non-PR systems. However, to effectively leverage PR, system designers must determine appropriate PRR sizes/organizations during early stages of PR system design, since inappropriate PRRs, given PRM requirements, can negate PR benefits, potentially resulting in system performance worse than a functionally-equivalent non-PR design. To aid in PR system design, we present two portable, high-level cost models, which are based on the synthesis report results generated by Xilinx tools. These cost models estimate PRR size/organization given the PRR's associated PRMs to maximize the PRRs' resource utilizations and estimate the PRM's associated partial bitstream sizes based on the PRR sizes/organizations. Experiments evaluate our cost models' accuracies for different PRMs and required resources, which enable our models to afford enhanced designer productivity since these models preclude the lengthy PR design flow, which is typically required to attain such analysis.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128372694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

HiCOMB Introduction and Committees HiCOMB介绍和委员会

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.160

S. Rajasekaran, S. Aluru, David A. Bader

HiCOMB Introduction and Committees

HiCOMB介绍和委员会

引用次数: 0

EduPar Keynote EduPar主题

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.177

Geoffrey Fox

We describe the Indiana University Data Science program which has a Masters, Certificate and PhD minor approved. We note the wide variety of students from hard core developers of new systems to analysts using data-intensive decision systems. We describe experience teaching two courses aimed at software:http://bigdataopensourceprojects.soic.indiana.edu/ and applications/algorithmshttps://bigdatacoursespring2015.appspot.com/preview respectively.All these courses deliver lectures online and support both non-residential students completely online with residential sections operating in "flipped classroom" mode. We describe experience with two broadly available technologies Google Course Builder and Microsoft Office Mix. These are both interesting incomplete platforms. We describe use of social media (forums) and support of online computing laboratory sessions. We note various mistakes we made and discuss way forward.

我们描述了印第安纳大学数据科学项目，该项目有硕士、证书和博士辅修批准。我们注意到学生的种类繁多，从新系统的核心开发人员到使用数据密集型决策系统的分析师。我们描述了两门课程的教学经验，分别针对软件:http://bigdataopensourceprojects.soic.indiana.edu/和应用/算法:bigdatacoursespring2015.appspot.com/preview。所有这些课程都提供在线授课，并支持非住宿学生完全在线，住宿部分采用“翻转课堂”模式。我们描述了两种广泛使用的技术谷歌课程生成器和Microsoft Office Mix的经验。这两个都是有趣的不完整平台。我们描述了社会媒体(论坛)的使用和在线计算实验课程的支持。我们注意到我们所犯的各种错误，并讨论前进的方向。

引用次数: 0

Relocation-Aware Floorplanning for Partially-Reconfigurable FPGA-Based Systems 部分可重构fpga系统的位置感知平面规划

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.52

Marco Rabozzi, Riccardo Cattaneo, Tobias Becker, W. Luk, M. Santambrogio

Within this paper we present a floor planner for partially-reconfigurable FPGAs that allow the designer to consider bit stream relocation constraints during the design of the system. The presented approach is an extension of our previous work on floor planning based on a Mixed-Integer Linear Programming (MILP) formulation, thus allowing the designer to optimize a set of different metrics within a user defined objective function while considering preferences related directly to relocation capabilities. Experimental results shows that the presented approach is able to reserve multiple free areas for a reconfigurable region with a small impact on the solution cost in terms of wire length and size of the configuration data.

在本文中，我们提出了一个部分可重构fpga的地板规划器，允许设计人员在系统设计期间考虑位流重新定位约束。所提出的方法是我们之前基于混合整数线性规划(MILP)公式的地板规划工作的扩展，从而允许设计师在考虑与搬迁能力直接相关的偏好的同时，在用户定义的目标函数内优化一组不同的指标。实验结果表明，该方法能够为一个可重构区域保留多个自由区域，并且在布线长度和配置数据大小方面对解决成本的影响很小。

引用次数: 3

Folding Methods for Event Timelines in Performance Analysis 性能分析中事件时间线的折叠方法

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.47

Matthias Weber, Ronald Geisler, H. Brunst, W. Nagel

The complexity of today's high performance computing systems, and their parallel software, requires performance analysis tools to fully understand application performance behavior. The visualization of event streams has proven to be a powerful approach for the detection of various types of performance problems. However, visualization of large numbers of process streams quickly hits the limits of available screen resolution. To alleviate this problem we propose folding strategies for event timelines that consider common questions during performance analysis. We demonstrate the effectiveness of our solution with code inefficiencies in two real-world applications, PIConGPU and COSMO-SPECS. Our methods facilitate visual scalability and provide powerful overviews of performance data at the same time. Furthermore, our folding strategies improve GPU stream visualization and allow easy evaluation of the GPU device utilization.

当今高性能计算系统及其并行软件的复杂性需要性能分析工具来完全理解应用程序的性能行为。事件流的可视化已被证明是检测各种类型性能问题的强大方法。然而，大量流程流的可视化很快就会达到可用屏幕分辨率的极限。为了缓解这个问题，我们提出了考虑性能分析过程中常见问题的事件时间线折叠策略。我们在两个实际应用程序PIConGPU和cosmos - spec中演示了我们的解决方案在代码效率低下方面的有效性。我们的方法促进了可视化的可伸缩性，同时提供了强大的性能数据概述。此外，我们的折叠策略改进了GPU流可视化，并允许轻松评估GPU设备利用率。

引用次数: 4

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀