首页 > 最新文献

2009 IEEE International Symposium on Parallel and Distributed Processing with Applications最新文献

英文 中文
A Task-Pool Parallel I/O Paradigm for an I/O Intensive Application 面向I/O密集型应用的任务池并行I/O范式
Jianjiang Li, Lin Yan, Zhe Gao, D. Hei
In regards to applications like 3D seismic migration, it is quite important to improve the I/O performance within an cluster computing system. Such seismic data processing applications are the I/O intensive applications. For example, large 3D data volume cannot be hold totally in computer memories. Therefore the input data files have to be divided into many fine-grained chunks. Intermediate results are written out at various stages during the execution, and final results are written out by the master process. This paper describes a novel manner for optimizing the parallel I/O data access strategy and load balancing for the above-mentioned particular program model. The optimization, based on the application defined API, reduces the number of I/O operations and communication (as compared to the original model). This is done by forming groups of threads with "group roots", so to speak, that read input data (determined by an index retrieved from the master process) and then send it to their group members. In the original model, each process/thread reads the whole input data and outputs its own results. Moreover the loads are balanced, for the on-line dynamic scheduling of access request to process the migration data. Finally, in the actual performance test, the improvement of performance is often more than 60% by comparison with the original model.
对于像3D地震迁移这样的应用程序,提高集群计算系统中的I/O性能是非常重要的。这种地震数据处理应用程序是I/O密集型应用程序。例如,大的3D数据量不能完全保存在计算机存储器中。因此,必须将输入数据文件划分为许多细粒度的块。中间结果在执行的各个阶段被写出来,最终结果由主进程写出来。本文针对上述特定的程序模型,提出了一种优化并行I/O数据访问策略和负载均衡的新方法。基于应用程序定义的API的优化减少了I/O操作和通信的数量(与原始模型相比)。这是通过形成具有“组根”的线程组来实现的,也就是说,线程组读取输入数据(由从主进程检索的索引确定),然后将其发送给它们的组成员。在原始模型中,每个进程/线程读取整个输入数据并输出自己的结果。并且负载均衡,实现访问请求的在线动态调度,处理迁移数据。最后,在实际性能测试中,与原模型相比,性能的提高往往在60%以上。
{"title":"A Task-Pool Parallel I/O Paradigm for an I/O Intensive Application","authors":"Jianjiang Li, Lin Yan, Zhe Gao, D. Hei","doi":"10.1109/ISPA.2009.20","DOIUrl":"https://doi.org/10.1109/ISPA.2009.20","url":null,"abstract":"In regards to applications like 3D seismic migration, it is quite important to improve the I/O performance within an cluster computing system. Such seismic data processing applications are the I/O intensive applications. For example, large 3D data volume cannot be hold totally in computer memories. Therefore the input data files have to be divided into many fine-grained chunks. Intermediate results are written out at various stages during the execution, and final results are written out by the master process. This paper describes a novel manner for optimizing the parallel I/O data access strategy and load balancing for the above-mentioned particular program model. The optimization, based on the application defined API, reduces the number of I/O operations and communication (as compared to the original model). This is done by forming groups of threads with \"group roots\", so to speak, that read input data (determined by an index retrieved from the master process) and then send it to their group members. In the original model, each process/thread reads the whole input data and outputs its own results. Moreover the loads are balanced, for the on-line dynamic scheduling of access request to process the migration data. Finally, in the actual performance test, the improvement of performance is often more than 60% by comparison with the original model.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122467916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Parallel Gibbs Sampling Algorithm for Motif Finding on GPU 基于GPU的基序查找并行Gibbs采样算法
Linbin Yu, Yun Xu
Motif is overrepresented pattern in biological sequence and Motif finding is an important problem in bioinformatics. Due to high computational complexity of motif finding, more and more computational capabilities are required as the rapid growth of available biological data, such as gene transcription data. Among many motif finding algorithms, Gibbs sampling is an effective method for long motif finding. In this paper we present an improved Gibbs sampling method on graphics processing units (GPU) to accelerate motif finding. Experimental data support that, compared to traditional programs on CPU, our program running on GPU provides an effective and low-cost solution for motif finding problem, especially for long Motif finding.
基序是生物序列中具有代表性的一种模式,基序的发现是生物信息学中的一个重要问题。由于基序查找的计算复杂度很高,随着可用生物数据(如基因转录数据)的快速增长,对计算能力的要求越来越高。在众多的基序查找算法中,Gibbs采样是一种有效的长基序查找方法。在本文中,我们提出了一种改进的吉布斯采样方法在图形处理单元(GPU)上加速motif查找。实验数据表明,与传统的CPU上的程序相比,我们的程序在GPU上运行,为motif查找问题,特别是长motif查找问题提供了一种有效且低成本的解决方案。
{"title":"A Parallel Gibbs Sampling Algorithm for Motif Finding on GPU","authors":"Linbin Yu, Yun Xu","doi":"10.1109/ISPA.2009.88","DOIUrl":"https://doi.org/10.1109/ISPA.2009.88","url":null,"abstract":"Motif is overrepresented pattern in biological sequence and Motif finding is an important problem in bioinformatics. Due to high computational complexity of motif finding, more and more computational capabilities are required as the rapid growth of available biological data, such as gene transcription data. Among many motif finding algorithms, Gibbs sampling is an effective method for long motif finding. In this paper we present an improved Gibbs sampling method on graphics processing units (GPU) to accelerate motif finding. Experimental data support that, compared to traditional programs on CPU, our program running on GPU provides an effective and low-cost solution for motif finding problem, especially for long Motif finding.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126988543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Performance Analysis of ClearSpeed's CSX600 Interconnects ClearSpeed CSX600互连的性能分析
Yuri Nishikawa, M. Koibuchi, Masato Yoshimi, Akihiro Shitara, K. Miura, H. Amano
ClearSpeed's CSX600 that consists of 96 Processing Elements (PEs) employs a one-dimensional array topology for a simple SIMD processing. To clearly show the performance factors and practical issues of NoCs in an existing modern many-core SIMD system, this paper measures and analyzes NoCs of CSX600 called Swazzle and ClearConnect. Evaluation and analysis results show that the sending and receiving overheads are the major limitation factors to the effective network bandwidth. We found that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are three main points to make the best use of bandwidth. In addition, we estimated the best- and worst-case latencies of data transfers in parallel applications.
ClearSpeed的CSX600由96个处理元素(pe)组成,采用一维阵列拓扑进行简单的SIMD处理。为了清楚地显示现有现代多核SIMD系统中noc的性能因素和实际问题,本文测量和分析了CSX600的noc,称为Swazzle和ClearConnect。评估和分析结果表明,发送和接收开销是限制有效网络带宽的主要因素。我们发现(1)使用的pe的数量,(2)传输数据的大小,以及(3)共享内存的数据对齐是充分利用带宽的三个要点。此外,我们估计了并行应用程序中数据传输的最佳和最坏延迟。
{"title":"Performance Analysis of ClearSpeed's CSX600 Interconnects","authors":"Yuri Nishikawa, M. Koibuchi, Masato Yoshimi, Akihiro Shitara, K. Miura, H. Amano","doi":"10.1109/ISPA.2009.102","DOIUrl":"https://doi.org/10.1109/ISPA.2009.102","url":null,"abstract":"ClearSpeed's CSX600 that consists of 96 Processing Elements (PEs) employs a one-dimensional array topology for a simple SIMD processing. To clearly show the performance factors and practical issues of NoCs in an existing modern many-core SIMD system, this paper measures and analyzes NoCs of CSX600 called Swazzle and ClearConnect. Evaluation and analysis results show that the sending and receiving overheads are the major limitation factors to the effective network bandwidth. We found that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are three main points to make the best use of bandwidth. In addition, we estimated the best- and worst-case latencies of data transfers in parallel applications.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131193361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Synchronization-Based Alternative to Directory Protocol 基于同步的目录协议替代方案
He Huang, Lei Liu, Nan Yuan, Wei Lin, Fenglong Song, Junchao Zhang, Dongrui Fan
The efficient support of cache coherence is extremely important to design and implement many-core processors. In this paper, we propose a synchronization-based coherence (SBC) protocol to efficiently support cache coherence for shared memory many-core architectures. The unique feature of our scheme is that it doesn’t use directory at all. Inspired by scope consistency memory model, our protocol maintains coherence at synchronization point. Within critical section, processor cores record write-sets (which lines have been written in critical section) with bloom-filter function. When the core releases the lock, the write-set is transferred to a synchronization manager. When another core acquires the same lock, it gets the write-set from the synchronization manager and invalidates stale data in its local cache. Experimental results show that the SBC outperforms by averages of 5% in execution time across a suite of scientific applications. At the mean time, the SBC is more cost-effective comparing to directory-based protocol that requires large amount of hardware resource and huge design verification effort.
高效的缓存一致性支持对多核处理器的设计和实现至关重要。在本文中,我们提出了一种基于同步的一致性(SBC)协议,以有效地支持共享内存多核架构的缓存一致性。我们方案的独特之处在于它根本不使用目录。受作用域一致性内存模型的启发,我们的协议在同步点保持一致性。在临界区内,处理器内核使用bloom-filter功能记录写集(哪些行已经在临界区写入)。当内核释放锁时,写集被转移到同步管理器。当另一个核心获得相同的锁时,它从同步管理器获得写集,并使其本地缓存中的陈旧数据无效。实验结果表明,在一系列科学应用中,SBC的执行时间平均优于5%。同时,与需要大量硬件资源和大量设计验证工作的基于目录的协议相比,SBC更具成本效益。
{"title":"A Synchronization-Based Alternative to Directory Protocol","authors":"He Huang, Lei Liu, Nan Yuan, Wei Lin, Fenglong Song, Junchao Zhang, Dongrui Fan","doi":"10.1109/ISPA.2009.25","DOIUrl":"https://doi.org/10.1109/ISPA.2009.25","url":null,"abstract":"The efficient support of cache coherence is extremely important to design and implement many-core processors. In this paper, we propose a synchronization-based coherence (SBC) protocol to efficiently support cache coherence for shared memory many-core architectures. The unique feature of our scheme is that it doesn’t use directory at all. Inspired by scope consistency memory model, our protocol maintains coherence at synchronization point. Within critical section, processor cores record write-sets (which lines have been written in critical section) with bloom-filter function. When the core releases the lock, the write-set is transferred to a synchronization manager. When another core acquires the same lock, it gets the write-set from the synchronization manager and invalidates stale data in its local cache. Experimental results show that the SBC outperforms by averages of 5% in execution time across a suite of scientific applications. At the mean time, the SBC is more cost-effective comparing to directory-based protocol that requires large amount of hardware resource and huge design verification effort.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"37 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114013021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data Management: The Spirit to Pursuit Peak Performance on Many-Core Processor 数据管理:追求多核处理器巅峰性能的精神
Yongbin Zhou, Junchao Zhang, Shuai Zhang, Nan Yuan, Dongrui Fan
to date, most of many-core prototypes employ tiled topologies connected through on-chip networks. The throughput and latency of the on-chip networks usually become to the bottleneck to achieve peak performance especially for communication intensive applications. Most of studies are focus on on-chip networks only, such as routing algorithms or router micro-architecture, to improve the above metrics. The salient aspect of our approach is that we provide a data management framework to implement high efficient on-chip traffic based on overall many-core system. The major contributions of this paper include that: (1) providing a novel tiled many-core architecture which supports software controlled on-chip data storage and movement management; (2) identifying that the asynchronous bulk data transfer mechanism is an effective method to tolerant the latency of 2-D mesh on-chip networks. At last, we evaluate the 1-D FFT algorithm on the framework and the performance achieves 47.6 Gflops with 24.8% computation efficiency.
迄今为止,大多数多核原型都采用通过片上网络连接的平铺拓扑。片上网络的吞吐量和延迟往往成为实现峰值性能的瓶颈,特别是在通信密集型应用中。为了改进上述指标,大多数研究只关注片上网络,如路由算法或路由器微架构。我们的方法的突出方面是,我们提供了一个数据管理框架,以实现基于整体多核系统的高效片上流量。本文的主要贡献包括:(1)提供了一种新颖的平铺多核架构,支持软件控制的片上数据存储和移动管理;(2)发现异步批量数据传输机制是容忍二维网格片上网络延迟的有效方法。最后,我们在该框架上对一维FFT算法进行了评估,性能达到47.6 Gflops,计算效率为24.8%。
{"title":"Data Management: The Spirit to Pursuit Peak Performance on Many-Core Processor","authors":"Yongbin Zhou, Junchao Zhang, Shuai Zhang, Nan Yuan, Dongrui Fan","doi":"10.1109/ISPA.2009.22","DOIUrl":"https://doi.org/10.1109/ISPA.2009.22","url":null,"abstract":"to date, most of many-core prototypes employ tiled topologies connected through on-chip networks. The throughput and latency of the on-chip networks usually become to the bottleneck to achieve peak performance especially for communication intensive applications. Most of studies are focus on on-chip networks only, such as routing algorithms or router micro-architecture, to improve the above metrics. The salient aspect of our approach is that we provide a data management framework to implement high efficient on-chip traffic based on overall many-core system. The major contributions of this paper include that: (1) providing a novel tiled many-core architecture which supports software controlled on-chip data storage and movement management; (2) identifying that the asynchronous bulk data transfer mechanism is an effective method to tolerant the latency of 2-D mesh on-chip networks. At last, we evaluate the 1-D FFT algorithm on the framework and the performance achieves 47.6 Gflops with 24.8% computation efficiency.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128737491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
P-Cache: Providing Prioritized Caching Service for Storage System P-Cache:为存储系统提供优先级缓存服务
Xiaoxuan Meng, Chengxiang Si, Wenwu Na, Lu Xu
P-Cache to provide prioritized caching service for storage server which is used to serve multiple concurrently accessing applications with diverse access patterns and unequal importance. Given the replacement algorithm and the application access patterns, the end performance of each individual application in a shared cache is actually determined by its allocated cache resource. So, P-Cache adopts a dynamic partitioning approach to explicitly divide cache resource among applications and utilizes a global cache allocation policy to make adaptive cache allocations to guarantee the preset relative caching priority among competing applications. We have implemented P-Cache in Linux kernel 2.6.18 as a pseudo device driver and measured its performance using synthetic benchmark and real-life workloads. The experiment results show that the prioritized caching service provided by P-Cache can not only be used to support application priority but can also be utilized to improve the overall storage system performance. Its runtime overhead is also smaller compared with Linux page cache.
P-Cache为存储服务器提供优先级缓存服务,用于服务多个并发访问、访问模式不同、重要性不等的应用程序。给定替换算法和应用程序访问模式,共享缓存中每个应用程序的最终性能实际上是由其分配的缓存资源决定的。因此,P-Cache采用动态分区的方法在应用程序之间显式划分缓存资源,并利用全局缓存分配策略进行自适应缓存分配,以保证竞争应用程序之间预设的相对缓存优先级。我们在Linux内核2.6.18中实现了P-Cache作为一个伪设备驱动程序,并使用合成基准测试和实际工作负载来测量其性能。实验结果表明,P-Cache提供的优先级缓存服务不仅可以用于支持应用程序优先级,还可以用于提高存储系统的整体性能。与Linux页面缓存相比,它的运行时开销也更小。
{"title":"P-Cache: Providing Prioritized Caching Service for Storage System","authors":"Xiaoxuan Meng, Chengxiang Si, Wenwu Na, Lu Xu","doi":"10.1109/ISPA.2009.40","DOIUrl":"https://doi.org/10.1109/ISPA.2009.40","url":null,"abstract":"P-Cache to provide prioritized caching service for storage server which is used to serve multiple concurrently accessing applications with diverse access patterns and unequal importance. Given the replacement algorithm and the application access patterns, the end performance of each individual application in a shared cache is actually determined by its allocated cache resource. So, P-Cache adopts a dynamic partitioning approach to explicitly divide cache resource among applications and utilizes a global cache allocation policy to make adaptive cache allocations to guarantee the preset relative caching priority among competing applications. We have implemented P-Cache in Linux kernel 2.6.18 as a pseudo device driver and measured its performance using synthetic benchmark and real-life workloads. The experiment results show that the prioritized caching service provided by P-Cache can not only be used to support application priority but can also be utilized to improve the overall storage system performance. Its runtime overhead is also smaller compared with Linux page cache.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125942667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Active Trusted Model for Virtual Machine Systems 虚拟机系统的主动可信模型
Wentao Qu, Minglu Li, Chuliang Weng
Virtualization is a new area for research in recent years, and virtualization technology can bring convenience to the management of computing resources. Together with the development of the network and the network computing, it gives the virtualization technology more scenarios. The cloud computing technology uses the virtualization technology as while. With the development of the technology, it meets some security problems, such as rootkit attacks and malignant tampers. Malicious programs can plug into the system, and be booted at the any time of the virtualized system. There is little theoretical research on booting a trusted virtualized system. We propose an active trusted model in order to give a theoretical model for not only analyzing the state of a virtualized system, but also helping to design trusted virtual machine application. TBoot is a project to boot a trusted virtual machine. We use our model to illustrate that TBoot can boot a trusted virtual machine theoretically.
虚拟化是近年来研究的一个新领域,虚拟化技术可以为计算资源的管理带来便利。随着网络和网络计算的发展,为虚拟化技术提供了更多的应用场景。云计算技术同时采用虚拟化技术。随着技术的发展,它遇到了一些安全问题,如rootkit攻击和恶意篡改。恶意程序可以插入系统,并在虚拟系统的任何时间被启动。很少有关于启动可信虚拟系统的理论研究。提出了一种主动可信模型,为分析虚拟系统的状态和设计可信虚拟机应用提供了理论模型。TBoot是引导受信任虚拟机的项目。从理论上说明TBoot可以引导可信虚拟机。
{"title":"An Active Trusted Model for Virtual Machine Systems","authors":"Wentao Qu, Minglu Li, Chuliang Weng","doi":"10.1109/ISPA.2009.68","DOIUrl":"https://doi.org/10.1109/ISPA.2009.68","url":null,"abstract":"Virtualization is a new area for research in recent years, and virtualization technology can bring convenience to the management of computing resources. Together with the development of the network and the network computing, it gives the virtualization technology more scenarios. The cloud computing technology uses the virtualization technology as while. With the development of the technology, it meets some security problems, such as rootkit attacks and malignant tampers. Malicious programs can plug into the system, and be booted at the any time of the virtualized system. There is little theoretical research on booting a trusted virtualized system. We propose an active trusted model in order to give a theoretical model for not only analyzing the state of a virtualized system, but also helping to design trusted virtual machine application. TBoot is a project to boot a trusted virtual machine. We use our model to illustrate that TBoot can boot a trusted virtual machine theoretically.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133707470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Balancing Parallel Applications on Multi-core Processors Based on Cache Partitioning 基于缓存分区的多核处理器并行应用均衡
Guang Suo, Xuejun Yang
Load balancing is an important problem for parallel applications. Recently, many super computers are built on multi-core processors which are usually sharing the last level cache. On one hand different accesses from different cores conflict each other, on the other hand different cores have different work loads resulting in load unbalancing. In this paper, we present a novel technique for balancing parallel applications for multi-core processors based on cache partitioning which can allocate different part of shared caches to different cores exclusively. Our intuitive idea is partitioning shared cache to different cores based on their workloads. That is to say, a heavy load core will get more shared caches than a light load core, so the heavy load core runs faster. We give 2 algorithms in this paper, initial cache partitioning algorithm (ICP) and dynamical cache partitioning algorithm (DCP). ICP is used to determine the best partition when application starting while DCP is used to adjust the initial partition based on the changes of load balancing. Our experiment results show that the running time can be reduced by 7% on average when our load balancing mechanism based on cache partitioning is used.
负载平衡是并行应用的一个重要问题。最近,许多超级计算机建立在多核处理器上,这些处理器通常共享最后一级缓存。一方面是不同核的不同访问会产生冲突,另一方面是不同核的工作负载不同,导致负载不均衡。本文提出了一种基于缓存分区的多核处理器并行应用均衡技术,该技术可以将共享缓存的不同部分独占地分配给不同的核心。我们的直观想法是根据工作负载将共享缓存分区到不同的核心。也就是说,重负载核心将比轻负载核心获得更多的共享缓存,因此重负载核心运行得更快。本文给出了初始缓存分区算法(ICP)和动态缓存分区算法(DCP)。ICP用于在应用程序启动时确定最佳分区,DCP用于根据负载平衡的变化调整初始分区。实验结果表明,采用基于缓存分区的负载均衡机制,平均可以减少7%的运行时间。
{"title":"Balancing Parallel Applications on Multi-core Processors Based on Cache Partitioning","authors":"Guang Suo, Xuejun Yang","doi":"10.1109/ISPA.2009.37","DOIUrl":"https://doi.org/10.1109/ISPA.2009.37","url":null,"abstract":"Load balancing is an important problem for parallel applications. Recently, many super computers are built on multi-core processors which are usually sharing the last level cache. On one hand different accesses from different cores conflict each other, on the other hand different cores have different work loads resulting in load unbalancing. In this paper, we present a novel technique for balancing parallel applications for multi-core processors based on cache partitioning which can allocate different part of shared caches to different cores exclusively. Our intuitive idea is partitioning shared cache to different cores based on their workloads. That is to say, a heavy load core will get more shared caches than a light load core, so the heavy load core runs faster. We give 2 algorithms in this paper, initial cache partitioning algorithm (ICP) and dynamical cache partitioning algorithm (DCP). ICP is used to determine the best partition when application starting while DCP is used to adjust the initial partition based on the changes of load balancing. Our experiment results show that the running time can be reduced by 7% on average when our load balancing mechanism based on cache partitioning is used.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128011713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Fault-Tolerant Routing Schemes for Wormhole Mesh 虫洞网格的容错路由方案
Xinming Duan, Dakun Zhang, Xuemei Sun
Fault-tolerance is an important issue for the design of interconnection networks. In this paper, a new fault-tolerant routing algorithm is presented and is applied in mesh networks employing wormhole switching. Due to its low routing restrictions, the presented routing algorithm is so highly adaptive that it is connected and deadlock-free in spite of the various fault regions in mesh networks. Due to the minimal virtual channels it uses, the presented routing algorithm only employs as few buffers as possible and is suitable for fault-tolerant interconnection networks with low cost. Since it chooses the path around fault regions according to the local fault information, the presented routing algorithm takes routing decisions quickly and is applicable in interconnection networks. Moreover, a simulation is conducted for the proposed routing algorithm and the results show that the algorithm exhibits a graceful degradation in performance.
容错是互连网络设计中的一个重要问题。本文提出了一种新的容错路由算法,并将其应用于采用虫洞交换的网状网络中。该算法具有较低的路由限制,自适应能力强,即使网状网络中存在各种故障区域,也能保持连通和无死锁。由于所使用的虚拟通道最少,因此所提出的路由算法只使用尽可能少的缓冲区,适合于低成本的容错互连网络。该算法根据局部故障信息选择故障区域周围的路径,路由决策速度快,适用于互连网络。此外,对所提出的路由算法进行了仿真,结果表明该算法具有良好的性能退化。
{"title":"Fault-Tolerant Routing Schemes for Wormhole Mesh","authors":"Xinming Duan, Dakun Zhang, Xuemei Sun","doi":"10.1109/ISPA.2009.62","DOIUrl":"https://doi.org/10.1109/ISPA.2009.62","url":null,"abstract":"Fault-tolerance is an important issue for the design of interconnection networks. In this paper, a new fault-tolerant routing algorithm is presented and is applied in mesh networks employing wormhole switching. Due to its low routing restrictions, the presented routing algorithm is so highly adaptive that it is connected and deadlock-free in spite of the various fault regions in mesh networks. Due to the minimal virtual channels it uses, the presented routing algorithm only employs as few buffers as possible and is suitable for fault-tolerant interconnection networks with low cost. Since it chooses the path around fault regions according to the local fault information, the presented routing algorithm takes routing decisions quickly and is applicable in interconnection networks. Moreover, a simulation is conducted for the proposed routing algorithm and the results show that the algorithm exhibits a graceful degradation in performance.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114853373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Dynamic Forensics Based on Intrusion Tolerance 基于入侵容忍的动态取证
Lin Chen, Zhitang Li, C. Gao, Lan Liu
With the development of intrusion technologies, dynamic forensics is becoming more and more important. Dynamic forensics using IDS or honeypot are all based on a common hypothesis that the system is still in a reliable working situation and collected evidences are believable even if the system is suffered from intrusion. In fact, the system has already transferred into an insecurity and unreliable state, it is uncertain that whether the intrusion detectors and investigators could run as normal and whether the obtained evidences are credible. Although intrusion tolerance has been applied in many areas of security for years, few researches are referred to network forensics. The work presented in this paper is based on an idea to integrate Intrusion tolerance into dynamic forensics to make the system under control, ensure the reliability of evidences and aim to gather more useful evidences for investigation. A mechanism of dynamic forensics based on intrusion forensics is proposed. This paper introduces the architecture of the model which uses IDS as tolerance and forensics trigger and honeypot as shadow server, the finite state machine model is described to specify the mechanism, and then two cases are analyzed to illuminate the mechanism.
随着入侵技术的发展,动态取证变得越来越重要。使用IDS或蜜罐的动态取证都是基于一个共同的假设,即即使系统遭受入侵,系统仍然处于可靠的工作状态,并且收集的证据是可信的。事实上,系统已经进入了不安全、不可靠的状态,入侵探测器和侦查人员能否正常运行,获取的证据是否可信,都是不确定的。近年来,入侵容忍技术在许多安全领域得到了广泛的应用,但针对网络取证的研究却很少。本文的工作是基于将入侵容忍融入动态取证的思想,使系统处于可控状态,保证证据的可靠性,旨在为调查收集更多有用的证据。提出了一种基于入侵取证的动态取证机制。本文介绍了以入侵检测作为容错触发器,蜜罐作为影子服务器的模型体系结构,描述了有限状态机模型来说明其实现机制,并通过分析两个案例来说明其实现机制。
{"title":"Dynamic Forensics Based on Intrusion Tolerance","authors":"Lin Chen, Zhitang Li, C. Gao, Lan Liu","doi":"10.1109/ISPA.2009.66","DOIUrl":"https://doi.org/10.1109/ISPA.2009.66","url":null,"abstract":"With the development of intrusion technologies, dynamic forensics is becoming more and more important. Dynamic forensics using IDS or honeypot are all based on a common hypothesis that the system is still in a reliable working situation and collected evidences are believable even if the system is suffered from intrusion. In fact, the system has already transferred into an insecurity and unreliable state, it is uncertain that whether the intrusion detectors and investigators could run as normal and whether the obtained evidences are credible. Although intrusion tolerance has been applied in many areas of security for years, few researches are referred to network forensics. The work presented in this paper is based on an idea to integrate Intrusion tolerance into dynamic forensics to make the system under control, ensure the reliability of evidences and aim to gather more useful evidences for investigation. A mechanism of dynamic forensics based on intrusion forensics is proposed. This paper introduces the architecture of the model which uses IDS as tolerance and forensics trigger and honeypot as shadow server, the finite state machine model is described to specify the mechanism, and then two cases are analyzed to illuminate the mechanism.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130380111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2009 IEEE International Symposium on Parallel and Distributed Processing with Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1