首页 > 最新文献

Proceedings International Conference on Parallel Processing最新文献

英文 中文
A system for monitoring and management of computational grids 一种用于监测和管理计算网格的系统
Pub Date : 2013-08-07 DOI: 10.1109/ICPP.2002.1040859
Warren Smith
As organizations begin to deploy large computational grids, it has become apparent that systems for observation and control of the resources, services, and applications that make up such grids are needed. Administrators must observe resources and services to ensure that they are operating correctly and must control resources and services to ensure that their operation meets the needs of users. Users are also interested in the operation of resources and services so that they can choose the most appropriate ones to use. We describe a prototype system to monitor and manage computational grids and describe the general software framework for control and observation in distributed environments that it is based on.
当组织开始部署大型计算网格时,很明显需要用于观察和控制组成这些网格的资源、服务和应用程序的系统。管理员需要对资源和服务进行观察,以确保其正常运行;对资源和服务进行控制,以确保其运行满足用户的需求。用户还对资源和服务的运行情况感兴趣,以便他们可以选择最合适的资源和服务来使用。我们描述了一个用于监控和管理计算网格的原型系统,并描述了它所基于的分布式环境中用于控制和观察的通用软件框架。
{"title":"A system for monitoring and management of computational grids","authors":"Warren Smith","doi":"10.1109/ICPP.2002.1040859","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040859","url":null,"abstract":"As organizations begin to deploy large computational grids, it has become apparent that systems for observation and control of the resources, services, and applications that make up such grids are needed. Administrators must observe resources and services to ensure that they are operating correctly and must control resources and services to ensure that their operation meets the needs of users. Users are also interested in the operation of resources and services so that they can choose the most appropriate ones to use. We describe a prototype system to monitor and manage computational grids and describe the general software framework for control and observation in distributed environments that it is based on.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134358278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Distributed game-tree search using transposition table driven work scheduling 使用换位表驱动的分布式游戏树搜索工作调度
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040888
Akihiro Kishimoto, J. Schaeffer
The /spl alpha//spl beta/ algorithm for two-player game-tree search has a notorious reputation as being a challenging algorithm for achieving reasonable parallel performance. MTD(f), a new /spl alpha//spl beta/ variant, has become the sequential algorithm of choice for practitioners. Unfortunately, MTD(f) inherits most of the parallel obstacles of /spl alpha//spl beta/, as well as creating new performance hurdles. Transposition-table-driven scheduling (TDS) is a new parallel search algorithm that has proven to be effective in the single-agent (one-player) domain. This paper presents TDSAB, the first time TDS parallelism has been applied to two-player search (the MTD(f) algorithm). Results show that TDSAB gives comparable speedups to that achieved by conventional parallel /spl alpha//spl beta/ algorithms. However, since this is a parallelization of a superior sequential algorithm the results in fact are better. This paper shows that the TDS idea can be extended to more challenging search domains.
用于双人游戏树搜索的/spl alpha//spl beta/算法因为是实现合理并行性能的具有挑战性的算法而臭名昭著。MTD(f)是一种新的/spl alpha//spl beta/变体,已成为实践者选择的顺序算法。不幸的是,MTD(f)继承了/spl alpha//spl beta/的大部分并行障碍,并产生了新的性能障碍。换位表驱动调度(TDS)是一种新的并行搜索算法,已被证明在单智能体(一人)领域是有效的。本文提出了TDSAB算法,首次将TDS并行性应用于双玩家搜索(MTD(f)算法)。结果表明,TDSAB的加速速度与传统的并行/spl alpha//spl beta/算法相当。然而,由于这是一种优越的顺序算法的并行化,结果实际上更好。本文表明,TDS思想可以扩展到更具挑战性的搜索领域。
{"title":"Distributed game-tree search using transposition table driven work scheduling","authors":"Akihiro Kishimoto, J. Schaeffer","doi":"10.1109/ICPP.2002.1040888","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040888","url":null,"abstract":"The /spl alpha//spl beta/ algorithm for two-player game-tree search has a notorious reputation as being a challenging algorithm for achieving reasonable parallel performance. MTD(f), a new /spl alpha//spl beta/ variant, has become the sequential algorithm of choice for practitioners. Unfortunately, MTD(f) inherits most of the parallel obstacles of /spl alpha//spl beta/, as well as creating new performance hurdles. Transposition-table-driven scheduling (TDS) is a new parallel search algorithm that has proven to be effective in the single-agent (one-player) domain. This paper presents TDSAB, the first time TDS parallelism has been applied to two-player search (the MTD(f) algorithm). Results show that TDSAB gives comparable speedups to that achieved by conventional parallel /spl alpha//spl beta/ algorithms. However, since this is a parallelization of a superior sequential algorithm the results in fact are better. This paper shows that the TDS idea can be extended to more challenging search domains.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114129585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
A selection technique for replicated multicast video servers 复制多播视频服务器的选择技术
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040913
Akihito Hiromori, H. Yamaguchi, K. Yasumoto, T. Higashino, K. Taniguchi
In this paper, we propose a selection technique for replicated multicast video servers. We assume that each replicated video server transmits the same video source as different quality levels of multicast streams. Using an IGMP facility like m-trace, each receiver monitors packet count information of those streams on routers and periodically selects the one which is expected to provide low loss rate and to be suitable for the current available bandwidth of receivers. Moreover, collection of packet count information is done in a scalable and efficient manner by sharing the collected information across receivers. Our experimental results using the network simulator have shown that our method could achieve much higher quality satisfaction of receivers, under the reasonable amount of tracing traffic.
本文提出了一种复制组播视频服务器的选择技术。我们假设每个复制的视频服务器以不同质量级别的多播流传输相同的视频源。使用像m-trace这样的IGMP工具,每个接收方监控路由器上这些流的包计数信息,并定期选择预期提供低损失率且适合接收方当前可用带宽的流。此外,通过在接收方之间共享收集到的信息,以可扩展和有效的方式收集包计数信息。在网络模拟器上的实验结果表明,在合理的跟踪流量下,我们的方法可以获得更高的质量满意度。
{"title":"A selection technique for replicated multicast video servers","authors":"Akihito Hiromori, H. Yamaguchi, K. Yasumoto, T. Higashino, K. Taniguchi","doi":"10.1109/ICPP.2002.1040913","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040913","url":null,"abstract":"In this paper, we propose a selection technique for replicated multicast video servers. We assume that each replicated video server transmits the same video source as different quality levels of multicast streams. Using an IGMP facility like m-trace, each receiver monitors packet count information of those streams on routers and periodically selects the one which is expected to provide low loss rate and to be suitable for the current available bandwidth of receivers. Moreover, collection of packet count information is done in a scalable and efficient manner by sharing the collected information across receivers. Our experimental results using the network simulator have shown that our method could achieve much higher quality satisfaction of receivers, under the reasonable amount of tracing traffic.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117071357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On-line permutation routing on WDM all-optical networks WDM全光网络的在线排列路由
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040898
Q. Gu
For a sequence (s/sub 1/, t/sub 1/), ..., (s/sub i/, t/sub i/), ... of routing requests with (s/sub i/, t/sub i/) arriving at time step i on the wavelength-division multiplexing (WDM) all-optical network, the on-line routing problem is to set-up a path s/sub i/ /spl rarr/ t/sub i/ and assign a wavelength to the path in step i such that the paths set-up so far with the same wavelength are edge-disjoint. Two measures are important for on-line routing algorithms: the number of wavelengths used and the response time. The sequence (s/sub 1/,t/sub 1/), ..., (s/sub i/, t/sub i/), ... is called a permutation if each node in the network appears in the sequence at most once as a source and at most once as a destination. Let H/sub n/ be the n-dimensional WDM all-optical hypercube. We develop two on-line routing algorithms on H/sub n/. Our first algorithm is a deterministic one which realizes any permutation by at most /spl lceil/3(n-1)/2/spl rceil/ + 1 wavelengths with response time O(2/sup n/). The second algorithm is a randomized one which realizes any permutation by at most (3/2 + /spl delta/)(n-1) wavelengths, where /spl delta/ can be any value satisfying /spl delta/ /spl ges/ 2/(n-1). The average response time of the algorithm is O(n(1 + /spl delta/)//spl delta/). Both algorithms use at most O(n) wavelengths for the permutation on Hn. This improves the previous bound of O(n/sup 2/).
对于序列(s/sub 1/, t/sub 1/),…, (s/下标i/, t/下标i/),…在波分复用(wavelength-division multiplexing, WDM)全光网络中,对于到达时间步长i的(s/sub i/, t/sub i/)路由请求,在线路由问题是建立一条路径s/sub i/ /spl rarr/ t/sub i/,并为步骤i中的路径分配一个波长,使迄今建立的具有相同波长的路径是边不相交的。在线路由算法有两个重要的度量:使用的波长数和响应时间。序列(s/sub 1/,t/sub 1/),…, (s/下标i/, t/下标i/),…如果网络中的每个节点在序列中最多出现一次作为源节点,最多出现一次作为目的地节点,则称为排列。设H/下标n/为n维WDM全光超立方体。我们开发了两种基于H/sub /的在线路由算法。我们的第一种算法是一种确定性算法,它可以实现最多/spl ceil/3(n-1)/2/spl ceil/ + 1波长的任意排列,响应时间为O(2/sup n/)。第二种算法是随机化算法,实现最多(3/2 + /spl delta/)(n-1)波长的任意排列,其中/spl delta/可以是满足/spl delta/ /spl ges/ 2/(n-1)的任何值。算法的平均响应时间为O(n(1 + /spl delta/)//spl delta/)。这两种算法最多使用O(n)个波长来进行Hn上的排列。这改进了之前的O(n/sup 2/)的边界。
{"title":"On-line permutation routing on WDM all-optical networks","authors":"Q. Gu","doi":"10.1109/ICPP.2002.1040898","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040898","url":null,"abstract":"For a sequence (s/sub 1/, t/sub 1/), ..., (s/sub i/, t/sub i/), ... of routing requests with (s/sub i/, t/sub i/) arriving at time step i on the wavelength-division multiplexing (WDM) all-optical network, the on-line routing problem is to set-up a path s/sub i/ /spl rarr/ t/sub i/ and assign a wavelength to the path in step i such that the paths set-up so far with the same wavelength are edge-disjoint. Two measures are important for on-line routing algorithms: the number of wavelengths used and the response time. The sequence (s/sub 1/,t/sub 1/), ..., (s/sub i/, t/sub i/), ... is called a permutation if each node in the network appears in the sequence at most once as a source and at most once as a destination. Let H/sub n/ be the n-dimensional WDM all-optical hypercube. We develop two on-line routing algorithms on H/sub n/. Our first algorithm is a deterministic one which realizes any permutation by at most /spl lceil/3(n-1)/2/spl rceil/ + 1 wavelengths with response time O(2/sup n/). The second algorithm is a randomized one which realizes any permutation by at most (3/2 + /spl delta/)(n-1) wavelengths, where /spl delta/ can be any value satisfying /spl delta/ /spl ges/ 2/(n-1). The average response time of the algorithm is O(n(1 + /spl delta/)//spl delta/). Both algorithms use at most O(n) wavelengths for the permutation on Hn. This improves the previous bound of O(n/sup 2/).","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121577771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The tracefile testbed - a community repository for identifying and retrieving HPC performance data 跟踪文件测试平台——用于识别和检索HPC性能数据的社区存储库
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040872
K. Ferschweiler, S. Harrah, D. Keon, M. Calzarossa, D. Tessera, C. Pancake
High-performance computing (HPC) programmers utilize tracefiles, which record program behavior in great detail, as the basis for many performance analysis activities. The lack of generally accessible tracefiles has forced programmers to develop their own testbeds in order to study the basic performance characteristics of the platforms they use. Since tracefiles serve as input to performance analysis and performance prediction tools, tool developers have also been hindered by the lack of a testbed for verifying and fine-tuning tool functionality, We created a community repository that meets the needs of both application and tool developers. In this paper, we describe how the tracefile testbed was designed to facilitate flexible searching and retrieval of tracefiles based on a variety of characteristics. Its Web-based interface provides a convenient mechanism for browsing, downloading, and uploading collections of tracefiles and tracefile segments, as well as viewing statistical summaries of performance characteristics.
高性能计算(HPC)程序员利用跟踪文件作为许多性能分析活动的基础,跟踪文件非常详细地记录了程序行为。缺乏普遍可访问的跟踪文件迫使程序员开发他们自己的测试平台,以便研究他们使用的平台的基本性能特征。由于跟踪文件作为性能分析和性能预测工具的输入,工具开发人员也被缺乏用于验证和微调工具功能的测试平台所阻碍,我们创建了一个满足应用程序和工具开发人员需求的社区存储库。在本文中,我们描述了如何设计跟踪文件测试平台,以促进基于各种特征的跟踪文件的灵活搜索和检索。它的基于web的界面为浏览、下载和上传跟踪文件和跟踪文件段集合以及查看性能特征的统计摘要提供了方便的机制。
{"title":"The tracefile testbed - a community repository for identifying and retrieving HPC performance data","authors":"K. Ferschweiler, S. Harrah, D. Keon, M. Calzarossa, D. Tessera, C. Pancake","doi":"10.1109/ICPP.2002.1040872","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040872","url":null,"abstract":"High-performance computing (HPC) programmers utilize tracefiles, which record program behavior in great detail, as the basis for many performance analysis activities. The lack of generally accessible tracefiles has forced programmers to develop their own testbeds in order to study the basic performance characteristics of the platforms they use. Since tracefiles serve as input to performance analysis and performance prediction tools, tool developers have also been hindered by the lack of a testbed for verifying and fine-tuning tool functionality, We created a community repository that meets the needs of both application and tool developers. In this paper, we describe how the tracefile testbed was designed to facilitate flexible searching and retrieval of tracefiles based on a variety of characteristics. Its Web-based interface provides a convenient mechanism for browsing, downloading, and uploading collections of tracefiles and tracefile segments, as well as viewing statistical summaries of performance characteristics.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131397248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Worst case analysis of a greedy multicast algorithm in k-ary n-cubes k-ary n-立方体中贪婪组播算法的最坏情况分析
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040908
S. Fujita
In this paper, we consider the problem of multicasting a message in k-ary n-cubes under the store-and-forward model. The objective of the problem is to minimize the size of the resultant multicast tree by keeping the distance to each destination over the tree the same as the distance in the original graph. In the following, we first propose an algorithm that grows a multicast tree in a greedy manner, in the sense that for each intermediate vertex of the tree, the outgoing edges of the vertex are selected in a non-increasing order of the number of destinations that can use the edge in a shortest path to the destination. We then evaluate the goodness of the algorithm in terms of the worst case ratio of the size of the generated tree to the size of an optimal tree. It is proved that for any k/spl ges/5 and n/spl ges/6, the performance ratio of the greedy algorithm is c/spl times/kn-o(n) for some constant 1/1.2/spl les/c/spl les/1/2.
在本文中,我们考虑了在存储转发模型下,在k元n个数据集中多播消息的问题。该问题的目标是通过保持到树上每个目的地的距离与原始图中的距离相同来最小化所得到的多播树的大小。在下文中,我们首先提出了一种以贪婪方式生长多播树的算法,即对于树的每个中间顶点,顶点的出线边按照可以在最短路径中使用该边的目的地数量的非递增顺序选择。然后,我们根据生成的树的大小与最优树的大小的最坏情况的比率来评估算法的优劣。证明了对于任意k/spl ges/5和n/spl ges/6,对于某常数1/1.2/spl les/c/spl les/1/2,贪心算法的性能比为c/spl乘以/kn-o(n)。
{"title":"Worst case analysis of a greedy multicast algorithm in k-ary n-cubes","authors":"S. Fujita","doi":"10.1109/ICPP.2002.1040908","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040908","url":null,"abstract":"In this paper, we consider the problem of multicasting a message in k-ary n-cubes under the store-and-forward model. The objective of the problem is to minimize the size of the resultant multicast tree by keeping the distance to each destination over the tree the same as the distance in the original graph. In the following, we first propose an algorithm that grows a multicast tree in a greedy manner, in the sense that for each intermediate vertex of the tree, the outgoing edges of the vertex are selected in a non-increasing order of the number of destinations that can use the edge in a shortest path to the destination. We then evaluate the goodness of the algorithm in terms of the worst case ratio of the size of the generated tree to the size of an optimal tree. It is proved that for any k/spl ges/5 and n/spl ges/6, the performance ratio of the greedy algorithm is c/spl times/kn-o(n) for some constant 1/1.2/spl les/c/spl les/1/2.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131563982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and evaluation of scalable switching fabrics for high-performance routers 高性能路由器可扩展交换结构的设计与评估
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040871
N. Tzeng, Ravi C. Batchu
This work considers switching fabrics with distributed packet routing to achieve high scalability and low costs. The considered switching fabrics are based on a multistage structure with different re-circulation designs, where adjacent stages are interconnected according to the indirect n-cube connection style. They all compare favorably with an earlier multistage-based counterpart according to extensive simulation, in terms of performance measures of interest and hardware complexity. When queues are incorporated in the output ports of switching elements (SEs), the total number of stages required in our proposed fabrics to reach a given performance level can be reduced substantially. The performance of those fabrics with output queues is evaluated under different "speedups" of the queues, where the speedup is the operating clock rate ratio of that at the SE core to that over external links. Our simulation reveals that a small speedup of 2 is adequate for buffered switching fabrics comprising 4/spl times/8 SEs to deliver better performance than their unbuffered counterparts with 50% more stages of SEs, when the fabric size is 256. The buffered switching fabrics under our consideration are scalable and of low costs, ideally suitable for constructing high-performance routers with large numbers of line cards.
本文考虑采用分布式分组路由的交换结构,以实现高可扩展性和低成本。所考虑的开关织物基于多级结构,具有不同的再循环设计,其中相邻的阶段根据间接的n立方体连接方式相互连接。根据广泛的模拟,就感兴趣的性能度量和硬件复杂性而言,它们都优于早期的基于多阶段的对等物。当在交换元件(se)的输出端口中加入队列时,我们建议的fabric中达到给定性能水平所需的总级数可以大大减少。这些具有输出队列的结构的性能在队列的不同“加速”下进行评估,其中加速是SE核心与外部链接的操作时钟速率之比。我们的模拟表明,当织物尺寸为256时,对于包含4/ sp1 /8 se的缓冲交换织物来说,2的小加速足以提供比具有50%以上se级的未缓冲交换织物更好的性能。我们考虑的缓冲交换结构具有可扩展性和低成本,非常适合构建具有大量线卡的高性能路由器。
{"title":"Design and evaluation of scalable switching fabrics for high-performance routers","authors":"N. Tzeng, Ravi C. Batchu","doi":"10.1109/ICPP.2002.1040871","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040871","url":null,"abstract":"This work considers switching fabrics with distributed packet routing to achieve high scalability and low costs. The considered switching fabrics are based on a multistage structure with different re-circulation designs, where adjacent stages are interconnected according to the indirect n-cube connection style. They all compare favorably with an earlier multistage-based counterpart according to extensive simulation, in terms of performance measures of interest and hardware complexity. When queues are incorporated in the output ports of switching elements (SEs), the total number of stages required in our proposed fabrics to reach a given performance level can be reduced substantially. The performance of those fabrics with output queues is evaluated under different \"speedups\" of the queues, where the speedup is the operating clock rate ratio of that at the SE core to that over external links. Our simulation reveals that a small speedup of 2 is adequate for buffered switching fabrics comprising 4/spl times/8 SEs to deliver better performance than their unbuffered counterparts with 50% more stages of SEs, when the fabric size is 256. The buffered switching fabrics under our consideration are scalable and of low costs, ideally suitable for constructing high-performance routers with large numbers of line cards.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127106497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Software caching using dynamic binary rewriting for embedded devices 嵌入式设备使用动态二进制重写的软件缓存
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040920
Chad Huneycutt, J. Fryman, K. Mackenzie
A software cache implements instruction and data caching entirely in software. Dynamic binary rewriting offers a means to specialize the software cache miss checks at cache miss time. We describe a software cache system implemented using dynamic binary rewriting and observe that the combination is particularly appropriate for the scenario of a simple embedded system connected to a more powerful server over a network. As two examples, consider a network of sensors with local processing or cell phones connected to cell towers. We describe two software cache systems for instruction caching only using dynamic binary rewriting and present results for the performance of instruction caching in these systems. We measure time overheads of 19% compared to no caching. We also show that we can guarantee a 100% hit rate for codes that fit in the cache. For comparison, we estimate that a comparable hardware cache would have space overhead of 12-18% for its tag array and would offer no hit rate guarantee.
软件缓存完全在软件中实现指令和数据缓存。动态二进制重写提供了一种在缓存丢失时专门化软件缓存丢失检查的方法。我们描述了一个使用动态二进制重写实现的软件缓存系统,并观察到这种组合特别适合于通过网络连接到功能更强大的服务器的简单嵌入式系统的场景。举两个例子,考虑具有本地处理功能的传感器网络或连接到手机信号塔的移动电话。本文描述了两种仅使用动态二进制重写进行指令缓存的软件缓存系统,并给出了这些系统中指令缓存性能的结果。与没有缓存相比,我们测量的时间开销为19%。我们还表明,我们可以保证适合缓存的代码的100%命中率。为了进行比较,我们估计一个类似的硬件缓存将为其标记数组占用12-18%的空间开销,并且不提供命中率保证。
{"title":"Software caching using dynamic binary rewriting for embedded devices","authors":"Chad Huneycutt, J. Fryman, K. Mackenzie","doi":"10.1109/ICPP.2002.1040920","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040920","url":null,"abstract":"A software cache implements instruction and data caching entirely in software. Dynamic binary rewriting offers a means to specialize the software cache miss checks at cache miss time. We describe a software cache system implemented using dynamic binary rewriting and observe that the combination is particularly appropriate for the scenario of a simple embedded system connected to a more powerful server over a network. As two examples, consider a network of sensors with local processing or cell phones connected to cell towers. We describe two software cache systems for instruction caching only using dynamic binary rewriting and present results for the performance of instruction caching in these systems. We measure time overheads of 19% compared to no caching. We also show that we can guarantee a 100% hit rate for codes that fit in the cache. For comparison, we estimate that a comparable hardware cache would have space overhead of 12-18% for its tag array and would offer no hit rate guarantee.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129300718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Introducing SCSI-to-IP cache for storage area networks 为存储区域网络引入scsi到ip缓存
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040875
Xubin He, Qing Yang, Ming Zhang
Data storage plays an essential role in today's fast-growing data-intensive network services. iSCSI is one of the most recent standards that allow SCSI protocols to be carried out over IP networks. However, the disparities between SCSI and IP prevent fast and efficient deployment of SAN (storage area network) over IP. This paper introduces STICS (SCSI-To-IP cache storage), a novel storage architecture that couples reliable and high-speed data caching with low-overhead conversion between SCSI and IP protocols. Through the efficient caching algorithm and localization of certain unnecessary protocol overheads, STICS significantly improves performance over current iSCSI system. Furthermore, STICS can be used as a basic plug-and-play building block for data storage over IP. We have implemented software STICS prototype on Linux operating system. Numerical results using popular PostMark benchmark program and EMC's trace have shown dramatic performance gain over the current iSCSI implementation.
在数据密集型网络业务快速发展的今天,数据存储扮演着至关重要的角色。iSCSI是允许在IP网络上执行SCSI协议的最新标准之一。然而,SCSI和IP之间的差异阻碍了在IP上快速有效地部署SAN(存储区域网络)。本文介绍了SCSI- to -IP缓存存储,这是一种新型的存储体系结构,它将可靠的高速数据缓存与SCSI和IP协议之间的低开销转换结合在一起。通过高效的缓存算法和某些不必要的协议开销的本地化,与当前的iSCSI系统相比,STICS显着提高了性能。此外,可以将STICS用作通过IP进行数据存储的基本即插即用构建块。我们在Linux操作系统上实现了软件的原型。使用流行的邮戳基准程序和EMC跟踪的数值结果显示,与当前的iSCSI实现相比,性能有了显着的提高。
{"title":"Introducing SCSI-to-IP cache for storage area networks","authors":"Xubin He, Qing Yang, Ming Zhang","doi":"10.1109/ICPP.2002.1040875","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040875","url":null,"abstract":"Data storage plays an essential role in today's fast-growing data-intensive network services. iSCSI is one of the most recent standards that allow SCSI protocols to be carried out over IP networks. However, the disparities between SCSI and IP prevent fast and efficient deployment of SAN (storage area network) over IP. This paper introduces STICS (SCSI-To-IP cache storage), a novel storage architecture that couples reliable and high-speed data caching with low-overhead conversion between SCSI and IP protocols. Through the efficient caching algorithm and localization of certain unnecessary protocol overheads, STICS significantly improves performance over current iSCSI system. Furthermore, STICS can be used as a basic plug-and-play building block for data storage over IP. We have implemented software STICS prototype on Linux operating system. Numerical results using popular PostMark benchmark program and EMC's trace have shown dramatic performance gain over the current iSCSI implementation.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122366681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Analysis of memory hierarchy performance of block data layout 块数据布局的内存层次性能分析
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040857
Neungsoo Park, Bo Hong, V. Prasanna
Recently, several experimental studies have been conducted on block data layout as a data transformation technique used in conjunction with tiling to improve cache performance. We provide a theoretical analysis for the TLB and cache performance of block data layout. For standard matrix access patterns, we derive an asymptotic lower bound on the number of TLB misses for any data layout and show that block data layout achieves this bound. We show that block data layout improves TLB misses by a factor of O(B) compared with conventional data layouts, where B is the block size of block data layout. This reduction contributes to the improvement in memory hierarchy performance. Using our TLB and cache analysis, we also discuss the impact of block size on the overall memory hierarchy performance. These results are validated through simulations and experiments on state-of-the-art platforms.
最近,已经进行了一些实验研究,将块数据布局作为数据转换技术与平铺技术结合使用,以提高缓存性能。对块数据布局的TLB和缓存性能进行了理论分析。对于标准矩阵访问模式,我们给出了任意数据布局的TLB缺失数的渐近下界,并证明了块数据布局达到了这个下界。我们表明,与传统数据布局相比,块数据布局将TLB失误率提高了O(B)倍,其中B是块数据布局的块大小。这种减少有助于提高内存层次结构性能。使用我们的TLB和缓存分析,我们还讨论了块大小对整体内存层次结构性能的影响。这些结果通过仿真和实验在最先进的平台上得到验证。
{"title":"Analysis of memory hierarchy performance of block data layout","authors":"Neungsoo Park, Bo Hong, V. Prasanna","doi":"10.1109/ICPP.2002.1040857","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040857","url":null,"abstract":"Recently, several experimental studies have been conducted on block data layout as a data transformation technique used in conjunction with tiling to improve cache performance. We provide a theoretical analysis for the TLB and cache performance of block data layout. For standard matrix access patterns, we derive an asymptotic lower bound on the number of TLB misses for any data layout and show that block data layout achieves this bound. We show that block data layout improves TLB misses by a factor of O(B) compared with conventional data layouts, where B is the block size of block data layout. This reduction contributes to the improvement in memory hierarchy performance. Using our TLB and cache analysis, we also discuss the impact of block size on the overall memory hierarchy performance. These results are validated through simulations and experiments on state-of-the-art platforms.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131956599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
期刊
Proceedings International Conference on Parallel Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1