首页 > 最新文献

2010 First International Conference on Networking and Computing最新文献

英文 中文
A Cluster Based Collaborative Cache Approach for MANETs 一种基于集群的manet协同缓存方法
Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.41
Marcos F. Caetano, J. Bordim
The main contribution of this work is to propose a distributed and collaborative cache solution in which the decisions to cache objects are performed in a collaborative way. In our solution, objects are classified in private and shared objects. Private objects are managed as in an individual cache system whereas the management of shared objects are performed collectively and are stored in a shared area. The simulation results have shown that our solution can expressively reduce the server load by increasing cache diversity and the probability of cache hits. Also, our solution provides significant savings in terms of battery power and congestion on the routes towards the sink node.
这项工作的主要贡献是提出了一个分布式和协作的缓存解决方案,其中缓存对象的决策以协作的方式执行。在我们的解决方案中,对象分为私有对象和共享对象。私有对象在单独的缓存系统中进行管理,而共享对象的管理是集体执行的,并存储在共享区域中。仿真结果表明,该方案通过提高缓存分集和缓存命中概率,显著降低了服务器负载。此外,我们的解决方案在电池电量和通往汇聚节点的路由拥塞方面提供了显著的节省。
{"title":"A Cluster Based Collaborative Cache Approach for MANETs","authors":"Marcos F. Caetano, J. Bordim","doi":"10.1109/IC-NC.2010.41","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.41","url":null,"abstract":"The main contribution of this work is to propose a distributed and collaborative cache solution in which the decisions to cache objects are performed in a collaborative way. In our solution, objects are classified in private and shared objects. Private objects are managed as in an individual cache system whereas the management of shared objects are performed collectively and are stored in a shared area. The simulation results have shown that our solution can expressively reduce the server load by increasing cache diversity and the probability of cache hits. Also, our solution provides significant savings in terms of battery power and congestion on the routes towards the sink node.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132811184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Balance and Proximity-Aware Skip Graph Construction 平衡和邻近感知跳跃图构造
Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.59
Fuminori Makikawa, Tatsuhiro Tsuchiya, T. Kikuno
A skip graph is a valuable overlay network for searching for keys in a peer-to-peer application. A problem with the construction algorithm for skip graphs is that it does not consider the proximity of adjacent peers. Because of this, a skip graph often contains links with considerably high communication time. Another problem is that due to the random nature of the algorithm, a skip graph often exhibits structural imbalance. In this paper, we propose a topology reconstruction algorithm to solve these problems. This algorithm, iteratively executed by each node, evaluates both proximity and topological balance and reshapes the overlay topology if necessary. The results of simulations show that the skip graph constructed by our approach achieves shorter search delay than the original skip graph.
跳图是一种有价值的覆盖网络,用于在点对点应用程序中搜索密钥。跳跃图的构造算法的一个问题是它没有考虑相邻节点的接近性。因此,跳过图通常包含具有相当高通信时间的链接。另一个问题是,由于算法的随机性,跳图经常表现出结构不平衡。在本文中,我们提出了一种拓扑重构算法来解决这些问题。该算法由每个节点迭代执行,评估接近性和拓扑平衡,并在必要时重塑覆盖拓扑。仿真结果表明,该方法构造的跳跃图比原跳跃图具有更短的搜索延迟。
{"title":"Balance and Proximity-Aware Skip Graph Construction","authors":"Fuminori Makikawa, Tatsuhiro Tsuchiya, T. Kikuno","doi":"10.1109/IC-NC.2010.59","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.59","url":null,"abstract":"A skip graph is a valuable overlay network for searching for keys in a peer-to-peer application. A problem with the construction algorithm for skip graphs is that it does not consider the proximity of adjacent peers. Because of this, a skip graph often contains links with considerably high communication time. Another problem is that due to the random nature of the algorithm, a skip graph often exhibits structural imbalance. In this paper, we propose a topology reconstruction algorithm to solve these problems. This algorithm, iteratively executed by each node, evaluates both proximity and topological balance and reshapes the overlay topology if necessary. The results of simulations show that the skip graph constructed by our approach achieves shorter search delay than the original skip graph.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"277 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125851346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Rewriting Algorithm to Generate AROM-free Fully Synchronous Circuits 一种生成无芳香全同步电路的重写算法
Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.54
Md. Nazrul Islam Mondal, K. Nakano, Yasuaki Ito
A Field Programmable Gate Array (FPGA) is used to embed a circuit designed by users instantly. FPGAs can be used for implementing hardware algorithms. Most of FPGAs have Configurable Logic Blocks (CLBs) to implement combinational and sequential circuits and block RAMs to implement Random Access Memories (RAMs) and Read Only Memories (ROMs). Circuit design that minimizes the number of clock cycles is easy if we use asynchronous read operations. However, most RAMs and ROMs in modern FPGAs support synchronous read operations, but do not support asynchronous read operations. It is one of the main difficulties for users to implement hardware algorithms using RAMs and ROMs with synchronous read operations. The main contribution of this paper is to provide one of the potent methods to resolve this problem. We assume that a circuit using asynchronous ROMs designed by a user is given. Our goal is to convert this circuit into an equivalent circuit with synchronous ROMs. We first clarify the condition that a given circuit with asynchronous ROMs can be converted into a circuit without asynchronous ROMs. For this purpose, we will show an algorithm that can generate a circuit with synchronous ROMs, whenever the original circuit with asynchronous ROMs satisfies this condition. Using our conversion algorithm, users can assume that FPGAs support asynchronous ROMs when they design their circuits. Finally, we will show that we can generate an almost equivalent circuit with synchronous ROMs by modifying the circuit even if it does not satisfy this condition.
现场可编程门阵列(FPGA)用于即时嵌入用户设计的电路。fpga可用于实现硬件算法。大多数fpga具有可配置逻辑块(clb)来实现组合和顺序电路,并具有块ram来实现随机存取存储器(ram)和只读存储器(rom)。如果我们使用异步读取操作,那么最小化时钟周期的电路设计是很容易的。然而,现代fpga中的大多数ram和rom支持同步读取操作,但不支持异步读取操作。使用具有同步读取操作的ram和rom实现硬件算法是用户面临的主要困难之一。本文的主要贡献是为解决这一问题提供了一种有效的方法。我们假设用户设计了一个使用异步rom的电路。我们的目标是将该电路转换成具有同步rom的等效电路。我们首先阐明了一个给定的带有异步rom的电路可以转换成没有异步rom的电路的条件。为此,我们将展示一种算法,该算法可以在具有异步rom的原始电路满足此条件时生成具有同步rom的电路。使用我们的转换算法,用户可以假设fpga在设计电路时支持异步rom。最后,我们将证明,即使不满足此条件,我们也可以通过修改电路来生成具有同步rom的几乎等效电路。
{"title":"A Rewriting Algorithm to Generate AROM-free Fully Synchronous Circuits","authors":"Md. Nazrul Islam Mondal, K. Nakano, Yasuaki Ito","doi":"10.1109/IC-NC.2010.54","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.54","url":null,"abstract":"A Field Programmable Gate Array (FPGA) is used to embed a circuit designed by users instantly. FPGAs can be used for implementing hardware algorithms. Most of FPGAs have Configurable Logic Blocks (CLBs) to implement combinational and sequential circuits and block RAMs to implement Random Access Memories (RAMs) and Read Only Memories (ROMs). Circuit design that minimizes the number of clock cycles is easy if we use asynchronous read operations. However, most RAMs and ROMs in modern FPGAs support synchronous read operations, but do not support asynchronous read operations. It is one of the main difficulties for users to implement hardware algorithms using RAMs and ROMs with synchronous read operations. The main contribution of this paper is to provide one of the potent methods to resolve this problem. We assume that a circuit using asynchronous ROMs designed by a user is given. Our goal is to convert this circuit into an equivalent circuit with synchronous ROMs. We first clarify the condition that a given circuit with asynchronous ROMs can be converted into a circuit without asynchronous ROMs. For this purpose, we will show an algorithm that can generate a circuit with synchronous ROMs, whenever the original circuit with asynchronous ROMs satisfies this condition. Using our conversion algorithm, users can assume that FPGAs support asynchronous ROMs when they design their circuits. Finally, we will show that we can generate an almost equivalent circuit with synchronous ROMs by modifying the circuit even if it does not satisfy this condition.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125999971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Mesh-of-Tori: A Novel Interconnection Network for Frontal Plane Cellular Processors 一种面向正面元胞处理器的新型互连网络
Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.30
A. Ravankar, S. Sedukhin
In this paper we propose a novel “Mesh-of-Tori” cellular interconnection network for scalable and massively parallel array processors with frontal plane I/O. The unit (called “m-Cell”) in this topology is the smallest double (for 2D case) or triple (for 3D) folded torus, which forms the basic ‘tile’. The Cells can be repeated and “fused” to form macro-Cells, or “divided” to form smaller Cells, without destroying the homogeneity of the entire structure, to give a highly scalable and cellular “Mesh-of-Tori” topology. The key features of the proposed interconnection network are (1) an excellent up and down scalability due to regularity and modularity, (2) no end-around connections, and (3) capability to map 2D streaming data from frontal plane I/O (stacked layer of sensors) to the processing elements. We also provide solutions for stream data manipulation through frontal plane I/O, on the propose d cellular topology.
在本文中,我们提出了一种新颖的“网格环”蜂窝互连网络,用于具有正面平面I/O的可扩展和大规模并行阵列处理器。这种拓扑结构中的单元(称为“m-Cell”)是最小的双(2D情况下)或三(3D情况下)折叠环面,形成基本的“瓷砖”。这些细胞可以重复和“融合”形成大细胞,或者“分裂”形成更小的细胞,而不会破坏整个结构的同质性,从而形成高度可扩展和细胞化的“环面网格”拓扑。所提出的互连网络的关键特征是:(1)由于规律性和模块化而具有出色的上下可扩展性,(2)无末端连接,以及(3)将2D流数据从正面I/O(传感器堆叠层)映射到处理元素的能力。我们还提供了通过前端平面I/O进行流数据操作的解决方案,基于所提出的d细胞拓扑结构。
{"title":"Mesh-of-Tori: A Novel Interconnection Network for Frontal Plane Cellular Processors","authors":"A. Ravankar, S. Sedukhin","doi":"10.1109/IC-NC.2010.30","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.30","url":null,"abstract":"In this paper we propose a novel “Mesh-of-Tori” cellular interconnection network for scalable and massively parallel array processors with frontal plane I/O. The unit (called “m-Cell”) in this topology is the smallest double (for 2D case) or triple (for 3D) folded torus, which forms the basic ‘tile’. The Cells can be repeated and “fused” to form macro-Cells, or “divided” to form smaller Cells, without destroying the homogeneity of the entire structure, to give a highly scalable and cellular “Mesh-of-Tori” topology. The key features of the proposed interconnection network are (1) an excellent up and down scalability due to regularity and modularity, (2) no end-around connections, and (3) capability to map 2D streaming data from frontal plane I/O (stacked layer of sensors) to the processing elements. We also provide solutions for stream data manipulation through frontal plane I/O, on the propose d cellular topology.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"2227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130186389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Control Independence Using Dual Renaming 使用双重重命名实现控制独立性
Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.16
Lin Meng, S. Oyanagi
Modern Super scalar Processor squashes up all of wrong-path instructions when the branch prediction misses. In deeper pipelines, branch miss prediction penalty increases seriously owing to large number of squashed instructions. Exploiting control independence has been proposed for reducing this penalty. Control Independence method reuses control independent instructions (CI instructions) without squashing when branch prediction misses. Reusing CI instructions at branch miss prediction is not easy because of changing data dependency between squashed instructions and CI instructions. Conventional researches of CI architecture require complex Re-renaming mechanism, or with a limited applicability. This paper proposes a new mechanism named Dual Renaming for reusing CI instructions. It assigns two tags for each source register of CI instruction, and solves data dependency with simple mechanism when branch miss prediction is detected. The simulation result shows that Dual Renaming mechanism increases IPCs by maximum 29.52%.
现代超级标量处理器会在分支预测失误时压制所有错误路径指令。在较深的流水线中,由于大量指令被压扁,分支预测失误的惩罚会严重增加。有人提出利用控制独立性来减少这种惩罚。控制独立方法是在分支预测失误时重复使用控制独立指令(CI 指令),而不进行挤压。由于被挤压指令和 CI 指令之间的数据依赖性不断变化,因此在分支预测未命中时重复使用 CI 指令并不容易。传统的 CI 架构研究需要复杂的重新命名机制,或者适用性有限。本文提出了一种名为 "双重重命名 "的新机制,用于重用 CI 指令。它为 CI 指令的每个源寄存器分配两个标记,并在检测到分支未命中预测时通过简单的机制解决数据依赖性问题。仿真结果表明,双重命名机制最多可将 IPC 提高 29.52%。
{"title":"Control Independence Using Dual Renaming","authors":"Lin Meng, S. Oyanagi","doi":"10.1109/IC-NC.2010.16","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.16","url":null,"abstract":"Modern Super scalar Processor squashes up all of wrong-path instructions when the branch prediction misses. In deeper pipelines, branch miss prediction penalty increases seriously owing to large number of squashed instructions. Exploiting control independence has been proposed for reducing this penalty. Control Independence method reuses control independent instructions (CI instructions) without squashing when branch prediction misses. Reusing CI instructions at branch miss prediction is not easy because of changing data dependency between squashed instructions and CI instructions. Conventional researches of CI architecture require complex Re-renaming mechanism, or with a limited applicability. This paper proposes a new mechanism named Dual Renaming for reusing CI instructions. It assigns two tags for each source register of CI instruction, and solves data dependency with simple mechanism when branch miss prediction is detected. The simulation result shows that Dual Renaming mechanism increases IPCs by maximum 29.52%.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128006291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
AES Encryption Implementation on CUDA GPU and Its Analysis AES加密在CUDA GPU上的实现及分析
Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.49
Keisuke Iwai, T. Kurokawa, Naoki Nishikawa
GPU has a good performance ratio and exhibits the capability for applications with high level of parallelism despite its inexpensive price. The support of integer and logical instructions on the latest generation of GPU makes us to implement cipher algorithms easier with the same instructions. However the decisions such as parallel processing granularity or memory allocation place imposed heavy burden on programmers. For this reason this paper shows the results of several experiments to study relation between memory allocation style of AES parameters and granularity as the parallelism exploited from AES encoding process using CUDA with NVIDIA Geforce GTX285. The result of experiments cleared up that the 16Byte/thread granularity had the highest performance and it achieved approximately 35Gbps throughput. Moreover, implementation with overlapping between processing and data transfer brought up 22.5Gbps throughput including data transfer time. Also, it cleared up that it is important to decide granularity and memory allocation to effective processing in AES encryption on GPU.
GPU具有良好的性能价格比,在价格低廉的情况下,也能表现出高水平并行应用的能力。最新一代GPU对整数指令和逻辑指令的支持,使我们在使用相同指令的情况下更容易实现密码算法。然而,诸如并行处理粒度或内存分配位置等决策给程序员带来了沉重的负担。为此,本文给出了几个实验的结果,研究AES参数的内存分配方式与粒度之间的关系,以及在NVIDIA Geforce GTX285的CUDA上利用AES编码过程的并行性。实验结果表明,16Byte/thread粒度具有最高的性能,它实现了大约35Gbps的吞吐量。此外,在处理和数据传输之间重叠的实现带来了22.5Gbps的吞吐量,包括数据传输时间。同时,本文还明确了在GPU上决定AES加密的粒度和内存分配对有效处理的重要性。
{"title":"AES Encryption Implementation on CUDA GPU and Its Analysis","authors":"Keisuke Iwai, T. Kurokawa, Naoki Nishikawa","doi":"10.1109/IC-NC.2010.49","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.49","url":null,"abstract":"GPU has a good performance ratio and exhibits the capability for applications with high level of parallelism despite its inexpensive price. The support of integer and logical instructions on the latest generation of GPU makes us to implement cipher algorithms easier with the same instructions. However the decisions such as parallel processing granularity or memory allocation place imposed heavy burden on programmers. For this reason this paper shows the results of several experiments to study relation between memory allocation style of AES parameters and granularity as the parallelism exploited from AES encoding process using CUDA with NVIDIA Geforce GTX285. The result of experiments cleared up that the 16Byte/thread granularity had the highest performance and it achieved approximately 35Gbps throughput. Moreover, implementation with overlapping between processing and data transfer brought up 22.5Gbps throughput including data transfer time. Also, it cleared up that it is important to decide granularity and memory allocation to effective processing in AES encryption on GPU.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123346701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Smart Core System for Dependable Many-Core Processor with Multifunction Routers 带多功能路由器的可靠多核处理器智能核心系统
Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.53
Shinya Takamaeda-Yamazaki, Shimpei Sato, T. Miyoshi, Kenji Kise
Dependability of many-core processors is a very important topic. To improve the dependability, we propose the Smart Core system, which is a smart many-core system with redundant cores and multifunction routers. The multifunction router has three functions: copying packets, changing the destinations of packets, and rendezvousing and comparing two packets from different nodes. Using these additional functions, the Smart Core system realizes redundant execution on multiple cores and detects execution errors at the packet level. We implemented a many-core processor with the Smart Core system on a software simulator. The evalution result shows that the performance overhead of packet rendezvous is small, up to 4.1%. In addition, we verified that a dependable many-core processor with the Smart Core system detects execution errors on a hardware prototyping system.
多核处理器的可靠性是一个非常重要的课题。为了提高可靠性,我们提出了智能核心系统,它是一个具有冗余核心和多功能路由器的智能多核系统。多功能路由器有三个功能:复制报文、改变报文的目的地、对来自不同节点的两个报文进行对接和比较。通过这些附加功能,智能核心系统可以在多核上实现冗余执行,并在数据包级别检测执行错误。我们在软件模拟器上使用Smart Core系统实现了一个多核处理器。评估结果表明,分组交会的性能开销很小,可达4.1%。此外,我们验证了具有Smart Core系统的可靠多核处理器可以检测硬件原型系统上的执行错误。
{"title":"Smart Core System for Dependable Many-Core Processor with Multifunction Routers","authors":"Shinya Takamaeda-Yamazaki, Shimpei Sato, T. Miyoshi, Kenji Kise","doi":"10.1109/IC-NC.2010.53","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.53","url":null,"abstract":"Dependability of many-core processors is a very important topic. To improve the dependability, we propose the Smart Core system, which is a smart many-core system with redundant cores and multifunction routers. The multifunction router has three functions: copying packets, changing the destinations of packets, and rendezvousing and comparing two packets from different nodes. Using these additional functions, the Smart Core system realizes redundant execution on multiple cores and detects execution errors at the packet level. We implemented a many-core processor with the Smart Core system on a software simulator. The evalution result shows that the performance overhead of packet rendezvous is small, up to 4.1%. In addition, we verified that a dependable many-core processor with the Smart Core system detects execution errors on a hardware prototyping system.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125674761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Speed-Up Technique for an Auto-Memoization Processor by Collectively Reusing Continuous Iterations 集体重用连续迭代的自动记忆处理器加速技术
Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.46
Tomoki Ikegaya, Tomoaki Tsumura, H. Matsuo, Y. Nakashima
We have proposed an auto-memoization processor based on computation reuse, and merged it with speculative multithreading based on value prediction into a parallel early computation. In the past model, the parallel early computation detects each iteration of loops as a reusable block. This paper proposes a new parallel early computation model, which integrates multiple continuous iterations into a reusable block automatically and dynamically without modifing executable binaries. We also propose a model for automatically detecting how many iterations should be integrated into one reusable block. Our model reduces the overhead of computation reuse, and further exploits reuse tables. The result of the experiment with SPEC CPU95 FP suite benchmarks shows that the new model improves the maximum speedup from 40.5% to 57.6%, and the average speedup from 15.0% to 26.0%.
提出了一种基于计算重用的自动记忆处理器,并将其与基于值预测的推测多线程融合为并行的早期计算。在过去的模型中,并行早期计算将循环的每次迭代检测为可重用块。本文提出了一种新的并行早期计算模型,该模型在不修改可执行二进制文件的情况下,自动动态地将多个连续迭代集成到一个可重用的块中。我们还提出了一个模型,用于自动检测应该将多少迭代集成到一个可重用块中。我们的模型减少了计算重用的开销,并进一步利用了重用表。在SPEC CPU95 FP套件基准测试上的实验结果表明,新模型将最大加速从40.5%提高到57.6%,平均加速从15.0%提高到26.0%。
{"title":"A Speed-Up Technique for an Auto-Memoization Processor by Collectively Reusing Continuous Iterations","authors":"Tomoki Ikegaya, Tomoaki Tsumura, H. Matsuo, Y. Nakashima","doi":"10.1109/IC-NC.2010.46","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.46","url":null,"abstract":"We have proposed an auto-memoization processor based on computation reuse, and merged it with speculative multithreading based on value prediction into a parallel early computation. In the past model, the parallel early computation detects each iteration of loops as a reusable block. This paper proposes a new parallel early computation model, which integrates multiple continuous iterations into a reusable block automatically and dynamically without modifing executable binaries. We also propose a model for automatically detecting how many iterations should be integrated into one reusable block. Our model reduces the overhead of computation reuse, and further exploits reuse tables. The result of the experiment with SPEC CPU95 FP suite benchmarks shows that the new model improves the maximum speedup from 40.5% to 57.6%, and the average speedup from 15.0% to 26.0%.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126681723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Effective Risk Avoidance Scheme for the EigenTrust Reputation Management System 特征信任信誉管理系统中一种有效的风险规避方案
Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.28
Takuya Nishikawa, S. Fujita
Peer-to-Peer (P2P) systems have attracted considerable attention in recent years, as a key technology to realize scalable, dependable network services. However, because of its high anonymity, P2P systems involve several flaws such as the weakness against malicious attacks by anonymous peers. In this paper, we propose a method to evaluate the trustfulness of each peer by explicitly taking into account the reliability of mutual evaluations. The proposed method is an improvement of the Eigen Trust algorithm proposed by Kamvar et al. which calculates a global trust vector consistent with the observed local trust vectors under the weighted sum in a linear space. The performance of the proposed method is evaluated by simulation. The result of simulations indicates that the proposed method identifies a large subset of reliable peers with sufficiently small number of message transmissions compared with previous schemes including the original Eigen Trust.
P2P (Peer-to-Peer)系统作为实现可扩展、可靠的网络服务的关键技术,近年来受到了广泛的关注。然而,由于其高度匿名性,P2P系统存在一些缺陷,例如无法抵御匿名对等体的恶意攻击。在本文中,我们提出了一种通过显式地考虑相互评估的可靠性来评估每个对等的可信度的方法。本文提出的方法是对Kamvar等人提出的Eigen信任算法的改进,该算法在线性空间的加权和下计算与观测到的局部信任向量一致的全局信任向量。通过仿真对该方法的性能进行了评价。仿真结果表明,与原有的特征信任方案相比,该方法能够以足够少的消息传输量识别出大量可靠的对等体。
{"title":"An Effective Risk Avoidance Scheme for the EigenTrust Reputation Management System","authors":"Takuya Nishikawa, S. Fujita","doi":"10.1109/IC-NC.2010.28","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.28","url":null,"abstract":"Peer-to-Peer (P2P) systems have attracted considerable attention in recent years, as a key technology to realize scalable, dependable network services. However, because of its high anonymity, P2P systems involve several flaws such as the weakness against malicious attacks by anonymous peers. In this paper, we propose a method to evaluate the trustfulness of each peer by explicitly taking into account the reliability of mutual evaluations. The proposed method is an improvement of the Eigen Trust algorithm proposed by Kamvar et al. which calculates a global trust vector consistent with the observed local trust vectors under the weighted sum in a linear space. The performance of the proposed method is evaluated by simulation. The result of simulations indicates that the proposed method identifies a large subset of reliable peers with sufficiently small number of message transmissions compared with previous schemes including the original Eigen Trust.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115717717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Algebra of Synchronization with Application to Deadlock and Semaphores 与死锁和信号量应用同步的代数
Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.43
E. Gomez, K. Schubert
Modern multiprocessor architectures have exacerbated problems of coordinating access to shared data, in particular as regards to the possibility of deadlock. For example semaphores, one of the most basic synchronization primitives, present difficulties. Djikstra defined semaphores to solve the problem of mutual exclusion. Practical implementation of the concept has, however, produced semaphores that are prone to deadlock, even while the original definition is theoretically free of it. This is not simply due to bad programming, but we have lacked a theory that allows us to understand the problem. We introduce a formal definition and new general theory of synchronization. We illustrate its applicability by deriving basic deadlock properties, to show where the problem lies with semaphores and also to guide us in finding some simple modifications to semaphores that greatly ameliorate the problem. We suggest some future directions for deadlock resolution that also avoid resource starvation.
现代多处理器体系结构加剧了对共享数据的协调访问的问题,特别是关于死锁的可能性。例如,信号量(最基本的同步原语之一)就存在困难。Djikstra定义了信号量来解决互斥问题。然而,这个概念的实际实现产生了容易发生死锁的信号量,即使最初的定义在理论上是没有死锁的。这不仅仅是因为糟糕的编程,而是因为我们缺乏一种能让我们理解问题的理论。我们介绍了同步的形式化定义和新的一般理论。我们通过推导基本的死锁属性来说明它的适用性,以显示信号量的问题所在,并指导我们找到一些对信号量的简单修改,从而极大地改善问题。我们提出了一些解决死锁的未来方向,同时也避免了资源匮乏。
{"title":"Algebra of Synchronization with Application to Deadlock and Semaphores","authors":"E. Gomez, K. Schubert","doi":"10.1109/IC-NC.2010.43","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.43","url":null,"abstract":"Modern multiprocessor architectures have exacerbated problems of coordinating access to shared data, in particular as regards to the possibility of deadlock. For example semaphores, one of the most basic synchronization primitives, present difficulties. Djikstra defined semaphores to solve the problem of mutual exclusion. Practical implementation of the concept has, however, produced semaphores that are prone to deadlock, even while the original definition is theoretically free of it. This is not simply due to bad programming, but we have lacked a theory that allows us to understand the problem. We introduce a formal definition and new general theory of synchronization. We illustrate its applicability by deriving basic deadlock properties, to show where the problem lies with semaphores and also to guide us in finding some simple modifications to semaphores that greatly ameliorate the problem. We suggest some future directions for deadlock resolution that also avoid resource starvation.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124839304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2010 First International Conference on Networking and Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1