首页 > 最新文献

2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)最新文献

英文 中文
Experimental Performance Analysis on Autonomous Distributed Collaborative Messaging Protocol 自主分布式协同消息传递协议的实验性能分析
Pub Date : 2018-11-01 DOI: 10.1109/CANDARW.2018.00020
Hiroyoshi Ichikawa, A. Kobayashi
Selfish nodes can collapse autonomous distributed network systems, such as ad hoc network systems or peer-to-peer network systems. In our previous work, we proposed an autonomous distributed collaborative messaging system model utilizing blockchain technology. An intermediate node obtains tokens of sending a message like a postage stamp if the node relays a message which is not from the node neither to the node. We revealed that communication by multiple relaying nodes can be executed, and selfish node cannot commit fraud. In this paper, we evaluate our proposed method to reveal its effectiveness.
自私节点可能导致自治的分布式网络系统崩溃,比如自组织网络系统或点对点网络系统。在我们之前的工作中,我们提出了一种利用区块链技术的自治分布式协作消息传递系统模型。如果中间节点转发的消息既不是来自该节点,也不是来自该节点,则中间节点获得发送消息的令牌(如邮票)。揭示了多个中继节点之间的通信可以被执行,并且自私节点不能进行欺诈。在本文中,我们对所提出的方法进行了评估,以揭示其有效性。
{"title":"Experimental Performance Analysis on Autonomous Distributed Collaborative Messaging Protocol","authors":"Hiroyoshi Ichikawa, A. Kobayashi","doi":"10.1109/CANDARW.2018.00020","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00020","url":null,"abstract":"Selfish nodes can collapse autonomous distributed network systems, such as ad hoc network systems or peer-to-peer network systems. In our previous work, we proposed an autonomous distributed collaborative messaging system model utilizing blockchain technology. An intermediate node obtains tokens of sending a message like a postage stamp if the node relays a message which is not from the node neither to the node. We revealed that communication by multiple relaying nodes can be executed, and selfish node cannot commit fraud. In this paper, we evaluate our proposed method to reveal its effectiveness.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123865451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reduction of Instruction Increase Overhead by STRAIGHT Compiler 利用STRAIGHT编译器减少指令增加开销
Pub Date : 2018-11-01 DOI: 10.1109/CANDARW.2018.00026
Toru Koizumi, Satoshi Nakae, A. Fukuda, H. Irie, S. Sakai
It is effective to remove false dependencies to efficiently perform out-of-order (OoO) execution which improves single thread performance. Hardware register renaming removes these dependencies, but it is one of the bottlenecks of the processor because of its complexity. The use of a STRAIGHT architecture is one of the approaches that allow the compiler to remove these dependencies. Because the source operand is specified as the distance between the producer instruction and consumer instruction and there is no register overwriting, no false dependency occurs. Instead, the compiler must generate code that satisfies the constraint of specifying operands as constant distances that are not dependent on the execution path. Although the basic algorithms for realizing the constraint are already known, the machine code generated thereby cannot achieve high performance because it is necessary to execute many inter-register transfer instructions added by compiler to satisfy the constraints. This paper presents an efficient algorithm that uses data flow analysis to determine the value causing an increase in the number of executed instructions and improve the performance by spilling them on the stack. We developed a compiler that implements the proposed method using LLVM and evaluated using CoreMark as a benchmark. The number of executed instructions was reduced by approximately 31 %, and the execution performance improved by up to 32 %.
消除虚假依赖可以有效地执行乱序(OoO)执行,从而提高单线程性能。硬件寄存器重命名消除了这些依赖关系,但由于其复杂性,它是处理器的瓶颈之一。使用STRAIGHT体系结构是允许编译器删除这些依赖关系的方法之一。由于源操作数被指定为生产者指令和消费者指令之间的距离,并且没有寄存器覆盖,因此不会发生错误的依赖关系。相反,编译器必须生成满足将操作数指定为不依赖于执行路径的恒定距离的约束的代码。虽然实现约束的基本算法已经已知,但由于需要执行编译器为满足约束而添加的许多寄存器间传输指令,因此生成的机器码不能达到高性能。本文提出了一种有效的算法,利用数据流分析来确定导致执行指令数量增加的值,并通过将它们溢出到堆栈上来提高性能。我们开发了一个编译器,使用LLVM实现了所提出的方法,并使用CoreMark作为基准进行了评估。执行指令的数量减少了大约31%,执行性能提高了多达32%。
{"title":"Reduction of Instruction Increase Overhead by STRAIGHT Compiler","authors":"Toru Koizumi, Satoshi Nakae, A. Fukuda, H. Irie, S. Sakai","doi":"10.1109/CANDARW.2018.00026","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00026","url":null,"abstract":"It is effective to remove false dependencies to efficiently perform out-of-order (OoO) execution which improves single thread performance. Hardware register renaming removes these dependencies, but it is one of the bottlenecks of the processor because of its complexity. The use of a STRAIGHT architecture is one of the approaches that allow the compiler to remove these dependencies. Because the source operand is specified as the distance between the producer instruction and consumer instruction and there is no register overwriting, no false dependency occurs. Instead, the compiler must generate code that satisfies the constraint of specifying operands as constant distances that are not dependent on the execution path. Although the basic algorithms for realizing the constraint are already known, the machine code generated thereby cannot achieve high performance because it is necessary to execute many inter-register transfer instructions added by compiler to satisfy the constraints. This paper presents an efficient algorithm that uses data flow analysis to determine the value causing an increase in the number of executed instructions and improve the performance by spilling them on the stack. We developed a compiler that implements the proposed method using LLVM and evaluated using CoreMark as a benchmark. The number of executed instructions was reduced by approximately 31 %, and the execution performance improved by up to 32 %.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116127928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hierarchical Distributed-Memory Multi-Leader MPI-Allreduce for Deep Learning Workloads 深度学习工作负载的分层分布式内存多leader MPI-Allreduce
Pub Date : 2018-11-01 DOI: 10.1109/CANDARW.2018.00048
Truong Thao Nguyen, M. Wahib, Ryousei Takano
Driven by the increase in complexity and size in Deep Learning models, training models on large-scale GPUs-accelerated clusters is becoming a commonplace. One of the main challenges for distributed training is the collective communication overhead for the very large message size: from several to hundreds of MB. In this paper, we exploit two hierarchical distributed-memory multi-leader allreduce algorithms optimized for GPU-accelerated clusters (named lr_lr and lr_rab). In which, one node performs the inter-node data transfer in parallel using other GPUs that are designated as node leaders. Each leader keeps and exchanges a partial result of local reduced values rather than the whole one. Hence we are capable of significantly reducing the time for injecting data into the internode network. We evaluate these algorithms on the discreteevent simulation Simgrid. We show that our algorithms, lr_lr and lr_rab, can cut down the execution time of an Allreduce microbenchmark that uses logical ring algorithm (lr) by up to 45% and 51%, respectively. In addition, saving the power consumption of network devices of up to 23% and 32% are projected.
由于深度学习模型的复杂性和规模的增加,在大规模gpu加速集群上训练模型正变得越来越普遍。分布式训练的主要挑战之一是非常大的消息大小的集体通信开销:从几到数百MB。在本文中,我们利用了针对gpu加速集群优化的两种分层分布式内存多leader allreduce算法(命名为lr_lr和lr_rab)。其中,一个节点使用指定为节点领导的其他gpu并行地执行节点间的数据传输。每个leader保留并交换局部约简值的部分结果,而不是全部结果。因此,我们能够显著减少向节点间网络注入数据的时间。我们在离散事件模拟Simgrid上对这些算法进行了评估。我们的算法lr_lr和lr_rab可以将使用逻辑环算法(lr)的Allreduce微基准测试的执行时间分别减少45%和51%。此外,预计可为网络设备节省高达23%和32%的功耗。
{"title":"Hierarchical Distributed-Memory Multi-Leader MPI-Allreduce for Deep Learning Workloads","authors":"Truong Thao Nguyen, M. Wahib, Ryousei Takano","doi":"10.1109/CANDARW.2018.00048","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00048","url":null,"abstract":"Driven by the increase in complexity and size in Deep Learning models, training models on large-scale GPUs-accelerated clusters is becoming a commonplace. One of the main challenges for distributed training is the collective communication overhead for the very large message size: from several to hundreds of MB. In this paper, we exploit two hierarchical distributed-memory multi-leader allreduce algorithms optimized for GPU-accelerated clusters (named lr_lr and lr_rab). In which, one node performs the inter-node data transfer in parallel using other GPUs that are designated as node leaders. Each leader keeps and exchanges a partial result of local reduced values rather than the whole one. Hence we are capable of significantly reducing the time for injecting data into the internode network. We evaluate these algorithms on the discreteevent simulation Simgrid. We show that our algorithms, lr_lr and lr_rab, can cut down the execution time of an Allreduce microbenchmark that uses logical ring algorithm (lr) by up to 45% and 51%, respectively. In addition, saving the power consumption of network devices of up to 23% and 32% are projected.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122816913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Branch and Bound Algorithm for Parallel Many-Core Architecture 并行多核结构的分支定界算法
Pub Date : 2018-11-01 DOI: 10.1109/CANDARW.2018.00058
Kazuki Hazama, H. Ebara
In recent years, computer environment using multiple processors such as multi-core and many-core device attracts attention due to the limit of performance improvement per processor. In this paper, we propose a new algorithm for the combinatorial optimization problem using a parallel search method called LazySMP to efficiently use many-core processors. LazySMP is a method based on the iterative deepening depth-first search, which is used for board searching of chess and shogi software. In this method, the search results are saved in a table that all processes can share, and the results are used in the search of other processes to shorten the search time. In the proposed method, Lazy SMP is applied to the branch and bound method. Specifically, it performs a branch and bound method that iteratively deepens in all threads and save a part of the result of some nodes in the shared hash table. Then, when it performs the subsequent searches, the hash table is referred to instead of researching the nodes. Our aim is to make efficient use of many-core processors. We make computer experiments with the traveling salesman problem as the benchmark in order to verify the performance of the proposed method.
近年来,多核、多核设备等使用多处理器的计算机环境受到了人们的关注,因为每个处理器的性能提升有限。本文提出了一种新的组合优化算法,利用并行搜索方法LazySMP来有效地利用多核处理器。LazySMP是一种基于迭代深化深度优先搜索的方法,用于象棋和将棋软件的棋盘搜索。该方法将搜索结果保存在所有进程可以共享的表中,并将结果用于其他进程的搜索,以缩短搜索时间。在该方法中,将Lazy SMP应用于分支定界法。具体来说,它执行分支和绑定方法,该方法在所有线程中迭代深化,并将某些节点的部分结果保存在共享哈希表中。然后,当它执行后续搜索时,引用哈希表,而不是研究节点。我们的目标是有效地利用多核处理器。为了验证所提方法的性能,我们以旅行商问题为基准进行了计算机实验。
{"title":"Branch and Bound Algorithm for Parallel Many-Core Architecture","authors":"Kazuki Hazama, H. Ebara","doi":"10.1109/CANDARW.2018.00058","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00058","url":null,"abstract":"In recent years, computer environment using multiple processors such as multi-core and many-core device attracts attention due to the limit of performance improvement per processor. In this paper, we propose a new algorithm for the combinatorial optimization problem using a parallel search method called LazySMP to efficiently use many-core processors. LazySMP is a method based on the iterative deepening depth-first search, which is used for board searching of chess and shogi software. In this method, the search results are saved in a table that all processes can share, and the results are used in the search of other processes to shorten the search time. In the proposed method, Lazy SMP is applied to the branch and bound method. Specifically, it performs a branch and bound method that iteratively deepens in all threads and save a part of the result of some nodes in the shared hash table. Then, when it performs the subsequent searches, the hash table is referred to instead of researching the nodes. Our aim is to make efficient use of many-core processors. We make computer experiments with the traveling salesman problem as the benchmark in order to verify the performance of the proposed method.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122828180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proposal and Evaluation of Secure Device Pairing Method with Camera and Accelerometer 相机和加速度计安全设备配对方法的提出与评价
Pub Date : 2018-11-01 DOI: 10.1109/CANDARW.2018.00064
M. Nagatomo, K. Aburada, N. Okazaki, Mirang Park
With the advancement of wireless short-range technology, the number of mobile devices is also increasing. Consequently, devices often exchange information using wireless communications, but this communication has vulnerabilities, such as eavesdropping and man-in-the-middle attack. Therefore, it is necessary to perform secure pairing between devices before wireless communication begin. In this paper, we assume that devices are securely paired in limited space, such as a room using wireless communication. The method for secure pairing using Received signal strength (RSS) is simple, but RSS is easy to change due to environmental factors. Recently, pairing methods using camera and accelerometer are studied. However, these methods do not directly detect inclination of the device. We propose the one of secure device pairing method using camera and accelerometer. Our proposed method make the camera read a marker on the display of the device. The marker's inclination corresponds to the device's inclination, hence, we assume that the accuracy is higher than that of existing methods.
随着无线短距离技术的进步,移动设备的数量也在不断增加。因此,设备通常使用无线通信交换信息,但这种通信存在漏洞,例如窃听和中间人攻击。因此,在无线通信开始之前,有必要在设备之间进行安全配对。在本文中,我们假设设备在有限的空间中安全地配对,例如使用无线通信的房间。利用接收信号强度(RSS)进行安全配对的方法简单,但RSS容易受环境因素的影响而改变。近年来,人们对相机和加速度计的配对方法进行了研究。然而,这些方法不能直接检测设备的倾斜度。我们提出了一种使用摄像头和加速度计的安全设备配对方法。我们提出的方法是让摄像头读取设备显示屏上的标记。标记的倾斜度与设备的倾斜度相对应,因此,我们假设该方法的精度高于现有方法。
{"title":"Proposal and Evaluation of Secure Device Pairing Method with Camera and Accelerometer","authors":"M. Nagatomo, K. Aburada, N. Okazaki, Mirang Park","doi":"10.1109/CANDARW.2018.00064","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00064","url":null,"abstract":"With the advancement of wireless short-range technology, the number of mobile devices is also increasing. Consequently, devices often exchange information using wireless communications, but this communication has vulnerabilities, such as eavesdropping and man-in-the-middle attack. Therefore, it is necessary to perform secure pairing between devices before wireless communication begin. In this paper, we assume that devices are securely paired in limited space, such as a room using wireless communication. The method for secure pairing using Received signal strength (RSS) is simple, but RSS is easy to change due to environmental factors. Recently, pairing methods using camera and accelerometer are studied. However, these methods do not directly detect inclination of the device. We propose the one of secure device pairing method using camera and accelerometer. Our proposed method make the camera read a marker on the display of the device. The marker's inclination corresponds to the device's inclination, hence, we assume that the accuracy is higher than that of existing methods.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122893782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accelerating Numerical Simulations of Supernovae with GPUs 用gpu加速超新星的数值模拟
Pub Date : 2018-11-01 DOI: 10.1109/CANDARW.2018.00056
H. Matsufuru, K. Sumiyoshi
To understand the mechanism of supernova explosions, large-scale numerical simulations are essential because of their complex dynamics described by a coupled equations of neutrino radiation transport and hydrodynamics of dense matter. In this work, we employ GPUs to accelerate such simulations. By adopting the implicit scheme for the evolution equation, an iterative linear equation solver for the coefficient matrix is the most time consuming part, which has been shown to be efficiently offloaded to GPUs. There are still several secondary bottlenecks which cost substantial time in the simulations, such as computation of the collision term of the Boltzmann equation of neutrinos, and parameter tuning of the matrices in the iterative solver. This paper focuses on these parts and offloads them to GPUs by employing CUDA in the case of spherically symmetric system. As a result, the time evolution is sufficiently accelerated for desirable model sizes toward systematic survey of stellar models with better grid resolution than that adopted so far.
为了理解超新星爆炸的机制,大规模的数值模拟是必不可少的,因为它们的复杂动力学是由中微子辐射输运和致密物质流体动力学耦合方程描述的。在这项工作中,我们使用gpu来加速这种模拟。演化方程采用隐式格式,求解系数矩阵的迭代线性方程是最耗时的部分,可以有效地卸载到gpu上。在模拟过程中还存在一些次要的瓶颈,如中微子玻尔兹曼方程的碰撞项的计算和迭代求解器中矩阵的参数调整等,需要耗费大量的时间。本文重点研究了这些部分,并在球对称系统的情况下通过CUDA将它们卸载到gpu上。因此,时间演化被充分加速,以达到理想的模型尺寸,并以比目前采用的更好的网格分辨率对恒星模型进行系统调查。
{"title":"Accelerating Numerical Simulations of Supernovae with GPUs","authors":"H. Matsufuru, K. Sumiyoshi","doi":"10.1109/CANDARW.2018.00056","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00056","url":null,"abstract":"To understand the mechanism of supernova explosions, large-scale numerical simulations are essential because of their complex dynamics described by a coupled equations of neutrino radiation transport and hydrodynamics of dense matter. In this work, we employ GPUs to accelerate such simulations. By adopting the implicit scheme for the evolution equation, an iterative linear equation solver for the coefficient matrix is the most time consuming part, which has been shown to be efficiently offloaded to GPUs. There are still several secondary bottlenecks which cost substantial time in the simulations, such as computation of the collision term of the Boltzmann equation of neutrinos, and parameter tuning of the matrices in the iterative solver. This paper focuses on these parts and offloads them to GPUs by employing CUDA in the case of spherically symmetric system. As a result, the time evolution is sufficiently accelerated for desirable model sizes toward systematic survey of stellar models with better grid resolution than that adopted so far.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127121956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Secure LiDAR with AES-Based Side-Channel Fingerprinting 安全激光雷达与基于aes的侧通道指纹
Pub Date : 2018-11-01 DOI: 10.1109/CANDARW.2018.00092
Ryuga Matsumura, T. Sugawara, K. Sakiyama
Sensor spoofing attack is an emerging threat to laser-based ranging. In this paper, we propose a countermeasure that superimposes authentication fingerprint onto light wave itself. In the proposed method, amplification of laser output is directly modulated by power side-channel information leaked from a cryptographic device. The feasibility of the concept is verified through experiments.
传感器欺骗攻击是激光测距的一个新兴威胁。本文提出了一种将身份验证指纹叠加到光波上的方法。在该方法中,激光输出的放大是由从加密设备泄漏的功率侧信道信息直接调制的。通过实验验证了该概念的可行性。
{"title":"A Secure LiDAR with AES-Based Side-Channel Fingerprinting","authors":"Ryuga Matsumura, T. Sugawara, K. Sakiyama","doi":"10.1109/CANDARW.2018.00092","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00092","url":null,"abstract":"Sensor spoofing attack is an emerging threat to laser-based ranging. In this paper, we propose a countermeasure that superimposes authentication fingerprint onto light wave itself. In the proposed method, amplification of laser output is directly modulated by power side-channel information leaked from a cryptographic device. The feasibility of the concept is verified through experiments.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131185180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Case Study on Memory Architecture Exploration for Manycores on an FPGA 基于FPGA的多核存储体系结构研究
Pub Date : 2018-11-01 DOI: 10.1109/CANDARW.2018.00032
Seiya Shirakuni, Ittetsu Taniguchi, H. Tomiyama
Due to the advances in semiconductor technologies, a recent FPGA device is capable of implementing a number of CPU cores, and manycore architecture on an FPGA attracts an increasing attention in the design of high-performance embedded systems. In embedded system design with FPGA-based manycore architectures, it is important to optimize not only the number and topology of cores but also memory architecture for each application in order to achieve high performance under limited FPGA resources. This paper presents a case study on memory architecture exploration for manycores on an FPGA. We design and implement three types of manycore architecture, together with an OpenCL-based software framework. The performance of the three architectures is evaluated based on actual measurement using various application programs.
由于半导体技术的进步,最近的FPGA器件能够实现多个CPU内核,FPGA上的多核架构在高性能嵌入式系统的设计中越来越受到关注。在基于FPGA多核架构的嵌入式系统设计中,为了在有限的FPGA资源下实现高性能,不仅要优化内核数量和拓扑结构,还要优化每个应用的内存结构。本文给出了FPGA多核存储架构探索的实例研究。我们设计并实现了三种类型的多核架构,以及基于opencl的软件框架。在实际测量的基础上,利用各种应用程序对三种体系结构的性能进行了评估。
{"title":"A Case Study on Memory Architecture Exploration for Manycores on an FPGA","authors":"Seiya Shirakuni, Ittetsu Taniguchi, H. Tomiyama","doi":"10.1109/CANDARW.2018.00032","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00032","url":null,"abstract":"Due to the advances in semiconductor technologies, a recent FPGA device is capable of implementing a number of CPU cores, and manycore architecture on an FPGA attracts an increasing attention in the design of high-performance embedded systems. In embedded system design with FPGA-based manycore architectures, it is important to optimize not only the number and topology of cores but also memory architecture for each application in order to achieve high performance under limited FPGA resources. This paper presents a case study on memory architecture exploration for manycores on an FPGA. We design and implement three types of manycore architecture, together with an OpenCL-based software framework. The performance of the three architectures is evaluated based on actual measurement using various application programs.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131302541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Facial Detection for Improvement of Person Identification Accuracy in Entering and Exiting Management System 加速人脸检测提高进出管理系统人员识别准确率
Pub Date : 2018-11-01 DOI: 10.1109/CANDARW.2018.00046
Hiroto Kizuna, Hiroyuki Sato
Recently, needs of individual personal entering and exiting information are high by development of deep learning. In order to automatically acquire entering and exiting, it is necessary to acquire a facial image of a human who enters or exits within about one second, so a faster face detection technology is required. And the installation space of photography equipment is narrow. We developed a high-speed facial area estimation algorithm by reducing the facial search area using a high-speed image processing and speeding up with GPGPU. By executing on GPU of Jetson TX 2, execution time of the facial area estimation becomes about 14 ms and accelerating rate with respect to the conventional method is 60 times. This result shows that practical facial area estimation processing is possible even on an inexpensive and compact processor.
近年来,随着深度学习的发展,对个人出入信息的需求越来越高。为了自动获取进出信息,需要在1秒左右的时间内获取进出人员的面部图像,因此需要更快的人脸检测技术。而且摄影设备的安装空间狭窄。我们开发了一种高速面部面积估计算法,通过高速图像处理和GPGPU加速来减少面部搜索区域。通过在Jetson TX 2的GPU上执行,面部面积估计的执行时间约为14 ms,相对于传统方法的加速速度为60倍。这一结果表明,即使在廉价和紧凑的处理器上,实际的面部面积估计处理也是可能的。
{"title":"Accelerating Facial Detection for Improvement of Person Identification Accuracy in Entering and Exiting Management System","authors":"Hiroto Kizuna, Hiroyuki Sato","doi":"10.1109/CANDARW.2018.00046","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00046","url":null,"abstract":"Recently, needs of individual personal entering and exiting information are high by development of deep learning. In order to automatically acquire entering and exiting, it is necessary to acquire a facial image of a human who enters or exits within about one second, so a faster face detection technology is required. And the installation space of photography equipment is narrow. We developed a high-speed facial area estimation algorithm by reducing the facial search area using a high-speed image processing and speeding up with GPGPU. By executing on GPU of Jetson TX 2, execution time of the facial area estimation becomes about 14 ms and accelerating rate with respect to the conventional method is 60 times. This result shows that practical facial area estimation processing is possible even on an inexpensive and compact processor.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128213333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Energy Balancing by Wireless Energy Transfer in Sensor Networks 传感器网络中无线能量传输的能量平衡
Pub Date : 2018-11-01 DOI: 10.1109/CANDARW.2018.00069
H. Michizu, Y. Sudo, H. Kakugawa, T. Masuzawa
Wireless energy transfer is a technology to transmit electricity without wire, and it is a promising technology for charging battery of mobile devices. In battery powered sensor networks, it is important to balance electric energy of batteries of nodes in order to maximize the life time of networks. In this paper, we propose two distributed protocols to balance electric energy of batteries of nodes. The proposed algorithms are based on the population protocol model which is a computational model for networked nodes with very limited resources. The goals of the algorithms are twofold: minimizing the loss of electric energy caused by wireless transmission, and minimizing the time to balance. The proposed algorithms are evaluated by computer simulation.
无线能量传输是一种无线传输电能的技术,是一种很有前途的移动设备电池充电技术。在电池供电的传感器网络中,为了使网络的使用寿命最大化,平衡节点电池的电能是非常重要的。本文提出了两种分布式协议来平衡节点电池的电能。提出的算法基于种群协议模型,这是一种资源非常有限的网络节点的计算模型。这些算法的目标有两个:最小化无线传输造成的电能损失,以及最小化平衡时间。通过计算机仿真对所提出的算法进行了验证。
{"title":"Energy Balancing by Wireless Energy Transfer in Sensor Networks","authors":"H. Michizu, Y. Sudo, H. Kakugawa, T. Masuzawa","doi":"10.1109/CANDARW.2018.00069","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00069","url":null,"abstract":"Wireless energy transfer is a technology to transmit electricity without wire, and it is a promising technology for charging battery of mobile devices. In battery powered sensor networks, it is important to balance electric energy of batteries of nodes in order to maximize the life time of networks. In this paper, we propose two distributed protocols to balance electric energy of batteries of nodes. The proposed algorithms are based on the population protocol model which is a computational model for networked nodes with very limited resources. The goals of the algorithms are twofold: minimizing the loss of electric energy caused by wireless transmission, and minimizing the time to balance. The proposed algorithms are evaluated by computer simulation.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131396840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1