首页 > 最新文献

2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献

英文 中文
Bluetooth Low Energy Based Indoor Positioning on iOS Platform 基于iOS平台的低功耗蓝牙室内定位
S. Duong, Anh Vu Trinh, T. Dinh
In this age of IoT (Internet of Things), Indoor Positioning (IPS) is considered as one of the most popular topics and has been researched widely all around the world, as the result of various applications it can provide. However, IPS is also a challenging topic that has a number of stringent requirements, such as cost, energy efficiency, availability and accuracy. The development of Bluetooth Low Energy (BLE) iBeacon has opened great opportunities for researchers to solve those challenges. In this paper, we present our iBeacon based positioning system, which we built as an application running on iOS platform. We also present Fingerprinting - the main positioning technique used in our system, in which we configure its fingerprints to improve accuracy. With that, a machine learning algorithm called k-Nearest Neighbor (kNN) is applied to extract the most probable user location. In addition, we also use Kalman Filter in order to enhance the stability of iBeacon's signal. Our system results in a 60% - 71.4% accuracy rate and an error of up to 1.6 m, which is acceptable in IPS.
在这个物联网(IoT)时代,室内定位(IPS)被认为是最受欢迎的话题之一,并在世界范围内得到了广泛的研究,因为它可以提供各种应用。然而,IPS也是一个具有挑战性的主题,它有许多严格的要求,例如成本、能源效率、可用性和准确性。低功耗蓝牙(BLE) iBeacon的发展为研究人员解决这些挑战提供了巨大的机会。本文介绍了基于iBeacon的定位系统,并构建了一个运行在iOS平台上的应用程序。我们还介绍了指纹识别-我们系统中使用的主要定位技术,我们在其中配置指纹以提高准确性。然后,应用一种称为k-最近邻(kNN)的机器学习算法来提取最可能的用户位置。此外,我们还使用了卡尔曼滤波器来增强iBeacon信号的稳定性。该系统的准确率为60% ~ 71.4%,误差不超过1.6 m,在IPS中可以接受。
{"title":"Bluetooth Low Energy Based Indoor Positioning on iOS Platform","authors":"S. Duong, Anh Vu Trinh, T. Dinh","doi":"10.1109/MCSoC2018.2018.00021","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00021","url":null,"abstract":"In this age of IoT (Internet of Things), Indoor Positioning (IPS) is considered as one of the most popular topics and has been researched widely all around the world, as the result of various applications it can provide. However, IPS is also a challenging topic that has a number of stringent requirements, such as cost, energy efficiency, availability and accuracy. The development of Bluetooth Low Energy (BLE) iBeacon has opened great opportunities for researchers to solve those challenges. In this paper, we present our iBeacon based positioning system, which we built as an application running on iOS platform. We also present Fingerprinting - the main positioning technique used in our system, in which we configure its fingerprints to improve accuracy. With that, a machine learning algorithm called k-Nearest Neighbor (kNN) is applied to extract the most probable user location. In addition, we also use Kalman Filter in order to enhance the stability of iBeacon's signal. Our system results in a 60% - 71.4% accuracy rate and an error of up to 1.6 m, which is acceptable in IPS.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127769830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Design Features of Analog-to-Digital Solutions for the Tracking Detector Readout Electronics 跟踪探测器读出电子器件的模数解决方案的设计特点
A. Kostrov, V. Stempitsky, A. Borovik, V. Tchekhovsky
8-channel mixed-signal application specific test IC was implemented in a TSMC 0.18 µm CMOS MS/RF 1.8/3.3 V process. A single IC channel is comprised of a chargesensitive preamplifier/shaper with a semi-Gaussian response, shaping amplifier with ion tail cancellation circuitry, differential output baseline restorer, a 10bit 10MSPS ADC. The structural scheme and specification of the IC are presented, the algorithm and features of chip functioning are described. The results of IC test channel simulation in the Cadence software package are presented.
8通道混合信号专用测试IC采用台积电0.18µm CMOS MS/RF 1.8/3.3 V工艺实现。单个IC通道由带半高斯响应的电荷敏感前置放大器/整形器、带离子尾抵消电路的整形放大器、差分输出基线恢复器、10bit 10MSPS ADC组成。给出了该集成电路的结构方案和规格,描述了芯片的算法和功能特点。给出了用Cadence软件对IC测试通道进行仿真的结果。
{"title":"Design Features of Analog-to-Digital Solutions for the Tracking Detector Readout Electronics","authors":"A. Kostrov, V. Stempitsky, A. Borovik, V. Tchekhovsky","doi":"10.1109/MCSoC2018.2018.00020","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00020","url":null,"abstract":"8-channel mixed-signal application specific test IC was implemented in a TSMC 0.18 µm CMOS MS/RF 1.8/3.3 V process. A single IC channel is comprised of a chargesensitive preamplifier/shaper with a semi-Gaussian response, shaping amplifier with ion tail cancellation circuitry, differential output baseline restorer, a 10bit 10MSPS ADC. The structural scheme and specification of the IC are presented, the algorithm and features of chip functioning are described. The results of IC test channel simulation in the Cadence software package are presented.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124895487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On-Line Cost-Aware Workflow Allocation in Heterogeneous Computing Environments 异构计算环境下在线成本感知工作流分配
Incheon Paik, Yuji Ishizuka, Quang-Minh Do, Wuhui Chen
With the appearance of on-line big data stream computation, the explosive growth of mobile devices, the development of broadband cellular network, and widespread use of WiFi in recent years, the VM allocation problem has shifted gradually from batch processing to real-time processing. As the processing streaming workflow allocation becomes very large, it has become far more difficult. First, in this paper, we have modeled new network based on mobile cloud computing and mobile edge computing scheme for the real-time streaming workflow allocation problem. Our proposed network called Heterogeneous Node Network (HNN) consists of three types of computing node. HNN has a conventional data center (DC), a cloudlet (CL) located between edge server (ES) and DC, and ES consisting of mobile devices. In HNN, DC is the conventional placement destination of virtual machine (VM) and has high computing resource compared to other nodes; CL is a new computing resource, whose performance is lower than DC, but data transmission between CL and ES is faster than between DC and ES, and ES is a cluster of mobile devices with the lowest computing resource and its advantage is reducing the amount of data from raw data for crucial processes of streaming workflow. Second, we propose a heuristic streaming workflow allocation algorithm, which is flexible according to change of real-time availability for streaming workflow and HNN environment to achieve cost minimization. Our algorithm is the hybrid of a bin-packing algorithm and a shortest path algorithm based on the VM placement problem and the shortest path problem in graph network respectively. Finally, our developed algorithm has been compared with the result of linear programming (LP). In performance evaluation, the experimental results show our approach leads to a solution close to an optimal solution generated by LP and its execution time is reduced.
随着近年来在线大数据流计算的出现、移动设备的爆炸式增长、宽带蜂窝网络的发展以及WiFi的广泛使用,虚拟机分配问题逐渐从批处理转向实时处理。随着处理流工作流的分配越来越大,处理流工作流的难度也越来越大。首先,本文针对实时流工作流分配问题,建立了基于移动云计算和移动边缘计算的新型网络模型。我们提出的异构节点网络(HNN)由三种类型的计算节点组成。HNN有一个传统的数据中心(DC),一个位于边缘服务器(ES)和数据中心之间的云(CL),以及一个由移动设备组成的ES。在HNN中,数据中心是虚拟机(VM)的常规放置目的地,相对于其他节点具有较高的计算资源;CL是一种新的计算资源,其性能低于DC,但CL和ES之间的数据传输速度比DC和ES之间快,而ES是一种计算资源最少的移动设备集群,其优势在于减少了流工作流关键流程的原始数据量。其次,我们提出了一种启发式流工作流分配算法,该算法可以根据流工作流实时可用性的变化和HNN环境的变化灵活地实现成本最小化。该算法是基于虚拟机放置问题和基于图网络中最短路径问题的装箱算法和最短路径算法的混合。最后,将该算法与线性规划(LP)的结果进行了比较。在性能评估方面,实验结果表明,我们的方法得到的解接近LP生成的最优解,并且减少了执行时间。
{"title":"On-Line Cost-Aware Workflow Allocation in Heterogeneous Computing Environments","authors":"Incheon Paik, Yuji Ishizuka, Quang-Minh Do, Wuhui Chen","doi":"10.1109/MCSoC2018.2018.00042","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00042","url":null,"abstract":"With the appearance of on-line big data stream computation, the explosive growth of mobile devices, the development of broadband cellular network, and widespread use of WiFi in recent years, the VM allocation problem has shifted gradually from batch processing to real-time processing. As the processing streaming workflow allocation becomes very large, it has become far more difficult. First, in this paper, we have modeled new network based on mobile cloud computing and mobile edge computing scheme for the real-time streaming workflow allocation problem. Our proposed network called Heterogeneous Node Network (HNN) consists of three types of computing node. HNN has a conventional data center (DC), a cloudlet (CL) located between edge server (ES) and DC, and ES consisting of mobile devices. In HNN, DC is the conventional placement destination of virtual machine (VM) and has high computing resource compared to other nodes; CL is a new computing resource, whose performance is lower than DC, but data transmission between CL and ES is faster than between DC and ES, and ES is a cluster of mobile devices with the lowest computing resource and its advantage is reducing the amount of data from raw data for crucial processes of streaming workflow. Second, we propose a heuristic streaming workflow allocation algorithm, which is flexible according to change of real-time availability for streaming workflow and HNN environment to achieve cost minimization. Our algorithm is the hybrid of a bin-packing algorithm and a shortest path algorithm based on the VM placement problem and the shortest path problem in graph network respectively. Finally, our developed algorithm has been compared with the result of linear programming (LP). In performance evaluation, the experimental results show our approach leads to a solution close to an optimal solution generated by LP and its execution time is reduced.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"25 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114052911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An FPGA Scalable Parallel Viterbi Decoder FPGA可扩展并行维特比解码器
Y. Ben-Asher, V. Tartakovsky, Katrina Portman, Orr Zilberman, Avishi Hadar
Viterbi decoders are an essential component in many embedded systems used for decoding streams of N data symbols over noisy channels. The decoding process is a sequential process wherein the decoder builds a trellis for N received symbols and then it traverses the trellis back computing the path in the trellis that implies the minimal amount of corrections in the bits of the N received symbols. Several techniques have been developed to increase the amount of parallelism of Viterbi decoders, showing building the trellis can be parallelized however to the selecting the minimal path proved harder to parallelize. In this work, we show that both building the Trellis and computing the minimal path can be parallelized as a sequence of matrix multiplications. This yields a parallel implementation with a linear speedup of order N/P+P where P is any amount of the desired parallelism in the circuit. We implemented a Verilog-generator that for any set of parameters generates an optimized sequential decoder and an optimized parallel decoder. We thus able to verify that the parallel version can obtain linear speedups.
维特比解码器是许多嵌入式系统中必不可少的组件,用于解码N个数据符号在噪声信道上的流。解码过程是一个顺序过程,其中解码器为N个接收符号构建一个网格,然后遍历网格,计算网格中的路径,这意味着N个接收符号的位中最小的更正量。为了提高Viterbi解码器的并行度,已经开发了几种技术,表明构建网格可以并行化,但选择最小路径被证明是难以并行化的。在这项工作中,我们证明了构建网格和计算最小路径都可以作为矩阵乘法序列并行化。这产生了一个线性加速为N/P+P阶的并行实现,其中P是电路中所需并行度的任意数量。我们实现了一个Verilog-generator,它可以为任何参数集生成一个优化的顺序解码器和一个优化的并行解码器。因此,我们能够验证并行版本可以获得线性加速。
{"title":"An FPGA Scalable Parallel Viterbi Decoder","authors":"Y. Ben-Asher, V. Tartakovsky, Katrina Portman, Orr Zilberman, Avishi Hadar","doi":"10.1109/MCSoC2018.2018.00014","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00014","url":null,"abstract":"Viterbi decoders are an essential component in many embedded systems used for decoding streams of N data symbols over noisy channels. The decoding process is a sequential process wherein the decoder builds a trellis for N received symbols and then it traverses the trellis back computing the path in the trellis that implies the minimal amount of corrections in the bits of the N received symbols. Several techniques have been developed to increase the amount of parallelism of Viterbi decoders, showing building the trellis can be parallelized however to the selecting the minimal path proved harder to parallelize. In this work, we show that both building the Trellis and computing the minimal path can be parallelized as a sequence of matrix multiplications. This yields a parallel implementation with a linear speedup of order N/P+P where P is any amount of the desired parallelism in the circuit. We implemented a Verilog-generator that for any set of parameters generates an optimized sequential decoder and an optimized parallel decoder. We thus able to verify that the parallel version can obtain linear speedups.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133212968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On-Chip Lifetime Prediction for Dependable Many-Processor SoCs Based on Data Fusion 基于数据融合的可靠多处理器soc片上寿命预测
Ghazanfar Ali, J. Pathrose, H. Kerkhoff
The developments in technology and complexity of many-processor Systems-on-Chips emerge at a very rapid pace as is their introduction in safety-critical applications, for instance the transport sector. The inherent decrease in dependability of these complex nanosystems must be compensated by counter measures. One promising approach is the usage of IJTAG-compatible embedded instruments in and around cores, monitoring the "health" of target processors. It has been anticipated that these instruments will be (primarily) used for reducing the cost of final testing. In case of degradation during life time, however, they can be reused and counteractions like run-time remapping can be carried out. In this paper, the on-line data of two types of embedded instruments will be used for the prognostics, a slack-delay monitor and an IDDX monitor. Their (correlated) data is being fused which enables a more accurate life-time prediction as compared to a single monitor approach. However, the computational requirements for the embedded dependability manager will increase to enable handling embedded instrument data fusion and/or multi-parameter life-time prediction
多处理器片上系统的技术发展和复杂性以非常快的速度出现,它们在安全关键应用领域(例如运输部门)的引入也是如此。这些复杂纳米系统固有的可靠性下降必须通过对抗措施来补偿。一种有希望的方法是在核心内部和周围使用与ijtag兼容的嵌入式仪器,监视目标处理器的“健康”。预计这些仪器将(主要)用于降低最后测试的费用。但是,如果在生命周期中出现退化,则可以重用它们,并且可以执行诸如运行时重新映射之类的抵消措施。本文将利用两种嵌入式仪器的在线数据进行预测,一种是滞后监测,另一种是IDDX监测。它们的(相关)数据正在被融合,与单一监测方法相比,这使得寿命预测更加准确。然而,嵌入式可靠性管理器的计算需求将增加,以处理嵌入式仪器数据融合和/或多参数寿命预测
{"title":"On-Chip Lifetime Prediction for Dependable Many-Processor SoCs Based on Data Fusion","authors":"Ghazanfar Ali, J. Pathrose, H. Kerkhoff","doi":"10.1109/MCSoC2018.2018.00019","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00019","url":null,"abstract":"The developments in technology and complexity of many-processor Systems-on-Chips emerge at a very rapid pace as is their introduction in safety-critical applications, for instance the transport sector. The inherent decrease in dependability of these complex nanosystems must be compensated by counter measures. One promising approach is the usage of IJTAG-compatible embedded instruments in and around cores, monitoring the \"health\" of target processors. It has been anticipated that these instruments will be (primarily) used for reducing the cost of final testing. In case of degradation during life time, however, they can be reused and counteractions like run-time remapping can be carried out. In this paper, the on-line data of two types of embedded instruments will be used for the prognostics, a slack-delay monitor and an IDDX monitor. Their (correlated) data is being fused which enables a more accurate life-time prediction as compared to a single monitor approach. However, the computational requirements for the embedded dependability manager will increase to enable handling embedded instrument data fusion and/or multi-parameter life-time prediction","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125789844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Adaptive Long-Term Reference Selection for Efficient Scalable Surveillance Video Coding 高效可扩展监控视频编码的自适应长期参考选择
Thi Hue Le Dao, Pham Van Giap, Hoang Van Xiem
{"title":"Adaptive Long-Term Reference Selection for Efficient Scalable Surveillance Video Coding","authors":"Thi Hue Le Dao, Pham Van Giap, Hoang Van Xiem","doi":"10.1109/mcsoc2018.2018.00023","DOIUrl":"https://doi.org/10.1109/mcsoc2018.2018.00023","url":null,"abstract":"","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127294406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Low-Power ASIC Implementation of Multi-Core OpenSPARC T1 Processor on 90nm CMOS Process 基于90nm CMOS工艺的多核OpenSPARC T1处理器的低功耗ASIC实现
Phuc-Vinh Nguyen, T. Tran, Phuoc-Loc Diep, Duc-Hung Le
In this paper, a hierarchy low-power design flow has been proposed. Low-power design techniques for digital ASIC design have been implemented with this proposed flow such as clock gating technique at RTL synthesis stage, multi-threshold voltage and power switching technique at back-end stage for power optimization. These low-power flow and techniques are implemented on an open source RTL of OpenSPARC T1 processor core. Firstly, the core is run synthesis and place-and-route without applying any low-power optimization techniques from front-end to back-end stage. Secondly, the core is completed by using the low-power design techniques. This work is implemented on open 90nm CMOS process with the EDA tools.
本文提出了一种层次化的低功耗设计流程。采用该流程实现了用于数字ASIC设计的低功耗设计技术,如RTL合成阶段的时钟门控技术,后端阶段的多阈值电压和功率开关技术,用于功率优化。这些低功耗流程和技术是在OpenSPARC T1处理器核心的开源RTL上实现的。首先,核心是运行综合和放置路由,从前端到后端不应用任何低功耗优化技术。其次,采用低功耗设计技术完成了核心设计。这项工作是利用EDA工具在开放的90纳米CMOS工艺上实现的。
{"title":"A Low-Power ASIC Implementation of Multi-Core OpenSPARC T1 Processor on 90nm CMOS Process","authors":"Phuc-Vinh Nguyen, T. Tran, Phuoc-Loc Diep, Duc-Hung Le","doi":"10.1109/MCSoC2018.2018.00027","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00027","url":null,"abstract":"In this paper, a hierarchy low-power design flow has been proposed. Low-power design techniques for digital ASIC design have been implemented with this proposed flow such as clock gating technique at RTL synthesis stage, multi-threshold voltage and power switching technique at back-end stage for power optimization. These low-power flow and techniques are implemented on an open source RTL of OpenSPARC T1 processor core. Firstly, the core is run synthesis and place-and-route without applying any low-power optimization techniques from front-end to back-end stage. Secondly, the core is completed by using the low-power design techniques. This work is implemented on open 90nm CMOS process with the EDA tools.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129614217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
In-NoC Circuits for Low-Latency Cache Coherence in Distributed Shared-Memory Architectures 分布式共享内存架构中低延迟缓存一致性的in - noc电路
Leonard Masing, A. Srivatsa, Fabian Kreß, Nidhi Anantharajaiah, A. Herkersdorf, J. Becker
Scalable communication and low latency memory accesses are the deciding factors for future manycore performance. An efficient hardware infrastructure is required, since raw performance must be balanced with area and power constraints. In distributed shared-memory (DSM) architectures, caches help in reducing costly remote accesses but must be kept coherent. To enable scalable coherence in manycore systems, the recently proposed region-based cache coherence defines configurable regions, i.e. cache coherent sub-sections of a manycore architecture. In this paper, a technique for supporting the regionbased cache coherence mechanism by using so called in-NoC circuits (INCs) in a hybrid networks-on-chip is proposed. These circuits are automatically established based on traffic monitoring and traffic analysis to connect nodes (i.e. routers) in the network to enable a shortcut for packets, reducing their latency. The INCs can be used by packets stemming from different sources and targeting different destinations in contrast to traditional end-toend circuits. Depending on the coherence region, our evaluations of several benchmarks show a latency reduction of up to 45% on average in a 4x4 mesh that further increases with the mesh size. The FPGA synthesis of a router from a scientific DSM architecture that was extended with the presented features shows additional costs of up to 31% more LUTs and 20% more Flip Flops.
可扩展通信和低延迟内存访问是未来多核性能的决定性因素。高效的硬件基础设施是必需的,因为原始性能必须与面积和功率限制相平衡。在分布式共享内存(DSM)体系结构中,缓存有助于减少昂贵的远程访问,但必须保持一致。为了在多核系统中实现可扩展的一致性,最近提出的基于区域的缓存一致性定义了可配置的区域,即多核架构的缓存一致性子部分。本文提出了一种在混合片上网络中使用嵌入式电路(INCs)支持基于区域的缓存一致性机制的技术。这些电路是根据流量监控和流量分析自动建立的,连接网络中的节点(即路由器),为报文提供快捷方式,减少时延。与传统的端到端电路相比,INCs可以被来自不同来源和针对不同目的地的数据包使用。根据相干区域的不同,我们对几个基准测试的评估显示,在4x4网格中,延迟平均减少了45%,随着网格大小的增加,延迟会进一步减少。基于科学的DSM架构的路由器的FPGA合成,扩展了所提供的功能,结果显示,lut和Flip的额外成本分别增加了31%和20%。
{"title":"In-NoC Circuits for Low-Latency Cache Coherence in Distributed Shared-Memory Architectures","authors":"Leonard Masing, A. Srivatsa, Fabian Kreß, Nidhi Anantharajaiah, A. Herkersdorf, J. Becker","doi":"10.1109/MCSoC2018.2018.00033","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00033","url":null,"abstract":"Scalable communication and low latency memory accesses are the deciding factors for future manycore performance. An efficient hardware infrastructure is required, since raw performance must be balanced with area and power constraints. In distributed shared-memory (DSM) architectures, caches help in reducing costly remote accesses but must be kept coherent. To enable scalable coherence in manycore systems, the recently proposed region-based cache coherence defines configurable regions, i.e. cache coherent sub-sections of a manycore architecture. In this paper, a technique for supporting the regionbased cache coherence mechanism by using so called in-NoC circuits (INCs) in a hybrid networks-on-chip is proposed. These circuits are automatically established based on traffic monitoring and traffic analysis to connect nodes (i.e. routers) in the network to enable a shortcut for packets, reducing their latency. The INCs can be used by packets stemming from different sources and targeting different destinations in contrast to traditional end-toend circuits. Depending on the coherence region, our evaluations of several benchmarks show a latency reduction of up to 45% on average in a 4x4 mesh that further increases with the mesh size. The FPGA synthesis of a router from a scientific DSM architecture that was extended with the presented features shows additional costs of up to 31% more LUTs and 20% more Flip Flops.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123211535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Evaluation of Performance and Fault Containment in AUTOSAR Micro-ECUs on a Multi-Core Processor 多核处理器上AUTOSAR微型ecu的性能评估与故障遏制
Moises Urbina, R. Obermaisser
The AUTOSAR standard does not provide an approach for the mapping of its ECU software architecture to a message-based multicore system. In this work we present an analysis of performance and fault containment of a novel TIme-triggered MEssage-based multi-core architecture for AUTOSAR (TIMEA). The TIMEA platform is intended to bring the advantages of network-on-a-chip architectures to the AUTOSAR software, which lead to multiple benefits for fail operational real time systems such as temporal predictability and fault isolation. We introduce a fault hypothesis consisting of multiple fault assumptions and the definition of the fault containment regions and we describe the algorithms for the integration of a multicore monitoring service into the AUTOSAR Basic Software. A set of experiments were carried out to evaluate the performance of the system using an anti-lock braking use case in a simulation scenario under failure occurrences. The obtained results demonstrate how the TIMEA platform remains operational in the presence of failures.
AUTOSAR标准没有提供将其ECU软件架构映射到基于消息的多核系统的方法。在这项工作中,我们提出了一种新的AUTOSAR (TIMEA)基于时间触发消息的多核架构的性能和故障遏制分析。TIMEA平台旨在将片上网络架构的优势引入AUTOSAR软件,从而为故障操作实时系统带来多重好处,例如时间可预测性和故障隔离。我们引入了一个由多个故障假设组成的故障假设和故障遏制区域的定义,并描述了将多核监测服务集成到AUTOSAR基本软件中的算法。在故障发生的模拟场景中,使用防抱死制动用例进行了一组实验,以评估系统的性能。获得的结果演示了TIMEA平台在出现故障时如何保持运行。
{"title":"Evaluation of Performance and Fault Containment in AUTOSAR Micro-ECUs on a Multi-Core Processor","authors":"Moises Urbina, R. Obermaisser","doi":"10.1109/MCSoC2018.2018.00040","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00040","url":null,"abstract":"The AUTOSAR standard does not provide an approach for the mapping of its ECU software architecture to a message-based multicore system. In this work we present an analysis of performance and fault containment of a novel TIme-triggered MEssage-based multi-core architecture for AUTOSAR (TIMEA). The TIMEA platform is intended to bring the advantages of network-on-a-chip architectures to the AUTOSAR software, which lead to multiple benefits for fail operational real time systems such as temporal predictability and fault isolation. We introduce a fault hypothesis consisting of multiple fault assumptions and the definition of the fault containment regions and we describe the algorithms for the integration of a multicore monitoring service into the AUTOSAR Basic Software. A set of experiments were carried out to evaluate the performance of the system using an anti-lock braking use case in a simulation scenario under failure occurrences. The obtained results demonstrate how the TIMEA platform remains operational in the presence of failures.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115103064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Body Bias Control Scheme for Ultra Low-Power Network-on-Chip Systems 超低功耗片上网络系统的自适应体偏控制方案
Akram Ben Ahmed, Hayate Okuhara, Hiroki Matsutani, M. Koibuchi, H. Amano
Over the past decade, the power consumption has been one of the main design challenges in Network-on-Chips (NoCs) as it significantly defines the performance of a given Chip-Multiprocessor (CMP). Body bias control is one of the solutions that provide an efficient trade-off between leakage power and performance. However, employing such a method is not straightforward since several factors should be taken into consideration, especially when adaptively implemented on-chip. In this paper, we propose a new router design and on-chip body bias control mechanism to adaptively control the body bias voltages supply in ultra low-power NoC systems. With the help of a light-weight monitoring circuit, the proposed router predicts the traffic load at each input-port and accordingly adjusts its pipeline depth in a fine-grained fashion. To satisfy the timing constraints, the router adaptively supplies each one of its input-ports with the appropriate body bias voltages to either boost the performance or to reduce the leakage power at the standby state. The evaluation results, using the SOTB 65nm Fully Depleted Silicon On Insulator (FD-SOI) technology, shows the ability of the proposed router in reducing both dynamic and static energies. When compared to two fixed-pipeline baseline routers (3-stages and 2-stages), the total energy reduction could reach up to 67% and 59%, respectively. At the same time, a reasonable performance tendency can be obtained with less than 6% area overhead.
在过去的十年中,功耗一直是片上网络(noc)的主要设计挑战之一,因为它在很大程度上决定了给定芯片多处理器(CMP)的性能。体偏置控制是在泄漏功率和性能之间提供有效权衡的解决方案之一。然而,采用这种方法并不简单,因为需要考虑几个因素,特别是在芯片上自适应实现时。在本文中,我们提出了一种新的路由器设计和片上体偏置控制机制来自适应控制超低功耗NoC系统中的体偏置电压供应。在轻量级监控电路的帮助下,该路由器可以预测每个输入端口的流量负载,并以细粒度的方式相应地调整其管道深度。为了满足时序约束,路由器自适应地为每个输入端口提供适当的体偏置电压,以提高性能或减少待机状态时的漏功率。使用SOTB 65nm全耗尽绝缘体上硅(FD-SOI)技术的评估结果表明,所提出的路由器具有降低动态和静态能量的能力。与两种固定管道基准路由器(3级和2级)相比,总能耗降低可分别达到67%和59%。同时,在面积开销小于6%的情况下,可以获得合理的性能趋势。
{"title":"Adaptive Body Bias Control Scheme for Ultra Low-Power Network-on-Chip Systems","authors":"Akram Ben Ahmed, Hayate Okuhara, Hiroki Matsutani, M. Koibuchi, H. Amano","doi":"10.1109/MCSoC2018.2018.00034","DOIUrl":"https://doi.org/10.1109/MCSoC2018.2018.00034","url":null,"abstract":"Over the past decade, the power consumption has been one of the main design challenges in Network-on-Chips (NoCs) as it significantly defines the performance of a given Chip-Multiprocessor (CMP). Body bias control is one of the solutions that provide an efficient trade-off between leakage power and performance. However, employing such a method is not straightforward since several factors should be taken into consideration, especially when adaptively implemented on-chip. In this paper, we propose a new router design and on-chip body bias control mechanism to adaptively control the body bias voltages supply in ultra low-power NoC systems. With the help of a light-weight monitoring circuit, the proposed router predicts the traffic load at each input-port and accordingly adjusts its pipeline depth in a fine-grained fashion. To satisfy the timing constraints, the router adaptively supplies each one of its input-ports with the appropriate body bias voltages to either boost the performance or to reduce the leakage power at the standby state. The evaluation results, using the SOTB 65nm Fully Depleted Silicon On Insulator (FD-SOI) technology, shows the ability of the proposed router in reducing both dynamic and static energies. When compared to two fixed-pipeline baseline routers (3-stages and 2-stages), the total energy reduction could reach up to 67% and 59%, respectively. At the same time, a reasonable performance tendency can be obtained with less than 6% area overhead.","PeriodicalId":413836,"journal":{"name":"2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128879139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1