IEEE Transactions on Multi-Scale Computing Systems最新文献_第6页

Stochastic-Based Synapse and Soft-Limiting Neuron with Spintronic Devices for Low Power and Robust Artificial Neural Networks 用于低功耗和鲁棒人工神经网络的基于随机的突触和带自旋电子器件的软限制神经元

IEEE Transactions on Multi-Scale Computing Systems

Pub Date : 2017-12-27 DOI: 10.1109/TMSCS.2017.2787109

Yu Bai;Deliang Fan;Mingjie Lin

We propose an innovative stochastic-based computing architecture to implement low-power and robust artificial neural network (S-ANN) with both magnetic tunneling junction (MTJ) and Domain Wall (DW) devices. Our mixed-model HSPICE simulation results have shown that, for a well-known pattern recognition task, a 34-neuron S-ANN implementation achieves more than 1.5 orders of magnitude lower energy consumption and 2.5 orders of magnitude less hidden layer chip area, when compared with its deterministicbased ANN counterparts which are implemented with digital and analog CMOS circuits. We believe that our S-ANN architecture achieves such a remarkable performance gain by leveraging two key ideas. First, because all neural signals are encoded as random bit streams, the standard weighted-sum synapses can be accomplished by stochastic bit writing and reading procedure. Second, we designed and implemented a novel multiple-phase pumping circuit structure to effectively realize the soft-limiting neural transfer function that is essential to improve the overall ANN capability and reduce its network complexity.

我们提出了一种创新的基于随机的计算架构，以实现具有磁性隧道结（MTJ）和畴壁（DW）器件的低功耗和鲁棒人工神经网络（S-ANN）。我们的混合模型HSPICE仿真结果表明，对于众所周知的模式识别任务，与用数字和模拟CMOS电路实现的基于确定性的ANN对应物相比，34个神经元的S-ANN实现实现的能耗降低了1.5个数量级以上，隐层芯片面积减少了2.5个数量级。我们相信，我们的S-ANN架构通过利用两个关键思想实现了如此显著的性能提升。首先，由于所有神经信号都被编码为随机比特流，因此标准的加权和突触可以通过随机比特写入和读取过程来实现。其次，我们设计并实现了一种新颖的多相抽运电路结构，以有效地实现软极限神经传递函数，这对于提高神经网络的整体能力和降低其网络复杂度至关重要。

{"title":"Stochastic-Based Synapse and Soft-Limiting Neuron with Spintronic Devices for Low Power and Robust Artificial Neural Networks","authors":"Yu Bai;Deliang Fan;Mingjie Lin","doi":"10.1109/TMSCS.2017.2787109","DOIUrl":"https://doi.org/10.1109/TMSCS.2017.2787109","url":null,"abstract":"We propose an innovative stochastic-based computing architecture to implement low-power and robust artificial neural network (S-ANN) with both magnetic tunneling junction (MTJ) and Domain Wall (DW) devices. Our mixed-model HSPICE simulation results have shown that, for a well-known pattern recognition task, a 34-neuron S-ANN implementation achieves more than 1.5 orders of magnitude lower energy consumption and 2.5 orders of magnitude less hidden layer chip area, when compared with its deterministicbased ANN counterparts which are implemented with digital and analog CMOS circuits. We believe that our S-ANN architecture achieves such a remarkable performance gain by leveraging two key ideas. First, because all neural signals are encoded as random bit streams, the standard weighted-sum synapses can be accomplished by stochastic bit writing and reading procedure. Second, we designed and implemented a novel multiple-phase pumping circuit structure to effectively realize the soft-limiting neural transfer function that is essential to improve the overall ANN capability and reduce its network complexity.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"463-476"},"PeriodicalIF":0.0,"publicationDate":"2017-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2787109","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68026464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

SpiNNaker: Event-Based Simulation—Quantitative Behavior SpiNNaker：基于事件的模拟——定量行为

IEEE Transactions on Multi-Scale Computing Systems

Pub Date : 2017-11-22 DOI: 10.1109/TMSCS.2017.2748122

Andrew D. Brown;John E. Chad;Raihaan Kamarudin;Kier J. Dugan;Stephen B. Furber

SpiNNaker (Spiking Neural Network Architecture) is a specialized computing engine, intended for real-time simulation of neural systems. It consists of a mesh of 240x240 nodes, each containing 18 ARM9 processors: over a million cores, communicating via a bespoke network. Ultimately, the machine will support the simulation of up to a billion neurons in real time, allowing simulation experiments to be taken to hitherto unattainable scales. The architecture achieves this by ignoring three of the axioms of computer design: the communication fabric is non-deterministic; there is no global core synchronisation, and the system state-held in distributed memory-is not coherent. Time models itself: there is no notion of computed simulation time-wallclock time is simulation time. Whilst these design decisions are orthogonal to conventional wisdom, they bring the engine behavior closer to its intended simulation target-neural systems. We describe how SpiNNaker simulates large neural ensembles; we provide performance figures and outline some failure mechanisms. SpiNNaker simulation time scales 1:1 with wallclock time at least up to nine million synaptic connections on a 768 core subsystem (~1400th of the full system) to accurately produce logically predicted results.

SpiNNaker（Spiking Neural Network Architecture）是一个专门的计算引擎，旨在实时模拟神经系统。它由240x240个节点组成，每个节点包含18个ARM9处理器：超过一百万个核心，通过定制网络进行通信。最终，该机器将实时支持多达10亿个神经元的模拟，使模拟实验能够达到迄今为止无法达到的规模。该体系结构通过忽略计算机设计的三个公理来实现这一点：通信结构是不确定性的；不存在全局核心同步，并且保持在分布式存储器中的系统状态不一致。时间模型本身：没有计算模拟时间的概念。wallclock时间就是模拟时间。虽然这些设计决策与传统观点正交，但它们使发动机行为更接近其预期的模拟目标神经系统。我们描述了SpiNNaker如何模拟大型神经系统；我们提供了性能数据并概述了一些故障机制。SpiNNaker模拟时间比例为1:1，壁时钟时间在768核心子系统（约为整个系统的1400个）上至少高达900万个突触连接，以准确产生逻辑预测结果。

{"title":"SpiNNaker: Event-Based Simulation—Quantitative Behavior","authors":"Andrew D. Brown;John E. Chad;Raihaan Kamarudin;Kier J. Dugan;Stephen B. Furber","doi":"10.1109/TMSCS.2017.2748122","DOIUrl":"https://doi.org/10.1109/TMSCS.2017.2748122","url":null,"abstract":"SpiNNaker (Spiking Neural Network Architecture) is a specialized computing engine, intended for real-time simulation of neural systems. It consists of a mesh of 240x240 nodes, each containing 18 ARM9 processors: over a million cores, communicating via a bespoke network. Ultimately, the machine will support the simulation of up to a billion neurons in real time, allowing simulation experiments to be taken to hitherto unattainable scales. The architecture achieves this by ignoring three of the axioms of computer design: the communication fabric is non-deterministic; there is no global core synchronisation, and the system state-held in distributed memory-is not coherent. Time models itself: there is no notion of computed simulation time-wallclock time is simulation time. Whilst these design decisions are orthogonal to conventional wisdom, they bring the engine behavior closer to its intended simulation target-neural systems. We describe how SpiNNaker simulates large neural ensembles; we provide performance figures and outline some failure mechanisms. SpiNNaker simulation time scales 1:1 with wallclock time at least up to nine million synaptic connections on a 768 core subsystem (~1400th of the full system) to accurately produce logically predicted results.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"450-462"},"PeriodicalIF":0.0,"publicationDate":"2017-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2748122","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67861115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

On Approximate Speculative Lock Elision 关于近似推测锁Elision

IEEE Transactions on Multi-Scale Computing Systems

Pub Date : 2017-11-17 DOI: 10.1109/TMSCS.2017.2773488

S. Karen Khatamifard;Ismail Akturk;Ulya R. Karpuzcu

Each synchronization point represents a point of serialization, and thereby can easily hurt parallel scalability. As demonstrated by recent studies, approximating, i.e., relaxing synchronization by eliminating a subset of synchronization points spatio-temporally can help improve parallel scalability, as long as approximation incurred violations of basic execution semantics remain predictable and controllable. Even if the divergence from fully-synchronized execution renders lower computation accuracy ratherthan catastrophic program termination, for approximation to be viable, the accuracy loss must be bounded. In this paper, we assess the viability of approximate synchronization using Speculative Lock Elision (SLE), which was adopted by hardware transactional memory implementations from industry, as a baseline for comparison. Specifically, we investigate the efficacy of exploiting semantic and temporal characteristics of critical sections in preventing excessive loss in computation accuracy, and devise a light-weight, proof-of-concept Approximate Speculative Lock Elision (ASLE) implementation, which exploits existing hardware support for SLE.

每个同步点代表一个序列化点，因此很容易损害并行可伸缩性。正如最近的研究所表明的那样，只要近似引起的对基本执行语义的违反保持可预测和可控，近似，即通过在时空上消除同步点的子集来放松同步，就有助于提高并行可扩展性。即使与灾难性程序终止相比，与完全同步执行的差异导致计算精度较低，为了使近似可行，精度损失必须是有界的。在本文中，我们使用推测锁Elision（SLE）来评估近似同步的可行性，该方法被业界的硬件事务存储器实现所采用，作为比较的基线。具体而言，我们研究了利用关键部分的语义和时间特征来防止计算精度过度损失的有效性，并设计了一种轻量级的概念验证近似推测锁Elision（ASLE）实现，该实现利用了对SLE的现有硬件支持。

{"title":"On Approximate Speculative Lock Elision","authors":"S. Karen Khatamifard;Ismail Akturk;Ulya R. Karpuzcu","doi":"10.1109/TMSCS.2017.2773488","DOIUrl":"https://doi.org/10.1109/TMSCS.2017.2773488","url":null,"abstract":"Each synchronization point represents a point of serialization, and thereby can easily hurt parallel scalability. As demonstrated by recent studies, approximating, i.e., relaxing synchronization by eliminating a subset of synchronization points spatio-temporally can help improve parallel scalability, as long as approximation incurred violations of basic execution semantics remain predictable and controllable. Even if the divergence from fully-synchronized execution renders lower computation accuracy ratherthan catastrophic program termination, for approximation to be viable, the accuracy loss must be bounded. In this paper, we assess the viability of approximate synchronization using Speculative Lock Elision (SLE), which was adopted by hardware transactional memory implementations from industry, as a baseline for comparison. Specifically, we investigate the efficacy of exploiting semantic and temporal characteristics of critical sections in preventing excessive loss in computation accuracy, and devise a light-weight, proof-of-concept Approximate Speculative Lock Elision (ASLE) implementation, which exploits existing hardware support for SLE.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 2","pages":"141-151"},"PeriodicalIF":0.0,"publicationDate":"2017-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2773488","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68025087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

High-Precision Performance Estimation for the Design Space Exploration of Dynamic Dataflow Programs 动态数据流程序设计空间探索的高精度性能估计

IEEE Transactions on Multi-Scale Computing Systems

Pub Date : 2017-11-16 DOI: 10.1109/TMSCS.2017.2774294

Małgorzata Michalska;Simone Casale-Brunet;Endri Bezati;Marco Mattavelli

The implementation and optimization of dynamic dataflow programs on multi/many-core platforms require solving a very difficult problem: how to partition and schedule the processing elements and dimension their interconnecting buffers according to given optimization functions in terms of throughput, memory usage, and energy consumption. This problem is NP-hard even for two cores. Thus, finding a close-to-optimal solution consists of exploring the design space by appropriate heuristics identifying those design points that maximize or minimize the desired (multiple) objective functions subject to a set of constraints. In general, exploring the design space efficiently is a challenging task due to the massive number of admissible design points. Efficient estimation methodologies are necessary to support an effective search of the design space by reducing to a minimum the cost and the number of measurements on the physical platform. This paper presents a new methodology that provides high-precision estimations of dynamic dataflow programs performances on multi/many-core platforms for any set of design configurations. The estimations rely on the execution trace post-processing obtained by a single profiled execution of the program. Furthermore, the paper describes the estimation methodology, implementation tools, and the type of information that is obtained from many/multi-core dataflow executions and used to drive the optimization heuristics. The results confirm a high level of accuracy achieved on different types of platforms and the effectiveness of the illustrated design space exploration methodology.

在多/多核心平台上实现和优化动态数据流程序需要解决一个非常困难的问题：如何根据吞吐量、内存使用和能耗方面的给定优化函数对处理元素进行分区和调度，并确定其互连缓冲区的大小。即使对于两个核心，这个问题也是NP难的。因此，找到接近最优的解决方案包括通过适当的启发法来探索设计空间，识别那些在一组约束条件下最大化或最小化期望（多个）目标函数的设计点。一般来说，由于大量的可接受设计点，有效地探索设计空间是一项具有挑战性的任务。有效的估计方法是必要的，以通过将物理平台上的测量成本和数量降至最低来支持对设计空间的有效搜索。本文提出了一种新的方法，该方法对任何一组设计配置的多/多核心平台上的动态数据流程序性能提供了高精度的估计。估计依赖于由程序的单个概要执行获得的执行跟踪后处理。此外，本文描述了从多核/多核数据流执行中获得并用于驱动优化启发式算法的估计方法、实现工具和信息类型。结果证实了在不同类型的平台上实现的高精度以及所示设计空间探索方法的有效性。

{"title":"High-Precision Performance Estimation for the Design Space Exploration of Dynamic Dataflow Programs","authors":"Małgorzata Michalska;Simone Casale-Brunet;Endri Bezati;Marco Mattavelli","doi":"10.1109/TMSCS.2017.2774294","DOIUrl":"https://doi.org/10.1109/TMSCS.2017.2774294","url":null,"abstract":"The implementation and optimization of dynamic dataflow programs on multi/many-core platforms require solving a very difficult problem: how to partition and schedule the processing elements and dimension their interconnecting buffers according to given optimization functions in terms of throughput, memory usage, and energy consumption. This problem is NP-hard even for two cores. Thus, finding a close-to-optimal solution consists of exploring the design space by appropriate heuristics identifying those design points that maximize or minimize the desired (multiple) objective functions subject to a set of constraints. In general, exploring the design space efficiently is a challenging task due to the massive number of admissible design points. Efficient estimation methodologies are necessary to support an effective search of the design space by reducing to a minimum the cost and the number of measurements on the physical platform. This paper presents a new methodology that provides high-precision estimations of dynamic dataflow programs performances on multi/many-core platforms for any set of design configurations. The estimations rely on the execution trace post-processing obtained by a single profiled execution of the program. Furthermore, the paper describes the estimation methodology, implementation tools, and the type of information that is obtained from many/multi-core dataflow executions and used to drive the optimization heuristics. The results confirm a high level of accuracy achieved on different types of platforms and the effectiveness of the illustrated design space exploration methodology.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 2","pages":"127-140"},"PeriodicalIF":0.0,"publicationDate":"2017-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2774294","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68025090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Multiscaled Simulation Methodology for Neuro-Inspired Circuits Demonstrated with an Organic Memristor 用有机忆阻器演示神经激励电路的多尺度仿真方法

IEEE Transactions on Multi-Scale Computing Systems

Pub Date : 2017-11-14 DOI: 10.1109/TMSCS.2017.2773523

Christopher H. Bennett;Jean-Etienne Lorival;Francois Marc;Théo Cabaret;Bruno Jousselme;Vincent Derycke;Jacques-Olivier Klein;Cristell Maneux

Organic memristors are promising molecular electronic devices for neuro-inspired on-chip learning applications. In this paper, we present a numerically efficient compact model suitable for

$Fe(bpy)_3^{2+}$

organic memristors operating according to an intramolecular charge transfer switching mechanism. This compact model, being physics-based and relying on electrical characterizations and parametric extractions performed on test structures, is especially efficient in pulsed mode and describes the conductance variations for both SET and RESET regimes. Using this model, a dynamic multiscale simulation approach has been set-up to extend the model from individual devices to larger model systems that learn progressively through time. To verify the soundness and highlight emergent properties of the organic memristors, instances of the compact model have been simulated within a simple neuromorphic design that co-integrates with CMOS neurons. In addition, a larger supervised learning system using the new compact model is demonstrated. These successful tests suggest our model might be of interest to neuromorphic designers.

有机忆阻器是一种很有前途的分子电子设备，用于神经启发的芯片上学习应用。在本文中，我们提出了一个适用于$Fe（bpy）_3^{2+}$有机忆阻器的数值高效紧凑模型，该忆阻器根据分子内电荷转移开关机制运行。这种紧凑的模型基于物理，依赖于对测试结构进行的电学表征和参数提取，在脉冲模式下尤其有效，并描述了SET和RESET状态下的电导变化。使用该模型，建立了一种动态多尺度模拟方法，将模型从单个设备扩展到随时间逐渐学习的更大模型系统。为了验证有机忆阻器的可靠性并突出其涌现特性，在与CMOS神经元共同集成的简单神经形态设计中模拟了紧凑模型的实例。此外，还演示了一个使用新的紧凑模型的更大的监督学习系统。这些成功的测试表明，我们的模型可能会引起神经形态设计师的兴趣。

{"title":"Multiscaled Simulation Methodology for Neuro-Inspired Circuits Demonstrated with an Organic Memristor","authors":"Christopher H. Bennett;Jean-Etienne Lorival;Francois Marc;Théo Cabaret;Bruno Jousselme;Vincent Derycke;Jacques-Olivier Klein;Cristell Maneux","doi":"10.1109/TMSCS.2017.2773523","DOIUrl":"https://doi.org/10.1109/TMSCS.2017.2773523","url":null,"abstract":"Organic memristors are promising molecular electronic devices for neuro-inspired on-chip learning applications. In this paper, we present a numerically efficient compact model suitable for \u0000<inline-formula><tex-math>$Fe(bpy)_3^{2+}$</tex-math></inline-formula>\u0000 organic memristors operating according to an intramolecular charge transfer switching mechanism. This compact model, being physics-based and relying on electrical characterizations and parametric extractions performed on test structures, is especially efficient in pulsed mode and describes the conductance variations for both SET and RESET regimes. Using this model, a dynamic multiscale simulation approach has been set-up to extend the model from individual devices to larger model systems that learn progressively through time. To verify the soundness and highlight emergent properties of the organic memristors, instances of the compact model have been simulated within a simple neuromorphic design that co-integrates with CMOS neurons. In addition, a larger supervised learning system using the new compact model is demonstrated. These successful tests suggest our model might be of interest to neuromorphic designers.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"822-832"},"PeriodicalIF":0.0,"publicationDate":"2017-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2773523","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67861365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

OpSecure: A Secure Unidirectional Optical Channel for Implantable Medical Devices OpSecure：一种用于植入式医疗设备的安全单向光通道

IEEE Transactions on Multi-Scale Computing Systems

Pub Date : 2017-11-09 DOI: 10.1109/TMSCS.2017.2771347

Arsalan Mosenia;Niraj K. Jha

Implantable medical devices (IMDs) are opening up new opportunities for holistic healthcare by enabling continuous monitoring and treatment of various medical conditions, leading to an ever-improving quality of life for patients. Integration of radio frequency (RF) modules in IMDs has provided wireless connectivity and facilitated access to on-device data and post-deployment tuning of essential therapy. However, this has also made IMDs susceptible to various security attacks. Several lightweight encryption mechanisms have been developed to prevent well-known attacks, e.g., integrity attacks that send malicious commands to the device, on IMDs. However, lack of a secure key exchange protocol (that enables the exchange of the encryption key while maintaining its confidentiality) and the immaturity of already-in-use wakeup protocols (that are used to turn on the RF module before an authorized data transmission) are two fundamental challenges that must be addressed to ensure the security of wireless-enabled IMDs. In this paper, we introduce OpSecure, an optical secure communication channel between an IMD and an external device, e.g., a smartphone. OpSecure enables an intrinsically user-perceptible unidirectional data transmission, suitable for physically-secure communication with minimal size and energy overheads. Based on OpSecure, we design and implement two protocols: (i) a low-power wakeup protocol that is resilient against remote battery-draining attacks, and (ii) a secure key exchange protocol to share the encryption key between the IMD and the external device. We evaluate the two protocols using a human body model.

植入式医疗设备（IMD）通过能够持续监测和治疗各种疾病，为整体医疗保健开辟了新的机会，从而不断提高患者的生活质量。IMD中射频（RF）模块的集成提供了无线连接，并促进了对设备上数据的访问和基本治疗的部署后调谐。然而，这也使得IMD容易受到各种安全攻击。已经开发了几种轻量级加密机制来防止众所周知的攻击，例如，在IMD上向设备发送恶意命令的完整性攻击。然而，缺乏安全的密钥交换协议（能够在保持加密密钥机密性的同时交换加密密钥）和已经在使用的唤醒协议（用于在授权数据传输之前打开RF模块）的不成熟是必须解决的两个基本挑战，以确保启用无线的IMD的安全。在本文中，我们介绍了OpSecure，这是IMD和外部设备（如智能手机）之间的一种光学安全通信通道。OpSecure实现了本质上用户可感知的单向数据传输，适用于物理安全通信，具有最小的尺寸和能量开销。基于OpSecure，我们设计并实现了两个协议：（i）一个低功耗唤醒协议，可抵御远程电池耗尽攻击；（ii）一个安全密钥交换协议，可在IMD和外部设备之间共享加密密钥。我们使用人体模型来评估这两个方案。

{"title":"OpSecure: A Secure Unidirectional Optical Channel for Implantable Medical Devices","authors":"Arsalan Mosenia;Niraj K. Jha","doi":"10.1109/TMSCS.2017.2771347","DOIUrl":"https://doi.org/10.1109/TMSCS.2017.2771347","url":null,"abstract":"Implantable medical devices (IMDs) are opening up new opportunities for holistic healthcare by enabling continuous monitoring and treatment of various medical conditions, leading to an ever-improving quality of life for patients. Integration of radio frequency (RF) modules in IMDs has provided wireless connectivity and facilitated access to on-device data and post-deployment tuning of essential therapy. However, this has also made IMDs susceptible to various security attacks. Several lightweight encryption mechanisms have been developed to prevent well-known attacks, e.g., integrity attacks that send malicious commands to the device, on IMDs. However, lack of a secure key exchange protocol (that enables the exchange of the encryption key while maintaining its confidentiality) and the immaturity of already-in-use wakeup protocols (that are used to turn on the RF module before an authorized data transmission) are two fundamental challenges that must be addressed to ensure the security of wireless-enabled IMDs. In this paper, we introduce OpSecure, an optical secure communication channel between an IMD and an external device, e.g., a smartphone. OpSecure enables an intrinsically user-perceptible unidirectional data transmission, suitable for physically-secure communication with minimal size and energy overheads. Based on OpSecure, we design and implement two protocols: (i) a low-power wakeup protocol that is resilient against remote battery-draining attacks, and (ii) a secure key exchange protocol to share the encryption key between the IMD and the external device. We evaluate the two protocols using a human body model.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"410-419"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2771347","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A Hardware/Software Stack for Heterogeneous Systems 异构系统的硬件/软件堆栈

IEEE Transactions on Multi-Scale Computing Systems

Pub Date : 2017-11-09 DOI: 10.1109/TMSCS.2017.2771750

Jeronimo Castrillon;Matthias Lieber;Sascha Klüppelholz;Marcus Völp;Nils Asmussen;Uwe Aßmann;Franz Baader;Christel Baier;Gerhard Fettweis;Jochen Fröhlich;Andrés Goens;Sebastian Haas;Dirk Habich;Hermann Härtig;Mattis Hasler;Immo Huismann;Tomas Karnagel;Sven Karol;Akash Kumar;Wolfgang Lehner;Linda Leuschner;Siqi Ling;Steffen Märcker;Christian Menard;Johannes Mey;Wolfgang Nagel;Benedikt Nöthen;Rafael Peñaloza;Michael Raitza;Jörg Stiller;Annett Ungethüm;Axel Voigt;Sascha Wunderlich

Plenty of novel emerging technologies are being proposed and evaluated today, mostly at the device and circuit levels. It is unclear what the impact of different new technologies at the system level will be. What is clear, however, is that new technologies will make their way into systems and will increase the already high complexity of heterogeneous parallel computing platforms, making it ever so difficult to program them. This paper discusses a programming stack for heterogeneous systems that combines and adapts well-understood principles from different areas, including capability-based operating systems, adaptive application runtimes, dataflow programming models, and model checking. We argue why we think that these principles built into the stack and the interfaces among the layers will also be applicable to future systems that integrate heterogeneous technologies. The programming stack is evaluated on a tiled heterogeneous multicore.

今天，许多新兴技术正在被提出和评估，主要是在器件和电路层面。目前尚不清楚不同的新技术在系统层面会产生什么影响。然而，显而易见的是，新技术将进入系统，并将增加异构并行计算平台本已很高的复杂性，使其编程变得越来越困难。本文讨论了一种用于异构系统的编程堆栈，它结合并适应了来自不同领域的众所周知的原理，包括基于能力的操作系统、自适应应用程序运行时、数据流编程模型和模型检查。我们讨论了为什么我们认为构建在堆栈中的这些原则以及层之间的接口也将适用于集成异构技术的未来系统。编程堆栈是在平铺的异构多核上进行评估的。

{"title":"A Hardware/Software Stack for Heterogeneous Systems","authors":"Jeronimo Castrillon;Matthias Lieber;Sascha Klüppelholz;Marcus Völp;Nils Asmussen;Uwe Aßmann;Franz Baader;Christel Baier;Gerhard Fettweis;Jochen Fröhlich;Andrés Goens;Sebastian Haas;Dirk Habich;Hermann Härtig;Mattis Hasler;Immo Huismann;Tomas Karnagel;Sven Karol;Akash Kumar;Wolfgang Lehner;Linda Leuschner;Siqi Ling;Steffen Märcker;Christian Menard;Johannes Mey;Wolfgang Nagel;Benedikt Nöthen;Rafael Peñaloza;Michael Raitza;Jörg Stiller;Annett Ungethüm;Axel Voigt;Sascha Wunderlich","doi":"10.1109/TMSCS.2017.2771750","DOIUrl":"https://doi.org/10.1109/TMSCS.2017.2771750","url":null,"abstract":"Plenty of novel emerging technologies are being proposed and evaluated today, mostly at the device and circuit levels. It is unclear what the impact of different new technologies at the system level will be. What is clear, however, is that new technologies will make their way into systems and will increase the already high complexity of heterogeneous parallel computing platforms, making it ever so difficult to program them. This paper discusses a programming stack for heterogeneous systems that combines and adapts well-understood principles from different areas, including capability-based operating systems, adaptive application runtimes, dataflow programming models, and model checking. We argue why we think that these principles built into the stack and the interfaces among the layers will also be applicable to future systems that integrate heterogeneous technologies. The programming stack is evaluated on a tiled heterogeneous multicore.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"243-259"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2771750","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68026461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores 超低功率集群多核的节能I$设计探索

IEEE Transactions on Multi-Scale Computing Systems

Pub Date : 2017-11-02 DOI: 10.1109/TMSCS.2017.2769046

Igor Loi;Alessandro Capotondi;Davide Rossi;Andrea Marongiu;Luca Benini

High performance and extreme energy efficiency are strong requirements for a fast-growing number of edge-node Internet of Things (IoT) applications. While traditional Ultra-Low-Power designs rely on single-core micro-controllers (MCU), a new generation of architectures leveraging fully programmable tightly-coupled clusters of near-threshold processors is emerging, joining the performance gain of parallel execution over multiple cores with the energy efficiency of low-voltage operation. In this work, we tackle one of the most critical energy-efficiency bottlenecks for these architectures: instruction memory hierarchy. Exploiting the instruction locality typical of data-parallel applications, we explore two different shared instruction cache architectures, based on energy-efficient latch-based memory banks: one leveraging a crossbar between processors and single-port banks (SP), and one leveraging banks with multiple read ports (MP). We evaluate the proposed architectures on a set of signal processing applications with different executable sizes and working-sets. The results show that the shared cache architectures are able to efficiently execute a much wider set of applications (including those featuring large memory footprint and irregular access patterns) with a much smaller area and with much better energy efficiency with respect to the private cache. The multi-port cache is suitable for sizes up to a few kB, improving performance by up to 40 percent, energy efficiency by up to 20 percent, and energy × area efficiency by up to 30 percent with respect to the private cache. The single-port solution is more suitable for larger cache sizes (up to 16 kB), providing up to 20 percent better energy x area efficiency than the multi-port, and up to 30 percent better energy efficiency than private cache.

高性能和极端的能源效率是快速增长的边缘节点物联网（IoT）应用的强烈要求。虽然传统的超低功耗设计依赖于单核微控制器（MCU），但新一代架构正在出现，它利用了近阈值处理器的完全可编程紧密耦合集群，将多核并行执行的性能增益与低电压操作的能效结合起来。在这项工作中，我们解决了这些体系结构中最关键的能效瓶颈之一：指令内存层次结构。利用数据并行应用程序的典型指令局部性，我们探索了两种不同的共享指令缓存体系结构，它们基于节能的基于闩锁的内存组：一种利用处理器和单端口组（SP）之间的交叉开关，另一种利用具有多个读端口的组（MP）。我们在一组具有不同可执行大小和工作集的信号处理应用程序上评估了所提出的架构。结果表明，与私有缓存相比，共享缓存架构能够以更小的面积和更好的能效高效地执行更广泛的应用程序集（包括那些具有大内存占用和不规则访问模式的应用程序）。多端口缓存适用于大小高达几kB的情况，与专用缓存相比，性能提高了40%，能源效率提高了20%，能源×面积效率提高了30%。单端口解决方案更适合较大的缓存大小（高达16kB），比多端口提供高达20%的能量x面积效率，比专用缓存提供高达30%的能量效率。

{"title":"The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores","authors":"Igor Loi;Alessandro Capotondi;Davide Rossi;Andrea Marongiu;Luca Benini","doi":"10.1109/TMSCS.2017.2769046","DOIUrl":"https://doi.org/10.1109/TMSCS.2017.2769046","url":null,"abstract":"High performance and extreme energy efficiency are strong requirements for a fast-growing number of edge-node Internet of Things (IoT) applications. While traditional Ultra-Low-Power designs rely on single-core micro-controllers (MCU), a new generation of architectures leveraging fully programmable tightly-coupled clusters of near-threshold processors is emerging, joining the performance gain of parallel execution over multiple cores with the energy efficiency of low-voltage operation. In this work, we tackle one of the most critical energy-efficiency bottlenecks for these architectures: instruction memory hierarchy. Exploiting the instruction locality typical of data-parallel applications, we explore two different shared instruction cache architectures, based on energy-efficient latch-based memory banks: one leveraging a crossbar between processors and single-port banks (SP), and one leveraging banks with multiple read ports (MP). We evaluate the proposed architectures on a set of signal processing applications with different executable sizes and working-sets. The results show that the shared cache architectures are able to efficiently execute a much wider set of applications (including those featuring large memory footprint and irregular access patterns) with a much smaller area and with much better energy efficiency with respect to the private cache. The multi-port cache is suitable for sizes up to a few kB, improving performance by up to 40 percent, energy efficiency by up to 20 percent, and energy × area efficiency by up to 30 percent with respect to the private cache. The single-port solution is more suitable for larger cache sizes (up to 16 kB), providing up to 20 percent better energy x area efficiency than the multi-port, and up to 30 percent better energy efficiency than private cache.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 2","pages":"99-112"},"PeriodicalIF":0.0,"publicationDate":"2017-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2769046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68025088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

A Systematic and Realistic Network-on-Chip Traffic Modeling and Generation Technique for Emerging Many-Core Systems 面向新兴多核心系统的系统真实的片上网络流量建模与生成技术

IEEE Transactions on Multi-Scale Computing Systems

Pub Date : 2017-11-02 DOI: 10.1109/TMSCS.2017.2768362

Weichen Liu;Zhe Wang;Peng Yang;Jiang Xu;Bin Li;Ravi Lyer;Ramesh Illikkal

As programs for microprocessor architectures, network-on-chip (NoC) traffic patterns are essential tools for NoC performance assessment and design exploration. The fidelity of NoC traffic patterns has profound influence on NoC studies. In this paper, we present a systematic traffic modeling and generation methodology and a traffic suite for efficient evaluation of NoC-based many-core systems. The publicly released MCSL (multi-constraint system-level) traffic suite includes a set of realistic traffic patterns for real-world applications and covers popular NoC architectures. It captures both the communication behaviors in NoCs and the temporal dependencies among them. The MCSL traffic suite can be easily incorporated into existing NoC simulators and significantly improve NoC simulation accuracy. The proposed methodology uses formal computational models to capture both communication and computation requirements of applications. It optimizes application memory requirements, mapping, and scheduling to maximize overall system performance and utilization before extracting traffic patterns through cycle level simulations. Experiment results show that the MCSL traffic suite can be used to study NoC characteristics more accurately than traditional random traffic patterns.

作为微处理器体系结构的程序，片上网络流量模式是片上网络性能评估和设计探索的重要工具。NoC交通模式的保真度对NoC研究有着深远的影响。在本文中，我们提出了一种系统的流量建模和生成方法，以及一套流量套件，用于对基于NoC的多核心系统进行有效评估。公开发布的MCSL（多约束系统级）流量套件包括一组适用于现实世界应用程序的真实流量模式，并涵盖了流行的NoC架构。它捕捉了NoC中的通信行为以及它们之间的时间依赖关系。MCSL流量套件可以很容易地集成到现有的NoC模拟器中，并显著提高NoC模拟的准确性。所提出的方法使用形式化的计算模型来捕获应用程序的通信和计算需求。它优化了应用程序内存需求、映射和调度，以最大限度地提高整体系统性能和利用率，然后再通过周期级模拟提取流量模式。实验结果表明，MCSL流量套件可以比传统的随机流量模式更准确地研究NoC特性。

{"title":"A Systematic and Realistic Network-on-Chip Traffic Modeling and Generation Technique for Emerging Many-Core Systems","authors":"Weichen Liu;Zhe Wang;Peng Yang;Jiang Xu;Bin Li;Ravi Lyer;Ramesh Illikkal","doi":"10.1109/TMSCS.2017.2768362","DOIUrl":"https://doi.org/10.1109/TMSCS.2017.2768362","url":null,"abstract":"As programs for microprocessor architectures, network-on-chip (NoC) traffic patterns are essential tools for NoC performance assessment and design exploration. The fidelity of NoC traffic patterns has profound influence on NoC studies. In this paper, we present a systematic traffic modeling and generation methodology and a traffic suite for efficient evaluation of NoC-based many-core systems. The publicly released MCSL (multi-constraint system-level) traffic suite includes a set of realistic traffic patterns for real-world applications and covers popular NoC architectures. It captures both the communication behaviors in NoCs and the temporal dependencies among them. The MCSL traffic suite can be easily incorporated into existing NoC simulators and significantly improve NoC simulation accuracy. The proposed methodology uses formal computational models to capture both communication and computation requirements of applications. It optimizes application memory requirements, mapping, and scheduling to maximize overall system performance and utilization before extracting traffic patterns through cycle level simulations. Experiment results show that the MCSL traffic suite can be used to study NoC characteristics more accurately than traditional random traffic patterns.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 2","pages":"113-126"},"PeriodicalIF":0.0,"publicationDate":"2017-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2768362","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68025089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Finding and Counting Tree-Like Subgraphs Using MapReduce 使用MapReduce查找和计数树状子图

IEEE Transactions on Multi-Scale Computing Systems

Pub Date : 2017-10-31 DOI: 10.1109/TMSCS.2017.2768426

Zhao Zhao;Langshi Chen;Mihai Avram;Meng Li;Guanying Wang;Ali Butt;Maleq Khan;Madhav Marathe;Judy Qiu;Anil Vullikanti

Several variants of the subgraph isomorphism problem, e.g., finding, counting, and estimating frequencies of subgraphs in networks arise in a number of real world applications, such as web analysis, disease diffusion prediction, and social network analysis. These problems are computationally challenging in having to scale to very large networks with millions of vertices. In this paper, we present SAHAD, a MapReduce algorithm for detecting and counting trees of bounded size using the elegant color coding technique developed by N. Alon et al. SAHAD is a randomized algorithm, and we show rigorous bounds on the approximation quality and the performance of it. SAHAD scales to very large networks comprising of 10⁷ - 10⁸ vertices and 10⁸ - 10⁹ edges and tree-like (acyclic) templates with up to 12 vertices. Further, we extend our results by implementing SAHAD in the Harp framework, which is more of a high performance computing environment. The new implementation gives 100x improvement in performance over the standard Hadoop implementation and achieves better performance than state-of-the-art MPI solutions on larger graphs.

子图同构问题的几种变体，例如，在网络中查找、计数和估计子图的频率，出现在许多现实世界的应用中，如网络分析、疾病扩散预测和社交网络分析。这些问题在计算上具有挑战性，因为必须扩展到具有数百万个顶点的非常大的网络。在本文中，我们提出了SAHAD，这是一种使用N.Alon等人开发的优雅颜色编码技术来检测和计数有界大小的树的MapReduce算法。SAHAD是一种随机算法，我们对其近似质量和性能给出了严格的限制。SAHAD可扩展到由107-108个顶点和108-109条边组成的非常大的网络，以及最多有12个顶点的树状（非循环）模板。此外，我们通过在Harp框架中实现SAHAD来扩展我们的结果，Harp框架更像是一个高性能计算环境。与标准Hadoop实现相比，新实现的性能提高了100倍，并且在更大的图形上实现了比最先进的MPI解决方案更好的性能。

{"title":"Finding and Counting Tree-Like Subgraphs Using MapReduce","authors":"Zhao Zhao;Langshi Chen;Mihai Avram;Meng Li;Guanying Wang;Ali Butt;Maleq Khan;Madhav Marathe;Judy Qiu;Anil Vullikanti","doi":"10.1109/TMSCS.2017.2768426","DOIUrl":"https://doi.org/10.1109/TMSCS.2017.2768426","url":null,"abstract":"Several variants of the subgraph isomorphism problem, e.g., finding, counting, and estimating frequencies of subgraphs in networks arise in a number of real world applications, such as web analysis, disease diffusion prediction, and social network analysis. These problems are computationally challenging in having to scale to very large networks with millions of vertices. In this paper, we present SAHAD, a MapReduce algorithm for detecting and counting trees of bounded size using the elegant color coding technique developed by N. Alon et al. SAHAD is a randomized algorithm, and we show rigorous bounds on the approximation quality and the performance of it. SAHAD scales to very large networks comprising of 10\u00007\u0000 - 10\u00008\u0000 vertices and 10\u00008\u0000 - 10\u00009\u0000 edges and tree-like (acyclic) templates with up to 12 vertices. Further, we extend our results by implementing SAHAD in the Harp framework, which is more of a high performance computing environment. The new implementation gives 100x improvement in performance over the standard Hadoop implementation and achieves better performance than state-of-the-art MPI solutions on larger graphs.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"217-230"},"PeriodicalIF":0.0,"publicationDate":"2017-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2768426","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68026459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3