首页 > 最新文献

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献

英文 中文
A systematic review of machine learning techniques in online learning platforms 在线学习平台中机器学习技术的系统综述
Cyril Elorm Kodjo Agbewali-Koku, Md.Atiqur Rahman, Mohamed Hamada, Mohammad Ameer Ali, Lutfun Nahar Oysharja, Md. Tazmim Hossain
The mode of education has changed over the past few years from the conventional method of in-person classes to the usage of online platforms to facilitate teaching and learning. These platforms popularly, known as online learning systems, have gradually become an integral part of education. These online platforms have been designed using various Artificial intelligence frameworks and techniques to enhance their functionality and personalize them for their users. Machine learning is one of the major fields of AI that has been used in most of these online platforms. Popular machine learning techniques such as deep learning, natural language processing, reinforcement learning, and others are being actively used and studied to further improve them for use. In this study, the focus will be on content analysis of different studies aimed at disclosing machine learning techniques that have been applied in the online learning sector and exploring the potential research trends and challenges of integrating machine learning techniques in online learning. The study will focus on published papers from the year 2015 to 2021, classifying them based on the research question.
在过去的几年里,教育模式已经从传统的面对面上课的方法转变为使用在线平台来促进教与学。这些平台通常被称为在线学习系统,已经逐渐成为教育的一个组成部分。这些在线平台使用各种人工智能框架和技术进行设计,以增强其功能并为用户个性化。机器学习是人工智能的主要领域之一,已经在大多数在线平台中使用。流行的机器学习技术,如深度学习、自然语言处理、强化学习等,正在被积极使用和研究,以进一步改进它们的使用。在本研究中,重点将放在不同研究的内容分析上,旨在揭示已应用于在线学习领域的机器学习技术,并探索将机器学习技术集成到在线学习中的潜在研究趋势和挑战。该研究将以2015年至2021年发表的论文为重点,根据研究问题进行分类。
{"title":"A systematic review of machine learning techniques in online learning platforms","authors":"Cyril Elorm Kodjo Agbewali-Koku, Md.Atiqur Rahman, Mohamed Hamada, Mohammad Ameer Ali, Lutfun Nahar Oysharja, Md. Tazmim Hossain","doi":"10.1109/MCSoC57363.2022.00046","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00046","url":null,"abstract":"The mode of education has changed over the past few years from the conventional method of in-person classes to the usage of online platforms to facilitate teaching and learning. These platforms popularly, known as online learning systems, have gradually become an integral part of education. These online platforms have been designed using various Artificial intelligence frameworks and techniques to enhance their functionality and personalize them for their users. Machine learning is one of the major fields of AI that has been used in most of these online platforms. Popular machine learning techniques such as deep learning, natural language processing, reinforcement learning, and others are being actively used and studied to further improve them for use. In this study, the focus will be on content analysis of different studies aimed at disclosing machine learning techniques that have been applied in the online learning sector and exploring the potential research trends and challenges of integrating machine learning techniques in online learning. The study will focus on published papers from the year 2015 to 2021, classifying them based on the research question.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125434251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Making Software Based on Human-Driven Design Case Study: SQL for non-experts 基于人为驱动设计的软件制作案例研究:非专家的SQL
Hida Masataka, Y. Watanobe
There are barriers between general public people and professionals or computers. This barrier can also be seen in an educational situation. The purpose of this study is to break the barriers and reduce the learning cost, misunderstandings, and conflicts among people, computers, and knowledge in studying. This study attempted to find a way to provide information to the general public by using the software. In this paper, “general people” stands for students or people learning new skills and professional knowledge or people who have no knowledge of the field. As a case study, this study will create a user-friendly SQL system for non-experts. Approaches are mainly presented from the UI design of software and the usage of professional terms and words. The concept of Human-Driven Design(HDD) is also used. This paper researches the balance of interaction between humans and technology when using software. Based on HDD, this study also considers the sustainability of the database field and creates an opportunity to consider the reasons why the wall between experts and non-experts appears.
在普通大众和专业人士或计算机之间存在着障碍。这种障碍也可以在教育环境中看到。本研究的目的是为了打破学习过程中人、计算机、知识之间的障碍,降低学习成本、误解和冲突。本研究试图找到一种通过使用该软件向公众提供信息的方法。在本文中,“一般人”代表学生或学习新技能和专业知识的人或对该领域一无所知的人。作为一个案例研究,本研究将为非专家创建一个用户友好的SQL系统。主要从软件的UI设计和专业术语的使用两方面提出了方法。还使用了Human-Driven Design(HDD)的概念。本文研究了在使用软件时,人与技术之间的交互平衡。基于HDD,本研究还考虑了数据库领域的可持续性,并创造了一个机会来考虑专家和非专家之间的墙出现的原因。
{"title":"Making Software Based on Human-Driven Design Case Study: SQL for non-experts","authors":"Hida Masataka, Y. Watanobe","doi":"10.1109/MCSoC57363.2022.00049","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00049","url":null,"abstract":"There are barriers between general public people and professionals or computers. This barrier can also be seen in an educational situation. The purpose of this study is to break the barriers and reduce the learning cost, misunderstandings, and conflicts among people, computers, and knowledge in studying. This study attempted to find a way to provide information to the general public by using the software. In this paper, “general people” stands for students or people learning new skills and professional knowledge or people who have no knowledge of the field. As a case study, this study will create a user-friendly SQL system for non-experts. Approaches are mainly presented from the UI design of software and the usage of professional terms and words. The concept of Human-Driven Design(HDD) is also used. This paper researches the balance of interaction between humans and technology when using software. Based on HDD, this study also considers the sustainability of the database field and creates an opportunity to consider the reasons why the wall between experts and non-experts appears.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122508626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distance Aware Compression for Low Latency High Bandwidth Interconnection Network 低延迟高带宽互连网络的距离感知压缩
Yuqing Zhou, Naoya Niwa, H. Amano
NoC(Network-on-Chip)s is an essential component of recent multi-core systems. When the number of wires available on a chip is limited, it is sometimes congested and increased la-tency can degrade the parallel application performance. The selective data compression has been proposed to mitigate such network congestion by compressing and decompressing packets based on the packet length and traffic situation. However, since the algorithm does not care the location of nodes, the compression and decompression are performed even when the packet is transferred between neighboring nodes. This paper proposes a distance aware (DA) compression mechanism to select whether the packet should be compressed by the distance to the destination. The packets to the nodes whose distance is larger than threshold level are compressed with a run-length loss-less compression at the sender's network interface and de-compressed at the receiver's network interface. Cycle level network simulation results show that the selective compression method achieves up to 45% bandwidth improve-ment with 1.26 times increase of the latency.
片上网络是当今多核系统的重要组成部分。当芯片上可用的线路数量有限时,有时会出现拥塞,并且延迟的增加会降低并行应用程序的性能。为了缓解这种网络拥塞,提出了选择性数据压缩,根据数据包长度和流量情况对数据包进行压缩和解压缩。但是,由于该算法不关心节点的位置,因此即使在相邻节点之间传输数据包也会进行压缩和解压缩。本文提出了一种距离感知(DA)压缩机制来选择数据包是否应该根据到目的地的距离进行压缩。对于发送到距离大于阈值的节点的数据包,在发送方的网络接口进行运行长度无损压缩,在接收方的网络接口进行解压缩。周期级网络仿真结果表明,选择性压缩方法的带宽提高了45%,延迟提高了1.26倍。
{"title":"Distance Aware Compression for Low Latency High Bandwidth Interconnection Network","authors":"Yuqing Zhou, Naoya Niwa, H. Amano","doi":"10.1109/MCSoC57363.2022.00063","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00063","url":null,"abstract":"NoC(Network-on-Chip)s is an essential component of recent multi-core systems. When the number of wires available on a chip is limited, it is sometimes congested and increased la-tency can degrade the parallel application performance. The selective data compression has been proposed to mitigate such network congestion by compressing and decompressing packets based on the packet length and traffic situation. However, since the algorithm does not care the location of nodes, the compression and decompression are performed even when the packet is transferred between neighboring nodes. This paper proposes a distance aware (DA) compression mechanism to select whether the packet should be compressed by the distance to the destination. The packets to the nodes whose distance is larger than threshold level are compressed with a run-length loss-less compression at the sender's network interface and de-compressed at the receiver's network interface. Cycle level network simulation results show that the selective compression method achieves up to 45% bandwidth improve-ment with 1.26 times increase of the latency.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124662047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Spintronics-Based Nonvolatile FPGA and Its Application to Edge-AI Accelerator 基于自旋电子学的非易失性FPGA及其在Edge-AI加速器中的应用
D. Suzuki, T. Hanyu
A nonvolatile (NV) field-programmable gate array (FPGA) is quite attractive hardware platform for an internet of things (IoT) device in terms of reconfigurability and ultra-low-power standby power consumption. Moreover, the use of NV logic-in-memory (LIM) circuitry makes it possible to im-prove both area efficiency and energy efficiency. In this paper, some related topics about NV-FPGA, NV-LIM circuitry, and its application to the edge-AI accelerator are presented and its effectiveness is demonstrated.
非易失性(NV)现场可编程门阵列(FPGA)在可重构性和超低待机功耗方面是物联网(IoT)设备非常有吸引力的硬件平台。此外,NV内存逻辑(LIM)电路的使用使得提高面积效率和能量效率成为可能。本文介绍了NV-FPGA、NV-LIM电路及其在边缘ai加速器中的应用,并对其有效性进行了验证。
{"title":"A Spintronics-Based Nonvolatile FPGA and Its Application to Edge-AI Accelerator","authors":"D. Suzuki, T. Hanyu","doi":"10.1109/MCSoC57363.2022.00018","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00018","url":null,"abstract":"A nonvolatile (NV) field-programmable gate array (FPGA) is quite attractive hardware platform for an internet of things (IoT) device in terms of reconfigurability and ultra-low-power standby power consumption. Moreover, the use of NV logic-in-memory (LIM) circuitry makes it possible to im-prove both area efficiency and energy efficiency. In this paper, some related topics about NV-FPGA, NV-LIM circuitry, and its application to the edge-AI accelerator are presented and its effectiveness is demonstrated.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129444760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware characterization of Integer-Net based seizure detection models on FPGA 基于FPGA的Integer-Net癫痫检测模型的硬件特性研究
R. SoujanyaS., M. Rao
Deployment of deep neural network (DNN) infer-ence on platforms like field programmable gate array (FPGA) for acceleration can be challenging because of the limited resource availability and a large number of floating-point matrix operations involved. Therefore, in this work a hardware efficient algorithm called Integer-Net which is based on approximate floating-point operations is studied for edge DNN inference deployment. This algorithm uses integerized floating-point arithmetic with a scalar correction for the matrix operations. Electroencephalo-gram (EEG) signal based automatic high-speed epileptic seizure detection using Integer-Net convolutional neural network (CNN) was hardware-implemented on Zynq-7000 SoC to characterize performance efficiency and hardware resources utilized against the full precision model. Implementation was benefited by the accelerated outputs, leveraged the optimum on-board resources, and at the same time with the help of configurable integer bit-width, facilitated keeping the accuracy close to the original model. This is the first time, Integer-Net based designs and their novel hybrid versions were employed and investigated for the hardware acceleration of a CNN network for seizure detection on FPG A. A latency acceleration of 5.65x with the on-chip memory usage reduction factor of 5.99x was achieved by the optimized hybrid integerized CNN model.
在现场可编程门阵列(FPGA)等加速平台上部署深度神经网络(DNN)推理是具有挑战性的,因为资源可用性有限,并且涉及大量浮点矩阵运算。因此,本文研究了一种基于近似浮点运算的硬件高效算法Integer-Net,用于边缘DNN推理部署。该算法采用整数浮点运算,并对矩阵运算进行标量校正。在Zynq-7000 SoC上硬件实现了基于integel - net卷积神经网络(CNN)的脑电图(EEG)信号的自动高速癫痫发作检测,以表征全精度模型的性能效率和硬件资源利用情况。实现得益于加速输出,利用了最佳的板载资源,同时借助可配置的整数位宽度,有助于保持接近原始模型的精度。这是首次采用基于integernet的设计及其新颖的混合版本来研究用于FPG a上癫痫检测的CNN网络的硬件加速,优化的混合集成CNN模型实现了5.65倍的延迟加速和5.99倍的片上内存使用减少因子。
{"title":"Hardware characterization of Integer-Net based seizure detection models on FPGA","authors":"R. SoujanyaS., M. Rao","doi":"10.1109/MCSoC57363.2022.00043","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00043","url":null,"abstract":"Deployment of deep neural network (DNN) infer-ence on platforms like field programmable gate array (FPGA) for acceleration can be challenging because of the limited resource availability and a large number of floating-point matrix operations involved. Therefore, in this work a hardware efficient algorithm called Integer-Net which is based on approximate floating-point operations is studied for edge DNN inference deployment. This algorithm uses integerized floating-point arithmetic with a scalar correction for the matrix operations. Electroencephalo-gram (EEG) signal based automatic high-speed epileptic seizure detection using Integer-Net convolutional neural network (CNN) was hardware-implemented on Zynq-7000 SoC to characterize performance efficiency and hardware resources utilized against the full precision model. Implementation was benefited by the accelerated outputs, leveraged the optimum on-board resources, and at the same time with the help of configurable integer bit-width, facilitated keeping the accuracy close to the original model. This is the first time, Integer-Net based designs and their novel hybrid versions were employed and investigated for the hardware acceleration of a CNN network for seizure detection on FPG A. A latency acceleration of 5.65x with the on-chip memory usage reduction factor of 5.99x was achieved by the optimized hybrid integerized CNN model.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127988977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Packed SIMD Vectorization of the DRAGON2-CB DRAGON2-CB的压缩SIMD矢量化
Riadh Ben Abdelhamid, Y. Yamaguchi
For over a half-century, computer architects have explored micro-architecture, instruction set architecture, and system architecture to offer a significant performance boost out of a computing chip. In the micro-architecture, multi-processing and multi-threading arose as fusing highly parallel processing and the growth of semiconductor manufacturing technology. It has caused a paradigm shift in computing chips and led to the many-core processor age, such as NVIDIA GPUs, Movidius Myriad, PEZY ZettaScaler, and the project Eyeriss based on a reconfigurable accelerator. Wherein packed SIMD (Single Instruction Multiple Data) vectorizations attract attention, especially from ML (machine learning) applications. It can achieve more energy-efficient computing by reducing computing precision, which is enough for ML applications to obtain the results with low-accuracy calculations. In other words, accuracy-flexible computing needs to allow splitting off one N-bit ALU (Arithmetic Logic Unit) or one N-bit FPU (Floating-Point Unit) into multiple $M$-bit units. For example, a double-precision (64-bit operands width) FPU can be split into two single-precision (32-bit operands width) FPUs, or four half-precision (16-bit operands width) FPUs. Consequently, instead of executing one original operation, a packed SIMD vectorization simultaneously enables executing two or four reduced-precision operations. This article proposes a packed SIMD vectorization approach, which considers the Dynamically Reprogrammable Architecture of Gather-scatter Overlay Nodes-Compact Buffering (DRAGON2-CB) many-core overlay architecture. In particular, this article presents a thorough comparative study between packed SIMD using dual single-precision and quad half-precision FPU-only many-core overlays compared to the non-vectorized double-precision version.
半个多世纪以来,计算机架构师一直在探索微体系结构、指令集体系结构和系统体系结构,以便从计算芯片中获得显著的性能提升。在微体系结构中,随着高度并行处理和半导体制造技术的发展,多处理和多线程技术应运而生。它引起了计算芯片的范式转变,并导致了多核处理器时代,如NVIDIA gpu、Movidius Myriad、PEZY ZettaScaler和基于可重构加速器的Eyeriss项目。其中包装SIMD(单指令多数据)矢量化引起了人们的注意,尤其是机器学习应用。它可以通过降低计算精度来实现更节能的计算,这足以让ML应用获得低精度计算的结果。换句话说,精确灵活的计算需要允许将一个n位ALU(算术逻辑单元)或一个n位FPU(浮点单元)拆分为多个$M$位单元。例如,双精度(64位操作数宽度)fppu可以拆分为2个单精度(32位操作数宽度)fppu,或4个半精度(16位操作数宽度)fppu。因此,与执行一个原始操作不同,压缩SIMD矢量化可以同时执行两个或四个降低精度的操作。本文提出了一种压缩SIMD矢量化方法,该方法考虑了聚散叠加节点压缩缓冲(DRAGON2-CB)多核叠加体系结构的动态可编程体系结构。特别是,本文对使用双单精度和四半精度gpu的多核覆盖的封装SIMD与非矢量化双精度版本进行了全面的比较研究。
{"title":"Packed SIMD Vectorization of the DRAGON2-CB","authors":"Riadh Ben Abdelhamid, Y. Yamaguchi","doi":"10.1109/MCSoC57363.2022.00023","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00023","url":null,"abstract":"For over a half-century, computer architects have explored micro-architecture, instruction set architecture, and system architecture to offer a significant performance boost out of a computing chip. In the micro-architecture, multi-processing and multi-threading arose as fusing highly parallel processing and the growth of semiconductor manufacturing technology. It has caused a paradigm shift in computing chips and led to the many-core processor age, such as NVIDIA GPUs, Movidius Myriad, PEZY ZettaScaler, and the project Eyeriss based on a reconfigurable accelerator. Wherein packed SIMD (Single Instruction Multiple Data) vectorizations attract attention, especially from ML (machine learning) applications. It can achieve more energy-efficient computing by reducing computing precision, which is enough for ML applications to obtain the results with low-accuracy calculations. In other words, accuracy-flexible computing needs to allow splitting off one N-bit ALU (Arithmetic Logic Unit) or one N-bit FPU (Floating-Point Unit) into multiple $M$-bit units. For example, a double-precision (64-bit operands width) FPU can be split into two single-precision (32-bit operands width) FPUs, or four half-precision (16-bit operands width) FPUs. Consequently, instead of executing one original operation, a packed SIMD vectorization simultaneously enables executing two or four reduced-precision operations. This article proposes a packed SIMD vectorization approach, which considers the Dynamically Reprogrammable Architecture of Gather-scatter Overlay Nodes-Compact Buffering (DRAGON2-CB) many-core overlay architecture. In particular, this article presents a thorough comparative study between packed SIMD using dual single-precision and quad half-precision FPU-only many-core overlays compared to the non-vectorized double-precision version.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130181582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Run-time Tapered Floating-Point Adder/Subtractor Supporting Vectorization 支持向量化的运行时锥形浮点加/减法器
Ashish Reddy Bommana, Srinivas Boppu
In this era of widespread embedded computing, energy efficiency has become the new performance criterion; as a result, accelerator-rich multi-processor system-on-chips are widely utilized in embedded computing hardware. Due to abun-dant and inexpensive computational capacity, computationally intensive machine learning applications have gained a lot of traction and are currently being used in a wide range of application domains. Furthermore, there is an increasing trend toward developing hardware accelerators for machine learning applications for embedded edge devices where performance and energy efficiency are critical. Although floating-point operations are frequently used for accuracy in these hardware accelerators, reduced width floating point formats are also used to reduce hardware complexity and thus power consumption while pre-serving accuracy. Mixed-precision DNN, vectorization techniques, and any-precision DNN concepts have also proven to boost performance, energy efficiency, and memory bandwidth. In this paper, we propose the design of a vectorized floating-point adder/subtractor that can handle arbitrary length floating-point formats with varying exponent and mantissa widths. The whole idea of this paper is to bring flexibility to each layer in a DNN model for arithmetic operations; depending on the requirement of computation of each layer, exponent width and the floating-point format are chosen dynamically. In comparison to existing designs in the literature, the proposed design is $1.69times$ area and $1.61times$ power-efficient, and it supports true vectorization with no restrictions on exponent and mantissa widths.
在这个嵌入式计算普及的时代,能效已成为新的性能标准;因此,富含加速器的多处理器片上系统在嵌入式计算硬件中得到了广泛应用。由于大量且廉价的计算能力,计算密集型机器学习应用已经获得了很大的牵引力,目前正在广泛的应用领域中使用。此外,为嵌入式边缘设备的机器学习应用开发硬件加速器的趋势越来越大,其中性能和能源效率至关重要。尽管在这些硬件加速器中经常使用浮点运算来提高精度,但减小宽度的浮点格式也用于降低硬件复杂性,从而在保持精度的同时降低功耗。混合精度深度神经网络、向量化技术和任意精度深度神经网络概念也被证明可以提高性能、能源效率和内存带宽。在本文中,我们提出了一个矢量化浮点加减法器的设计,它可以处理具有不同指数和尾数宽度的任意长度浮点格式。本文的整体思想是为深度神经网络模型的每一层的算术运算带来灵活性;根据每一层的计算需求,动态选择指数宽度和浮点数格式。与文献中的现有设计相比,所提出的设计面积为1.69倍,功耗为1.61倍,并且支持真正的矢量化,不受指数和尾数宽度的限制。
{"title":"A Run-time Tapered Floating-Point Adder/Subtractor Supporting Vectorization","authors":"Ashish Reddy Bommana, Srinivas Boppu","doi":"10.1109/MCSoC57363.2022.00056","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00056","url":null,"abstract":"In this era of widespread embedded computing, energy efficiency has become the new performance criterion; as a result, accelerator-rich multi-processor system-on-chips are widely utilized in embedded computing hardware. Due to abun-dant and inexpensive computational capacity, computationally intensive machine learning applications have gained a lot of traction and are currently being used in a wide range of application domains. Furthermore, there is an increasing trend toward developing hardware accelerators for machine learning applications for embedded edge devices where performance and energy efficiency are critical. Although floating-point operations are frequently used for accuracy in these hardware accelerators, reduced width floating point formats are also used to reduce hardware complexity and thus power consumption while pre-serving accuracy. Mixed-precision DNN, vectorization techniques, and any-precision DNN concepts have also proven to boost performance, energy efficiency, and memory bandwidth. In this paper, we propose the design of a vectorized floating-point adder/subtractor that can handle arbitrary length floating-point formats with varying exponent and mantissa widths. The whole idea of this paper is to bring flexibility to each layer in a DNN model for arithmetic operations; depending on the requirement of computation of each layer, exponent width and the floating-point format are chosen dynamically. In comparison to existing designs in the literature, the proposed design is $1.69times$ area and $1.61times$ power-efficient, and it supports true vectorization with no restrictions on exponent and mantissa widths.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124824833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neuronal population biomarkers of temporal difference learning in human impulsive choices 人类冲动选择中时间差异学习的神经元群体生物标志物
R. Cowan, T. Davis, B. Kundu, J. Rolston, Elliot H. Smith
Impulsive choice is a facet of impulsivity that may lead to one to choose smaller, more immediate rewards over larger, delayed rewards. Impulsive choice is a kind of maladaptive decision making that is a fundamental element of relapse in substance use disorder. Despite its essential role in relapse, there is currently little understanding of the neural basis of impulsive choices. Better understanding of the neural correlates of impulsivity could lead to improved diagnosis and treatment of psychiatric disorders in which impulsive choice plays a role. In this work, we examined impulsive choice behavior in humans undergoing intracranial seizure monitoring by fitting temporal difference learning models to behavior and broadband high frequency (70–150 Hz) local field potentials. We found neural and behavioral differences between more and less impulsive choosers, informing the neural underpinnings of impulsive choices and describing a biomarker for reward expectation and surprise in the human brain.
冲动选择是冲动的一个方面,它可能导致人们选择较小的、更直接的奖励,而不是较大的、延迟的奖励。冲动性选择是一种不适应决策,是物质使用障碍复发的根本因素。尽管它在复发中起着至关重要的作用,但目前对冲动选择的神经基础知之甚少。更好地理解冲动的神经关联可以改善对冲动选择起作用的精神疾病的诊断和治疗。在这项工作中,我们通过将时间差异学习模型拟合到行为和宽带高频(70-150 Hz)局部场电位中,研究了接受颅内癫痫监测的人的冲动选择行为。我们发现了冲动选择者和较少冲动选择者之间的神经和行为差异,为冲动选择的神经基础提供了信息,并描述了人类大脑中奖励预期和惊喜的生物标记。
{"title":"Neuronal population biomarkers of temporal difference learning in human impulsive choices","authors":"R. Cowan, T. Davis, B. Kundu, J. Rolston, Elliot H. Smith","doi":"10.1109/MCSoC57363.2022.00054","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00054","url":null,"abstract":"Impulsive choice is a facet of impulsivity that may lead to one to choose smaller, more immediate rewards over larger, delayed rewards. Impulsive choice is a kind of maladaptive decision making that is a fundamental element of relapse in substance use disorder. Despite its essential role in relapse, there is currently little understanding of the neural basis of impulsive choices. Better understanding of the neural correlates of impulsivity could lead to improved diagnosis and treatment of psychiatric disorders in which impulsive choice plays a role. In this work, we examined impulsive choice behavior in humans undergoing intracranial seizure monitoring by fitting temporal difference learning models to behavior and broadband high frequency (70–150 Hz) local field potentials. We found neural and behavioral differences between more and less impulsive choosers, informing the neural underpinnings of impulsive choices and describing a biomarker for reward expectation and surprise in the human brain.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115624371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Analysis of A Dual-Band Bistatic Backscatter Circuit for Passive RFID Tags 无源RFID标签双频双稳态后向散射电路的设计与分析
N. A. Quadir, M. Hamdi, M. A. Awan, Bo Wang, A. Bermak
Passive radio-frequency identification (RFID) tags, when placed remotely or in harsh environments, will benefit the most if the communication distance between the tag and reader is vastly improved. The bistatic backscattering technique provides a solution to this problem by separating the carrier and backscattered signal in frequency, which helps mitigate interference. It also decouples the reader from carrier generation by having a separate radio-frequency (RF) emitter and further improves the signal strength by reducing round trip path loss. A dual-band on-chip bistatic backscattering circuit design for passive RFID tags is presented in this paper using a 180 nm CMOS process dissipating 35 $mu mathrm{W}$ of power. Post layout simulation results provide a communicable distance of 170 m between the tag and reader at 868 MHz and 60 m at 2.4 GHz when the tag is kept 5 m away from the RF emitter.
无源射频识别(RFID)标签,当放置在远程或恶劣的环境中,如果标签和阅读器之间的通信距离大大提高,将受益最大。双基地后向散射技术通过在频率上分离载波和后向散射信号来解决这一问题,有助于减轻干扰。它还通过具有单独的射频(RF)发射器将读取器与载波产生分离,并通过减少往返路径损耗进一步提高信号强度。本文提出了一种用于无源RFID标签的双频片上双稳态后向散射电路设计,该电路采用180nm CMOS工艺,功耗为35 $mu mathrm{W}$。后布局仿真结果表明,当标签与射频发射器保持5 m距离时,标签与读写器在868 MHz时的通信距离为170 m,在2.4 GHz时为60 m。
{"title":"Design and Analysis of A Dual-Band Bistatic Backscatter Circuit for Passive RFID Tags","authors":"N. A. Quadir, M. Hamdi, M. A. Awan, Bo Wang, A. Bermak","doi":"10.1109/MCSoC57363.2022.00055","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00055","url":null,"abstract":"Passive radio-frequency identification (RFID) tags, when placed remotely or in harsh environments, will benefit the most if the communication distance between the tag and reader is vastly improved. The bistatic backscattering technique provides a solution to this problem by separating the carrier and backscattered signal in frequency, which helps mitigate interference. It also decouples the reader from carrier generation by having a separate radio-frequency (RF) emitter and further improves the signal strength by reducing round trip path loss. A dual-band on-chip bistatic backscattering circuit design for passive RFID tags is presented in this paper using a 180 nm CMOS process dissipating 35 $mu mathrm{W}$ of power. Post layout simulation results provide a communicable distance of 170 m between the tag and reader at 868 MHz and 60 m at 2.4 GHz when the tag is kept 5 m away from the RF emitter.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132654760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Autotuning Power Consumption and Computation Accuracy using ppOpen-AT 利用ppOpen-AT自动调整功耗和计算精度
Shouhei Yamanashi, H. Yashiro, T. Katagiri, Toru Nagai, S. Ohshima
Mixed-precision computation mainly focuses on shortening the execution time, at the expense of accuracy. To achieve speedups for numerical calculation using mixed-precision computation, it is necessary to tune software performance with respect to not only execution speed but also computation accuracy and power consumption. This increases the overall cost of tuning. Autotuning (AT) is one of the candidates among several technologies available for reducing the cost associated with tuning the software performance. In this study, we propose a method for AT to obtain speedups with respect to computation accuracy and power consumption. The proposed AT method uses an AT language that changes computation accuracy of the original code to mixed-precision by combining double and single precisions. Performance evaluation was carried out by using the Fujitsu PRIMEHPC FX1000, which is a “Fugaku” type supercomputer installed at the Information Technology Center, Nagoya University. The proposed method achieved a 1.5x reduction in execution time and energy consumption while retaining reasonable accuracy degradation from the original code of a global cloud resolving model.
混合精度计算主要侧重于缩短执行时间,以牺牲精度为代价。为了使用混合精度计算实现数值计算的加速,需要对软件性能进行调整,不仅要考虑执行速度,还要考虑计算精度和功耗。这增加了调优的总成本。自动调优(AT)是几种可用于降低与调优软件性能相关的成本的备选技术之一。在本研究中,我们提出了一种AT在计算精度和功耗方面获得加速的方法。本文提出的AT方法采用一种AT语言,通过双精度和单精度的结合,将原代码的计算精度改为混合精度。使用安装在名古屋大学信息技术中心的“Fugaku”型超级计算机富士通PRIMEHPC FX1000进行了性能评估。本文提出的方法将执行时间和能量消耗减少了1.5倍,同时保持了相对于全球云解析模型原始代码的合理精度下降。
{"title":"Autotuning Power Consumption and Computation Accuracy using ppOpen-AT","authors":"Shouhei Yamanashi, H. Yashiro, T. Katagiri, Toru Nagai, S. Ohshima","doi":"10.1109/MCSoC57363.2022.00041","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00041","url":null,"abstract":"Mixed-precision computation mainly focuses on shortening the execution time, at the expense of accuracy. To achieve speedups for numerical calculation using mixed-precision computation, it is necessary to tune software performance with respect to not only execution speed but also computation accuracy and power consumption. This increases the overall cost of tuning. Autotuning (AT) is one of the candidates among several technologies available for reducing the cost associated with tuning the software performance. In this study, we propose a method for AT to obtain speedups with respect to computation accuracy and power consumption. The proposed AT method uses an AT language that changes computation accuracy of the original code to mixed-precision by combining double and single precisions. Performance evaluation was carried out by using the Fujitsu PRIMEHPC FX1000, which is a “Fugaku” type supercomputer installed at the Information Technology Center, Nagoya University. The proposed method achieved a 1.5x reduction in execution time and energy consumption while retaining reasonable accuracy degradation from the original code of a global cloud resolving model.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133743264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1