首页 > 最新文献

IEEE Transactions on Multi-Scale Computing Systems最新文献

英文 中文
Body Bias Control for Renewable Energy Source with a High Inner Resistance 高内阻可再生能源的体偏控制
Pub Date : 2018-04-17 DOI: 10.1109/TMSCS.2018.2827980
Keita Azegami;Hayate Okuhara;Hideharu Amano
Sensor nodes used in Internet of Things (IoT) are required to work an extremely long time without replacing the battery. Natural renewable energy such as a solar battery is a hopeful candidate for such nodes. Here, a power model for operating an Silicon on Insulator (SOI) device with a solar battery including a large inner resistance is proposed, and applied to a micro-controller V850E-star and an accelerator CMA-SOTB2. Unlike the ideal case, the maximum operational frequency was achieved with reverse biasing by suppressing the leakage current which decreases the supply voltage. Under the room light with a large inner resistance, the strong reverse bias is effective, while a relatively weak reverse bias is advantageous under the bright light. The proposed model is appeared to be useful to estimate the appropriate body bias voltage both for V850E-star and CMA-SOTB2. In the V850E-star, the estimated operational frequencies were different from the real chip, while they were relatively matched when CMA-SOTB2 was used under the low illuminance.
物联网(IoT)中使用的传感器节点需要在不更换电池的情况下工作很长时间。太阳能电池等天然可再生能源有望成为此类节点的候选能源。在此,提出了一种用于操作具有包括大内阻的太阳能电池的绝缘体上硅(SOI)器件的功率模型,并将其应用于微控制器V850E星和加速器CMA-SOTB2。与理想情况不同,通过抑制降低电源电压的泄漏电流,通过反向偏置实现了最大工作频率。在内阻较大的室内光线下,强反向偏压是有效的,而在明亮的光线下,相对较弱的反向偏压是有利的。所提出的模型似乎有助于估计V850E恒星和CMA-SOTB2的适当体偏置电压。在V850E恒星中,估计的工作频率与实际芯片不同,而在低照度下使用CMA-SOTB2时,它们相对匹配。
{"title":"Body Bias Control for Renewable Energy Source with a High Inner Resistance","authors":"Keita Azegami;Hayate Okuhara;Hideharu Amano","doi":"10.1109/TMSCS.2018.2827980","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2827980","url":null,"abstract":"Sensor nodes used in Internet of Things (IoT) are required to work an extremely long time without replacing the battery. Natural renewable energy such as a solar battery is a hopeful candidate for such nodes. Here, a power model for operating an Silicon on Insulator (SOI) device with a solar battery including a large inner resistance is proposed, and applied to a micro-controller V850E-star and an accelerator CMA-SOTB2. Unlike the ideal case, the maximum operational frequency was achieved with reverse biasing by suppressing the leakage current which decreases the supply voltage. Under the room light with a large inner resistance, the strong reverse bias is effective, while a relatively weak reverse bias is advantageous under the bright light. The proposed model is appeared to be useful to estimate the appropriate body bias voltage both for V850E-star and CMA-SOTB2. In the V850E-star, the estimated operational frequencies were different from the real chip, while they were relatively matched when CMA-SOTB2 was used under the low illuminance.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"605-612"},"PeriodicalIF":0.0,"publicationDate":"2018-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2827980","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68024188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Adjacent-Line-Merging Writeback Scheme for STT-RAM-Based Last-Level Caches 一种基于STT RAM的末级缓存的邻行合并写回方案
Pub Date : 2018-04-17 DOI: 10.1109/TMSCS.2018.2827955
Masayuki Sato;Yoshiki Shoji;Zentaro Sakai;Ryusuke Egawa;Hiroaki Kobayashi
Spin-Transfer Torque RAM (STT-RAM) has attracted attention as a key element for the Last-Level Cache (LLC) of a future microprocessor. Since STT-RAM has a higher density than SRAM and non-volatility, STT-RAM can contribute to building the cache memory with a larger capacity and a less static energy. However, since STT-RAM changes its magnetization state in the case when storing data, the energy cost of write access requests for an STT-RAM LLC is more expensive than that of an SRAM LLC. As a result, the total energy consumption of the STT-RAM LLC for write-intensive applications may increase. To solve this problem, this paper proposes an Adjacent-Line-Merging Writeback Scheme. Since a larger cache line of an STT-RAM cache can contribute to the reduction in the write energy cost per byte, the upper-level cache merges two adjacent small lines to one large line, and then writes the merged line back to the STT-RAM LLC. Moreover, the larger line size for the LLC leads to a reduction in the static energy cost. The evaluation results show that the proposed scheme can reduce the energy consumption of the STT-RAM LLC by up to 26, and 9.3 percent on average.
自旋转移力矩RAM(STT-RAM)作为未来微处理器的末级高速缓存(LLC)的关键元件,引起了人们的关注。由于STT-RAM具有比SRAM更高的密度和非易失性,STT-RAM可以有助于构建具有更大容量和更少静态能量的高速缓冲存储器。然而,由于STT-RAM在存储数据的情况下改变其磁化状态,所以STT-RAM LLC的写访问请求的能量成本比SRAM LLC的能量成本更昂贵。因此,用于写密集型应用的STT-RAM有限责任公司的总能量消耗可能增加。为了解决这个问题,本文提出了一种邻线合并写回方案。由于STT-RAM高速缓存的较大高速缓存行可以有助于降低每字节的写入能量成本,因此上级高速缓存将两条相邻的小行合并为一条大行,然后将合并后的行写回STT-RAM LLC。此外,LLC的较大行大小导致静态能量成本的降低。评估结果表明,该方案可将STT-RAM LLC的能耗降低26%,平均降低9.3%。
{"title":"An Adjacent-Line-Merging Writeback Scheme for STT-RAM-Based Last-Level Caches","authors":"Masayuki Sato;Yoshiki Shoji;Zentaro Sakai;Ryusuke Egawa;Hiroaki Kobayashi","doi":"10.1109/TMSCS.2018.2827955","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2827955","url":null,"abstract":"Spin-Transfer Torque RAM (STT-RAM) has attracted attention as a key element for the Last-Level Cache (LLC) of a future microprocessor. Since STT-RAM has a higher density than SRAM and non-volatility, STT-RAM can contribute to building the cache memory with a larger capacity and a less static energy. However, since STT-RAM changes its magnetization state in the case when storing data, the energy cost of write access requests for an STT-RAM LLC is more expensive than that of an SRAM LLC. As a result, the total energy consumption of the STT-RAM LLC for write-intensive applications may increase. To solve this problem, this paper proposes an Adjacent-Line-Merging Writeback Scheme. Since a larger cache line of an STT-RAM cache can contribute to the reduction in the write energy cost per byte, the upper-level cache merges two adjacent small lines to one large line, and then writes the merged line back to the STT-RAM LLC. Moreover, the larger line size for the LLC leads to a reduction in the static energy cost. The evaluation results show that the proposed scheme can reduce the energy consumption of the STT-RAM LLC by up to 26, and 9.3 percent on average.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"593-604"},"PeriodicalIF":0.0,"publicationDate":"2018-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2827955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68025496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A 120 fps High Frame Rate Real-time HEVC Video Encoder with Parallel Configuration Scalable to 4K 可扩展到4K的并行配置的120fps高帧率实时HEVC视频编码器
Pub Date : 2018-04-10 DOI: 10.1109/TMSCS.2018.2825320
Yuya Omori;Takayuki Onishi;Hiroe Iwasaki;Atsushi Shimizu
This paper describes a new 120 fps (frames per second) real-time HEVC (High Efficiency Video Coding) encoder for HFR (high frame rate) video encoding and transmission. HFR provides more immersive viewing experience features by solving the problems created by fast moving scenes. Temporally scalable encoding with backward compatibility for legacy non-HFR systems is suitable for the rapid spread of HFR content delivery, avoiding the need to distribute multiple bitstreams of the same video with different frame rates. Such temporal scalability requires flexible encoder control functionalities to support newly-customized reference picture structures and dual-stream bitrate control. In this paper, modification in the customizable software architecture of encoder LSIs makes it possible to achieve 120 fps temporally scalable HEVC encoding for existing 60 fps-based systems. The encoder also achieves ${4mathrm{K}/ 120;mathrm{fps}}$ video encoding in real time through the synchronized operation of multiple ${2mathrm{K}/ 120;mathrm{fps}}$ encoders working in parallel. Our evaluations show that the bitrate increase rate from 60 fps to 120 fps under the same objective image quality condition are within the range of less than 57.2 percent in all video sequences and its average value is 53.8 percent. Both values are lower than that of the HM (HEVC reference software). The proposed encoder systems will open the door to the next generation high frame rate UHDTV (ultra high definition television) services.
本文介绍了一种新的用于HFR(高帧率)视频编码和传输的120fps(帧/秒)实时HEVC(高效视频编码)编码器。HFR通过解决快速移动场景带来的问题,提供了更身临其境的观看体验功能。对传统非HFR系统具有向后兼容性的临时可扩展编码适用于HFR内容递送的快速扩展,避免了以不同帧速率分发同一视频的多个比特流的需要。这种时间可伸缩性需要灵活的编码器控制功能来支持新定制的参考图片结构和双流比特率控制。在本文中,对编码器LSI的可定制软件架构进行修改,使得在现有的基于60fps的系统中实现120fps的时间可扩展HEVC编码成为可能。该编码器还实现了${4mathrm{K}/120;mathrm{fps}}$通过多个${2mathrm{K}/120的同步操作进行实时视频编码;mathrm{fps}}$编码器并行工作。我们的评估表明,在相同的客观图像质量条件下,在所有视频序列中,从60fps到120fps的比特率增长率在小于57.2%的范围内,其平均值为53.8%。这两个值都低于HM(HEVC参考软件)的值。拟议的编码器系统将为下一代高帧率UHDTV(超高清电视)服务打开大门。
{"title":"A 120 fps High Frame Rate Real-time HEVC Video Encoder with Parallel Configuration Scalable to 4K","authors":"Yuya Omori;Takayuki Onishi;Hiroe Iwasaki;Atsushi Shimizu","doi":"10.1109/TMSCS.2018.2825320","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2825320","url":null,"abstract":"This paper describes a new 120 fps (frames per second) real-time HEVC (High Efficiency Video Coding) encoder for HFR (high frame rate) video encoding and transmission. HFR provides more immersive viewing experience features by solving the problems created by fast moving scenes. Temporally scalable encoding with backward compatibility for legacy non-HFR systems is suitable for the rapid spread of HFR content delivery, avoiding the need to distribute multiple bitstreams of the same video with different frame rates. Such temporal scalability requires flexible encoder control functionalities to support newly-customized reference picture structures and dual-stream bitrate control. In this paper, modification in the customizable software architecture of encoder LSIs makes it possible to achieve 120 fps temporally scalable HEVC encoding for existing 60 fps-based systems. The encoder also achieves \u0000<inline-formula><tex-math>${4mathrm{K}/ 120;mathrm{fps}}$</tex-math></inline-formula>\u0000 video encoding in real time through the synchronized operation of multiple \u0000<inline-formula><tex-math>${2mathrm{K}/ 120;mathrm{fps}}$</tex-math></inline-formula>\u0000 encoders working in parallel. Our evaluations show that the bitrate increase rate from 60 fps to 120 fps under the same objective image quality condition are within the range of less than 57.2 percent in all video sequences and its average value is 53.8 percent. Both values are lower than that of the HM (HEVC reference software). The proposed encoder systems will open the door to the next generation high frame rate UHDTV (ultra high definition television) services.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"491-499"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2825320","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Module Placement under Completion-Time Uncertainty in Micro-Electrode-Dot-Array Digital Microfluidic Biochips 微电极点阵列数字微流控芯片在完成时间不确定条件下的模块放置
Pub Date : 2018-04-04 DOI: 10.1109/TMSCS.2018.2822799
Wen-Chun Chung;Pei-Yi Cheng;Zipeng Li;Tsung-Yi Ho
Digital microfluidic biochips (DMFBs) are an emerging technology that are replacing traditional laboratory procedures. With the integrated functions which are necessary for biochemical experiments, DMFBs are able to achieve automatic experiments. Recently, DMFBs based on a new architecture called micro-electrode-dot-array (MEDA) have been demonstrated. Compared with conventional DMFBs which sensors are specifically located, each microelectrode is integrated with a sensor on MEDA-based biochips. Benefiting from the advantage of MEDA-based biochips, real-time reaction-outcome detection is attainable. However, to the best of our knowledge, synthesis algorithms proposed in the literature for MEDA-based biochips do not fully utilize the real-time detection since completion-time uncertainties have not yet been considered. During the execution of a biochemical experiment, operations may finish earlier or delay due to variability and randomness in biochemical reactions. Such uncertainties also have effects when allocating modules for each fluidic operation and placing them on a biochip since a biochip with a fixed size area restricts the number and the size of these modules. Thus, in this paper, we proposed the first operation-variation-aware placement algorithm that fully utilizes the real-time detection since completion-time uncertainties have been considered. Simulation results demonstrate that with the proposed approach, it leads to reduced time-to-result and minimizes the chip size while not exceeding completion time compared to the benchmarks.
数字微流控生物芯片(DMFBs)是一种正在取代传统实验室程序的新兴技术。DMFB具有生化实验所需的集成功能,能够实现自动化实验。最近,基于一种称为微电极点阵列(MEDA)的新架构的DMFB已经得到了证明。与传感器具体定位的传统DMFB相比,每个微电极都与基于MEDA的生物芯片上的传感器集成。得益于基于MEDA的生物芯片的优势,可以实现实时反应结果检测。然而,据我们所知,文献中提出的基于MEDA的生物芯片的合成算法并没有充分利用实时检测,因为尚未考虑完成时间的不确定性。在生物化学实验的执行过程中,由于生物化学反应的可变性和随机性,操作可能会提前或延迟完成。当为每个流体操作分配模块并将其放置在生物芯片上时,这种不确定性也会产生影响,因为具有固定尺寸区域的生物芯片限制了这些模块的数量和尺寸。因此,在本文中,我们提出了第一种操作变化感知布局算法,该算法充分利用了实时检测,因为考虑了完成时间的不确定性。仿真结果表明,与基准测试相比,使用所提出的方法可以缩短结果时间,并最大限度地减小芯片尺寸,同时不超过完成时间。
{"title":"Module Placement under Completion-Time Uncertainty in Micro-Electrode-Dot-Array Digital Microfluidic Biochips","authors":"Wen-Chun Chung;Pei-Yi Cheng;Zipeng Li;Tsung-Yi Ho","doi":"10.1109/TMSCS.2018.2822799","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2822799","url":null,"abstract":"Digital microfluidic biochips (DMFBs) are an emerging technology that are replacing traditional laboratory procedures. With the integrated functions which are necessary for biochemical experiments, DMFBs are able to achieve automatic experiments. Recently, DMFBs based on a new architecture called micro-electrode-dot-array (MEDA) have been demonstrated. Compared with conventional DMFBs which sensors are specifically located, each microelectrode is integrated with a sensor on MEDA-based biochips. Benefiting from the advantage of MEDA-based biochips, real-time reaction-outcome detection is attainable. However, to the best of our knowledge, synthesis algorithms proposed in the literature for MEDA-based biochips do not fully utilize the real-time detection since completion-time uncertainties have not yet been considered. During the execution of a biochemical experiment, operations may finish earlier or delay due to variability and randomness in biochemical reactions. Such uncertainties also have effects when allocating modules for each fluidic operation and placing them on a biochip since a biochip with a fixed size area restricts the number and the size of these modules. Thus, in this paper, we proposed the first operation-variation-aware placement algorithm that fully utilizes the real-time detection since completion-time uncertainties have been considered. Simulation results demonstrate that with the proposed approach, it leads to reduced time-to-result and minimizes the chip size while not exceeding completion time compared to the benchmarks.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"811-821"},"PeriodicalIF":0.0,"publicationDate":"2018-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2822799","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Hierarchical Inference Model for Internet-of-Things 物联网的层次推理模型
Pub Date : 2018-03-30 DOI: 10.1109/TMSCS.2018.2821154
Hongxu Yin;Zeyu Wang;Niraj K. Jha
Internet-of-Things (IoT) has connected billions of devices to the Internet. These devices are already collecting zettabytes ($10^{21}$) of data. However, the current IoT framework suffers from limited sensor energy, communication bandwidth, and server storage. These limitations impede the ability to send all the sensor data to the server all the time. Compact smart sensors provide a way to address this challenge. As opposed to the conventional sense-and-transmit sensors, emerging smart sensors can collect data, extract features, derive local inferences, and transmit only inference outcomes and possibly some raw data associated with rare events instead of all the raw data. This can dramatically cut down on the amount of sensor data transmitted, and hence its communication energy and network traffic. However, edge or server inference models trained with conventional machine learning approaches do not account for the fact that the smart sensors in the system have already performed a local inference. These approaches need all the sensor data and hence only cater to the traditional sense-and-transmit paradigm. This undoes the energy benefits brought about by smart sensors. In this paper, we propose a hierarchical inference model for IoT applications based on hierarchical learning and local inferences. Our model is able to take advantage of inference already performed on smart sensors, while at the same time accommodating conventional sense-and-transmit sensors in the IoT system. It also generalizes sensor-level inference to inference at other edge nodes by exploiting the intrinsically sensor/edge-grouped IoT data structure. We train classifiers hierarchically, aligned with the sensor-edge-server IoT paradigm. We verify our approach with seven IoT applications, demonstrating that the model is accurate, efficient, and generally applicable. We derive four edge-level inference models and four server-level inference models for these applications. For the four edge-level inference models, we reduce the number of bits transmitted from the sensor by $3.2times$ - $42.7times$ while at the same time also improving the classification accuracy by 0.3-6.7 percent. For the four server-level inference models, we reduce the number of edge-to-server bits transmitted by $17times$ - $60times$, with classification accuracy change in the $-0.4$- $+0.1$ percent range.
物联网已经将数十亿台设备连接到互联网上。这些设备已经在收集ζ字节($10^{21}$)的数据。然而,当前的物联网框架存在传感器能量、通信带宽和服务器存储有限的问题。这些限制阻碍了一直向服务器发送所有传感器数据的能力。紧凑型智能传感器提供了一种解决这一挑战的方法。与传统的感知和传输传感器不同,新兴的智能传感器可以收集数据、提取特征、导出局部推断,并且只传输推断结果,可能还传输一些与罕见事件相关的原始数据,而不是所有的原始数据。这可以显著减少传输的传感器数据量,从而减少其通信能量和网络流量。然而,用传统机器学习方法训练的边缘或服务器推理模型没有考虑到系统中的智能传感器已经执行了局部推理的事实。这些方法需要所有的传感器数据,因此只满足传统的感知和传输模式。这抵消了智能传感器带来的能源效益。在本文中,我们提出了一个基于分层学习和局部推理的物联网应用分层推理模型。我们的模型能够利用已经在智能传感器上进行的推理,同时在物联网系统中容纳传统的传感和传输传感器。它还通过利用本质上传感器/边缘分组的物联网数据结构,将传感器级推理推广到其他边缘节点的推理。我们按照传感器-边缘服务器-物联网模式,分层训练分类器。我们用七个物联网应用程序验证了我们的方法,证明该模型准确、高效且普遍适用。我们为这些应用程序推导了四个边缘级推理模型和四个服务器级推理模型。对于四个边缘级推理模型,我们将传感器传输的比特数量减少了$3.2times$-42.7times$,同时还将分类精度提高了0.3-6.7%。对于四个服务器级推理模型,我们将传输的边缘到服务器的比特数量减少了$17times$-$60times$,分类精度的变化范围为$-0.4$-$+0.1$percent。
{"title":"A Hierarchical Inference Model for Internet-of-Things","authors":"Hongxu Yin;Zeyu Wang;Niraj K. Jha","doi":"10.1109/TMSCS.2018.2821154","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2821154","url":null,"abstract":"Internet-of-Things (IoT) has connected billions of devices to the Internet. These devices are already collecting zettabytes (\u0000<inline-formula><tex-math>$10^{21}$</tex-math></inline-formula>\u0000) of data. However, the current IoT framework suffers from limited sensor energy, communication bandwidth, and server storage. These limitations impede the ability to send all the sensor data to the server all the time. Compact smart sensors provide a way to address this challenge. As opposed to the conventional sense-and-transmit sensors, emerging smart sensors can collect data, extract features, derive local inferences, and transmit only inference outcomes and possibly some raw data associated with rare events instead of all the raw data. This can dramatically cut down on the amount of sensor data transmitted, and hence its communication energy and network traffic. However, edge or server inference models trained with conventional machine learning approaches do not account for the fact that the smart sensors in the system have already performed a local inference. These approaches need all the sensor data and hence only cater to the traditional sense-and-transmit paradigm. This undoes the energy benefits brought about by smart sensors. In this paper, we propose a hierarchical inference model for IoT applications based on hierarchical learning and local inferences. Our model is able to take advantage of inference already performed on smart sensors, while at the same time accommodating conventional sense-and-transmit sensors in the IoT system. It also generalizes sensor-level inference to inference at other edge nodes by exploiting the intrinsically sensor/edge-grouped IoT data structure. We train classifiers hierarchically, aligned with the sensor-edge-server IoT paradigm. We verify our approach with seven IoT applications, demonstrating that the model is accurate, efficient, and generally applicable. We derive four edge-level inference models and four server-level inference models for these applications. For the four edge-level inference models, we reduce the number of bits transmitted from the sensor by \u0000<inline-formula><tex-math>$3.2times$ </tex-math></inline-formula>\u0000- \u0000<inline-formula><tex-math>$42.7times$</tex-math></inline-formula>\u0000 while at the same time also improving the classification accuracy by 0.3-6.7 percent. For the four server-level inference models, we reduce the number of edge-to-server bits transmitted by \u0000<inline-formula><tex-math>$17times$</tex-math> </inline-formula>\u0000-\u0000<inline-formula> <tex-math>$60times$</tex-math></inline-formula>\u0000, with classification accuracy change in the \u0000<inline-formula><tex-math> $-0.4$</tex-math></inline-formula>\u0000- \u0000<inline-formula><tex-math>$+0.1$</tex-math></inline-formula>\u0000 percent range.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"260-271"},"PeriodicalIF":0.0,"publicationDate":"2018-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2821154","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Design, Evaluation and Application of Approximate High-Radix Dividers 近似高基除法器的设计、评价及应用
Pub Date : 2018-03-22 DOI: 10.1109/TMSCS.2018.2817608
Linbin Chen;Jie Han;Weiqiang Liu;Paolo Montuschi;Fabrizio Lombardi
Approximate high radix dividers (HR-AXDs) are proposed and investigated in this paper. High-radix division is reviewed and inexact computing is introduced at different levels. Design parameters such as number of bits (N) and radix (r) are considered in the analysis; the replacement of exact cells with inexact cells in a binary signed-digit adder is introduced by utilizing different replacement schemes. Cell truncation and error compensation are also proposed to further extend inexact computation. Circuit-level performance and the error characteristics of the inexact high radix dividers are analyzed for the proposed designs. The combined assessment of the normal error distance, power dissipation, and delay is investigated and applications of approximate high-radix dividers are treated in detail. The simulation results show that the proposed approximate dividers offer extensive saving in terms of power dissipation, circuit complexity, and delay, while only incurring in a small degradation in accuracy thus making them possibly suitable and interesting to some applications and domains such as low power/mobile computing.
本文提出并研究了近似高基数除法器。回顾了高基数除法,并介绍了不同层次的不精确计算。在分析中考虑了诸如位数(N)和基数(r)之类的设计参数;在二进制带符号数字加法器中,通过使用不同的替换方案,引入了用不精确单元替换精确单元的方法。为了进一步扩展不精确计算,还提出了信元截断和误差补偿。针对所提出的设计,分析了不精确高基数除法器的电路级性能和误差特性。研究了法向误差距离、功耗和延迟的综合评估,并详细讨论了近似高基数除法器的应用。仿真结果表明,所提出的近似除法器在功耗、电路复杂性和延迟方面提供了广泛的节省,同时只导致精度的小幅度下降,因此可能适用于低功耗/移动计算等一些应用和领域。
{"title":"Design, Evaluation and Application of Approximate High-Radix Dividers","authors":"Linbin Chen;Jie Han;Weiqiang Liu;Paolo Montuschi;Fabrizio Lombardi","doi":"10.1109/TMSCS.2018.2817608","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2817608","url":null,"abstract":"Approximate high radix dividers (HR-AXDs) are proposed and investigated in this paper. High-radix division is reviewed and inexact computing is introduced at different levels. Design parameters such as number of bits (N) and radix (r) are considered in the analysis; the replacement of exact cells with inexact cells in a binary signed-digit adder is introduced by utilizing different replacement schemes. Cell truncation and error compensation are also proposed to further extend inexact computation. Circuit-level performance and the error characteristics of the inexact high radix dividers are analyzed for the proposed designs. The combined assessment of the normal error distance, power dissipation, and delay is investigated and applications of approximate high-radix dividers are treated in detail. The simulation results show that the proposed approximate dividers offer extensive saving in terms of power dissipation, circuit complexity, and delay, while only incurring in a small degradation in accuracy thus making them possibly suitable and interesting to some applications and domains such as low power/mobile computing.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"299-312"},"PeriodicalIF":0.0,"publicationDate":"2018-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2817608","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Guest Editorial: Special Issue on Accelerated Computing 客座编辑:加速计算特刊
Pub Date : 2018-03-20 DOI: 10.1109/TMSCS.2018.2807058
Aviral Shrivastava;Fadi J. Kurdahi
The papers in this special section focus on the accelerated computing which refers to a computing model wherein some/all of the computation of an application is carried out on specialized hardware (known as an accelerator) in tandem with the traditional CPU. Accelerators are highly specialized hardware components that can execute a specific functionality at high performance and lower power, and often, even higher reliability than is possible on a traditional CPU. The demand of ever more computation and ever-higher power-efficiency of computation (Watts per Mflops) plus the brakes on Dennard scaling have brought the paradigm of Accelerated Computing to the front and center of computer architecture and computing system design.
本专题部分的论文集中在加速计算上,加速计算是指一种计算模型,其中应用程序的部分/全部计算是在专用硬件(称为加速器)上与传统CPU协同执行的。加速器是高度专业化的硬件组件,可以以高性能、低功耗执行特定功能,通常甚至比传统CPU更高的可靠性。对越来越多的计算和越来越高的计算功率效率(瓦特/Mflops)的需求,加上对Dennard缩放的限制,将加速计算的范式带到了计算机体系结构和计算系统设计的前沿和中心。
{"title":"Guest Editorial: Special Issue on Accelerated Computing","authors":"Aviral Shrivastava;Fadi J. Kurdahi","doi":"10.1109/TMSCS.2018.2807058","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2807058","url":null,"abstract":"The papers in this special section focus on the accelerated computing which refers to a computing model wherein some/all of the computation of an application is carried out on specialized hardware (known as an accelerator) in tandem with the traditional CPU. Accelerators are highly specialized hardware components that can execute a specific functionality at high performance and lower power, and often, even higher reliability than is possible on a traditional CPU. The demand of ever more computation and ever-higher power-efficiency of computation (Watts per Mflops) plus the brakes on Dennard scaling have brought the paradigm of Accelerated Computing to the front and center of computer architecture and computing system design.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2807058","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68003402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2017 Index IEEE Transactions on Multi-Scale Computing Systems Vol. 3 2017年索引IEEE多尺度计算系统汇刊第3卷
Pub Date : 2018-03-20 DOI: 10.1109/TMSCS.2017.2788365
Presents the 2017 subject/author index for this publication.
提供本出版物2017年主题/作者索引。
{"title":"2017 Index IEEE Transactions on Multi-Scale Computing Systems Vol. 3","authors":"","doi":"10.1109/TMSCS.2017.2788365","DOIUrl":"https://doi.org/10.1109/TMSCS.2017.2788365","url":null,"abstract":"Presents the 2017 subject/author index for this publication.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2788365","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68003143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2018 Reviewers List 2018年评审人名单
Pub Date : 2018-03-20 DOI: 10.1109/TMSCS.2018.2810358
Presents a listing of the reviewers who contributed to this publication in 2017.
提供2017年为本出版物做出贡献的评审人员名单。
{"title":"2018 Reviewers List","authors":"","doi":"10.1109/TMSCS.2018.2810358","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2810358","url":null,"abstract":"Presents a listing of the reviewers who contributed to this publication in 2017.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 1","pages":"95-96"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2810358","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68003388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring a SOT-MRAM Based In-Memory Computing for Data Processing 探索一种用于数据处理的基于SOT-MRAM的内存计算
Pub Date : 2018-03-17 DOI: 10.1109/TMSCS.2018.2836967
Zhezhi He;Yang Zhang;Shaahin Angizi;Boqing Gong;Deliang Fan
In this paper, we propose a Spin-Orbit Torque Magnetic Random-Access Memory (SOT-MRAM) array design that can simultaneously work as non-volatile memory and implement a reconfigurable in-memory logic operation without add-on logic circuits. The computed output can be simply read out like a typical MRAM bit-cell through the modified peripheral circuit. Such intrinsic in-memory computation can be used to process data locally and transfer the “cooked” data to the primary processing unit (i.e., CPU or GPU) for complex computations with high precision requirement. It greatly reduces the power-hungry and long-distance data communication, and further leads to extreme parallel computation within memory. In this work, we further propose an in-memory edge extraction algorithm as a case study to demonstrate the efficiency of the in-memory pre-processing methodology. The simulation results show that our edge extraction method reduces data communication as much as 8x for grayscale image, thus greatly reducing system energy consumption. Meanwhile, the F-measure result shows only $sim$10 percent degradation compared to conventional edge detection operator, such as Prewitt, Sobel, and Roberts. Moreover, the edges extracted from the memory show comparable good quality with Canny edges in the context of edge-based motion detection and cross-modality object recognition.
在本文中,我们提出了一种自旋轨道力矩磁随机存取存储器(SOT-MRAM)阵列设计,它可以同时作为非易失性存储器工作,并在没有附加逻辑电路的情况下实现可重新配置的存储器内逻辑操作。计算的输出可以像典型的MRAM位单元一样通过修改的外围电路简单地读出。这种内在的内存计算可以用于本地处理数据,并将“煮熟”的数据传输到主处理单元(即CPU或GPU),用于高精度要求的复杂计算。它极大地减少了耗电和远距离的数据通信,并进一步导致了内存内的极端并行计算。在这项工作中,我们进一步提出了一种内存中边缘提取算法作为案例研究,以证明内存中预处理方法的有效性。仿真结果表明,对于灰度图像,我们的边缘提取方法将数据通信减少了8倍,从而大大降低了系统能耗。同时,与传统的边缘检测算子(如Prewitt、Sobel和Roberts)相比,F-measure结果仅显示出$sim$10%的退化。此外,在基于边缘的运动检测和跨模态对象识别的背景下,从存储器中提取的边缘显示出与Canny边缘相当的良好质量。
{"title":"Exploring a SOT-MRAM Based In-Memory Computing for Data Processing","authors":"Zhezhi He;Yang Zhang;Shaahin Angizi;Boqing Gong;Deliang Fan","doi":"10.1109/TMSCS.2018.2836967","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2836967","url":null,"abstract":"In this paper, we propose a Spin-Orbit Torque Magnetic Random-Access Memory (SOT-MRAM) array design that can simultaneously work as non-volatile memory and implement a reconfigurable in-memory logic operation without add-on logic circuits. The computed output can be simply read out like a typical MRAM bit-cell through the modified peripheral circuit. Such intrinsic in-memory computation can be used to process data locally and transfer the “cooked” data to the primary processing unit (i.e., CPU or GPU) for complex computations with high precision requirement. It greatly reduces the power-hungry and long-distance data communication, and further leads to extreme parallel computation within memory. In this work, we further propose an in-memory edge extraction algorithm as a case study to demonstrate the efficiency of the in-memory pre-processing methodology. The simulation results show that our edge extraction method reduces data communication as much as 8x for grayscale image, thus greatly reducing system energy consumption. Meanwhile, the F-measure result shows only \u0000<inline-formula><tex-math>$sim$</tex-math></inline-formula>\u000010 percent degradation compared to conventional edge detection operator, such as Prewitt, Sobel, and Roberts. Moreover, the edges extracted from the memory show comparable good quality with Canny edges in the context of edge-based motion detection and cross-modality object recognition.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"676-685"},"PeriodicalIF":0.0,"publicationDate":"2018-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2836967","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68024195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
期刊
IEEE Transactions on Multi-Scale Computing Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1