IEEE Transactions on Emerging Topics in Computing最新文献_第7页

Sparsity-Oriented MRAM-Centric Computing for Efficient Neural Network Inference 以稀疏性为导向、以 MRAM 为中心的高效神经网络推理计算

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-10-26 DOI: 10.1109/TETC.2023.3326312

Jia-Le Cui;Yanan Guo;Juntong Chen;Bo Liu;Hao Cai

Near-memory computing (NMC) and in- memory computing (IMC) paradigms show great importance in non-von Neumann architecture. Spin-transfer torque magnetic random access memory (STT-MRAM) is considered as a promising candidate to realize both NMC and IMC for resource-constrained applications. In this work, two MRAM-centric computing frameworks are proposed: triple-skipping NMC (TS-NMC) and analog-multi-bit-sparsity IMC (AMS-IMC). The TS-NMC exploits the sparsity of activations and weights to implement a write-read-calculation triple skipping computing scheme by utilizing a sparse flag generator. The AMS-IMC with reconfigured computing bit-cell and flag generator accommodate bit-level activation sparsity in the computing. STT-MRAM array and its peripheral circuits are implemented with an industrial 28-nm CMOS design-kit and an MTJ compact model. The triple-skipping scheme can reduce memory access energy consumption by 51.5× when processing zero vectors, compared to processing non-zero vectors. The energy efficiency of AMS-IMC is improved by 5.9× and 1.5× (with 75% input sparsity) as compared to the conventional NMC framework and existing analog IMC framework. Verification results show that TS-NMC and AMS-IMC achieved 98.6% and 97.5% inference accuracy in MNIST classification, with energy consumption of 14.2 nJ/pattern and 12.7 nJ/pattern, respectively.

近内存计算（NMC）和内存计算（IMC）范例在非冯-诺依曼体系结构中具有重要意义。自旋转移力矩磁随机存取存储器（STT-MRAM）被认为是实现资源受限应用的 NMC 和 IMC 的理想候选方案。在这项工作中，提出了两个以 MRAM 为中心的计算框架：三跳 NMC（TS-NMC）和模拟多位稀疏 IMC（AMS-IMC）。TS-NMC 利用激活和权重的稀疏性，通过稀疏标志发生器实现写入-读取-计算三重跳过计算方案。AMS-IMC 具有重新配置的计算位元和标志发生器，可在计算中适应位级激活稀疏性。STT-MRAM 阵列及其外围电路是通过 28 纳米 CMOS 工业设计套件和 MTJ 紧凑型模型实现的。与处理非零矢量相比，三跳方案在处理零矢量时可将内存访问能耗降低 51.5 倍。与传统的 NMC 框架和现有的模拟 IMC 框架相比，AMS-IMC 的能效分别提高了 5.9 倍和 1.5 倍（输入稀疏度为 75%）。验证结果表明，TS-NMC 和 AMS-IMC 在 MNIST 分类中的推理准确率分别达到了 98.6% 和 97.5%，能耗分别为 14.2 nJ/模式和 12.7 nJ/模式。

{"title":"Sparsity-Oriented MRAM-Centric Computing for Efficient Neural Network Inference","authors":"Jia-Le Cui;Yanan Guo;Juntong Chen;Bo Liu;Hao Cai","doi":"10.1109/TETC.2023.3326312","DOIUrl":"10.1109/TETC.2023.3326312","url":null,"abstract":"Near-memory computing (NMC) and in- memory computing (IMC) paradigms show great importance in non-von Neumann architecture. Spin-transfer torque magnetic random access memory (STT-MRAM) is considered as a promising candidate to realize both NMC and IMC for resource-constrained applications. In this work, two MRAM-centric computing frameworks are proposed: triple-skipping NMC (TS-NMC) and analog-multi-bit-sparsity IMC (AMS-IMC). The TS-NMC exploits the sparsity of activations and weights to implement a write-read-calculation triple skipping computing scheme by utilizing a sparse flag generator. The AMS-IMC with reconfigured computing bit-cell and flag generator accommodate bit-level activation sparsity in the computing. STT-MRAM array and its peripheral circuits are implemented with an industrial 28-nm CMOS design-kit and an MTJ compact model. The triple-skipping scheme can reduce memory access energy consumption by 51.5× when processing zero vectors, compared to processing non-zero vectors. The energy efficiency of AMS-IMC is improved by 5.9× and 1.5× (with 75% input sparsity) as compared to the conventional NMC framework and existing analog IMC framework. Verification results show that TS-NMC and AMS-IMC achieved 98.6% and 97.5% inference accuracy in MNIST classification, with energy consumption of 14.2 nJ/pattern and 12.7 nJ/pattern, respectively.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"97-108"},"PeriodicalIF":5.9,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135210898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distributed Indexing Schemes for K-Dominant Skyline Analytics on Uncertain Edge-IoT Data 用于不确定边缘物联网数据 K 主导天际线分析的分布式索引方案

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-10-26 DOI: 10.1109/TETC.2023.3326295

Chuan-Chi Lai;Hsuan-Yu Lin;Chuan-Ming Liu

Skyline queries typically search a Pareto-optimal set from a given data set to solve the corresponding multiobjective optimization problem. As the number of criteria increases, the skyline presumes excessive data items, which yield a meaningless result. To address this curse of dimensionality, we proposed a

$k$

-dominant skyline in which the number of skyline members was reduced by relaxing the restriction on the number of dimensions, considering the uncertainty of data. Specifically, each data item was associated with a probability of appearance, which represented the probability of becoming a member of the

$k$

-dominant skyline. As data items appear continuously in data streams, the corresponding

$k$

-dominant skyline may vary with time. Therefore, an effective and rapid mechanism of updating the

$k$

-dominant skyline becomes crucial. Herein, we proposed two time-efficient schemes, Middle Indexing (MI) and All Indexing (AI), for

$k$

-dominant skyline in distributed edge-computing environments, where irrelevant data items can be effectively excluded from the compute to reduce the processing duration. Furthermore, the proposed schemes were validated with extensive experimental simulations. The experimental results demonstrated that the proposed MI and AI schemes reduced the computation time by approximately 13% and 56%, respectively, compared with the existing method.

天际线查询通常是从给定数据集中搜索帕累托最优集，以解决相应的多目标优化问题。随着标准数量的增加，天际线会假定过多的数据项，从而产生毫无意义的结果。考虑到数据的不确定性，我们提出了一种 "k$主导天际线"，通过放宽维数限制来减少天际线成员的数量。具体来说，每个数据项都与出现概率相关联，而出现概率代表了成为 $k$ 主导天际线成员的概率。由于数据项在数据流中不断出现，相应的 $k$ 主导天际线可能会随时间而变化。因此，一种有效而快速的 $k$ 主导天际线更新机制变得至关重要。在此，我们针对分布式边缘计算环境中的 $k$ 主导天际线提出了两种省时方案：中间索引（MI）和全部索引（AI），其中不相关的数据项可以有效地排除在计算之外，从而缩短处理时间。此外，还通过大量的实验模拟验证了所提出的方案。实验结果表明，与现有方法相比，所提出的 MI 和 AI 方案分别减少了约 13% 和 56% 的计算时间。

{"title":"Distributed Indexing Schemes for K-Dominant Skyline Analytics on Uncertain Edge-IoT Data","authors":"Chuan-Chi Lai;Hsuan-Yu Lin;Chuan-Ming Liu","doi":"10.1109/TETC.2023.3326295","DOIUrl":"10.1109/TETC.2023.3326295","url":null,"abstract":"Skyline queries typically search a Pareto-optimal set from a given data set to solve the corresponding multiobjective optimization problem. As the number of criteria increases, the skyline presumes excessive data items, which yield a meaningless result. To address this curse of dimensionality, we proposed a \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-dominant skyline in which the number of skyline members was reduced by relaxing the restriction on the number of dimensions, considering the uncertainty of data. Specifically, each data item was associated with a probability of appearance, which represented the probability of becoming a member of the \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-dominant skyline. As data items appear continuously in data streams, the corresponding \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-dominant skyline may vary with time. Therefore, an effective and rapid mechanism of updating the \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-dominant skyline becomes crucial. Herein, we proposed two time-efficient schemes, Middle Indexing (MI) and All Indexing (AI), for \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-dominant skyline in distributed edge-computing environments, where irrelevant data items can be effectively excluded from the compute to reduce the processing duration. Furthermore, the proposed schemes were validated with extensive experimental simulations. The experimental results demonstrated that the proposed MI and AI schemes reduced the computation time by approximately 13% and 56%, respectively, compared with the existing method.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"878-890"},"PeriodicalIF":5.1,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135058351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Ternary Logic Circuits Optimized by Ternary Arithmetic Algorithms 通过三元算术算法优化的高效三元逻辑电路

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-10-19 DOI: 10.1109/TETC.2023.3321050

Guangchao Zhao;Zhiwei Zeng;Xingli Wang;Abdelrahman G. Qoutb;Philippe Coquet;Eby G. Friedman;Beng Kang Tay;Mingqiang Huang

Multi-valued logic (MVL) circuits, especially the ternary logic circuits, have attracted great attention in recent years due to their higher information density than binary logic systems. However, the basic construction method for MVL circuit standard cells and the CMOS fabrication possibility/compatibility issues are still to be addressed. In this work, we propose various ternary arithmetic circuits (adders and multipliers) with embedded ternary arithmetic algorithms to improve the efficiency. First, ternary cycling gates are designed to optimize both the arithmetic algorithms and logic circuits of ternary adders. Second, optimized ternary Boolean truth table is used to simplify the circuit complexity. Third, high-speed ternary Wallace tree multipliers are implemented with task dividing policy. Significant improvements in propagation delay and power-delay-product (PDP) have been achieved as compared with previous works. In particular, the ternary full adder shows 11 aJ PDP at 0.5 GHz, which is the best result among all the reported works using the same simulation platform. And an average PDP improvement of 36.8% in the ternary multiplier is also achieved. Furthermore, the proposed methods have been successfully explored using standard CMOS 180nm silicon devices, indicating its great potential for the practical application of ternary computing in the near future.

近年来，多值逻辑（MVL）电路，尤其是三元逻辑电路，因其信息密度高于二元逻辑系统而备受关注。然而，MVL 电路标准单元的基本构造方法和 CMOS 制造的可能性/兼容性问题仍有待解决。在这项工作中，我们提出了各种具有嵌入式三元运算算法的三元运算电路（加法器和乘法器），以提高效率。首先，设计了三元循环门，以优化三元加法器的算术算法和逻辑电路。其次，使用优化的三元布尔真值表来简化电路复杂度。第三，利用任务划分策略实现了高速三元华莱士树乘法器。与之前的研究相比，传播延迟和功率-延迟-乘积（PDP）有了显著改善。其中，三元全加法器在 0.5 GHz 时的功率延迟积（PDP）为 11 aJ，这是在使用相同仿真平台的所有报告作品中取得的最佳结果。三元乘法器的平均 PDP 也提高了 36.8%。此外，利用标准 CMOS 180nm 硅器件成功探索了所提出的方法，这表明在不久的将来，它在三元计算的实际应用中将大有可为。

{"title":"Efficient Ternary Logic Circuits Optimized by Ternary Arithmetic Algorithms","authors":"Guangchao Zhao;Zhiwei Zeng;Xingli Wang;Abdelrahman G. Qoutb;Philippe Coquet;Eby G. Friedman;Beng Kang Tay;Mingqiang Huang","doi":"10.1109/TETC.2023.3321050","DOIUrl":"10.1109/TETC.2023.3321050","url":null,"abstract":"Multi-valued logic (MVL) circuits, especially the ternary logic circuits, have attracted great attention in recent years due to their higher information density than binary logic systems. However, the basic construction method for MVL circuit standard cells and the CMOS fabrication possibility/compatibility issues are still to be addressed. In this work, we propose various ternary arithmetic circuits (adders and multipliers) with embedded ternary arithmetic algorithms to improve the efficiency. First, ternary cycling gates are designed to optimize both the arithmetic algorithms and logic circuits of ternary adders. Second, optimized ternary Boolean truth table is used to simplify the circuit complexity. Third, high-speed ternary Wallace tree multipliers are implemented with task dividing policy. Significant improvements in propagation delay and power-delay-product (PDP) have been achieved as compared with previous works. In particular, the ternary full adder shows 11 aJ PDP at 0.5 GHz, which is the best result among all the reported works using the same simulation platform. And an average PDP improvement of 36.8% in the ternary multiplier is also achieved. Furthermore, the proposed methods have been successfully explored using standard CMOS 180nm silicon devices, indicating its great potential for the practical application of ternary computing in the near future.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"826-839"},"PeriodicalIF":5.1,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135058269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Diversities to Model the Reliability of Two-Version Machine Learning Systems 利用多样性为双版本机器学习系统的可靠性建模

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-10-12 DOI: 10.1109/TETC.2023.3322563

Fumio Machida

The N-version machine learning system (MLS) is an architectural approach to reduce error outputs from a system by redundant configuration using multiple machine learning (ML) modules. Improved system reliability achieved by N-version MLSs inherently depends on how diverse ML models are employed and how diverse input data sets are given. However, neither error input spaces of individual ML models nor input data distributions are obtainable in practice, which is a fundamental barrier to understanding the reliability improvement by N-version architectures. In this paper, we introduce two diversity measures quantifying the similarities of ML models’ capabilities and the interdependence of input data sets causing errors, respectively. The defined measures are used to formulate the reliability of an elemental N-version MLS called dependent double-modules double-inputs MLS. The system is assumed to fail when two ML modules output errors simultaneously for the same classification task. The reliabilities of different architecture options for this MLS are comprehensively analyzed through a compact matrix representation form of the proposed reliability model. The theoretical analysis and numerical results show that the architecture exploiting two diversities achieves preferable reliability under reasonable assumptions. Intuitive relations between diversity parameters and architecture reliabilities are also demonstrated through numerical examples.

N 版机器学习系统（MLS）是一种通过使用多个机器学习（ML）模块进行冗余配置来减少系统错误输出的架构方法。N 版机器学习系统所实现的系统可靠性的提高，本质上取决于采用的机器学习模型的多样性以及输入数据集的多样性。然而，无论是单个 ML 模型的误差输入空间还是输入数据分布，在实践中都无法获得，这是理解 N 版架构提高可靠性的根本障碍。在本文中，我们引入了两个多样性度量，分别量化 ML 模型能力的相似性和导致错误的输入数据集的相互依赖性。所定义的度量值被用于计算一种名为依赖双模块双输入 MLS 的元素 N 版本 MLS 的可靠性。假设在同一分类任务中，两个 ML 模块同时输出错误时，系统就会失效。通过所提可靠性模型的紧凑矩阵表示形式，全面分析了该 MLS 不同架构选项的可靠性。理论分析和数值结果表明，在合理的假设条件下，利用两个多样性的架构能获得更佳的可靠性。此外，还通过数值示例证明了多样性参数与架构可靠性之间的直观关系。

{"title":"Using Diversities to Model the Reliability of Two-Version Machine Learning Systems","authors":"Fumio Machida","doi":"10.1109/TETC.2023.3322563","DOIUrl":"10.1109/TETC.2023.3322563","url":null,"abstract":"The N-version machine learning system (MLS) is an architectural approach to reduce error outputs from a system by redundant configuration using multiple machine learning (ML) modules. Improved system reliability achieved by N-version MLSs inherently depends on how diverse ML models are employed and how diverse input data sets are given. However, neither error input spaces of individual ML models nor input data distributions are obtainable in practice, which is a fundamental barrier to understanding the reliability improvement by N-version architectures. In this paper, we introduce two diversity measures quantifying the similarities of ML models’ capabilities and the interdependence of input data sets causing errors, respectively. The defined measures are used to formulate the reliability of an elemental N-version MLS called dependent double-modules double-inputs MLS. The system is assumed to fail when two ML modules output errors simultaneously for the same classification task. The reliabilities of different architecture options for this MLS are comprehensively analyzed through a compact matrix representation form of the proposed reliability model. The theoretical analysis and numerical results show that the architecture exploiting two diversities achieves preferable reliability under reasonable assumptions. Intuitive relations between diversity parameters and architecture reliabilities are also demonstrated through numerical examples.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"810-825"},"PeriodicalIF":5.1,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136303218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Non-Invasive Reverse Engineering of One-Hot Finite State Machines Using Scan Dump Data 利用扫描数据对一热有限状态机进行非侵入式逆向工程研究

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-10-11 DOI: 10.1109/TETC.2023.3322299

Zhaoxuan Dong;Aijiao Cui;Hao Lu

Finite-state machine (FSM) always works as a core control unit of a chip or a system. As a high level design, FSM has also been exploited to build multiple secure designs as it is deemed hard to discern FSM structure from the netlist or physical design. However, these secure designs can never sustain once the FSM structure is reversed. Reverse engineering FSM not only indicates the access of the control scheme of a design, but also poses a severe threat to those FSM-based secure designs. As the one-hot encoding FSM is widely adopted in various circuit designs, this paper proposes a non-invasive method to reverse engineer the one-hot encoding FSM. The data dumped from the scan chain during chip operation is first collected. The scan data is then used to identify all the candidate sets of state registers which satisfy two necessary conditions for one-hot state registers. Association relationship between the candidate registers and data registers are further evaluated to identify the unique target set of state registers. The transitions among FSM states are finally retrieved based on the scan dump data from those identified state registers. The experimental results on the benchmark circuits of different size show that this proposed method can identify all one-hot state registers exactly and the transitions can be retrieved at a high accuracy while the existing methods cannot achieve a satisfactory correct detection rate for one-hot encoding FSM.

有限状态机（FSM）始终是芯片或系统的核心控制单元。作为一种高级设计，FSM 也被用来构建多种安全设计，因为从网表或物理设计中很难分辨出 FSM 结构。然而，一旦 FSM 结构被逆转，这些安全设计就无法继续。逆向工程 FSM 不仅表明设计的控制方案被访问，而且对基于 FSM 的安全设计构成严重威胁。鉴于单次编码 FSM 广泛应用于各种电路设计中，本文提出了一种非侵入式的单次编码 FSM 逆向工程方法。首先收集芯片运行时从扫描链转储的数据。然后，利用扫描数据识别出所有满足一热状态寄存器两个必要条件的候选状态寄存器集。进一步评估候选寄存器和数据寄存器之间的关联关系，以确定唯一的目标状态寄存器集。最后，根据这些已确定状态寄存器的扫描转储数据，检索 FSM 状态之间的转换。在不同规模的基准电路上的实验结果表明，所提出的方法能准确识别所有单次热状态寄存器，并能以较高的精度检索出转换，而现有方法则无法达到令人满意的单次热编码 FSM 正确检测率。

{"title":"Non-Invasive Reverse Engineering of One-Hot Finite State Machines Using Scan Dump Data","authors":"Zhaoxuan Dong;Aijiao Cui;Hao Lu","doi":"10.1109/TETC.2023.3322299","DOIUrl":"10.1109/TETC.2023.3322299","url":null,"abstract":"Finite-state machine (FSM) always works as a core control unit of a chip or a system. As a high level design, FSM has also been exploited to build multiple secure designs as it is deemed hard to discern FSM structure from the netlist or physical design. However, these secure designs can never sustain once the FSM structure is reversed. Reverse engineering FSM not only indicates the access of the control scheme of a design, but also poses a severe threat to those FSM-based secure designs. As the one-hot encoding FSM is widely adopted in various circuit designs, this paper proposes a non-invasive method to reverse engineer the one-hot encoding FSM. The data dumped from the scan chain during chip operation is first collected. The scan data is then used to identify all the candidate sets of state registers which satisfy two necessary conditions for one-hot state registers. Association relationship between the candidate registers and data registers are further evaluated to identify the unique target set of state registers. The transitions among FSM states are finally retrieved based on the scan dump data from those identified state registers. The experimental results on the benchmark circuits of different size show that this proposed method can identify all one-hot state registers exactly and the transitions can be retrieved at a high accuracy while the existing methods cannot achieve a satisfactory correct detection rate for one-hot encoding FSM.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"795-809"},"PeriodicalIF":5.1,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136257360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Neural Architecture Search With Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices 利用多种硬件限制增强神经架构搜索，以便在微型物联网设备上部署深度学习模型

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-10-10 DOI: 10.1109/TETC.2023.3322033

Alessio Burrello;Matteo Risso;Beatrice Alessandra Motetti;Enrico Macii;Luca Benini;Daniele Jahier Pagliari

The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend to be too complex and computationally intensive for typical IoT end-nodes. To address this challenge, Neural Architecture Search (NAS) has emerged as a popular design automation technique for co-optimizing the accuracy and complexity of deep neural networks. Nevertheless, existing NAS techniques require many iterations to produce a network that adheres to specific hardware constraints, such as the maximum memory available on the hardware or the maximum latency allowed by the target application. In this work, we propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods, which allows the generation, in a single shot, of a model that respects user-defined constraints on both memory and latency in a time comparable to a single standard training. The proposed approach is evaluated on five IoT-relevant benchmarks, including the MLPerf Tiny suite and Tiny ImageNet, demonstrating that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively (as defined by our targets), while ensuring non-inferior accuracy on state-of-the-art hand-tuned deep neural networks for TinyML.

依赖于物联网（IoT）设备的计算领域迅速激增，因此迫切需要能够在低功耗设备上运行的高效、准确的深度学习（DL）模型。然而，对于典型的物联网终端节点来说，传统的深度学习模型往往过于复杂和计算密集。为了应对这一挑战，神经架构搜索（NAS）已成为一种流行的设计自动化技术，用于共同优化深度神经网络的准确性和复杂性。然而，现有的 NAS 技术需要多次迭代才能生成符合特定硬件约束条件的网络，例如硬件可用的最大内存或目标应用允许的最大延迟。在这项工作中，我们提出了一种新方法，将多个约束条件纳入所谓的可微分 NAS 优化方法中，这样就能在与单次标准训练相当的时间内，一次性生成一个遵守用户定义的内存和延迟约束条件的模型。我们在五个物联网相关基准（包括 MLPerf Tiny 套件和 Tiny ImageNet）上对所提出的方法进行了评估，结果表明，只需一次搜索，就能将内存和延迟分别减少 87.4% 和 54.2%（根据我们的目标定义），同时确保 TinyML 的最先进手工调谐深度神经网络的准确性毫不逊色。

{"title":"Enhancing Neural Architecture Search With Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices","authors":"Alessio Burrello;Matteo Risso;Beatrice Alessandra Motetti;Enrico Macii;Luca Benini;Daniele Jahier Pagliari","doi":"10.1109/TETC.2023.3322033","DOIUrl":"10.1109/TETC.2023.3322033","url":null,"abstract":"The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend to be too complex and computationally intensive for typical IoT end-nodes. To address this challenge, Neural Architecture Search (NAS) has emerged as a popular design automation technique for co-optimizing the accuracy and complexity of deep neural networks. Nevertheless, existing NAS techniques require many iterations to produce a network that adheres to specific hardware constraints, such as the maximum memory available on the hardware or the maximum latency allowed by the target application. In this work, we propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods, which allows the generation, in a single shot, of a model that respects user-defined constraints on both memory and latency in a time comparable to a single standard training. The proposed approach is evaluated on five IoT-relevant benchmarks, including the MLPerf Tiny suite and Tiny ImageNet, demonstrating that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively (as defined by our targets), while ensuring non-inferior accuracy on state-of-the-art hand-tuned deep neural networks for TinyML.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"780-794"},"PeriodicalIF":5.1,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136207718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Chaotic Maps-Based Privacy-Preserving Distributed Deep Learning for Incomplete and Non-IID Datasets 基于混沌图的不完整和非 IID 数据集隐私保护分布式深度学习

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-10-05 DOI: 10.1109/TETC.2023.3320758

Irina Arévalo;Jose L. Salmeron

Federated Learning is a machine learning approach that enables the training of a deep learning model among several participants with sensitive data that wish to share their own knowledge without compromising the privacy of their data. In this research, the authors employ a secured Federated Learning method with an additional layer of privacy and proposes a method for addressing the non-IID challenge. Moreover, differential privacy is compared with chaotic-based encryption as layer of privacy. The experimental approach assesses the performance of the federated deep learning model with differential privacy using both IID and non-IID data. In each experiment, the Federated Learning process improves the average performance metrics of the deep neural network, even in the case of non-IID data.

Federated Learning 是一种机器学习方法，它可以在多个拥有敏感数据的参与者之间训练深度学习模型，这些参与者希望在不损害其数据隐私的情况下分享自己的知识。在这项研究中，作者采用了一种具有额外隐私层的安全联邦学习方法，并提出了一种应对非 IID 挑战的方法。此外，还将差分隐私与混沌加密作为隐私层进行了比较。实验方法使用 IID 和非 IID 数据评估了具有差分隐私的联合深度学习模型的性能。在每个实验中，联合学习过程都提高了深度神经网络的平均性能指标，即使在非 IID 数据的情况下也是如此。

引用次数: 0

Memristive Crossbar Array-Based Adversarial Defense Using Compression 使用压缩技术的基于内存交叉条阵列的对抗性防御

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-10-03 DOI: 10.1109/TETC.2023.3319659

Bijay Raj Paudel;Spyros Tragoudas

This article shows that Memristive Crossbar Array (MCA)-based neuromorphic architectures provide a robust defense against adversarial attacks due to the stochastic behavior of memristors. Furthermore, it shows that adversarial robustness can be further improved by compression-based preprocessing steps that can be implemented on MCAs. It also evaluates the effect of inter-chip process variations on adversarial robustness using the proposed MCA implementation and studies the effect of on-chip training. It shows that adversarial attacks do not uniformly affect the classification accuracy of different chips. Experimental evidence using a variety of datasets and attack models supports the impact of MCA-based neuromorphic architectures and compression-based preprocessing implemented using MCA on defending against adversarial attacks. It is also experimentally shown that the on-chip training results in high resiliency to adversarial attacks in all chips.

本文表明，由于忆阻器的随机行为，基于忆阻器交叉条阵列（MCA）的神经形态架构可提供对对抗性攻击的稳健防御。此外，它还表明，通过在 MCA 上实施基于压缩的预处理步骤，可以进一步提高对抗鲁棒性。它还评估了芯片间工艺变化对使用拟议的 MCA 实现对抗鲁棒性的影响，并研究了片上训练的效果。研究表明，对抗性攻击对不同芯片分类准确性的影响并不一致。使用各种数据集和攻击模型进行的实验证明，基于 MCA 的神经形态架构和使用 MCA 实现的基于压缩的预处理对抵御对抗性攻击有影响。实验还表明，片上训练使所有芯片都能很好地抵御对抗性攻击。

引用次数: 0

Scheduling Coflows by Online Identification in Data Center Network 在数据中心网络中通过在线识别调度同向流量

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-09-29 DOI: 10.1109/TETC.2023.3315512

Chang Ruan;Jianxin Wang;Wanchun Jiang;Tao Zhang

Recently, many scheduling schemes leverage coflows to improve the communication performance of jobs in distributed application frameworks deployed in data center networks, such as MapReduce and Spark. Most of them require application modification to obtain the coflow information such as the coflow ID. The latest work CODA suggests non-intrusively extracting coflow information via an identification method. However, the method depends on the historical traffic information, which may cause the identification accuracy to decrease a lot when traffic varies. To tackle the problem, we present SOCI for Scheduling coflows by the Online Coflow Identification. By observing that flows in a coflow typically communicate with a master process for starting and ending in the up-to-date distributed application frameworks, SOCI uses this characteristic for the online coflow identification. Given identification errors are inevitable, the coflow scheduler in SOCI adopts a Selectively Late Binding (SLB) mechanism, which associates the misclassified flows with coflows according to the estimation on the impact of this association on the average Coflow Completion Time (CCT). The trace-driven simulations show that SOCI can reduce CCT by up to

$1.23times$

compared to CODA when the identification accuracy decreases and is comparable to schemes without coflow identification.

最近，许多调度方案利用协流来提高部署在数据中心网络中的分布式应用框架（如 MapReduce 和 Spark）中作业的通信性能。其中大多数方案都需要对应用程序进行修改，以获取协流 ID 等协流信息。最新研究成果 CODA 建议通过识别方法非侵入式地提取协流信息。然而，这种方法依赖于历史流量信息，当流量发生变化时，识别准确率可能会大大降低。为了解决这个问题，我们提出了通过在线共流识别来调度共流的 SOCI 方法。在最新的分布式应用框架中，共同流中的流通常会与主进程通信，以开始和结束共同流，SOCI 利用这一特性进行在线共同流识别。鉴于识别错误在所难免，SOCI 中的协流调度器采用了选择性延迟绑定（SLB）机制，根据这种关联对平均协流完成时间（CCT）影响的估计，将分类错误的流与协流关联起来。轨迹驱动仿真表明，当识别准确率降低时，SOCI 与 CODA 相比可将 CCT 减少多达 1.23 美元/次，与不识别同向流的方案相当。

{"title":"Scheduling Coflows by Online Identification in Data Center Network","authors":"Chang Ruan;Jianxin Wang;Wanchun Jiang;Tao Zhang","doi":"10.1109/TETC.2023.3315512","DOIUrl":"10.1109/TETC.2023.3315512","url":null,"abstract":"Recently, many scheduling schemes leverage coflows to improve the communication performance of jobs in distributed application frameworks deployed in data center networks, such as MapReduce and Spark. Most of them require application modification to obtain the coflow information such as the coflow ID. The latest work CODA suggests non-intrusively extracting coflow information via an identification method. However, the method depends on the historical traffic information, which may cause the identification accuracy to decrease a lot when traffic varies. To tackle the problem, we present SOCI for Scheduling coflows by the Online Coflow Identification. By observing that flows in a coflow typically communicate with a master process for starting and ending in the up-to-date distributed application frameworks, SOCI uses this characteristic for the online coflow identification. Given identification errors are inevitable, the coflow scheduler in SOCI adopts a Selectively Late Binding (SLB) mechanism, which associates the misclassified flows with coflows according to the estimation on the impact of this association on the average Coflow Completion Time (CCT). The trace-driven simulations show that SOCI can reduce CCT by up to \u0000<inline-formula><tex-math>$1.23times$</tex-math></inline-formula>\u0000 compared to CODA when the identification accuracy decreases and is comparable to schemes without coflow identification.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"1057-1069"},"PeriodicalIF":5.9,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135843617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Memristive Crossbar Array-Based Computing Framework via DWT for Biomedical Image Enhancement 通过 DWT 增强生物医学图像的基于 Memristive Crossbar 阵列的计算框架

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-09-28 DOI: 10.1109/TETC.2023.3318303

Kumari Jyoti;Mohit Kumar Gautam;Sanjay Kumar;Sai Sushma;Ram Bilas Pachori;Shaibal Mukherjee

Here, we report the fabrication of Y₂O₃-based memristive crossbar array (MCA) by utilizing dual ion beam sputtering system, which shows high cyclic stability in the resistive switching behavior. Further, the obtained experimental results are validated with an analytical MCA based model, which exhibits extremely well fitting with the corresponding experimental data. Moreover, the experimentally validated analytical model is further used for biomedical image analysis, specifically computed tomography (CT) scan and magnetic resonance imaging (MRI) images by utilizing the 2-dimensional image decomposition technique. The different levels of decomposition are used for different threshold values which help to analyze the quality of the reconstructed image in terms of peak signal-to-noise ratio, structural similarity index and mean square error. For the MRI and CT scan images, at the first decomposition level, the data compression ratio of 21.01%, and 47.81% with Haar and 18.82%, and 46.05% with biorthogonal wavelet are obtained. Furthermore, the impact of brightness is also analyzed which shows a sufficient increment in the quality of output image by 103.72% and 18.59% for CT scan and MRI image, respectively for Haar wavelet. The proposed MCA based model for image processing is a novel approach to reduce the computation time and storage for biomedical engineering.

在此，我们报告了利用双离子束溅射系统制造出的基于 Y2O3 的忆阻性横杆阵列（MCA），该阵列在电阻开关行为方面表现出很高的周期稳定性。此外，获得的实验结果与基于 MCA 的分析模型进行了验证，该模型与相应的实验数据具有极高的拟合度。此外，经实验验证的分析模型还被进一步用于生物医学图像分析，特别是利用二维图像分解技术对计算机断层扫描（CT）和磁共振成像（MRI）图像进行分析。不同的分解级别采用不同的阈值，这有助于从峰值信噪比、结构相似性指数和均方误差等方面分析重建图像的质量。对于核磁共振成像和 CT 扫描图像，在第一级分解时，Haar 小波的数据压缩率分别为 21.01% 和 47.81%，而 Biorthogonal 小波的数据压缩率分别为 18.82% 和 46.05%。此外，还分析了亮度的影响，结果表明，对于 CT 扫描和 MRI 图像，Haar 小波的输出图像质量分别提高了 103.72% 和 18.59%。所提出的基于 MCA 的图像处理模型是一种减少生物医学工程计算时间和存储空间的新方法。

{"title":"Memristive Crossbar Array-Based Computing Framework via DWT for Biomedical Image Enhancement","authors":"Kumari Jyoti;Mohit Kumar Gautam;Sanjay Kumar;Sai Sushma;Ram Bilas Pachori;Shaibal Mukherjee","doi":"10.1109/TETC.2023.3318303","DOIUrl":"10.1109/TETC.2023.3318303","url":null,"abstract":"Here, we report the fabrication of Y\u0000<sub>2</sub>\u0000O\u0000<sub>3</sub>\u0000-based memristive crossbar array (MCA) by utilizing dual ion beam sputtering system, which shows high cyclic stability in the resistive switching behavior. Further, the obtained experimental results are validated with an analytical MCA based model, which exhibits extremely well fitting with the corresponding experimental data. Moreover, the experimentally validated analytical model is further used for biomedical image analysis, specifically computed tomography (CT) scan and magnetic resonance imaging (MRI) images by utilizing the 2-dimensional image decomposition technique. The different levels of decomposition are used for different threshold values which help to analyze the quality of the reconstructed image in terms of peak signal-to-noise ratio, structural similarity index and mean square error. For the MRI and CT scan images, at the first decomposition level, the data compression ratio of 21.01%, and 47.81% with Haar and 18.82%, and 46.05% with biorthogonal wavelet are obtained. Furthermore, the impact of brightness is also analyzed which shows a sufficient increment in the quality of output image by 103.72% and 18.59% for CT scan and MRI image, respectively for Haar wavelet. The proposed MCA based model for image processing is a novel approach to reduce the computation time and storage for biomedical engineering.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"766-779"},"PeriodicalIF":5.1,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135800896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0