The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend to be too complex and computationally intensive for typical IoT end-nodes. To address this challenge, Neural Architecture Search (NAS) has emerged as a popular design automation technique for co-optimizing the accuracy and complexity of deep neural networks. Nevertheless, existing NAS techniques require many iterations to produce a network that adheres to specific hardware constraints, such as the maximum memory available on the hardware or the maximum latency allowed by the target application. In this work, we propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods, which allows the generation, in a single shot, of a model that respects user-defined constraints on both memory and latency in a time comparable to a single standard training. The proposed approach is evaluated on five IoT-relevant benchmarks, including the MLPerf Tiny suite and Tiny ImageNet, demonstrating that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively (as defined by our targets), while ensuring non-inferior accuracy on state-of-the-art hand-tuned deep neural networks for TinyML.
依赖于物联网(IoT)设备的计算领域迅速激增,因此迫切需要能够在低功耗设备上运行的高效、准确的深度学习(DL)模型。然而,对于典型的物联网终端节点来说,传统的深度学习模型往往过于复杂和计算密集。为了应对这一挑战,神经架构搜索(NAS)已成为一种流行的设计自动化技术,用于共同优化深度神经网络的准确性和复杂性。然而,现有的 NAS 技术需要多次迭代才能生成符合特定硬件约束条件的网络,例如硬件可用的最大内存或目标应用允许的最大延迟。在这项工作中,我们提出了一种新方法,将多个约束条件纳入所谓的可微分 NAS 优化方法中,这样就能在与单次标准训练相当的时间内,一次性生成一个遵守用户定义的内存和延迟约束条件的模型。我们在五个物联网相关基准(包括 MLPerf Tiny 套件和 Tiny ImageNet)上对所提出的方法进行了评估,结果表明,只需一次搜索,就能将内存和延迟分别减少 87.4% 和 54.2%(根据我们的目标定义),同时确保 TinyML 的最先进手工调谐深度神经网络的准确性毫不逊色。
{"title":"Enhancing Neural Architecture Search With Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices","authors":"Alessio Burrello;Matteo Risso;Beatrice Alessandra Motetti;Enrico Macii;Luca Benini;Daniele Jahier Pagliari","doi":"10.1109/TETC.2023.3322033","DOIUrl":"10.1109/TETC.2023.3322033","url":null,"abstract":"The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend to be too complex and computationally intensive for typical IoT end-nodes. To address this challenge, Neural Architecture Search (NAS) has emerged as a popular design automation technique for co-optimizing the accuracy and complexity of deep neural networks. Nevertheless, existing NAS techniques require many iterations to produce a network that adheres to specific hardware constraints, such as the maximum memory available on the hardware or the maximum latency allowed by the target application. In this work, we propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods, which allows the generation, in a single shot, of a model that respects user-defined constraints on both memory and latency in a time comparable to a single standard training. The proposed approach is evaluated on five IoT-relevant benchmarks, including the MLPerf Tiny suite and Tiny ImageNet, demonstrating that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively (as defined by our targets), while ensuring non-inferior accuracy on state-of-the-art hand-tuned deep neural networks for TinyML.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"780-794"},"PeriodicalIF":5.1,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136207718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-05DOI: 10.1109/TETC.2023.3320758
Irina Arévalo;Jose L. Salmeron
Federated Learning is a machine learning approach that enables the training of a deep learning model among several participants with sensitive data that wish to share their own knowledge without compromising the privacy of their data. In this research, the authors employ a secured Federated Learning method with an additional layer of privacy and proposes a method for addressing the non-IID challenge. Moreover, differential privacy is compared with chaotic-based encryption as layer of privacy. The experimental approach assesses the performance of the federated deep learning model with differential privacy using both IID and non-IID data. In each experiment, the Federated Learning process improves the average performance metrics of the deep neural network, even in the case of non-IID data.
{"title":"A Chaotic Maps-Based Privacy-Preserving Distributed Deep Learning for Incomplete and Non-IID Datasets","authors":"Irina Arévalo;Jose L. Salmeron","doi":"10.1109/TETC.2023.3320758","DOIUrl":"10.1109/TETC.2023.3320758","url":null,"abstract":"Federated Learning is a machine learning approach that enables the training of a deep learning model among several participants with sensitive data that wish to share their own knowledge without compromising the privacy of their data. In this research, the authors employ a secured Federated Learning method with an additional layer of privacy and proposes a method for addressing the non-IID challenge. Moreover, differential privacy is compared with chaotic-based encryption as layer of privacy. The experimental approach assesses the performance of the federated deep learning model with differential privacy using both IID and non-IID data. In each experiment, the Federated Learning process improves the average performance metrics of the deep neural network, even in the case of non-IID data.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"357-367"},"PeriodicalIF":5.9,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136002626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-03DOI: 10.1109/TETC.2023.3319659
Bijay Raj Paudel;Spyros Tragoudas
This article shows that Memristive Crossbar Array (MCA)-based neuromorphic architectures provide a robust defense against adversarial attacks due to the stochastic behavior of memristors. Furthermore, it shows that adversarial robustness can be further improved by compression-based preprocessing steps that can be implemented on MCAs. It also evaluates the effect of inter-chip process variations on adversarial robustness using the proposed MCA implementation and studies the effect of on-chip training. It shows that adversarial attacks do not uniformly affect the classification accuracy of different chips. Experimental evidence using a variety of datasets and attack models supports the impact of MCA-based neuromorphic architectures and compression-based preprocessing implemented using MCA on defending against adversarial attacks. It is also experimentally shown that the on-chip training results in high resiliency to adversarial attacks in all chips.
{"title":"Memristive Crossbar Array-Based Adversarial Defense Using Compression","authors":"Bijay Raj Paudel;Spyros Tragoudas","doi":"10.1109/TETC.2023.3319659","DOIUrl":"10.1109/TETC.2023.3319659","url":null,"abstract":"This article shows that Memristive Crossbar Array (MCA)-based neuromorphic architectures provide a robust defense against adversarial attacks due to the stochastic behavior of memristors. Furthermore, it shows that adversarial robustness can be further improved by compression-based preprocessing steps that can be implemented on MCAs. It also evaluates the effect of inter-chip process variations on adversarial robustness using the proposed MCA implementation and studies the effect of on-chip training. It shows that adversarial attacks do not uniformly affect the classification accuracy of different chips. Experimental evidence using a variety of datasets and attack models supports the impact of MCA-based neuromorphic architectures and compression-based preprocessing implemented using MCA on defending against adversarial attacks. It is also experimentally shown that the on-chip training results in high resiliency to adversarial attacks in all chips.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"864-877"},"PeriodicalIF":5.1,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135914085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-29DOI: 10.1109/TETC.2023.3315512
Chang Ruan;Jianxin Wang;Wanchun Jiang;Tao Zhang
Recently, many scheduling schemes leverage coflows to improve the communication performance of jobs in distributed application frameworks deployed in data center networks, such as MapReduce and Spark. Most of them require application modification to obtain the coflow information such as the coflow ID. The latest work CODA suggests non-intrusively extracting coflow information via an identification method. However, the method depends on the historical traffic information, which may cause the identification accuracy to decrease a lot when traffic varies. To tackle the problem, we present SOCI for Scheduling coflows by the Online Coflow Identification. By observing that flows in a coflow typically communicate with a master process for starting and ending in the up-to-date distributed application frameworks, SOCI uses this characteristic for the online coflow identification. Given identification errors are inevitable, the coflow scheduler in SOCI adopts a Selectively Late Binding (SLB) mechanism, which associates the misclassified flows with coflows according to the estimation on the impact of this association on the average Coflow Completion Time (CCT). The trace-driven simulations show that SOCI can reduce CCT by up to $1.23times$