首页 > 最新文献

IEEE Transactions on Emerging Topics in Computing最新文献

英文 中文
Guest Editorial: Special Section on Applied Software Aging and Rejuvenation 嘉宾评论:应用软件老化与再生专题
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-06-19 DOI: 10.1109/TETC.2025.3579813
Raffaele Romagnoli;Jianwen Xiang
{"title":"Guest Editorial: Special Section on Applied Software Aging and Rejuvenation","authors":"Raffaele Romagnoli;Jianwen Xiang","doi":"10.1109/TETC.2025.3579813","DOIUrl":"https://doi.org/10.1109/TETC.2025.3579813","url":null,"abstract":"","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 2","pages":"281-282"},"PeriodicalIF":5.1,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11045264","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel RFET-Based FPGA Architecture Based on Delay-Aware Packing Algorithm 一种基于延迟感知封装算法的rfet FPGA结构
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-06-16 DOI: 10.1109/TETC.2025.3572712
Sheng Lu;Liuting Shang;Sungyong Jung;Qilian Liang;Chenyun Pan
Reconfigurable devices are attracting growing interest as both a potential alternative and complement to traditional CMOS technology. This paper develops a novel field-programmable gate array (FPGA) architecture based on MClusters, which is made of fast and area-efficient 2-input look-up tables (LUTs) through reconfigurable field-effect transistors (RFETs). To fully utilize the MClusters, we propose an SAT-based delay-aware packing algorithm for the technology mapping. In addition, we integrate a partitioning algorithm to divide the circuit into several sub-circuits to further reduce the global routing resources and their associated switching energy of the system. Finally, we develop an efficient technology/circuit/system co-design framework for optimizing the overall performance of FPGAs. Based on comprehensive benchmarking, results demonstrate that optimal design yields significant reductions of up to 39% area, 36% wire length, and 40% switching energy compared to traditional CMOS 6-input LUT FPGAs.
可重构器件作为传统CMOS技术的潜在替代和补充,正引起人们越来越多的兴趣。本文开发了一种基于MClusters的新型现场可编程门阵列(FPGA)架构,该架构通过可重构场效应晶体管(rfet)组成快速高效的2输入查找表(lut)。为了充分利用MClusters,我们提出了一种基于sat的延迟感知封装算法用于技术映射。此外,我们还集成了一种划分算法,将电路划分为若干个子电路,以进一步减少系统的全局路由资源及其相关的交换能量。最后,我们开发了一个高效的技术/电路/系统协同设计框架,以优化fpga的整体性能。基于全面的基准测试,结果表明,与传统的CMOS 6输入LUT fpga相比,优化设计可显着减少高达39%的面积,36%的线长和40%的开关能量。
{"title":"A Novel RFET-Based FPGA Architecture Based on Delay-Aware Packing Algorithm","authors":"Sheng Lu;Liuting Shang;Sungyong Jung;Qilian Liang;Chenyun Pan","doi":"10.1109/TETC.2025.3572712","DOIUrl":"https://doi.org/10.1109/TETC.2025.3572712","url":null,"abstract":"Reconfigurable devices are attracting growing interest as both a potential alternative and complement to traditional CMOS technology. This paper develops a novel field-programmable gate array (FPGA) architecture based on MClusters, which is made of fast and area-efficient 2-input look-up tables (LUTs) through reconfigurable field-effect transistors (RFETs). To fully utilize the MClusters, we propose an SAT-based delay-aware packing algorithm for the technology mapping. In addition, we integrate a partitioning algorithm to divide the circuit into several sub-circuits to further reduce the global routing resources and their associated switching energy of the system. Finally, we develop an efficient technology/circuit/system co-design framework for optimizing the overall performance of FPGAs. Based on comprehensive benchmarking, results demonstrate that optimal design yields significant reductions of up to 39% area, 36% wire length, and 40% switching energy compared to traditional CMOS 6-input LUT FPGAs.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1230-1241"},"PeriodicalIF":5.4,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145036775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What, When, Where to Compute-in-Memory for Efficient Matrix Multiplication During Machine Learning Inference 在机器学习推理过程中,什么、何时、何地在内存中进行有效的矩阵乘法计算
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-06-05 DOI: 10.1109/TETC.2025.3574508
Tanvi Sharma;Mustafa Ali;Indranil Chakraborty;Kaushik Roy
Matrix multiplication is the dominant computation during Machine Learning (ML) inference. To efficiently perform such multiplication operations, Compute-in-memory (CiM) paradigms have emerged as a highly energy efficient solution. However, integrating compute in memory poses key questions, such as 1) What type of CiM to use: Given a multitude of CiM design characteristics, determining their suitability from architecture perspective is needed. 2) When to use CiM: ML inference includes workloads with a variety of memory and compute requirements, making it difficult to identify when CiM is more beneficial than standard processing cores. 3) Where to integrate CiM: Each memory level has different bandwidth and capacity, creating different data reuse opportunities for CiM integration. To answer such questions regarding on-chip CiM integration for accelerating ML workloads, we use an analytical architecture-evaluation methodology with tailored mapping algorithm. The mapping algorithm aims to achieve highest weight reuse and reduced data movements for a given CiM prototype and workload. Our analysis considers the integration of CiM prototypes into the cache levels of a tensor-core-like architecture, and shows that CiM integrated memory improves energy efficiency by up to $3.4 times$ and throughput by up to $15.6 times$ compared to established baseline with INT-8 precision. We believe the proposed work provides insights into what type of CiM to use, and when and where to optimally integrate it in the cache hierarchy for efficient matrix multiplication.
矩阵乘法是机器学习推理中的主要计算方法。为了有效地执行这种乘法运算,内存中计算(CiM)范式作为一种高能效的解决方案出现了。然而,在内存中集成计算提出了一些关键问题,例如1)使用哪种类型的CiM:给定大量CiM设计特征,需要从体系结构的角度确定它们的适用性。2)何时使用CiM: ML推理包括具有各种内存和计算需求的工作负载,因此很难确定CiM何时比标准处理核心更有益。3)在何处集成CiM:每个内存级别具有不同的带宽和容量,为CiM集成创造了不同的数据重用机会。为了回答有关加速机器学习工作负载的片上CiM集成的问题,我们使用了具有定制映射算法的分析架构评估方法。映射算法旨在实现给定CiM原型和工作负载的最高权重重用和减少数据移动。我们的分析考虑了将CiM原型集成到类似张量核心架构的缓存级别中,并表明与具有INT-8精度的建立基线相比,CiM集成存储器将能源效率提高了3.4倍,吞吐量提高了15.6倍。我们认为,所建议的工作提供了关于使用哪种类型的CiM的见解,以及何时何地将其最佳地集成到缓存层次结构中以实现有效的矩阵乘法。
{"title":"What, When, Where to Compute-in-Memory for Efficient Matrix Multiplication During Machine Learning Inference","authors":"Tanvi Sharma;Mustafa Ali;Indranil Chakraborty;Kaushik Roy","doi":"10.1109/TETC.2025.3574508","DOIUrl":"https://doi.org/10.1109/TETC.2025.3574508","url":null,"abstract":"Matrix multiplication is the dominant computation during Machine Learning (ML) inference. To efficiently perform such multiplication operations, Compute-in-memory (CiM) paradigms have emerged as a highly energy efficient solution. However, integrating compute in memory poses key questions, such as 1) <i>What type of CiM to use:</i> Given a multitude of CiM design characteristics, determining their suitability from architecture perspective is needed. 2) <i>When to use CiM:</i> ML inference includes workloads with a variety of memory and compute requirements, making it difficult to identify when CiM is more beneficial than standard processing cores. 3) <i>Where to integrate CiM:</i> Each memory level has different bandwidth and capacity, creating different data reuse opportunities for CiM integration. To answer such questions regarding on-chip CiM integration for accelerating ML workloads, we use an analytical architecture-evaluation methodology with tailored mapping algorithm. The mapping algorithm aims to achieve highest weight reuse and reduced data movements for a given CiM prototype and workload. Our analysis considers the integration of CiM prototypes into the cache levels of a tensor-core-like architecture, and shows that CiM integrated memory improves energy efficiency by up to <inline-formula><tex-math>$3.4 times$</tex-math></inline-formula> and throughput by up to <inline-formula><tex-math>$15.6 times$</tex-math></inline-formula> compared to established baseline with INT-8 precision. We believe the proposed work provides insights into <i>what</i> type of CiM to use, and <i>when</i> and <i>where</i> to optimally integrate it in the cache hierarchy for efficient matrix multiplication.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1215-1229"},"PeriodicalIF":5.4,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145036894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Cancelable Multimodal Template Protection Algorithm Based on Random Index 基于随机索引的可取消多模态模板保护算法
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-06-04 DOI: 10.1109/TETC.2025.3574359
Huabin Wang;Mingzhao Wang;Xinxin Liu;Yingfan Cheng;Fei Liu;Jian Zhou;Liang Tao
Current multimodal template protection methods typically require encryption or transformation of the original biometric features. However, these operations carry certain risks, as attackers may reverse-engineer or decrypt the protected multimodal templates to retrieve partial or complete information about the original templates, leading to the leakage of the original biometric features. To address this issue, we propose a cancelable multimodal template protection method based on random indexing. First, hash functions are used to generate integer sequences as index values, which are then employed to create single-modal cancelable templates using random binary vectors. Second, the single-modal cancelable templates are used as indices for random binary sequences, which locate the corresponding template information and are filled into the fusion cancelable template at the respective positions, achieving template fusion. The resulting template is unrelated to the original biometric features. Finally, without directly storing the binary factor sequences, an XOR operation is performed on the extended biometric feature vectors and random binary sequences to generate the encoded key. Experimental results demonstrate that the proposed method significantly enhances performance on the FVC2002DB1 fingerprint, MMCBNU_6000 finger-vein, and NUPT_FPV databases, while also satisfying the standards for cancelable biometric feature design. We also analyze four privacy and security attacks against this scheme.
目前的多模态模板保护方法通常需要对原始生物特征进行加密或转换。然而,这些操作带有一定的风险,因为攻击者可能会对受保护的多模态模板进行反向工程或解密,以获取原始模板的部分或全部信息,从而导致原始生物特征的泄漏。为了解决这一问题,我们提出了一种基于随机索引的可取消多模态模板保护方法。首先,散列函数用于生成整数序列作为索引值,然后使用随机二进制向量创建单模态可取消模板。其次,将单模态可取消模板作为随机二值序列的索引,定位相应的模板信息,并在相应位置填充到融合可取消模板中,实现模板融合;生成的模板与原始的生物特征无关。最后,在不直接存储二进制因子序列的情况下,对扩展的生物特征向量和随机二进制序列进行异或运算以生成编码密钥。实验结果表明,该方法在FVC2002DB1指纹、MMCBNU_6000指静脉和NUPT_FPV数据库上的性能显著提高,同时满足可取消生物特征设计标准。我们还分析了针对该方案的四种隐私和安全攻击。
{"title":"The Cancelable Multimodal Template Protection Algorithm Based on Random Index","authors":"Huabin Wang;Mingzhao Wang;Xinxin Liu;Yingfan Cheng;Fei Liu;Jian Zhou;Liang Tao","doi":"10.1109/TETC.2025.3574359","DOIUrl":"https://doi.org/10.1109/TETC.2025.3574359","url":null,"abstract":"Current multimodal template protection methods typically require encryption or transformation of the original biometric features. However, these operations carry certain risks, as attackers may reverse-engineer or decrypt the protected multimodal templates to retrieve partial or complete information about the original templates, leading to the leakage of the original biometric features. To address this issue, we propose a cancelable multimodal template protection method based on random indexing. First, hash functions are used to generate integer sequences as index values, which are then employed to create single-modal cancelable templates using random binary vectors. Second, the single-modal cancelable templates are used as indices for random binary sequences, which locate the corresponding template information and are filled into the fusion cancelable template at the respective positions, achieving template fusion. The resulting template is unrelated to the original biometric features. Finally, without directly storing the binary factor sequences, an XOR operation is performed on the extended biometric feature vectors and random binary sequences to generate the encoded key. Experimental results demonstrate that the proposed method significantly enhances performance on the FVC2002DB1 fingerprint, MMCBNU_6000 finger-vein, and NUPT_FPV databases, while also satisfying the standards for cancelable biometric feature design. We also analyze four privacy and security attacks against this scheme.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1200-1214"},"PeriodicalIF":5.4,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PipeDAP: An Efficient Communication Framework for Scheduling Decoupled All-Reduce Primitives in Distributed DNN Training 分布式深度神经网络训练中调度解耦全约简原语的高效通信框架
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-06-02 DOI: 10.1109/TETC.2025.3573522
Yunqi Gao;Bing Hu;Mahdi Boloursaz Mashhadi;Wei Wang;Rahim Tafazolli;Mérouane Debbah
Communication scheduling effectively improves the scalability of distributed deep learning by overlapping computation and communication tasks during training. However, existing communication scheduling frameworks based on tensor partitioning suffer from two fundamental issues: (1) partitioning schemes at the data volume level introduce extensive startup overheads leading to higher energy consumption, and (2) partitioning schemes at the communication primitive level do not provide optimal scheduling resulting in longer training time. In this article, we propose an efficient communication mechanism, namely PipeDAP, which schedules decoupled all-reduce operations in a near-optimal order to minimize the time and energy consumption of training DNN models. We build the mathematical model for PipeDAP and derive the near-optimal scheduling order of the reduce-scatter and all-gather operations. Meanwhile, we leverage simultaneous communication of reduce-scatter and all-gather operations to further reduce the startup overheads. We implement the PipeDAP architecture on PyTorch framework, and apply it for distributed training of benchmark DNN models. Experimental results on two GPU clusters demonstrate that PipeDAP achieves up to 1.82x speedup and saves up to 45.4% of energy consumption compared to the state-of-the-art communication scheduling frameworks.
通信调度通过在训练过程中重叠计算和通信任务,有效地提高了分布式深度学习的可扩展性。然而,现有的基于张量分区的通信调度框架存在两个基本问题:(1)数据量级别的分区方案引入了大量的启动开销,导致较高的能耗;(2)通信原语级别的分区方案不能提供最优调度,导致训练时间较长。在本文中,我们提出了一种高效的通信机制,即PipeDAP,它以近乎最优的顺序调度解耦的全约运算,以最大限度地减少训练DNN模型的时间和能量消耗。建立了PipeDAP的数学模型,导出了减少分散和全聚操作的近最优调度顺序。同时,我们利用reduce-scatter和all-gather操作的同步通信,进一步降低启动开销。我们在PyTorch框架上实现了PipeDAP架构,并将其应用于基准DNN模型的分布式训练。在两个GPU集群上的实验结果表明,与目前最先进的通信调度框架相比,PipeDAP实现了高达1.82倍的加速,节省了45.4%的能耗。
{"title":"PipeDAP: An Efficient Communication Framework for Scheduling Decoupled All-Reduce Primitives in Distributed DNN Training","authors":"Yunqi Gao;Bing Hu;Mahdi Boloursaz Mashhadi;Wei Wang;Rahim Tafazolli;Mérouane Debbah","doi":"10.1109/TETC.2025.3573522","DOIUrl":"https://doi.org/10.1109/TETC.2025.3573522","url":null,"abstract":"Communication scheduling effectively improves the scalability of distributed deep learning by overlapping computation and communication tasks during training. However, existing communication scheduling frameworks based on tensor partitioning suffer from two fundamental issues: (1) partitioning schemes at the data volume level introduce extensive startup overheads leading to higher energy consumption, and (2) partitioning schemes at the communication primitive level do not provide optimal scheduling resulting in longer training time. In this article, we propose an efficient communication mechanism, namely PipeDAP, which schedules decoupled all-reduce operations in a near-optimal order to minimize the time and energy consumption of training DNN models. We build the mathematical model for PipeDAP and derive the near-optimal scheduling order of the reduce-scatter and all-gather operations. Meanwhile, we leverage simultaneous communication of reduce-scatter and all-gather operations to further reduce the startup overheads. We implement the PipeDAP architecture on PyTorch framework, and apply it for distributed training of benchmark DNN models. Experimental results on two GPU clusters demonstrate that PipeDAP achieves up to 1.82x speedup and saves up to 45.4% of energy consumption compared to the state-of-the-art communication scheduling frameworks.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1170-1184"},"PeriodicalIF":5.4,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DT-Net: Point Cloud Completion Network With Neighboring Adaptive Denoiser and Splitting-Based Upsampling Transformer DT-Net:带有相邻自适应去噪和基于分裂的上采样变压器的点云补全网络
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-06-02 DOI: 10.1109/TETC.2025.3573505
Aihua Mao;Qing Liu;Yuxuan Tang;Sheng Ye;Ran Yi;Minjing Yu;Yong-Jin Liu
Point cloud completion, which involves inferring missing regions of 3D objects from partial observations, remains a challenging problem in 3D vision and robotics. Existing learning-based frameworks typically leverage an encoder-decoder architecture to predict the complete point cloud based on the global shape representation extracted from the incomplete input, or further introduce a refinement network to optimize the obtained complete point cloud in a coarse-to-fine manner, which is unable to capture fine-grained local geometric details and filled with noisy points in the thin or complex structure. In this article, we propose a novel coarse-to-fine point cloud completion framework called DT-Net, by focusing on coarse point cloud denoising and multi-level upsampling. Specifically, we propose a Neighboring Adaptive Denoiser (NAD) to effectively denoise the coarse point cloud generated by an autoencoder, and reduce noise around the slender structures, making them clear and well represented. Moreover, a novel Splitting-based Upsampling Transformer (SUT), which effectively incorporates spatial and semantic relationships between local neighborhoods in the point cloud, is also proposed for multi-level upsampling. Extensive qualitative and quantitative experiments demonstrate that our method outperforms state-of-the-art methods under widely used benchmarks.
在3D视觉和机器人技术中,点云补全仍然是一个具有挑战性的问题,它涉及到从部分观测中推断出3D物体的缺失区域。现有的基于学习的框架通常利用编码器-解码器架构,根据从不完整输入中提取的全局形状表示来预测完整点云,或者进一步引入细化网络,以粗到细的方式优化获得的完整点云,无法捕获细粒度的局部几何细节,并且在薄或复杂的结构中充满了噪声点。在本文中,我们提出了一种新的粗到细点云补全框架,称为DT-Net,重点关注粗点云去噪和多级上采样。具体来说,我们提出了一种邻域自适应去噪(NAD)方法来有效地去噪自编码器产生的粗点云,并降低细长结构周围的噪声,使其清晰和良好地表示。此外,还提出了一种新的基于分裂的上采样变压器(SUT),该变压器有效地融合了点云中局部邻域之间的空间和语义关系,用于多级上采样。广泛的定性和定量实验表明,在广泛使用的基准下,我们的方法优于最先进的方法。
{"title":"DT-Net: Point Cloud Completion Network With Neighboring Adaptive Denoiser and Splitting-Based Upsampling Transformer","authors":"Aihua Mao;Qing Liu;Yuxuan Tang;Sheng Ye;Ran Yi;Minjing Yu;Yong-Jin Liu","doi":"10.1109/TETC.2025.3573505","DOIUrl":"https://doi.org/10.1109/TETC.2025.3573505","url":null,"abstract":"Point cloud completion, which involves inferring missing regions of 3D objects from partial observations, remains a challenging problem in 3D vision and robotics. Existing learning-based frameworks typically leverage an encoder-decoder architecture to predict the complete point cloud based on the global shape representation extracted from the incomplete input, or further introduce a refinement network to optimize the obtained complete point cloud in a coarse-to-fine manner, which is unable to capture fine-grained local geometric details and filled with noisy points in the thin or complex structure. In this article, we propose a novel coarse-to-fine point cloud completion framework called DT-Net, by focusing on coarse point cloud denoising and multi-level upsampling. Specifically, we propose a Neighboring Adaptive Denoiser (NAD) to effectively denoise the coarse point cloud generated by an autoencoder, and reduce noise around the slender structures, making them clear and well represented. Moreover, a novel Splitting-based Upsampling Transformer (SUT), which effectively incorporates spatial and semantic relationships between local neighborhoods in the point cloud, is also proposed for multi-level upsampling. Extensive qualitative and quantitative experiments demonstrate that our method outperforms state-of-the-art methods under widely used benchmarks.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1185-1199"},"PeriodicalIF":5.4,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Quantum ResNet for Time Series Classification 时间序列分类的混合量子ResNet
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-30 DOI: 10.1109/TETC.2025.3563944
Dae-Il Noh;Seon-Geun Jeong;Won-Joo Hwang
Residual networks (ResNet) are known to be effective for image classification. However, challenges such as computational time remain because of the significant number of parameters. Quantum computing using quantum entanglement and quantum parallelism is an emerging computing paradigm that addresses this issue. Although quantum advantage is still studied in many research fields, quantum machine learning is a research area that leverages the strengths of quantum computing and machine learning. In this study, we investigated the quantum speedup with respect to the number of parameters in each model for a time-series classification task. This paper proposes a novel hybrid quantum residual network (HQResNet) inspired by the classical ResNet for time-series classification. HQResNet introduces a classical layer before a quantum convolutional neural network (QCNN), where the QCNN is used as a residual block. These structures enable shortcut connections and are particularly effective in achieving classification tasks without a data re-uploading scheme. We used ultra-wide-band (UWB) channel impulse response data to demonstrate the performance of the proposed algorithm and compared the state-of-the-art benchmarks with HQResNet using evaluation metrics. The results show that HQResNet achieved high performance with a small number of trainable parameters.
残差网络(ResNet)是一种有效的图像分类方法。然而,由于参数数量巨大,计算时间等挑战仍然存在。使用量子纠缠和量子并行的量子计算是解决这一问题的新兴计算范式。虽然量子优势在很多研究领域仍在研究,但量子机器学习是一个利用量子计算和机器学习优势的研究领域。在本研究中,我们研究了时间序列分类任务中每个模型中参数数量的量子加速。本文在经典量子残差网络的基础上,提出了一种新的用于时间序列分类的混合量子残差网络(HQResNet)。HQResNet在量子卷积神经网络(QCNN)之前引入了一个经典层,其中QCNN用作残差块。这些结构支持快捷连接,并且在无需数据重新上传方案的情况下特别有效地完成分类任务。我们使用超宽带(UWB)信道脉冲响应数据来演示所提出算法的性能,并使用评估指标将最先进的基准与HQResNet进行比较。结果表明,HQResNet在训练参数较少的情况下取得了较高的性能。
{"title":"Hybrid Quantum ResNet for Time Series Classification","authors":"Dae-Il Noh;Seon-Geun Jeong;Won-Joo Hwang","doi":"10.1109/TETC.2025.3563944","DOIUrl":"https://doi.org/10.1109/TETC.2025.3563944","url":null,"abstract":"Residual networks (ResNet) are known to be effective for image classification. However, challenges such as computational time remain because of the significant number of parameters. Quantum computing using quantum entanglement and quantum parallelism is an emerging computing paradigm that addresses this issue. Although quantum advantage is still studied in many research fields, quantum machine learning is a research area that leverages the strengths of quantum computing and machine learning. In this study, we investigated the quantum speedup with respect to the number of parameters in each model for a time-series classification task. This paper proposes a novel hybrid quantum residual network (HQResNet) inspired by the classical ResNet for time-series classification. HQResNet introduces a classical layer before a quantum convolutional neural network (QCNN), where the QCNN is used as a residual block. These structures enable shortcut connections and are particularly effective in achieving classification tasks without a data re-uploading scheme. We used ultra-wide-band (UWB) channel impulse response data to demonstrate the performance of the proposed algorithm and compared the state-of-the-art benchmarks with HQResNet using evaluation metrics. The results show that HQResNet achieved high performance with a small number of trainable parameters.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1083-1098"},"PeriodicalIF":5.4,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAM4: In-Memory Viral Pathogen Genome Classification Using Similarity Search Dynamic Content-Addressable Memory CAM4:使用相似性搜索动态内容可寻址存储器的内存病毒病原体基因组分类
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-28 DOI: 10.1109/TETC.2025.3563201
Zuher Jahshan;Itay Merlin;Esteban Garzón;Leonid Yavits
Wepresent CAM4, a novel embedded dynamic storage-based similarity search content addressable memory. CAM4 is designated for in-memory computational genomics applications, particularly the identification and classification of pathogen DNA. CAM4 employs a novel gain cell design and one-hot encoding of DNA bases to address retention time variations, and mitigate potential data loss from pulldown leakage and soft errors in embedded DRAM. CAM4 features performance overhead-free refresh and data upload, allowing simultaneous search and refresh without performance degradation. CAM4 offers approximate search versatility in scenarios with a variety of industrial sequencers with different error profiles. When classifying DNA reads with a 10% error rate, it achieves, on average, a 25% higher $F_{1}$ score compared to MetaCache-GPU and Kraken2 DNA classification tools. Simulated at 1 GHz, CAM4 provides $1,412times$ and $1,040times$ average speedup over MetaCache-GPU and Kraken2 respectively.
提出了一种基于相似性搜索内容的嵌入式动态存储器CAM4。CAM4被指定用于内存计算基因组学应用,特别是病原体DNA的鉴定和分类。CAM4采用新颖的增益单元设计和DNA碱基的单热编码来解决保留时间变化问题,并减轻嵌入式DRAM中的下拉泄漏和软错误带来的潜在数据丢失。CAM4具有无性能开销的刷新和数据上传功能,允许同时搜索和刷新而不会降低性能。CAM4在具有不同错误配置文件的各种工业测序器的场景中提供近似的搜索通用性。当以10%的错误率对DNA读取进行分类时,与MetaCache-GPU和Kraken2 DNA分类工具相比,它的平均得分高出25%。在1 GHz下进行模拟,CAM4的平均加速速度分别是metacachegpu和Kraken2的1412倍和1040倍。
{"title":"CAM4: In-Memory Viral Pathogen Genome Classification Using Similarity Search Dynamic Content-Addressable Memory","authors":"Zuher Jahshan;Itay Merlin;Esteban Garzón;Leonid Yavits","doi":"10.1109/TETC.2025.3563201","DOIUrl":"https://doi.org/10.1109/TETC.2025.3563201","url":null,"abstract":"Wepresent CAM4, a novel embedded dynamic storage-based similarity search content addressable memory. CAM4 is designated for in-memory computational genomics applications, particularly the identification and classification of pathogen DNA. CAM4 employs a novel gain cell design and one-hot encoding of DNA bases to address retention time variations, and mitigate potential data loss from pulldown leakage and soft errors in embedded DRAM. CAM4 features performance overhead-free refresh and data upload, allowing simultaneous search and refresh without performance degradation. CAM4 offers approximate search versatility in scenarios with a variety of industrial sequencers with different error profiles. When classifying DNA reads with a 10% error rate, it achieves, on average, a 25% higher <inline-formula><tex-math>$F_{1}$</tex-math></inline-formula> score compared to MetaCache-GPU and Kraken2 DNA classification tools. Simulated at 1 GHz, CAM4 provides <inline-formula><tex-math>$1,412times$</tex-math></inline-formula> and <inline-formula><tex-math>$1,040times$</tex-math></inline-formula> average speedup over MetaCache-GPU and Kraken2 respectively.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 4","pages":"1341-1355"},"PeriodicalIF":5.4,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Implementation of Cost-Effective End-to-End Authentication Protocol for PUF-Enabled IoT Devices 为支持puf的物联网设备设计和实现具有成本效益的端到端认证协议
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-28 DOI: 10.1109/TETC.2025.3563064
Sourav Roy;Mahabub Hasan Mahalat;Bibhash Sen
The ubiquitous presence of Internet of Things (IoT) prospers in every aspect of human life. The low-powered sensors, actuators, and mobile devices in IoT transfer a high volume of security-sensitive data. Unmonitored IoT devices are highly susceptible to security vulnerabilities. Their operating environment, with minimal or no safeguards, allows physical invasion. The conventional end-to-end authentications protocols are inadequate because of the limited resources and ambient working environment of IoT. In this direction, a lightweight and secure end-to-end authentication protocol is proposed for the Physically Unclonability Function (PUF) embedded IoT devices by processing them in pairs. PUF promises to be a unique hardware-based security solution for resource-constrained devices. The proposed protocol exploits the coherent conduct of public and private key-based cryptosystems with PUF. The protocol integrates the concept of ECC with ECDH and the cryptographic hash function. Security of the proposed protocol is validated using authentication validation, BAN logic, Scyther tool, and against different adversarial attacks. The performance evaluation and extensive comparative study of the proposed protocol highlight its lightweight feature. The practical feasibility of the proposed protocol is verified by an empirical evaluation using an Arbiter PUF implemented on Xilinx Spartan-3E FPGA and Raspberry Pi as an IoT device.
物联网(IoT)无处不在,在人类生活的方方面面蓬勃发展。物联网中的低功耗传感器、执行器和移动设备传输大量安全敏感数据。不受监控的物联网设备极易受到安全漏洞的影响。他们的操作环境,很少或没有保护措施,允许物理入侵。由于物联网有限的资源和工作环境,传统的端到端认证协议是不够的。为此,提出了一种轻量级、安全的端到端认证协议,对具有物理不可克隆功能(physical Unclonability Function, PUF)的物联网设备进行配对处理。PUF承诺为资源受限的设备提供一种独特的基于硬件的安全解决方案。提出的协议利用了基于PUF的公钥和私钥密码系统的一致行为。该协议将ECC的概念与ECDH和加密哈希函数相结合。采用身份验证、BAN逻辑、Scyther工具和不同的对抗性攻击对协议的安全性进行了验证。对该协议的性能评估和广泛的比较研究突出了其轻量级的特点。通过使用Xilinx Spartan-3E FPGA和树莓派作为物联网设备实现的Arbiter PUF进行实证评估,验证了所提出协议的实际可行性。
{"title":"Design and Implementation of Cost-Effective End-to-End Authentication Protocol for PUF-Enabled IoT Devices","authors":"Sourav Roy;Mahabub Hasan Mahalat;Bibhash Sen","doi":"10.1109/TETC.2025.3563064","DOIUrl":"https://doi.org/10.1109/TETC.2025.3563064","url":null,"abstract":"The ubiquitous presence of Internet of Things (IoT) prospers in every aspect of human life. The low-powered sensors, actuators, and mobile devices in IoT transfer a high volume of security-sensitive data. Unmonitored IoT devices are highly susceptible to security vulnerabilities. Their operating environment, with minimal or no safeguards, allows physical invasion. The conventional end-to-end authentications protocols are inadequate because of the limited resources and ambient working environment of IoT. In this direction, a lightweight and secure end-to-end authentication protocol is proposed for the Physically Unclonability Function (PUF) embedded IoT devices by processing them in pairs. PUF promises to be a unique hardware-based security solution for resource-constrained devices. The proposed protocol exploits the coherent conduct of public and private key-based cryptosystems with PUF. The protocol integrates the concept of ECC with ECDH and the cryptographic hash function. Security of the proposed protocol is validated using authentication validation, BAN logic, Scyther tool, and against different adversarial attacks. The performance evaluation and extensive comparative study of the proposed protocol highlight its lightweight feature. The practical feasibility of the proposed protocol is verified by an empirical evaluation using an Arbiter PUF implemented on Xilinx Spartan-3E FPGA and Raspberry Pi as an IoT device.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1055-1067"},"PeriodicalIF":5.4,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing Graph Processing Workloads in Heterogeneous CPU-PIM Systems 异构CPU-PIM系统图处理负载均衡
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-28 DOI: 10.1109/TETC.2025.3563249
Sheng Xu;Chun Li;Le Luo;Ming Zheng;Liang Yan;Xingqi Zou;Xiaoming Chen
Processing-in-Memory (PIM) offers a promising architecture to alleviate the memory wall challenge in graph processing applications. The key aspect of PIM is to incorporate logic within the memory, thereby leveraging the near-data advantages. State-of-the-art PIM-based graph processing accelerators tend to offload more to the memory in order to maximize near-data benefits, causing significant load imbalance in PIM systems. In this paper, we demonstrate that this intention is not true and that host processors still play a vital role in heterogeneous CPU-PIM systems. For this purpose, we propose CAPLBS, an online contention-aware Processing-in-Memory load-balance scheduler for graph processing applications in CPU-PIM systems. The core concept of CAPLBS is to steal workload candidates back to host processors with minimal off-chip data synchronization overhead when some host processors are idle. To model data contentions among workloads and determine the stealing decision, a measurement structure called Locality Cohesive Subgraph is proposed by deeply exploring the connectivity of the input graph and the memory access patterns of deployed graph applications. Experimental results show that CAPLBS achieved an average speed-up of 4.8× and 1.3× (up to 9.1× and 1.9×) compared with CPU-only and the upper bound of locality-aware fine-grained in-memory atomics. Moreover, CAPLBS adds no hardware overhead and works well with existing CPU-PIM graph processing accelerators.
内存中处理(PIM)提供了一种很有前途的架构来缓解图形处理应用程序中内存墙的挑战。PIM的关键方面是在内存中合并逻辑,从而利用近数据优势。最先进的基于PIM的图形处理加速器倾向于将更多的负载卸载到内存中,以最大化近数据收益,从而导致PIM系统中显著的负载不平衡。在本文中,我们证明了这种意图是不正确的,并且主机处理器仍然在异构CPU-PIM系统中起着至关重要的作用。为此,我们提出了CAPLBS,一个用于CPU-PIM系统中图形处理应用程序的在线竞争感知内存中处理负载平衡调度程序。CAPLBS的核心概念是,当某些主机处理器空闲时,以最小的片外数据同步开销将候选工作负载窃取回主机处理器。为了模拟工作负载间的数据争用并确定窃取决策,通过深入研究输入图的连通性和部署图应用程序的内存访问模式,提出了一种称为局部性内聚子图的测量结构。实验结果表明,与仅使用cpu和位置感知的细粒度内存原子相比,CAPLBS的平均速度提高了4.8倍和1.3倍(最高可达9.1倍和1.9倍)。此外,CAPLBS不增加硬件开销,与现有的CPU-PIM图形处理加速器配合良好。
{"title":"Balancing Graph Processing Workloads in Heterogeneous CPU-PIM Systems","authors":"Sheng Xu;Chun Li;Le Luo;Ming Zheng;Liang Yan;Xingqi Zou;Xiaoming Chen","doi":"10.1109/TETC.2025.3563249","DOIUrl":"https://doi.org/10.1109/TETC.2025.3563249","url":null,"abstract":"Processing-in-Memory (PIM) offers a promising architecture to alleviate the memory wall challenge in graph processing applications. The key aspect of PIM is to incorporate logic within the memory, thereby leveraging the near-data advantages. State-of-the-art PIM-based graph processing accelerators tend to offload more to the memory in order to maximize near-data benefits, causing significant load imbalance in PIM systems. In this paper, we demonstrate that this intention is not true and that host processors still play a vital role in heterogeneous CPU-PIM systems. For this purpose, we propose CAPLBS, an online contention-aware Processing-in-Memory load-balance scheduler for graph processing applications in CPU-PIM systems. The core concept of CAPLBS is to steal workload candidates back to host processors with minimal off-chip data synchronization overhead when some host processors are idle. To model data contentions among workloads and determine the stealing decision, a measurement structure called Locality Cohesive Subgraph is proposed by deeply exploring the connectivity of the input graph and the memory access patterns of deployed graph applications. Experimental results show that CAPLBS achieved an average speed-up of 4.8× and 1.3× (up to 9.1× and 1.9×) compared with CPU-only and the upper bound of locality-aware fine-grained in-memory atomics. Moreover, CAPLBS adds no hardware overhead and works well with existing CPU-PIM graph processing accelerators.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1068-1082"},"PeriodicalIF":5.4,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Emerging Topics in Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1