Pub Date : 2025-03-04DOI: 10.1109/TETC.2025.3546119
Samitha Somathilaka;Sasitharan Balasubramaniam;Daniel P. Martins
Biocomputing envisions the development computing paradigms using biological systems, ranging from micron-level components to collections of cells, including organoids. This paradigm shift exploits hidden natural computing properties, to develop miniaturized wet-computing devices that can be deployed in harsh environments, and to explore designs of novel energy-efficient systems. In parallel, we witness the emergence of AI hardware, including neuromorphic processors with the aim of improving computational capacity. This study brings together the concept of biocomputing and neuromorphic systems by focusing on the bacterial gene regulatory networks and their transformation into Gene Regulatory Neural Networks (GRNNs). We explore the intrinsic properties of gene regulations, map this to a gene-perceptron function, and propose an application-specific sub-GRNN search algorithm that maps the network structure to match a computing problem. Focusing on the model organism Escherichia coli, the base-GRNN is initially extracted and validated for accuracy. Subsequently, a comprehensive feasibility analysis of the derived GRNN confirms its computational prowess in classification and regression tasks. Furthermore, we discuss the possibility of performing a well-known digit classification task as a use case. Our analysis and simulation experiments show promising results in the offloading of computation tasks to GRNN in bacterial cells, advancing wet-neuromorphic computing using natural cells.
{"title":"Analyzing Wet-Neuromorphic Computing Using Bacterial Gene Regulatory Neural Networks","authors":"Samitha Somathilaka;Sasitharan Balasubramaniam;Daniel P. Martins","doi":"10.1109/TETC.2025.3546119","DOIUrl":"https://doi.org/10.1109/TETC.2025.3546119","url":null,"abstract":"Biocomputing envisions the development computing paradigms using biological systems, ranging from micron-level components to collections of cells, including organoids. This paradigm shift exploits hidden natural computing properties, to develop miniaturized wet-computing devices that can be deployed in harsh environments, and to explore designs of novel energy-efficient systems. In parallel, we witness the emergence of AI hardware, including neuromorphic processors with the aim of improving computational capacity. This study brings together the concept of biocomputing and neuromorphic systems by focusing on the bacterial gene regulatory networks and their transformation into Gene Regulatory Neural Networks (GRNNs). We explore the intrinsic properties of gene regulations, map this to a gene-perceptron function, and propose an application-specific sub-GRNN search algorithm that maps the network structure to match a computing problem. Focusing on the model organism Escherichia coli, the base-GRNN is initially extracted and validated for accuracy. Subsequently, a comprehensive feasibility analysis of the derived GRNN confirms its computational prowess in classification and regression tasks. Furthermore, we discuss the possibility of performing a well-known digit classification task as a use case. Our analysis and simulation experiments show promising results in the offloading of computation tasks to GRNN in bacterial cells, advancing wet-neuromorphic computing using natural cells.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"902-918"},"PeriodicalIF":5.4,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When arranged in a crossbar configuration, resistive memory devices can be used to execute Matrix-Vector Multiplications (MVMs), the most dominant operation of many Machine Learning (ML) algorithms, in constant time complexity. Nonetheless, when performing computations in the analog domain, novel challenges are introduced in terms of arithmetic precision and stochasticity, due to non-ideal circuit and device behaviour. Moreover, these non-idealities have a temporal dimension, resulting in a degrading application accuracy over time. Facing these challenges, we propose a novel framework, named LionHeart, to obtain hybrid analog-digital mappings to execute Deep Learning (DL) inference workloads using heterogeneous accelerators. The accuracy-constrained mappings derived by LionHeart showcase, across different Convolutional Neural Networks (CNNs) and one transformer-based network, high accuracy and potential for speedup. The results of the full system simulations highlight run-time reductions and energy efficiency gains that exceed 6×, with a user-defined accuracy threshold for a fully digital floating point implementation.
{"title":"LionHeart: A Layer-Based Mapping Framework for Heterogeneous Systems With Analog In-Memory Computing Tiles","authors":"Corey Lammie;Yuxuan Wang;Flavio Ponzina;Joshua Klein;Hadjer Benmeziane;Marina Zapater;Irem Boybat;Abu Sebastian;Giovanni Ansaloni;David Atienza","doi":"10.1109/TETC.2025.3546128","DOIUrl":"https://doi.org/10.1109/TETC.2025.3546128","url":null,"abstract":"When arranged in a crossbar configuration, resistive memory devices can be used to execute Matrix-Vector Multiplications (MVMs), the most dominant operation of many Machine Learning (ML) algorithms, in constant time complexity. Nonetheless, when performing computations in the analog domain, novel challenges are introduced in terms of arithmetic precision and stochasticity, due to non-ideal circuit and device behaviour. Moreover, these non-idealities have a temporal dimension, resulting in a degrading application accuracy over time. Facing these challenges, we propose a novel framework, named <italic>LionHeart</i>, to obtain hybrid analog-digital mappings to execute Deep Learning (DL) inference workloads using heterogeneous accelerators. The accuracy-constrained mappings derived by <italic>LionHeart</i> showcase, across different Convolutional Neural Networks (CNNs) and one transformer-based network, high accuracy and potential for speedup. The results of the full system simulations highlight run-time reductions and energy efficiency gains that exceed 6×, with a user-defined accuracy threshold for a fully digital floating point implementation.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 4","pages":"1383-1395"},"PeriodicalIF":5.4,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-31DOI: 10.1109/TETC.2025.3534243
Han Yu;Hongming Cai;Shengtung Tsai;Mengyao Li;Pan Hu;Jiaoyan Chen;Bingqing Shen
Script event prediction is the task of predicting the subsequent event given a sequence of events that already took place. It benefits task planning and process scheduling for event-centric systems including enterprise systems, IoT systems, etc. Sequence-based and graph-based learning models have been applied to this task. However, when learning data is limited, especially in a multiple-participant-involved enterprise environment, the performance of such models falls short of expectations as they heavily rely on large-scale training data. To take full advantage of given data, in this article we propose a new type of knowledge graph (KG) that models not just events but also entities participating in the events, and we design a collaborative event prediction model exploiting such KGs. Our model identifies semantically similar vertices as collaborators to resolve unknown events, applies gated graph neural networks to extract event-wise sequential features, and exploits a heterogeneous attention network to cope with entity-wise influence in event sequences. To verify the effectiveness of our approach, we designed multiple-choice narrative cloze tasks with inadequate knowledge. Our experimental evaluation with three datasets generated from well-known corpora shows our method can successfully defend against such incompleteness of data and outperforms the state-of-the-art approaches for event prediction.
{"title":"Exploiting Entity Information for Robust Prediction Over Event Knowledge Graphs","authors":"Han Yu;Hongming Cai;Shengtung Tsai;Mengyao Li;Pan Hu;Jiaoyan Chen;Bingqing Shen","doi":"10.1109/TETC.2025.3534243","DOIUrl":"https://doi.org/10.1109/TETC.2025.3534243","url":null,"abstract":"Script event prediction is the task of predicting the subsequent event given a sequence of events that already took place. It benefits task planning and process scheduling for event-centric systems including enterprise systems, IoT systems, etc. Sequence-based and graph-based learning models have been applied to this task. However, when learning data is limited, especially in a multiple-participant-involved enterprise environment, the performance of such models falls short of expectations as they heavily rely on large-scale training data. To take full advantage of given data, in this article we propose a new type of knowledge graph (KG) that models not just events but also entities participating in the events, and we design a collaborative event prediction model exploiting such KGs. Our model identifies semantically similar vertices as collaborators to resolve unknown events, applies gated graph neural networks to extract event-wise sequential features, and exploits a heterogeneous attention network to cope with entity-wise influence in event sequences. To verify the effectiveness of our approach, we designed multiple-choice narrative cloze tasks with inadequate knowledge. Our experimental evaluation with three datasets generated from well-known corpora shows our method can successfully defend against such incompleteness of data and outperforms the state-of-the-art approaches for event prediction.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"890-901"},"PeriodicalIF":5.4,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Resistive Random-Access Memory (RRAM) crossbar array-based Deep Neural Networks (DNNs) are increasingly attractive for implementing ultra-low-power computing for AI. However, RRAM-based DNNs face inherent challenges from manufacturing process variability, which can compromise their performance (classification accuracy) and functional safety. One way to test these DNNs is to apply the exhaustive set of test images to each DNN to ascertain its performance; however, this is expensive and time-consuming. We propose a signature-based predictive testing (SiPT) in which a small subset of test images is applied to each DNN and the classification accuracy of the DNN is predicted directly from observations of the intermediate and final layer outputs of the network. This saves the test cost while allowing binning of RRAM-based DNNs for performance. To further improve the test efficiency of SiPT, we create the optimized compact set of test images, leveraging image filters and enhancements to synthesize images and develop a cascaded test structure, incorporating multiple sets of SiPT modules trained on compact test subsets of varying sizes. Through experimentation across diverse test cases, we demonstrate the viability of our SiPT framework under the RRAM process variations, showing test efficiency improvements up to 48X over testing with the exhaustive image dataset.
{"title":"SiPT: Signature-Based Predictive Testing of RRAM Crossbar Arrays for Deep Neural Networks","authors":"Kwondo Ma;Anurup Saha;Chandramouli Amarnath;Abhijit Chatterjee","doi":"10.1109/TETC.2025.3533895","DOIUrl":"https://doi.org/10.1109/TETC.2025.3533895","url":null,"abstract":"Resistive Random-Access Memory (RRAM) crossbar array-based Deep Neural Networks (DNNs) are increasingly attractive for implementing ultra-low-power computing for AI. However, RRAM-based DNNs face inherent challenges from manufacturing process variability, which can compromise their performance (classification accuracy) and functional safety. One way to test these DNNs is to apply the exhaustive set of test images to each DNN to ascertain its performance; however, this is expensive and time-consuming. We propose a signature-based predictive testing (SiPT) in which a small subset of test images is applied to each DNN and the classification accuracy of the DNN is predicted directly from observations of the intermediate and final layer outputs of the network. This saves the test cost while allowing binning of RRAM-based DNNs for performance. To further improve the test efficiency of SiPT, we create the optimized compact set of test images, leveraging image filters and enhancements to synthesize images and develop a cascaded test structure, incorporating multiple sets of SiPT modules trained on compact test subsets of varying sizes. Through experimentation across diverse test cases, we demonstrate the viability of our SiPT framework under the RRAM process variations, showing test efficiency improvements up to 48X over testing with the exhaustive image dataset.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 4","pages":"1465-1480"},"PeriodicalIF":5.4,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advances in Deep Learning (DL) have enhanced Aging-Related Bug (ARB) prediction for mitigating software aging. However, DL-based ARB prediction models face a dual challenge: overcoming overfitting to enhance generalization and managing the high labeling costs associated with extensive data requirements. To address the first issue, we utilize the sparse and binary nature of spiking communication in Spiking Neural Networks (SNNs), which inherently provides brain-inspired regularization to effectively alleviate overfitting. Therefore, we propose a Spiking Convolutional Neural Network (SCNN)-based ARB prediction model along with a training framework that handles the model’s spatial-temporal dynamics and non-differentiable nature. To reduce labeling costs, we introduce a Bio-inspired and Diversity-aware Active Learning framework (BiDAL), which prioritizes highly informative and diverse samples, enabling more efficient usage of the limited labeling budget. This framework incorporates bio-inspired uncertainty to enhance informativeness measurement along with using a diversity-aware selection strategy based on clustering to prevent redundant labeling. Experiments on three ARB datasets show that ARB-SCNN effectively reduces overfitting, improving generalization performance by 6.65% over other DL-based classifiers. Additionally, BiDAL boosts label efficiency for ARB-SCNN training, outperforming four state-of-the-art active learning methods by 4.77% within limited labeling budgets.
{"title":"Towards Label-Efficient Deep Learning-Based Aging-Related Bug Prediction With Spiking Convolutional Neural Networks","authors":"Yunzhe Tian;Yike Li;Kang Chen;Zhenguo Zhang;Endong Tong;Jiqiang Liu;Fangyun Qin;Zheng Zheng;Wenjia Niu","doi":"10.1109/TETC.2025.3531051","DOIUrl":"https://doi.org/10.1109/TETC.2025.3531051","url":null,"abstract":"Recent advances in Deep Learning (DL) have enhanced Aging-Related Bug (ARB) prediction for mitigating software aging. However, DL-based ARB prediction models face a dual challenge: overcoming overfitting to enhance generalization and managing the high labeling costs associated with extensive data requirements. To address the first issue, we utilize the sparse and binary nature of spiking communication in Spiking Neural Networks (SNNs), which inherently provides brain-inspired regularization to effectively alleviate overfitting. Therefore, we propose a Spiking Convolutional Neural Network (SCNN)-based ARB prediction model along with a training framework that handles the model’s spatial-temporal dynamics and non-differentiable nature. To reduce labeling costs, we introduce a Bio-inspired and Diversity-aware Active Learning framework (BiDAL), which prioritizes highly informative and diverse samples, enabling more efficient usage of the limited labeling budget. This framework incorporates bio-inspired uncertainty to enhance informativeness measurement along with using a diversity-aware selection strategy based on clustering to prevent redundant labeling. Experiments on three ARB datasets show that ARB-SCNN effectively reduces overfitting, improving generalization performance by 6.65% over other DL-based classifiers. Additionally, BiDAL boosts label efficiency for ARB-SCNN training, outperforming four state-of-the-art active learning methods by 4.77% within limited labeling budgets.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 2","pages":"314-329"},"PeriodicalIF":5.1,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.1109/TETC.2025.3528336
Alex James;Chithra Reghuvaran;Leon Chua
The cellular neural network (CNN or CeNN) is known to be useful because of its suitability in real-time processing, parallel processing, robustness, flexibility, and energy efficiency. CeNNs have a large number of interconnected processing elements, which can be programmed to produce a wide range of patterns, including regular and irregular patterns, random patterns, and more. When implemented in memristive hardware, the pattern generator ability and inherent variability of memristive devices can be explored to create Physical Unclonable Functions (PUFs). This work reports a method of using memristive CeNNs to perform image processing tasks along with PUF image generation. The CeNN-PUF has dual mode capability combining data processing and encryption using PUF image watermarking. The proposed method provides unique device-specific image watermarks, following a two-stage process of (1) device-specific secret mask generation and (2) watermark embedding. The system is evaluated using multiple CeNN cloning templates and the robustness of the method is validated against ML attacks. A detailed analysis is presented to evaluate the uniqueness, randomness and reliability against different environmental changes. The experimental validation of the proposed model is done on FPGA Xilinx Zynq-7010 processor and benchmarked the system against quantization noise.
{"title":"Processing In-Memory PUF Watermark Embedding With Cellular Memristor Network","authors":"Alex James;Chithra Reghuvaran;Leon Chua","doi":"10.1109/TETC.2025.3528336","DOIUrl":"https://doi.org/10.1109/TETC.2025.3528336","url":null,"abstract":"The cellular neural network (CNN or CeNN) is known to be useful because of its suitability in real-time processing, parallel processing, robustness, flexibility, and energy efficiency. CeNNs have a large number of interconnected processing elements, which can be programmed to produce a wide range of patterns, including regular and irregular patterns, random patterns, and more. When implemented in memristive hardware, the pattern generator ability and inherent variability of memristive devices can be explored to create Physical Unclonable Functions (PUFs). This work reports a method of using memristive CeNNs to perform image processing tasks along with PUF image generation. The CeNN-PUF has dual mode capability combining data processing and encryption using PUF image watermarking. The proposed method provides unique device-specific image watermarks, following a two-stage process of (1) device-specific secret mask generation and (2) watermark embedding. The system is evaluated using multiple CeNN cloning templates and the robustness of the method is validated against ML attacks. A detailed analysis is presented to evaluate the uniqueness, randomness and reliability against different environmental changes. The experimental validation of the proposed model is done on FPGA Xilinx Zynq-7010 processor and benchmarked the system against quantization noise.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 4","pages":"1453-1464"},"PeriodicalIF":5.4,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10851809","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-20DOI: 10.1109/TETC.2025.3528985
Shih-Chieh Chuang;Ching-Hu Lu
Real-time adaptability is often required to maintain system accuracy in scenarios involving domain shifts caused by constantly changing environments. While continual test-time adaptation has been proposed to handle such scenarios, existing methods rely on high-accuracy pseudo-labels. Moreover, contrastive learning methods for continuous test-time adaptation consider the aggregation of features from the same class while neglecting the problem of aggregating similar features within the same class. Therefore, we propose “Weighted Contrastive Learning” and apply it to both pre-training and continual test-time adaptation. To address the issue of catastrophic forgetting caused by continual adaptation, previous studies have employed source-domain knowledge to stochastically recover the target-domain model. However, significant domain shifts may cause the source-domain knowledge to behave as noise, thus impacting the model's adaptability. Therefore, we propose “Domain-aware Pseudo-label Correction” to mitigate catastrophic forgetting and error accumulation without accessing the original source-domain data while minimizing the impact on model adaptability. The thorough evaluations in our experiments have demonstrated the effectiveness of our proposed approach.
{"title":"Continual Test-Time Adaptation With Weighted Contrastive Learning and Pseudo-Label Correction","authors":"Shih-Chieh Chuang;Ching-Hu Lu","doi":"10.1109/TETC.2025.3528985","DOIUrl":"https://doi.org/10.1109/TETC.2025.3528985","url":null,"abstract":"Real-time adaptability is often required to maintain system accuracy in scenarios involving domain shifts caused by constantly changing environments. While continual test-time adaptation has been proposed to handle such scenarios, existing methods rely on high-accuracy pseudo-labels. Moreover, contrastive learning methods for continuous test-time adaptation consider the aggregation of features from the same class while neglecting the problem of aggregating similar features within the same class. Therefore, we propose “Weighted Contrastive Learning” and apply it to both pre-training and continual test-time adaptation. To address the issue of catastrophic forgetting caused by continual adaptation, previous studies have employed source-domain knowledge to stochastically recover the target-domain model. However, significant domain shifts may cause the source-domain knowledge to behave as noise, thus impacting the model's adaptability. Therefore, we propose “Domain-aware Pseudo-label Correction” to mitigate catastrophic forgetting and error accumulation without accessing the original source-domain data while minimizing the impact on model adaptability. The thorough evaluations in our experiments have demonstrated the effectiveness of our proposed approach.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"866-877"},"PeriodicalIF":5.4,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-20DOI: 10.1109/TETC.2025.3529842
Andrew Dervay;Wenfeng Zhao
In-memory computing (IMC) emerges as one of the most promising computing technologies for data-intensive applications to ameliorate the “memory wall” bottleneck in von Neumann computer systems. Meanwhile, IMC also shows promising prospects towards high-throughput and energy-efficient processing of cryptographic workloads. This paper presents Block-Cipher-In-Memory (BCIM), a constant-time, high-throughput, bit-serial in-memory cryptography scheme to support versatile Substitution-Permutation and Feistel network based block ciphers, such as standard ciphers like Advanced Encryption Standard (AES), and lightweight block ciphers like RECTANGLE and Simon. In addition, BCIM employs a processor-assisted key loading scheme and prudent memory management strategies to minimize the memory footprint needed for cryptographic algorithms to improve the peak operating frequency and energy efficiency. Built upon these, BCIM can also support alternative block cipher modes of operation like counter mode beyond electronic-codebook. Furthermore, the bit-serial operation of BCIM inherently ensures constant-time execution and exploits column-wise single instruction multiple data (SIMD) processing, thereby providing strong resistance to side-channel timing attacks, and achieves high-throughput encryption and decryption via massively-parallel compact round function implementation. Experimental results suggest that BCIM shows substantial performance and energy improvements over state-of-the-art bit-parallel IMC ciphers. Additionally, BCIM show competitive performance and orders of magnitude energy advantages over the bitsliced software implementations on MCU/CPU platforms.
{"title":"BCIM: Constant-Time and High-Throughput Block-Cipher-in-Memory With Massively-Parallel Bit-Serial Execution","authors":"Andrew Dervay;Wenfeng Zhao","doi":"10.1109/TETC.2025.3529842","DOIUrl":"https://doi.org/10.1109/TETC.2025.3529842","url":null,"abstract":"In-memory computing (IMC) emerges as one of the most promising computing technologies for data-intensive applications to ameliorate the “<italic>memory wall</i>” bottleneck in von Neumann computer systems. Meanwhile, IMC also shows promising prospects towards high-throughput and energy-efficient processing of cryptographic workloads. This paper presents Block-Cipher-In-Memory (BCIM), a constant-time, high-throughput, bit-serial in-memory cryptography scheme to support versatile Substitution-Permutation and Feistel network based block ciphers, such as standard ciphers like Advanced Encryption Standard (AES), and lightweight block ciphers like RECTANGLE and S<sc>imon</small>. In addition, BCIM employs a processor-assisted key loading scheme and prudent memory management strategies to minimize the memory footprint needed for cryptographic algorithms to improve the peak operating frequency and energy efficiency. Built upon these, BCIM can also support alternative block cipher modes of operation like counter mode beyond electronic-codebook. Furthermore, the bit-serial operation of BCIM inherently ensures constant-time execution and exploits column-wise single instruction multiple data (SIMD) processing, thereby providing strong resistance to side-channel timing attacks, and achieves high-throughput encryption and decryption via massively-parallel compact round function implementation. Experimental results suggest that BCIM shows substantial performance and energy improvements over state-of-the-art bit-parallel IMC ciphers. Additionally, BCIM show competitive performance and orders of magnitude energy advantages over the bitsliced software implementations on MCU/CPU platforms.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 4","pages":"1440-1452"},"PeriodicalIF":5.4,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-20DOI: 10.1109/TETC.2025.3528994
Georgios Boulougaris;Kostas Kolomvatsos
Currently, there is a great attention of the research community for the intelligent management of data in a context-aware manner at the intersection of the Internet of Things (IoT) and Edge Computing (EC). In this article, we propose a strategy to be adopted by autonomous edge nodes related to their decision on what data should be migrated to specific locations of the infrastructure and support the desired requests for processing. Our intention is to arm nodes with the ability of learning the access patterns of offloaded data-driven tasks and predict which data should be migrated to the original ‘owners’ of tasks. Naturally, these tasks are linked to the processing of data that are absent at the original hosting nodes indicating the required data assets that need to be accessed directly. To identify these data intervals, we employ an ensemble scheme that combines a statistically oriented model and a machine learning scheme. Hence, we are able not only to detect the density of the requests but also to learn and infer the ‘strong’ data assets. The proposed approach is analyzed in detail by presenting the corresponding formulations being also evaluated and compared against baselines and models found in the respective literature.
{"title":"A Pervasive Edge Computing Model for Proactive Intelligent Data Migration","authors":"Georgios Boulougaris;Kostas Kolomvatsos","doi":"10.1109/TETC.2025.3528994","DOIUrl":"https://doi.org/10.1109/TETC.2025.3528994","url":null,"abstract":"Currently, there is a great attention of the research community for the intelligent management of data in a context-aware manner at the intersection of the Internet of Things (IoT) and Edge Computing (EC). In this article, we propose a strategy to be adopted by autonomous edge nodes related to their decision on what data should be migrated to specific locations of the infrastructure and support the desired requests for processing. Our intention is to arm nodes with the ability of learning the access patterns of offloaded data-driven tasks and predict which data should be migrated to the original ‘owners’ of tasks. Naturally, these tasks are linked to the processing of data that are absent at the original hosting nodes indicating the required data assets that need to be accessed directly. To identify these data intervals, we employ an ensemble scheme that combines a statistically oriented model and a machine learning scheme. Hence, we are able not only to detect the density of the requests but also to learn and infer the ‘strong’ data assets. The proposed approach is analyzed in detail by presenting the corresponding formulations being also evaluated and compared against baselines and models found in the respective literature.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"878-889"},"PeriodicalIF":5.4,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-20DOI: 10.1109/TETC.2025.3528972
Amir Sabbagh Molahosseini;JunKyu Lee;Hans Vandierendonck
This article presents the design and implementation of Software-Defined Floating-Point (SDF) number formats for high-speed implementation of the Belief Propagation (BP) algorithm. SDF formats are designed specifically to meet the numeric needs of the computation and are more compact representations of the data. They reduce memory footprint and memory bandwidth requirements without sacrificing accuracy, given that BP for loopy graphs inherently involves algorithmic errors. This article designs several SDF formats for sum-product BP applications by careful analysis of the computation. Our theoretical analysis leads to the design of 16-bit (half-precision) and 8-bit (mini-precision) widths. We moreover present highly efficient software implementation of the proposed SDF formats which is centered around conversion to hardware-supported single-precision arithmetic hardware. Our solution demonstrates negligible conversion overhead on commercially available CPUs. For Ising grids with sizes from 100 × 100 to 500 × 500, the 16- and 8-bit SDF formats along with our conversion module produce equivalent accuracy to double-precision floating-point format but with 2.86× speedups on average on an Intel Xeon processor. Particularly, increasing the grid size results in higher speed-up. For example, the proposed half-precision format with 3-bit exponent and 13-bit mantissa achieved the minimum and maximum speedups of 1.30× and 1.39× over single-precision, and 2.55× and 3.40× over double-precision, by increasing grid size from 100 × 100 to 500 × 500.
{"title":"Software-Defined Number Formats for High-Speed Belief Propagation","authors":"Amir Sabbagh Molahosseini;JunKyu Lee;Hans Vandierendonck","doi":"10.1109/TETC.2025.3528972","DOIUrl":"https://doi.org/10.1109/TETC.2025.3528972","url":null,"abstract":"This article presents the design and implementation of Software-Defined Floating-Point (SDF) number formats for high-speed implementation of the Belief Propagation (BP) algorithm. SDF formats are designed specifically to meet the numeric needs of the computation and are more compact representations of the data. They reduce memory footprint and memory bandwidth requirements without sacrificing accuracy, given that BP for loopy graphs inherently involves algorithmic errors. This article designs several SDF formats for sum-product BP applications by careful analysis of the computation. Our theoretical analysis leads to the design of 16-bit (half-precision) and 8-bit (mini-precision) widths. We moreover present highly efficient software implementation of the proposed SDF formats which is centered around conversion to hardware-supported single-precision arithmetic hardware. Our solution demonstrates negligible conversion overhead on commercially available CPUs. For Ising grids with sizes from 100 × 100 to 500 × 500, the 16- and 8-bit SDF formats along with our conversion module produce equivalent accuracy to double-precision floating-point format but with 2.86× speedups on average on an Intel Xeon processor. Particularly, increasing the grid size results in higher speed-up. For example, the proposed half-precision format with 3-bit exponent and 13-bit mantissa achieved the minimum and maximum speedups of 1.30× and 1.39× over single-precision, and 2.55× and 3.40× over double-precision, by increasing grid size from 100 × 100 to 500 × 500.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"853-865"},"PeriodicalIF":5.4,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145051083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}