Thalla Narasimha Swetha, Uppugunduru Anil Kumar, Syed Ershad Ahmed
The posit number system represents a significant advancement aimed at replacing the current IEEE floating-point standard in a seamless manner. With its notable dynamic range and gradually tapering precision, a smaller posit can closely match the performance of a larger floating-point number in representing decimal values. Multiplication is a fundamental arithmetic operation that is essential in a wide range of applications, particularly in fields such as image processing, signal processing, neural networks (NNs), and machine learning. Given the considerable power consumption, area requirements, and latency associated with multiplication, it is imperative to explore optimization strategies in these areas. This article provides a comprehensive review of both exact and inexact (approximate) posit multiplier designs. It includes a detailed comparative evaluation of their error rates and circuit characteristics, aimed at fostering a deeper understanding of the distinctive features of various designs. This study examines Booth-based posit multipliers and logarithmic posit multipliers, categorizing Booth multipliers into exact and inexact types. The posit multipliers are implemented and synthesized using the Cadence RTL Compiler in Verilog HDL, while error characterization is conducted using the soft posit library in Python. In this article, power, area, and delay are compared in relation to the mean relative error distance (MRED). The comparative results indicate that the logarithmic-based posit multiplier is hardware-efficient but has low accuracy. In contrast, the Booth posit multiplier offers superior accuracy, despite having higher performance metrics. Notably, the logarithmic multiplier, referred to as posit logarithmic-approximate multiplier (PLAM), provides a substantial decrease in power, area, and delay by at least 92%, 82%, and 78%, respectively, compared to all the Booth multipliers. The approximation error of PLAM is analyzed, including metrics such as MRED, to assess performance relative to exact posit multipliers. The posit logarithmic multiplier was validated using various NN architectures, including LeNet-5, VGG11, and ResNet-18. The results indicate that posit logarithmic multiplier achieves inference accuracy comparable to traditional floating-point multipliers while also enhancing hardware efficiency.
{"title":"Exploring Posit Multiplication: A Comprehensive Review of Booth and Logarithmic Mantissa Methods","authors":"Thalla Narasimha Swetha, Uppugunduru Anil Kumar, Syed Ershad Ahmed","doi":"10.1049/cdt2/7515558","DOIUrl":"https://doi.org/10.1049/cdt2/7515558","url":null,"abstract":"<p>The posit number system represents a significant advancement aimed at replacing the current IEEE floating-point standard in a seamless manner. With its notable dynamic range and gradually tapering precision, a smaller posit can closely match the performance of a larger floating-point number in representing decimal values. Multiplication is a fundamental arithmetic operation that is essential in a wide range of applications, particularly in fields such as image processing, signal processing, neural networks (NNs), and machine learning. Given the considerable power consumption, area requirements, and latency associated with multiplication, it is imperative to explore optimization strategies in these areas. This article provides a comprehensive review of both exact and inexact (approximate) posit multiplier designs. It includes a detailed comparative evaluation of their error rates and circuit characteristics, aimed at fostering a deeper understanding of the distinctive features of various designs. This study examines Booth-based posit multipliers and logarithmic posit multipliers, categorizing Booth multipliers into exact and inexact types. The posit multipliers are implemented and synthesized using the Cadence RTL Compiler in Verilog HDL, while error characterization is conducted using the soft posit library in Python. In this article, power, area, and delay are compared in relation to the mean relative error distance (MRED). The comparative results indicate that the logarithmic-based posit multiplier is hardware-efficient but has low accuracy. In contrast, the Booth posit multiplier offers superior accuracy, despite having higher performance metrics. Notably, the logarithmic multiplier, referred to as posit logarithmic-approximate multiplier (PLAM), provides a substantial decrease in power, area, and delay by at least 92%, 82%, and 78%, respectively, compared to all the Booth multipliers. The approximation error of PLAM is analyzed, including metrics such as MRED, to assess performance relative to exact posit multipliers. The posit logarithmic multiplier was validated using various NN architectures, including LeNet-5, VGG11, and ResNet-18. The results indicate that posit logarithmic multiplier achieves inference accuracy comparable to traditional floating-point multipliers while also enhancing hardware efficiency.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2026 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2/7515558","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145909361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Facial emotion recognition has poor robustness and low recognition accuracy in complex lighting, posture changes, and occlusion scenes. This study aims to design a high-performance convolutional neural network (CNN) model to improve the recognition accuracy and generalization ability of seven basic emotions in complex environments. FER2013, CK+ and Japanese female cultural specific expression (JAFFE) datasets are selected, and data preprocessing is performed through grayscale, histogram equalization and size normalization; secondly, random rotation, horizontal flipping and brightness perturbation are used for data enhancement to improve the generalization of the model; then, a 12-layer CNN model is constructed, including four convolutional blocks, two fully connected layers and an output layer, and Dropout (0.5) is used to prevent overfitting; the Adam optimizer is used to iterate 100 epochs on the training data, with cross entropy as the loss function, and the early stopping mechanism is used to optimize the hyperparameters on the validation set. The highest accuracy rate reaches 99.2% on the FER2013 test set, and the average accuracy rates of 97.3% and 88.3% are obtained in the cross-dataset tests of CK+ and JAFFE, respectively. Key performance indicators show that the average recall rate is 90.7%; the precision rate is 90.4%; the F1-score is 90.5%; the accuracy rate is still 85.2% in the standard mask occlusion test scenario. The proposed CNN model significantly improves the accuracy and robustness of emotion recognition under complex conditions through end-to-end feature learning and data enhancement strategies, providing an effective technical solution for real-time emotion analysis systems.
{"title":"Facial Emotion Recognition Method Based on Convolutional Neural Network","authors":"Mou Hongwei, Wang Xue, Huang Kai","doi":"10.1049/cdt2/1845378","DOIUrl":"10.1049/cdt2/1845378","url":null,"abstract":"<p>Facial emotion recognition has poor robustness and low recognition accuracy in complex lighting, posture changes, and occlusion scenes. This study aims to design a high-performance convolutional neural network (CNN) model to improve the recognition accuracy and generalization ability of seven basic emotions in complex environments. FER2013, CK+ and Japanese female cultural specific expression (JAFFE) datasets are selected, and data preprocessing is performed through grayscale, histogram equalization and size normalization; secondly, random rotation, horizontal flipping and brightness perturbation are used for data enhancement to improve the generalization of the model; then, a 12-layer CNN model is constructed, including four convolutional blocks, two fully connected layers and an output layer, and Dropout (0.5) is used to prevent overfitting; the Adam optimizer is used to iterate 100 epochs on the training data, with cross entropy as the loss function, and the early stopping mechanism is used to optimize the hyperparameters on the validation set. The highest accuracy rate reaches 99.2% on the FER2013 test set, and the average accuracy rates of 97.3% and 88.3% are obtained in the cross-dataset tests of CK+ and JAFFE, respectively. Key performance indicators show that the average recall rate is 90.7%; the precision rate is 90.4%; the F1-score is 90.5%; the accuracy rate is still 85.2% in the standard mask occlusion test scenario. The proposed CNN model significantly improves the accuracy and robustness of emotion recognition under complex conditions through end-to-end feature learning and data enhancement strategies, providing an effective technical solution for real-time emotion analysis systems.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2025 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2/1845378","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid development of computer science and information technology, augmented reality (AR) technology has been widely used in the field of visual communication design. AR is needed in visual communication design because it allows blending the real world with the virtual one, which cannot be done with the help of traditional 2D design methods. This paper aims to enhance the application of reality technology in visual communication design and evaluate its effect and advantages and disadvantages through experiments. In this paper, AR technology is used to virtually superimpose images, videos, and other elements in the real scene to achieve colorful visual communication effects. Meanwhile, the artificial intelligence (AI) algorithm is used to optimize the AR content to improve its visual quality and realism. Finally, the effect and user experience of traditional graphic design and AR design are compared through experiments. The research results show that AR technology can enhance the visualization effect of information, increase the user participation by nearly 20% year-on-year, and enhance the memory durability by 4.37% compared with before. AR technology can also create a unique experience different from traditional design media. It shows that AR technology can effectively improve the effect and user experience of visual communication design and provide more abundant and diversified design tools and means for visual communication designers.
{"title":"The Application of Augmented Reality Technology in Visual Communication Design","authors":"Xiao Hong Tian","doi":"10.1049/cdt2/4006505","DOIUrl":"10.1049/cdt2/4006505","url":null,"abstract":"<p>With the rapid development of computer science and information technology, augmented reality (AR) technology has been widely used in the field of visual communication design. AR is needed in visual communication design because it allows blending the real world with the virtual one, which cannot be done with the help of traditional 2D design methods. This paper aims to enhance the application of reality technology in visual communication design and evaluate its effect and advantages and disadvantages through experiments. In this paper, AR technology is used to virtually superimpose images, videos, and other elements in the real scene to achieve colorful visual communication effects. Meanwhile, the artificial intelligence (AI) algorithm is used to optimize the AR content to improve its visual quality and realism. Finally, the effect and user experience of traditional graphic design and AR design are compared through experiments. The research results show that AR technology can enhance the visualization effect of information, increase the user participation by nearly 20% year-on-year, and enhance the memory durability by 4.37% compared with before. AR technology can also create a unique experience different from traditional design media. It shows that AR technology can effectively improve the effect and user experience of visual communication design and provide more abundant and diversified design tools and means for visual communication designers.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2025 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2/4006505","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhe Zheng, Wei Ma, Jinghua Liu, Jinghui Lu, Song Qiu, Rui Liu, Wenpeng Cui
With the widespread application of local affine (LA) motion models in various video coding standards, this study explores the implementation methods and performance changes of introducing a global registration model in an encoder that already includes a LA motion model. First, a coding scheme combining global and local registration is achieved by incorporating global registration computation, optimizing reference frame selection strategies, and macroblock mode selection strategies. Second, through experiments, the impact of introducing a global warp motion model and a global translational (GT) registration model on performance is further compared. The results indicate that the introduction of a global warp motion model leads to functional redundancy and mutual interference, with higher computational complexity and limited overall benefits. On the other hand, introducing a GT registration model can complement and enhance the coding performance for translation scenarios, working in synergy with the LA model, while maintaining lower computational complexity and greater practicality.
{"title":"Research on Adding Global Registration Model in Video Coding With Local Affine Motion Model","authors":"Zhe Zheng, Wei Ma, Jinghua Liu, Jinghui Lu, Song Qiu, Rui Liu, Wenpeng Cui","doi":"10.1049/cdt2/6692669","DOIUrl":"https://doi.org/10.1049/cdt2/6692669","url":null,"abstract":"<p>With the widespread application of local affine (LA) motion models in various video coding standards, this study explores the implementation methods and performance changes of introducing a global registration model in an encoder that already includes a LA motion model. First, a coding scheme combining global and local registration is achieved by incorporating global registration computation, optimizing reference frame selection strategies, and macroblock mode selection strategies. Second, through experiments, the impact of introducing a global warp motion model and a global translational (GT) registration model on performance is further compared. The results indicate that the introduction of a global warp motion model leads to functional redundancy and mutual interference, with higher computational complexity and limited overall benefits. On the other hand, introducing a GT registration model can complement and enhance the coding performance for translation scenarios, working in synergy with the LA model, while maintaining lower computational complexity and greater practicality.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2025 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2/6692669","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145750846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sunawar khan, Tehseen Mazhar, Tariq Shahzad, Muhammad Amir Khan, Wasim Ahmad, Afsha Bibi, Habib Hamam
Generative adversarial networks (GANs), a subset of deep learning, have demonstrated breakthrough performance in domains such as computer vision (CV) and natural language processing (NLP), particularly in surveillance, autonomous driving, and automated programing assistance. Based on game theory principles, GANs utilize a generator–discriminator architecture to produce high-quality synthetic data. This study conducts a systematic literature review (SLR) to comprehensively assess the development, applications, limitations, and security-related advancements of GANs. It examines foundational models and key architectural variants, providing a critical evaluation of their roles in NLP and CV. This research explores the integration of GANs into the domain of security, highlighting their applications in information security, cybersecurity, and artificial intelligence (AI)-driven defense mechanisms. The study also discusses prominent evaluation metrics such as inception score (IS), Fréchet inception distance (FID), structural similarity index measure (SSIM), and peak signal-to-noise ratio (PSNR) to assess GAN performance. Key strengths of GANs, including their ability to generate high-resolution data and support domain adaptation, are emphasized as driving factors for their continued evolution and adoption.
{"title":"A Systematic Literature Review on the Applications, Models, Limitations, and Future Directions of Generative Adversarial Networks","authors":"Sunawar khan, Tehseen Mazhar, Tariq Shahzad, Muhammad Amir Khan, Wasim Ahmad, Afsha Bibi, Habib Hamam","doi":"10.1049/cdt2/5384331","DOIUrl":"10.1049/cdt2/5384331","url":null,"abstract":"<p>Generative adversarial networks (GANs), a subset of deep learning, have demonstrated breakthrough performance in domains such as computer vision (CV) and natural language processing (NLP), particularly in surveillance, autonomous driving, and automated programing assistance. Based on game theory principles, GANs utilize a generator–discriminator architecture to produce high-quality synthetic data. This study conducts a systematic literature review (SLR) to comprehensively assess the development, applications, limitations, and security-related advancements of GANs. It examines foundational models and key architectural variants, providing a critical evaluation of their roles in NLP and CV. This research explores the integration of GANs into the domain of security, highlighting their applications in information security, cybersecurity, and artificial intelligence (AI)-driven defense mechanisms. The study also discusses prominent evaluation metrics such as inception score (IS), Fréchet inception distance (FID), structural similarity index measure (SSIM), and peak signal-to-noise ratio (PSNR) to assess GAN performance. Key strengths of GANs, including their ability to generate high-resolution data and support domain adaptation, are emphasized as driving factors for their continued evolution and adoption.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2025 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2/5384331","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145572159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Securing reusable hardware intellectual property (IP) cores used in system-on-chip (SoC) designs is crucial, due to global design supply chain that may introduce different points of security vulnerability. One of the major threats includes an untrustworthy entity in the SoC design house attempting piracy or falsely claiming ownership of the IP design. Further, owing to the importance of handling transient fault in hardware IP designs, design of fault-detectable IP designs has become a standard practice in the community. However, these fault-detectable IP designs are also similarly prone to hardware threats such as IP piracy and false claim of IP ownership. Therefore, robust sturdy countermeasure for fault-detectable IP designs against such threats is essential. This paper presents a detective countermeasure using proposed novel hardware watermarking methodology for transient fault-detectable IP designs. The proposed IP watermarking methodology introduces a novel multivariate encoded high-level synthesis (HLS) scheduling based multimodal security framework. The proposed approach is capable of embedding a robust, unique, and nonreplicable watermark in the HLS register allocation phase of fault-detectable IP design. The proposed watermarking technique is more robust than the prior watermarking approaches in terms of reduced probability of coincidence (PC; upto ~10−8), stronger tamper tolerance (TT; upto ~10130), and lower watermark decoding probability at 0% design cost overhead.
{"title":"Watermarking of Transient Fault-Detectable IP Designs Using Multivariate HLS Scheduling Based Multimodal Security","authors":"Anirban Sengupta, Vishal Chourasia, Nabendu Bhui, Aditya Anshul","doi":"10.1049/cdt2/5926846","DOIUrl":"https://doi.org/10.1049/cdt2/5926846","url":null,"abstract":"<p>Securing reusable hardware intellectual property (IP) cores used in system-on-chip (SoC) designs is crucial, due to global design supply chain that may introduce different points of security vulnerability. One of the major threats includes an untrustworthy entity in the SoC design house attempting piracy or falsely claiming ownership of the IP design. Further, owing to the importance of handling transient fault in hardware IP designs, design of fault-detectable IP designs has become a standard practice in the community. However, these fault-detectable IP designs are also similarly prone to hardware threats such as IP piracy and false claim of IP ownership. Therefore, robust sturdy countermeasure for fault-detectable IP designs against such threats is essential. This paper presents a detective countermeasure using proposed novel hardware watermarking methodology for transient fault-detectable IP designs. The proposed IP watermarking methodology introduces a novel multivariate encoded high-level synthesis (HLS) scheduling based multimodal security framework. The proposed approach is capable of embedding a robust, unique, and nonreplicable watermark in the HLS register allocation phase of fault-detectable IP design. The proposed watermarking technique is more robust than the prior watermarking approaches in terms of reduced probability of coincidence (PC; upto ~10<sup>−8</sup>), stronger tamper tolerance (TT; upto ~10<sup>130</sup>), and lower watermark decoding probability at 0% design cost overhead.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2025 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2/5926846","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145521816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study presents an innovative temperature-induced random noise correction method for complementary metal oxide semiconductor (CMOS) spatial cameras using an attention mechanism-enhanced long short-term memory (LSTM) model. The model, specifically designed to address pixel drift and random noise issues in CMOS space cameras due to temperature variations, incorporates a multilayer LSTM network with an attention mechanism. This study comprehensively examines the temperature-induced variations in noise characteristics of CMOS cameras across diverse thermal conditions, encompassing in-depth analyses of both dark-field and light-field scenarios. Through detailed pixel-level analysis, the study quantifies the influence of temperature on pixel values and critical performance parameters such as internal nonuniformity within the camera. The experimental results show that under the dark field condition, the fitting variance between the predicted value and the measured value ranges from 0.29585 to 5.798307. After correction in light field conditions, the average variance of images decreases to 0.29, the mean signal-to-noise ratio (SNR) increases to 80, and the photo response nonuniformity (PRNU) mean drops to 0.0161%. Compared to precorrection levels, these key metrics show significant improvements, with an average 83.57-fold reduction, 1.89-fold increase, and 84.98-fold decrease, respectively. These results confirm the effectiveness of the deep learning method in correcting temperature-induced noise, highlighting the potential for practical engineering applications.
{"title":"A Temperature Noise Correction Method for CMOS Spatial Camera Using LSTM With Attention Mechanism","authors":"Long Cheng, Xueying Wang, Jing Xu","doi":"10.1049/cdt2/6670185","DOIUrl":"10.1049/cdt2/6670185","url":null,"abstract":"<p>This study presents an innovative temperature-induced random noise correction method for complementary metal oxide semiconductor (CMOS) spatial cameras using an attention mechanism-enhanced long short-term memory (LSTM) model. The model, specifically designed to address pixel drift and random noise issues in CMOS space cameras due to temperature variations, incorporates a multilayer LSTM network with an attention mechanism. This study comprehensively examines the temperature-induced variations in noise characteristics of CMOS cameras across diverse thermal conditions, encompassing in-depth analyses of both dark-field and light-field scenarios. Through detailed pixel-level analysis, the study quantifies the influence of temperature on pixel values and critical performance parameters such as internal nonuniformity within the camera. The experimental results show that under the dark field condition, the fitting variance between the predicted value and the measured value ranges from 0.29585 to 5.798307. After correction in light field conditions, the average variance of images decreases to 0.29, the mean signal-to-noise ratio (SNR) increases to 80, and the photo response nonuniformity (PRNU) mean drops to 0.0161%. Compared to precorrection levels, these key metrics show significant improvements, with an average 83.57-fold reduction, 1.89-fold increase, and 84.98-fold decrease, respectively. These results confirm the effectiveness of the deep learning method in correcting temperature-induced noise, highlighting the potential for practical engineering applications.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2025 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2/6670185","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144220342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zourong Long, Gen Tan, You Wu, Hong Yang, Chao Ding
The processing of point cloud data has become a significant area of research in the modern field of perception. Classification and segmentation are critical tasks in autonomous driving, environmental perception, and digital twins. Algorithms that directly extract features from raw point cloud data have simple architectures, but they are constrained by computational demands and limited efficiency. This makes effective deployment on resource-limited devices challenging. This article introduces GRSNet, an ultra-lightweight algorithm. The principal innovation is a new sampling method named golden ratio sampling (GRS), which generates sampling point indices directly using the golden ratio to subsequently locate the corresponding sampling points. This method efficiently extracts representative points from point cloud data and integrates them into deep networks. Leveraging GRS, this study combines the concepts from GhostNet and self-attention mechanisms to develop a feature extraction module dubbed the SA_Ghost Block, forming the core of GRSNet. Comparative experiments with leading algorithms on established point cloud open-source datasets demonstrate that GRSNet achieves superior performance, maintaining only 0.7 M parameters.
{"title":"GRSNet: An Ultra-Lightweight Neural Network for 3D Point Cloud Classification and Segmentation","authors":"Zourong Long, Gen Tan, You Wu, Hong Yang, Chao Ding","doi":"10.1049/cdt2/7934018","DOIUrl":"10.1049/cdt2/7934018","url":null,"abstract":"<p>The processing of point cloud data has become a significant area of research in the modern field of perception. Classification and segmentation are critical tasks in autonomous driving, environmental perception, and digital twins. Algorithms that directly extract features from raw point cloud data have simple architectures, but they are constrained by computational demands and limited efficiency. This makes effective deployment on resource-limited devices challenging. This article introduces GRSNet, an ultra-lightweight algorithm. The principal innovation is a new sampling method named golden ratio sampling (GRS), which generates sampling point indices directly using the golden ratio to subsequently locate the corresponding sampling points. This method efficiently extracts representative points from point cloud data and integrates them into deep networks. Leveraging GRS, this study combines the concepts from GhostNet and self-attention mechanisms to develop a feature extraction module dubbed the SA_Ghost Block, forming the core of GRSNet. Comparative experiments with leading algorithms on established point cloud open-source datasets demonstrate that GRSNet achieves superior performance, maintaining only 0.7 M parameters.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2025 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2/7934018","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143939359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study presents an intelligent moving target to replicate mob attacks and other realistic events in police training to match actual fighting needs. The police intelligent moving target must deploy target detection algorithms on the hardware platform, but the traditional you only look once (YOLO)v8 algorithm has a large framework, which will slow recognition due to the hardware platform’s lack of arithmetic power. In this study, GhostNet network architecture replaces YOLOv8′s backbone network for real-time target identification, improving recognition speed. The bounding box regression issue in target detection uses the scale invariant intersection over union (SIoU) loss function to increase prediction box overlapping and identification accuracy. Finally, BiFormer uses dynamic sparse attention for more flexible computational allocation and content perception. The method’s real-time detection speed is 4.81 frames per second (FPS) faster, [email protected] is 5.38% faster, mean average precision (mAP)@0.5:0.95 is 4.19% faster, and parameter volume is 5.81 M less than the original approach. The approach developed in this work has several applications in real-time target identification and lightweight deployment.
本研究提出了一个智能移动目标来复制暴徒袭击和警察训练中的其他现实事件,以匹配实际战斗需求。警用智能移动目标必须在硬件平台上部署目标检测算法,而传统的you only look once (YOLO)v8算法框架较大,由于硬件平台缺乏算力,会导致识别速度变慢。在本研究中,GhostNet网络架构取代YOLOv8的骨干网进行实时目标识别,提高了识别速度。目标检测中的边界盒回归问题采用SIoU损失函数(scale invariant intersection over union)来提高预测盒重叠和识别精度。最后,BiFormer使用动态稀疏注意实现更灵活的计算分配和内容感知。该方法的实时检测速度比原方法提高了4.81帧/秒(FPS), [email protected]提高了5.38%,平均精度(mAP)@0.5:0.95提高了4.19%,参数体积比原方法减少了5.81 M。本研究开发的方法在实时目标识别和轻量级部署中具有多种应用。
{"title":"Application of Lightweight Target Detection Algorithm Based on YOLOv8 for Police Intelligent Moving Targets","authors":"Yanjie Zhang, Xiaojun Liu, Yuehan Shi, Zecong Ding, Xiaoming Zhang","doi":"10.1049/cdt2/9984821","DOIUrl":"10.1049/cdt2/9984821","url":null,"abstract":"<p>This study presents an intelligent moving target to replicate mob attacks and other realistic events in police training to match actual fighting needs. The police intelligent moving target must deploy target detection algorithms on the hardware platform, but the traditional you only look once (YOLO)v8 algorithm has a large framework, which will slow recognition due to the hardware platform’s lack of arithmetic power. In this study, GhostNet network architecture replaces YOLOv8<sup>′</sup>s backbone network for real-time target identification, improving recognition speed. The bounding box regression issue in target detection uses the scale invariant intersection over union (SIoU) loss function to increase prediction box overlapping and identification accuracy. Finally, BiFormer uses dynamic sparse attention for more flexible computational allocation and content perception. The method’s real-time detection speed is 4.81 frames per second (FPS) faster, [email protected] is 5.38% faster, mean average precision (mAP)@0.5:0.95 is 4.19% faster, and parameter volume is 5.81 M less than the original approach. The approach developed in this work has several applications in real-time target identification and lightweight deployment.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2025 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2/9984821","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143930499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zilin Li, Jizeng Wei, Shuangsheng Li, Yaogong Yang
The branch predictor is widely used to enhance processor performance, but it also constitutes one of the major energy-consuming components in processors. We found that approximately 32% of instruction blocks in a decoupled frontend do not contain branch instructions, while 30.8% of instruction blocks contain only conditional branches. However, because the type of instructions within a block cannot be determined during prediction, branch prediction must be executed every cycle. In this work, we propose the next block type (NBT) and no branch sequence table (NST) for predicting instruction block types. These mechanisms occupy minimal space and are straightforward to implement. For a four-way out-of-order processor, the NBT and NST reduce the branch predictor’s energy consumption by 52.36% and processor’s energy consumption by 4.1% without sacrificing the processor’s instructions per cycle (IPC) and branch prediction accuracy.
{"title":"Energy-Efficient Branch Predictor via Instruction Block Type Prediction in Decoupled Frontend","authors":"Zilin Li, Jizeng Wei, Shuangsheng Li, Yaogong Yang","doi":"10.1049/cdt2/3359419","DOIUrl":"10.1049/cdt2/3359419","url":null,"abstract":"<p>The branch predictor is widely used to enhance processor performance, but it also constitutes one of the major energy-consuming components in processors. We found that approximately 32% of instruction blocks in a decoupled frontend do not contain branch instructions, while 30.8% of instruction blocks contain only conditional branches. However, because the type of instructions within a block cannot be determined during prediction, branch prediction must be executed every cycle. In this work, we propose the next block type (NBT) and no branch sequence table (NST) for predicting instruction block types. These mechanisms occupy minimal space and are straightforward to implement. For a four-way out-of-order processor, the NBT and NST reduce the branch predictor’s energy consumption by 52.36% and processor’s energy consumption by 4.1% without sacrificing the processor’s instructions per cycle (IPC) and branch prediction accuracy.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2025 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2/3359419","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}