Journal of Real-Time Image Processing最新文献_第3页

Adversarial generative learning and timed path optimization for real-time visual image prediction to guide robot arm movements 用于实时视觉图像预测的对抗生成学习和定时路径优化，以指导机器人手臂运动

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-08-15 DOI: 10.1007/s11554-024-01526-5

Xin Li, Changhai Ru, Haonan Sun

Real-time visual image prediction, crucial for directing robotic arm movements, represents a significant technique in artificial intelligence and robotics. The primary technical challenges involve the robot’s inaccurate perception and understanding of the environment, coupled with imprecise control of movements. This study proposes ForGAN-MCTS, a generative adversarial network-based action sequence prediction algorithm, aimed at refining visually guided rearrangement planning for movable objects. Initially, the algorithm unveils a scalable and robust strategy for rearrangement planning, capitalizing on the capabilities of a Monte Carlo Tree Search strategy. Secondly, to enable the robot’s successful execution of grasping maneuvers, the algorithm proposes a generative adversarial network-based real-time prediction method, employing a network trained solely on synthetic data for robust estimation of multi-object workspace states via a single uncalibrated RGB camera. The efficacy of the newly proposed algorithm is corroborated through extensive experiments conducted by using a UR-5 robotic arm. The experimental results demonstrate that the algorithm surpasses existing methods in terms of planning efficacy and processing speed. Additionally, the algorithm is robust to camera motion and can effectively mitigate the effects of external perturbations.

实时视觉图像预测对指导机械臂运动至关重要，是人工智能和机器人技术中的一项重要技术。主要的技术挑战包括机器人对环境的感知和理解不准确，以及对动作的控制不精确。本研究提出了一种基于生成对抗网络的动作序列预测算法 ForGAN-MCTS，旨在完善可移动物体的视觉引导重新排列规划。首先，该算法利用蒙特卡洛树搜索（Monte Carlo Tree Search）策略的能力，为重新排列规划揭示了一种可扩展且稳健的策略。其次，为了使机器人能够成功执行抓取动作，该算法提出了一种基于生成对抗网络的实时预测方法，该方法仅使用合成数据训练的网络，通过单个未校准的 RGB 摄像头对多物体工作区状态进行稳健估计。通过使用 UR-5 机械臂进行大量实验，证实了新提出算法的有效性。实验结果表明，该算法在规划效率和处理速度方面都超越了现有方法。此外，该算法对相机运动具有鲁棒性，并能有效减轻外部扰动的影响。

{"title":"Adversarial generative learning and timed path optimization for real-time visual image prediction to guide robot arm movements","authors":"Xin Li, Changhai Ru, Haonan Sun","doi":"10.1007/s11554-024-01526-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01526-5","url":null,"abstract":"Real-time visual image prediction, crucial for directing robotic arm movements, represents a significant technique in artificial intelligence and robotics. The primary technical challenges involve the robot’s inaccurate perception and understanding of the environment, coupled with imprecise control of movements. This study proposes ForGAN-MCTS, a generative adversarial network-based action sequence prediction algorithm, aimed at refining visually guided rearrangement planning for movable objects. Initially, the algorithm unveils a scalable and robust strategy for rearrangement planning, capitalizing on the capabilities of a Monte Carlo Tree Search strategy. Secondly, to enable the robot’s successful execution of grasping maneuvers, the algorithm proposes a generative adversarial network-based real-time prediction method, employing a network trained solely on synthetic data for robust estimation of multi-object workspace states via a single uncalibrated RGB camera. The efficacy of the newly proposed algorithm is corroborated through extensive experiments conducted by using a UR-5 robotic arm. The experimental results demonstrate that the algorithm surpasses existing methods in terms of planning efficacy and processing speed. Additionally, the algorithm is robust to camera motion and can effectively mitigate the effects of external perturbations.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"7 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TinyCount: an efficient crowd counting network for intelligent surveillance TinyCount：用于智能监控的高效人群计数网络

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-08-13 DOI: 10.1007/s11554-024-01531-8

Hyeonbeen Lee, Jangho Lee

Crowd counting, the task of estimating the total number of people in an image, is essential for intelligent surveillance. Integrating a well-trained crowd counting network into edge devices, such as intelligent CCTV systems, enables its application across various domains, including the prevention of crowd collapses and urban planning. For a model to be embedded in edge devices, it requires robust performance, reduced parameter count, and faster response times. This study proposes a lightweight and powerful model called TinyCount, which has only 60k parameters. The proposed TinyCount is a fully convolutional network consisting of a feature extract module (FEM) for robust and rapid feature extraction, a scale perception module (SPM) for scale variation perception and an upsampling module (UM) that adjusts the feature map to the same size as the original image. TinyCount demonstrated competitive performance across three representative crowd counting datasets, despite utilizing approximately 3.33 to 271 times fewer parameters than other crowd counting approaches. The proposed model achieved relatively fast inference times by leveraging the MobileNetV2 architecture with dilated and transposed convolutions. The application of SEblock and findings from existing studies further proved its effectiveness. Finally, we evaluated the proposed TinyCount on multiple edge devices, including the Raspberry Pi 4, NVIDIA Jetson Nano, and NVIDIA Jetson AGX Xavier, to demonstrate its potential for practical applications.

人群计数是一项估算图像中总人数的任务，对于智能监控至关重要。将训练有素的人群计数网络集成到智能闭路电视系统等边缘设备中，可将其应用于各个领域，包括防止人群溃散和城市规划。要将模型嵌入到边缘设备中，就需要强大的性能、更少的参数数量和更快的响应时间。本研究提出了一种名为 TinyCount 的轻量级强大模型，它只有 60k 个参数。所提出的 TinyCount 是一个全卷积网络，由用于稳健、快速特征提取的特征提取模块（FEM）、用于尺度变化感知的尺度感知模块（SPM）和将特征图调整为与原始图像相同大小的上采样模块（UM）组成。TinyCount 在三个具有代表性的人群计数数据集上表现出了极具竞争力的性能，尽管它使用的参数比其他人群计数方法少了约 3.33 到 271 倍。通过利用 MobileNetV2 架构的扩张卷积和转置卷积，所提出的模型实现了相对较快的推理时间。SEblock 的应用和现有研究结果进一步证明了其有效性。最后，我们在多种边缘设备（包括 Raspberry Pi 4、NVIDIA Jetson Nano 和 NVIDIA Jetson AGX Xavier）上对所提出的 TinyCount 进行了评估，以证明其在实际应用中的潜力。

{"title":"TinyCount: an efficient crowd counting network for intelligent surveillance","authors":"Hyeonbeen Lee, Jangho Lee","doi":"10.1007/s11554-024-01531-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01531-8","url":null,"abstract":"Crowd counting, the task of estimating the total number of people in an image, is essential for intelligent surveillance. Integrating a well-trained crowd counting network into edge devices, such as intelligent CCTV systems, enables its application across various domains, including the prevention of crowd collapses and urban planning. For a model to be embedded in edge devices, it requires robust performance, reduced parameter count, and faster response times. This study proposes a lightweight and powerful model called TinyCount, which has only 60k parameters. The proposed TinyCount is a fully convolutional network consisting of a feature extract module (FEM) for robust and rapid feature extraction, a scale perception module (SPM) for scale variation perception and an upsampling module (UM) that adjusts the feature map to the same size as the original image. TinyCount demonstrated competitive performance across three representative crowd counting datasets, despite utilizing approximately 3.33 to 271 times fewer parameters than other crowd counting approaches. The proposed model achieved relatively fast inference times by leveraging the MobileNetV2 architecture with dilated and transposed convolutions. The application of SEblock and findings from existing studies further proved its effectiveness. Finally, we evaluated the proposed TinyCount on multiple edge devices, including the Raspberry Pi 4, NVIDIA Jetson Nano, and NVIDIA Jetson AGX Xavier, to demonstrate its potential for practical applications.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"9 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic detection of defects in electronic plastic packaging using deep convolutional neural networks 利用深度卷积神经网络自动检测电子塑料包装中的缺陷

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-08-12 DOI: 10.1007/s11554-024-01534-5

Wanchun Ren, Pengcheng Zhu, Shaofeng Cai, Yi Huang, Haoran Zhao, Youji Hama, Zhu Yan, Tao Zhou, Junde Pu, Hongwei Yang

As the mainstream chip packaging technology, plastic-encapsulated chips (PEC) suffer from process defects such as delamination and voids, which seriously impact the chip's reliability. Therefore, it is urgent to detect defects promptly and accurately. However, the current manual detection methods cannot meet the application's requirements, as they are both inaccurate and inefficient. This study utilized the deep convolutional neural network (DCNN) technique to analyze PEC's scanning acoustic microscope (SAM) images and identify their internal defects. First, the SAM technology was used to collect and set up datasets of seven typical PEC defects. Then, according to the characteristics of densely packed PEC and an incredibly tiny size ratio in SAM, a PECNet network was established to detect PEC based on the traditional RetinaNet network, combining the CoTNet50 backbone network and the feature pyramid network structure. Furthermore, a PEDNet was designed to classify PEC defects based on the MobileNetV2 network, integrating cross-local connections and progressive classifiers. The experimental results demonstrated that the PECNet network's chip recognition accuracy reaches 98.6%, and its speed of a single image requires only nine milliseconds. Meanwhile, the PEDNet network's average defect classification accuracy is 97.8%, and the recognition speed of a single image is only 0.0021 s. This method provides a precise and efficient technique for defect detection in PEC.

作为主流芯片封装技术，塑料封装芯片（PEC）存在分层和空洞等工艺缺陷，严重影响芯片的可靠性。因此，及时准确地检测缺陷迫在眉睫。然而，目前的人工检测方法既不准确又效率低下，无法满足应用要求。本研究利用深度卷积神经网络（DCNN）技术分析 PEC 的扫描声学显微镜（SAM）图像并识别其内部缺陷。首先，利用 SAM 技术收集并建立了七个典型 PEC 缺陷的数据集。然后，根据 PEC 在 SAM 中密集排列、尺寸比极其微小的特点，在传统 RetinaNet 网络的基础上，结合 CoTNet50 骨干网络和特征金字塔网络结构，建立了 PECNet 网络来检测 PEC。此外，还设计了一个 PEDNet，基于 MobileNetV2 网络，整合跨本地连接和渐进式分类器，对 PEC 缺陷进行分类。实验结果表明，PECNet 网络的芯片识别准确率达到 98.6%，单张图像的识别速度仅需 9 毫秒。同时，PEDNet 网络的平均缺陷分类准确率为 97.8%，单张图像的识别速度仅为 0.0021 秒。

{"title":"Automatic detection of defects in electronic plastic packaging using deep convolutional neural networks","authors":"Wanchun Ren, Pengcheng Zhu, Shaofeng Cai, Yi Huang, Haoran Zhao, Youji Hama, Zhu Yan, Tao Zhou, Junde Pu, Hongwei Yang","doi":"10.1007/s11554-024-01534-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01534-5","url":null,"abstract":"As the mainstream chip packaging technology, plastic-encapsulated chips (PEC) suffer from process defects such as delamination and voids, which seriously impact the chip's reliability. Therefore, it is urgent to detect defects promptly and accurately. However, the current manual detection methods cannot meet the application's requirements, as they are both inaccurate and inefficient. This study utilized the deep convolutional neural network (DCNN) technique to analyze PEC's scanning acoustic microscope (SAM) images and identify their internal defects. First, the SAM technology was used to collect and set up datasets of seven typical PEC defects. Then, according to the characteristics of densely packed PEC and an incredibly tiny size ratio in SAM, a PECNet network was established to detect PEC based on the traditional RetinaNet network, combining the CoTNet50 backbone network and the feature pyramid network structure. Furthermore, a PEDNet was designed to classify PEC defects based on the MobileNetV2 network, integrating cross-local connections and progressive classifiers. The experimental results demonstrated that the PECNet network's chip recognition accuracy reaches 98.6%, and its speed of a single image requires only nine milliseconds. Meanwhile, the PEDNet network's average defect classification accuracy is 97.8%, and the recognition speed of a single image is only 0.0021 s. This method provides a precise and efficient technique for defect detection in PEC.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"79 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time detection model of electrical work safety belt based on lightweight improved YOLOv5 基于轻量级改进型 YOLOv5 的电气工作安全带实时检测模型

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-08-10 DOI: 10.1007/s11554-024-01533-6

Li Liu, Kaiye Huang, Yuang Bai, Qifan Zhang, Yujian Li

Aiming at the issue that the existing aerial work safety belt wearing detection model cannot meet the real-time operation on edge devices, this paper proposes a lightweight aerial work safety belt detection model with higher accuracy. First, the model is made lightweight by introducing Ghost convolution and model pruning. Second, for complex scenarios involving occlusion, color confusion, etc., the model’s performance is optimized by introducing the new up-sampling operator, the attention mechanism, and the feature fusion network. Lastly, the model is trained using knowledge distillation to compensate for accuracy loss resulting from the lightweight design, thereby maintain a higher accuracy. Experimental results based on the Guangdong Power Grid Intelligence Challenge safety belt wearable dataset show that, in the comparison experiments, the improved model, compared with the mainstream object detection algorithm YOU ONLY LOOK ONCE v5s (YOLOv5s), has only 8.7% of the parameters of the former with only 3.7% difference in the mean Average Precision (mAP.50) metrics and the speed is improved by 100.4%. Meanwhile, the ablation experiments show that the improved model’s parameter count is reduced by 66.9% compared with the original model, while mAP.50 decreases by only 1.9%. The overhead safety belt detection model proposed in this paper combines the model’s lightweight design, SimAM attention mechanism, Bidirectional Feature Pyramid Network feature fusion network, Carafe operator, and knowledge distillation training strategy, enabling the model to maintain lightweight and real-time performance while achieving high detection accuracy.

针对现有高空作业安全带佩戴检测模型无法满足边缘设备实时操作的问题，本文提出了一种精度更高的轻量级高空作业安全带检测模型。首先，通过引入 Ghost 卷积和模型剪枝使模型轻量化。其次，对于涉及遮挡、颜色混淆等复杂场景，通过引入新的上采样算子、注意力机制和特征融合网络，优化了模型的性能。最后，利用知识提炼对模型进行训练，以弥补轻量级设计带来的精度损失，从而保持更高的精度。基于广东电网智能挑战赛安全带可穿戴数据集的实验结果表明，在对比实验中，改进后的模型与主流物体检测算法YOU ONLY LOOK ONCE v5s（YOLOv5s）相比，参数仅为前者的8.7%，平均精度（mAP.50）指标仅相差3.7%，速度提高了100.4%。同时，消融实验表明，改进模型的参数数比原始模型减少了 66.9%，而 mAP.50 仅减少了 1.9%。本文提出的架空安全带检测模型结合了模型的轻量级设计、SimAM 注意机制、双向特征金字塔网络特征融合网络、Carafe 算子和知识蒸馏训练策略，使模型在实现高检测精度的同时保持了轻量级和实时性。

{"title":"Real-time detection model of electrical work safety belt based on lightweight improved YOLOv5","authors":"Li Liu, Kaiye Huang, Yuang Bai, Qifan Zhang, Yujian Li","doi":"10.1007/s11554-024-01533-6","DOIUrl":"https://doi.org/10.1007/s11554-024-01533-6","url":null,"abstract":"Aiming at the issue that the existing aerial work safety belt wearing detection model cannot meet the real-time operation on edge devices, this paper proposes a lightweight aerial work safety belt detection model with higher accuracy. First, the model is made lightweight by introducing Ghost convolution and model pruning. Second, for complex scenarios involving occlusion, color confusion, etc., the model’s performance is optimized by introducing the new up-sampling operator, the attention mechanism, and the feature fusion network. Lastly, the model is trained using knowledge distillation to compensate for accuracy loss resulting from the lightweight design, thereby maintain a higher accuracy. Experimental results based on the Guangdong Power Grid Intelligence Challenge safety belt wearable dataset show that, in the comparison experiments, the improved model, compared with the mainstream object detection algorithm YOU ONLY LOOK ONCE v5s (YOLOv5s), has only 8.7% of the parameters of the former with only 3.7% difference in the mean Average Precision (mAP.50) metrics and the speed is improved by 100.4%. Meanwhile, the ablation experiments show that the improved model’s parameter count is reduced by 66.9% compared with the original model, while mAP.50 decreases by only 1.9%. The overhead safety belt detection model proposed in this paper combines the model’s lightweight design, SimAM attention mechanism, Bidirectional Feature Pyramid Network feature fusion network, Carafe operator, and knowledge distillation training strategy, enabling the model to maintain lightweight and real-time performance while achieving high detection accuracy.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"7 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EV-TIFNet: lightweight binocular fusion network assisted by event camera time information for 3D human pose estimation EV-TIFNet：由事件相机时间信息辅助的轻量级双目融合网络，用于三维人体姿态估计

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-08-09 DOI: 10.1007/s11554-024-01528-3

Xin Zhao, Lianping Yang, Wencong Huang, Qi Wang, Xin Wang, Yantao Lou

Human pose estimation using RGB cameras often encounters performance degradation in challenging scenarios such as motion blur or suboptimal lighting. In comparison, event cameras, endowed with a wide dynamic range, microsecond-scale temporal resolution, minimal latency, and low power consumption, demonstrate remarkable adaptability in extreme visual environments. Nevertheless, the exploitation of event cameras for pose estimation in current research has not yet fully harnessed the potential of event-driven data, and enhancing model efficiency remains an ongoing pursuit. This work focuses on devising an efficient, compact pose estimation algorithm, with special attention on optimizing the fusion of multi-view event streams for improved pose prediction accuracy. We propose EV-TIFNet, a compact dual-view interactive network, which incorporates event frames along with our custom-designed Global Spatio-Temporal Feature Maps (GTF Maps). To enhance the network’s ability to understand motion characteristics and localize keypoints, we have tailored a dedicated Auxiliary Information Extraction Module (AIE Module) for the GTF Maps. Experimental results demonstrate that our model, with a compact parameter count of 0.55 million, achieves notable advancements on the DHP19 dataset, reducing the (hbox {MPJPE}_{3D}) to 61.45 mm. Building upon the sparsity of event data, the integration of sparse convolution operators replaces a significant portion of traditional convolutional layers, leading to a reduction in computational demand by 28.3%, totalling 8.71 GFLOPs. These design choices highlight the model’s suitability and efficiency in scenarios where computational resources are limited.

使用 RGB 摄像机进行人体姿态估计时，在运动模糊或光照不佳等具有挑战性的情况下往往会出现性能下降。相比之下，事件相机具有宽动态范围、微秒级时间分辨率、最小延迟和低功耗等特点，在极端视觉环境中表现出卓越的适应性。然而，在目前的研究中，利用事件相机进行姿态估计的方法尚未充分利用事件驱动数据的潜力，提高模型效率仍是一项持续的追求。这项工作的重点是设计一种高效、紧凑的姿态估计算法，并特别关注优化多视角事件流的融合，以提高姿态预测的准确性。我们提出了一种紧凑型双视角交互网络 EV-TIFNet，它将事件帧与我们定制设计的全局时空特征图（GTF 地图）结合在一起。为了提高网络理解运动特征和定位关键点的能力，我们为 GTF 地图量身定制了专用的辅助信息提取模块（AIE 模块）。实验结果表明，我们的模型拥有55万个精简参数，在DHP19数据集上取得了显著进步，将（hbox {MPJPE}_{3D}）减小到61.45毫米。基于事件数据的稀疏性，稀疏卷积算子的集成取代了传统卷积层的很大一部分，从而将计算需求降低了 28.3%，总计 8.71 GFLOPs。这些设计选择凸显了该模型在计算资源有限的情况下的适用性和效率。

{"title":"EV-TIFNet: lightweight binocular fusion network assisted by event camera time information for 3D human pose estimation","authors":"Xin Zhao, Lianping Yang, Wencong Huang, Qi Wang, Xin Wang, Yantao Lou","doi":"10.1007/s11554-024-01528-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01528-3","url":null,"abstract":"Human pose estimation using RGB cameras often encounters performance degradation in challenging scenarios such as motion blur or suboptimal lighting. In comparison, event cameras, endowed with a wide dynamic range, microsecond-scale temporal resolution, minimal latency, and low power consumption, demonstrate remarkable adaptability in extreme visual environments. Nevertheless, the exploitation of event cameras for pose estimation in current research has not yet fully harnessed the potential of event-driven data, and enhancing model efficiency remains an ongoing pursuit. This work focuses on devising an efficient, compact pose estimation algorithm, with special attention on optimizing the fusion of multi-view event streams for improved pose prediction accuracy. We propose EV-TIFNet, a compact dual-view interactive network, which incorporates event frames along with our custom-designed Global Spatio-Temporal Feature Maps (GTF Maps). To enhance the network’s ability to understand motion characteristics and localize keypoints, we have tailored a dedicated Auxiliary Information Extraction Module (AIE Module) for the GTF Maps. Experimental results demonstrate that our model, with a compact parameter count of 0.55 million, achieves notable advancements on the DHP19 dataset, reducing the (hbox {MPJPE}_{3D}) to 61.45 mm. Building upon the sparsity of event data, the integration of sparse convolution operators replaces a significant portion of traditional convolutional layers, leading to a reduction in computational demand by 28.3%, totalling 8.71 GFLOPs. These design choices highlight the model’s suitability and efficiency in scenarios where computational resources are limited.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"85 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lightweight and privacy-preserving hierarchical federated learning mechanism for artificial intelligence-generated image content 针对人工智能生成的图像内容的轻量级和保护隐私的分层联合学习机制

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-08-08 DOI: 10.1007/s11554-024-01524-7

Bingquan Wang, Fangling Yang

With the rapid development of artificial intelligence and Big Data, the application of artificial intelligence-generated image content (AIGIC) is becoming increasingly widespread in various fields. However, the image data utilized by AIGIC is diverse and often contains sensitive personal information, characterized by heterogeneity and privacy concerns. This leads to prolonged implementation times for image data privacy protection, and a high risk of unauthorized third-party access, resulting in serious privacy breaches and security risks. To address this issue, this paper combines Hierarchical Federated Learning (HFL) with Homomorphic Encryption to first address the encryption and transmission challenges in the image processing pipeline of AIGIC. Building upon this foundation, a novel HFL group collaborative training strategy is designed to further streamline the privacy protection process of AIGIC image data, effectively masking the heterogeneity of raw image data and achieving balanced allocation of computational resources. Additionally, a model compression algorithm based on pruning is introduced to alleviate the data transmission pressure in the image encryption process. Optimization of the homomorphic encryption modulo operations significantly reduces the computational burden, enabling real-time enhancement of image data privacy protection from multiple dimensions including computational and transmission resources. To verify the effectiveness of the proposed mechanism, extensive simulation verification of the lightweight privacy protection process for AIGIC image data was performed, and a comparative analysis of the time complexity of the mechanism was conducted. Experimental results indicate substantial advantages of the proposed algorithm over traditional real-time privacy protection algorithms in AIGIC.

随着人工智能和大数据的快速发展，人工智能生成的图像内容（AIGIC）在各个领域的应用日益广泛。然而，人工智能生成的图像数据种类繁多，往往包含敏感的个人信息，具有异质性和隐私问题的特点。这导致图像数据隐私保护的实施时间延长，而且第三方未经授权访问的风险很高，从而造成严重的隐私泄露和安全风险。为解决这一问题，本文将分层联合学习（HFL）与同态加密相结合，首先解决了 AIGIC 图像处理流水线中的加密和传输难题。在此基础上，设计了一种新颖的 HFL 群体协作训练策略，进一步简化了 AIGIC 图像数据的隐私保护流程，有效掩盖了原始图像数据的异质性，实现了计算资源的均衡分配。此外，还引入了基于剪枝的模型压缩算法，以减轻图像加密过程中的数据传输压力。同态加密模操作的优化大大减轻了计算负担，从计算资源和传输资源等多个维度实时加强了图像数据的隐私保护。为验证所提机制的有效性，对 AIGIC 图像数据的轻量级隐私保护过程进行了大量仿真验证，并对该机制的时间复杂度进行了对比分析。实验结果表明，与传统的 AIGIC 实时隐私保护算法相比，所提出的算法具有很大的优势。

{"title":"Lightweight and privacy-preserving hierarchical federated learning mechanism for artificial intelligence-generated image content","authors":"Bingquan Wang, Fangling Yang","doi":"10.1007/s11554-024-01524-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01524-7","url":null,"abstract":"With the rapid development of artificial intelligence and Big Data, the application of artificial intelligence-generated image content (AIGIC) is becoming increasingly widespread in various fields. However, the image data utilized by AIGIC is diverse and often contains sensitive personal information, characterized by heterogeneity and privacy concerns. This leads to prolonged implementation times for image data privacy protection, and a high risk of unauthorized third-party access, resulting in serious privacy breaches and security risks. To address this issue, this paper combines Hierarchical Federated Learning (HFL) with Homomorphic Encryption to first address the encryption and transmission challenges in the image processing pipeline of AIGIC. Building upon this foundation, a novel HFL group collaborative training strategy is designed to further streamline the privacy protection process of AIGIC image data, effectively masking the heterogeneity of raw image data and achieving balanced allocation of computational resources. Additionally, a model compression algorithm based on pruning is introduced to alleviate the data transmission pressure in the image encryption process. Optimization of the homomorphic encryption modulo operations significantly reduces the computational burden, enabling real-time enhancement of image data privacy protection from multiple dimensions including computational and transmission resources. To verify the effectiveness of the proposed mechanism, extensive simulation verification of the lightweight privacy protection process for AIGIC image data was performed, and a comparative analysis of the time complexity of the mechanism was conducted. Experimental results indicate substantial advantages of the proposed algorithm over traditional real-time privacy protection algorithms in AIGIC.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"4 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

YOLO-LF: a lightweight multi-scale feature fusion algorithm for wheat spike detection YOLO-LF：用于小麦穗检测的轻量级多尺度特征融合算法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-08-08 DOI: 10.1007/s11554-024-01529-2

Shuren Zhou, Shengzhen Long

Wheat is one of the most significant crops in China, as its yield directly affects the country’s food security. Due to its dense, overlapping, and relatively fuzzy distribution, wheat spikes are prone to being missed in practical detection. Existing object detection models suffer from large model size, high computational complexity, and long computation times. Consequently, this study proposes a lightweight real-time wheat spike detection model called YOLO-LF. Initially, a lightweight backbone network is improved to reduce the model size and lower the number of parameters, thereby improving the runtime speed. Second, the structure of the neck is redesigned in the context of the wheat spike dataset to enhance the feature extraction capability of the network for wheat spikes and to achieve lightweightness. Finally, a lightweight detection head was designed to significantly reduce the FLOPs of the model and achieve further lightweighting. Experimental results on the test set indicate that the size of our model is 1.7 MB, the number of parameters is 0.76 M, and the FLOPs are 2.9, which represent reductions of 73, 74, and 64% compared to YOLOv8n, respectively. Our model demonstrates a latency of 8.6 ms and an FPS of 115 on Titan X, whereas YOLOv8n has a latency of 10.2 ms and an FPS of 97 on the same hardware. In contrast, our model is more lightweight and faster to detect, while the mAP@0.5 only decreases by 0.9%, outperforming YOLOv8 and other mainstream detection networks in overall performance. Consequently, our model can be deployed on mobile devices to provide effective assistance in the real-time detection of wheat spikes.

小麦是中国最重要的农作物之一，其产量直接影响到国家的粮食安全。由于麦穗分布密集、重叠且相对模糊，在实际检测中很容易被遗漏。现有的物体检测模型存在模型体积大、计算复杂度高、计算时间长等问题。因此，本研究提出了一种名为 YOLO-LF 的轻量级实时麦穗检测模型。首先，改进了轻量级骨干网络，减小了模型体积，降低了参数数量，从而提高了运行速度。其次，根据小麦穗数据集重新设计了颈部结构，以增强网络对小麦穗的特征提取能力，实现轻量化。最后，设计了一个轻量级检测头，大大减少了模型的 FLOPs，实现了进一步的轻量化。测试集上的实验结果表明，我们的模型大小为 1.7 MB，参数数为 0.76 M，FLOPs 为 2.9，与 YOLOv8n 相比分别减少了 73%、74% 和 64%。我们的模型在 Titan X 上的延迟为 8.6 毫秒，FPS 为 115，而 YOLOv8n 在相同硬件上的延迟为 10.2 毫秒，FPS 为 97。相比之下，我们的模型更轻便，检测速度更快，而 mAP@0.5 仅降低了 0.9%，整体性能优于 YOLOv8 和其他主流检测网络。因此，我们的模型可以部署在移动设备上，为实时检测麦穗提供有效帮助。

{"title":"YOLO-LF: a lightweight multi-scale feature fusion algorithm for wheat spike detection","authors":"Shuren Zhou, Shengzhen Long","doi":"10.1007/s11554-024-01529-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01529-2","url":null,"abstract":"Wheat is one of the most significant crops in China, as its yield directly affects the country’s food security. Due to its dense, overlapping, and relatively fuzzy distribution, wheat spikes are prone to being missed in practical detection. Existing object detection models suffer from large model size, high computational complexity, and long computation times. Consequently, this study proposes a lightweight real-time wheat spike detection model called YOLO-LF. Initially, a lightweight backbone network is improved to reduce the model size and lower the number of parameters, thereby improving the runtime speed. Second, the structure of the neck is redesigned in the context of the wheat spike dataset to enhance the feature extraction capability of the network for wheat spikes and to achieve lightweightness. Finally, a lightweight detection head was designed to significantly reduce the FLOPs of the model and achieve further lightweighting. Experimental results on the test set indicate that the size of our model is 1.7 MB, the number of parameters is 0.76 M, and the FLOPs are 2.9, which represent reductions of 73, 74, and 64% compared to YOLOv8n, respectively. Our model demonstrates a latency of 8.6 ms and an FPS of 115 on Titan X, whereas YOLOv8n has a latency of 10.2 ms and an FPS of 97 on the same hardware. In contrast, our model is more lightweight and faster to detect, while the mAP@0.5 only decreases by 0.9%, outperforming YOLOv8 and other mainstream detection networks in overall performance. Consequently, our model can be deployed on mobile devices to provide effective assistance in the real-time detection of wheat spikes.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"15 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

WoodGLNet: a multi-scale network integrating global and local information for real-time classification of wood images WoodGLNet：整合全局和局部信息的多尺度网络，用于木材图像的实时分类

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-08-05 DOI: 10.1007/s11554-024-01521-w

Zhishuai Zheng, Zhedong Ge, Zhikang Tian, Xiaoxia Yang, Yucheng Zhou

Current research on image classification has combined convolutional neural networks (CNNs) and transformers to introduce inductive biases to the model, enhancing its ability to handle long-range dependencies. However, these integrated models have limitations. Standard CNNs have a static nature, restricting their convolution from dynamically adjusting to input images, thus limiting feature expression capabilities. In addition, the static nature of CNNs impedes the seamless integration between features dynamically generated by self-attention mechanisms and static features generated by convolution when combined with transformers. Furthermore, during image processing, each model stage contains abundant information that cannot be fully utilized by single-scale convolution, ultimately impacting the network’s classification performance. To tackle these challenges, we propose WoodGLNet, a real-time multi-scale pyramid network that aggregates global and local information in an input-dependent manner and facilitates feature interaction through three scales of convolution. WoodGLNet utilizes efficient multi-scale global spatial decay attention modules and input-dependent multi-scale dynamic convolutions at different stages, enhancing the network’s inductive biases and expanding the effective receptive field. In CIFAR100 and CIFAR10 image classification tasks, WoodGLNet-T achieves Top-1 accuracies of 76.34% and 92.35%, respectively, outperforming EfficientNet-B3 by 1.03 and 0.86 percentage points. WoodGLNet-S and WoodGLNet-B attain Top-1 accuracies of 77.56%, 93.66%, and 80.12%, 94.27%, respectively. The experimental subjects of this study were sourced from the Shandong Province Construction Structural Material Specimen Museum, tasked with wood testing and requiring high real-time performance. To assess WoodGLNet’s real-time detection capabilities, 20 types of precious wood from the museum were identified in real time using the WoodGLNet network. The results indicated that WoodGLNet achieved a classification accuracy of up to 99.60%, with a recognition time of 0.013 s per single image. These findings demonstrate the network’s exceptional real-time classification and generalization abilities.

目前的图像分类研究已将卷积神经网络（CNN）和变压器结合起来，为模型引入感应偏差，从而增强其处理远距离依赖关系的能力。不过，这些集成模型也有局限性。标准的卷积神经网络具有静态特性，限制了其卷积对输入图像进行动态调整，从而限制了特征表达能力。此外，CNN 的静态特性也阻碍了自注意机制动态生成的特征与卷积生成的静态特征在与变换器结合时的无缝整合。此外，在图像处理过程中，每个模型阶段都包含丰富的信息，单尺度卷积无法充分利用这些信息，最终影响网络的分类性能。为了应对这些挑战，我们提出了一种实时多尺度金字塔网络（WoodGLNet），它以依赖输入的方式聚合全局和局部信息，并通过三个尺度的卷积促进特征交互。WoodGLNet 在不同阶段利用高效的多尺度全局空间衰减注意模块和依赖输入的多尺度动态卷积，增强了网络的感应偏差，扩大了有效感受野。在 CIFAR100 和 CIFAR10 图像分类任务中，WoodGLNet-T 的 Top-1 准确率分别达到 76.34% 和 92.35%，分别比 EfficientNet-B3 高出 1.03 和 0.86 个百分点。WoodGLNet-S和WoodGLNet-B的Top-1准确率分别为77.56%和93.66%，以及80.12%和94.27%。本研究的实验对象来自山东省建筑结构材料标本馆，承担着木材检测任务，对实时性要求较高。为了评估WoodGLNet的实时检测能力，利用WoodGLNet网络对该馆的20种珍贵木材进行了实时识别。结果表明，WoodGLNet 的分类准确率高达 99.60%，每张图像的识别时间为 0.013 秒。这些结果证明了该网络卓越的实时分类和泛化能力。

{"title":"WoodGLNet: a multi-scale network integrating global and local information for real-time classification of wood images","authors":"Zhishuai Zheng, Zhedong Ge, Zhikang Tian, Xiaoxia Yang, Yucheng Zhou","doi":"10.1007/s11554-024-01521-w","DOIUrl":"https://doi.org/10.1007/s11554-024-01521-w","url":null,"abstract":"Current research on image classification has combined convolutional neural networks (CNNs) and transformers to introduce inductive biases to the model, enhancing its ability to handle long-range dependencies. However, these integrated models have limitations. Standard CNNs have a static nature, restricting their convolution from dynamically adjusting to input images, thus limiting feature expression capabilities. In addition, the static nature of CNNs impedes the seamless integration between features dynamically generated by self-attention mechanisms and static features generated by convolution when combined with transformers. Furthermore, during image processing, each model stage contains abundant information that cannot be fully utilized by single-scale convolution, ultimately impacting the network’s classification performance. To tackle these challenges, we propose WoodGLNet, a real-time multi-scale pyramid network that aggregates global and local information in an input-dependent manner and facilitates feature interaction through three scales of convolution. WoodGLNet utilizes efficient multi-scale global spatial decay attention modules and input-dependent multi-scale dynamic convolutions at different stages, enhancing the network’s inductive biases and expanding the effective receptive field. In CIFAR100 and CIFAR10 image classification tasks, WoodGLNet-T achieves Top-1 accuracies of 76.34% and 92.35%, respectively, outperforming EfficientNet-B3 by 1.03 and 0.86 percentage points. WoodGLNet-S and WoodGLNet-B attain Top-1 accuracies of 77.56%, 93.66%, and 80.12%, 94.27%, respectively. The experimental subjects of this study were sourced from the Shandong Province Construction Structural Material Specimen Museum, tasked with wood testing and requiring high real-time performance. To assess WoodGLNet’s real-time detection capabilities, 20 types of precious wood from the museum were identified in real time using the WoodGLNet network. The results indicated that WoodGLNet achieved a classification accuracy of up to 99.60%, with a recognition time of 0.013 s per single image. These findings demonstrate the network’s exceptional real-time classification and generalization abilities.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"24 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Railway rutting defects detection based on improved RT-DETR 基于改进型 RT-DETR 的铁路车辙缺陷检测

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-08-05 DOI: 10.1007/s11554-024-01530-9

Chenghai Yu, Xiangwei Chen

Railway turnouts are critical components of the rail track system, and their defects can lead to severe safety incidents and significant property damage. The irregular distribution and varying sizes of railway-turnout defects, combined with changing environmental lighting and complex backgrounds, pose challenges for traditional detection methods, often resulting in low accuracy and poor real-time performance. To address the issue of improving the detection performance of railway-turnout defects, this study proposes a high-precision recognition model, Faster-Hilo-BiFPN-DETR (FHB-DETR), based on the RT-DETR architecture. First, we designed the Faster CGLU module based on Faster Block, which optimizes the aggregation of local and global feature information through partial convolution and gating mechanisms. This approach reduces both computational load and parameter count while enhancing feature extraction capabilities. Second, we replaced the multi-head self-attention mechanism with Hilo attention, reducing parameter count and computational load, and improving real-time performance. In terms of feature fusion, we utilized BiFPN instead of CCFF to better capture subtle defect features and optimized the weighting of feature maps through a weighted mechanism. Experimental results show that compared to RT-DETR, FHB-DETR improved mAP50 by 3.5%, reduced parameter count by 25%, and decreased computational complexity by 6%, while maintaining a high frame rate, meeting real-time performance requirements.

铁路道岔是铁路轨道系统的重要组成部分，其缺陷可导致严重的安全事故和重大财产损失。铁路道岔缺陷分布不规则、大小不一，再加上环境光线多变、背景复杂，给传统的检测方法带来了挑战，往往导致检测精度低、实时性差。为了解决提高铁路道岔缺陷检测性能的问题，本研究在 RT-DETR 架构的基础上提出了一种高精度识别模型 Faster-Hilo-BiFPN-DETR（FHB-DETR）。首先，我们在 Faster Block 的基础上设计了 Faster CGLU 模块，该模块通过部分卷积和门控机制优化了局部和全局特征信息的聚合。这种方法既减少了计算负荷和参数数量，又增强了特征提取能力。其次，我们用 Hilo attention 取代了多头自关注机制，减少了参数数量和计算负荷，提高了实时性。在特征融合方面，我们使用 BiFPN 代替 CCFF，以更好地捕捉细微缺陷特征，并通过加权机制优化了特征图的权重。实验结果表明，与 RT-DETR 相比，FHB-DETR 的 mAP50 提高了 3.5%，参数数量减少了 25%，计算复杂度降低了 6%，同时保持了较高的帧速率，满足了实时性要求。

{"title":"Railway rutting defects detection based on improved RT-DETR","authors":"Chenghai Yu, Xiangwei Chen","doi":"10.1007/s11554-024-01530-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01530-9","url":null,"abstract":"Railway turnouts are critical components of the rail track system, and their defects can lead to severe safety incidents and significant property damage. The irregular distribution and varying sizes of railway-turnout defects, combined with changing environmental lighting and complex backgrounds, pose challenges for traditional detection methods, often resulting in low accuracy and poor real-time performance. To address the issue of improving the detection performance of railway-turnout defects, this study proposes a high-precision recognition model, Faster-Hilo-BiFPN-DETR (FHB-DETR), based on the RT-DETR architecture. First, we designed the Faster CGLU module based on Faster Block, which optimizes the aggregation of local and global feature information through partial convolution and gating mechanisms. This approach reduces both computational load and parameter count while enhancing feature extraction capabilities. Second, we replaced the multi-head self-attention mechanism with Hilo attention, reducing parameter count and computational load, and improving real-time performance. In terms of feature fusion, we utilized BiFPN instead of CCFF to better capture subtle defect features and optimized the weighting of feature maps through a weighted mechanism. Experimental results show that compared to RT-DETR, FHB-DETR improved mAP50 by 3.5%, reduced parameter count by 25%, and decreased computational complexity by 6%, while maintaining a high frame rate, meeting real-time performance requirements.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"83 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Embedded planogram compliance control system 嵌入式平面图合规控制系统

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-08-05 DOI: 10.1007/s11554-024-01525-6

Mehmet Erkin Yücel, Serkan Topaloğlu, Cem Ünsalan

The retail sector presents several open and challenging problems that could benefit from advanced pattern recognition and computer vision techniques. One such critical challenge is planogram compliance control. In this study, we propose a complete embedded system to tackle this issue. Our system consists of four key components as image acquisition and transfer via stand-alone embedded camera module, object detection via computer vision and deep learning methods working on single-board computers, planogram compliance control method again working on single-board computers, and energy harvesting and power management block to accompany the embedded camera modules. The image acquisition and transfer block is implemented on the ESP-EYE camera module. The object detection block is based on YOLOv5 as the deep learning method and local feature extraction. We implement these methods on Raspberry Pi 4, NVIDIA Jetson Orin Nano, and NVIDIA Jetson AGX Orin as single-board computers. The planogram compliance control block utilizes sequence alignment through a modified Needleman–Wunsch algorithm. This block is also working along with the object detection block on the same single-board computers. The energy harvesting and power management block consists of solar and RF energy-harvesting modules with suitable battery pack for operation. We tested the proposed embedded planogram compliance control system on two different datasets to provide valuable insights on its strengths and weaknesses. The results show that the proposed method achieves F1 scores of 0.997 and 1.0 in object detection and planogram compliance control blocks, respectively. Furthermore, we calculated that the complete embedded system can work in stand-alone form up to 2 years based on battery. This duration can be further extended with the integration of the proposed solar and RF energy-harvesting options.

零售业面临着一些有待解决且极具挑战性的问题，先进的模式识别和计算机视觉技术可以帮助解决这些问题。其中一个关键挑战就是平面图合规性控制。在本研究中，我们提出了一个完整的嵌入式系统来解决这一问题。我们的系统由四个关键部分组成：通过独立的嵌入式摄像头模块进行图像采集和传输；通过单板计算机上的计算机视觉和深度学习方法进行物体检测；再次通过单板计算机进行平面图顺应性控制；以及与嵌入式摄像头模块配套的能量收集和电源管理模块。图像采集和传输模块在 ESP-EYE 摄像头模块上实现。物体检测模块基于 YOLOv5 作为深度学习方法和局部特征提取。我们在 Raspberry Pi 4、NVIDIA Jetson Orin Nano 和 NVIDIA Jetson AGX Orin 单板计算机上实现了这些方法。平面图顺应性控制模块通过改进的 Needleman-Wunsch 算法利用序列对齐。该程序块还与物体检测程序块一起在同一单板计算机上工作。能量收集和电源管理模块由太阳能和射频能量收集模块以及合适的电池组组成。我们在两个不同的数据集上测试了所提出的嵌入式平面图顺应性控制系统，以便对其优缺点提供有价值的见解。结果表明，在物体检测和平面图符合性控制模块中，建议的方法分别获得了 0.997 和 1.0 的 F1 分数。此外，我们还计算出，基于电池，整个嵌入式系统可独立工作长达 2 年。在集成了所建议的太阳能和射频能量收集方案后，这一持续时间还可进一步延长。

{"title":"Embedded planogram compliance control system","authors":"Mehmet Erkin Yücel, Serkan Topaloğlu, Cem Ünsalan","doi":"10.1007/s11554-024-01525-6","DOIUrl":"https://doi.org/10.1007/s11554-024-01525-6","url":null,"abstract":"The retail sector presents several open and challenging problems that could benefit from advanced pattern recognition and computer vision techniques. One such critical challenge is planogram compliance control. In this study, we propose a complete embedded system to tackle this issue. Our system consists of four key components as image acquisition and transfer via stand-alone embedded camera module, object detection via computer vision and deep learning methods working on single-board computers, planogram compliance control method again working on single-board computers, and energy harvesting and power management block to accompany the embedded camera modules. The image acquisition and transfer block is implemented on the ESP-EYE camera module. The object detection block is based on YOLOv5 as the deep learning method and local feature extraction. We implement these methods on Raspberry Pi 4, NVIDIA Jetson Orin Nano, and NVIDIA Jetson AGX Orin as single-board computers. The planogram compliance control block utilizes sequence alignment through a modified Needleman–Wunsch algorithm. This block is also working along with the object detection block on the same single-board computers. The energy harvesting and power management block consists of solar and RF energy-harvesting modules with suitable battery pack for operation. We tested the proposed embedded planogram compliance control system on two different datasets to provide valuable insights on its strengths and weaknesses. The results show that the proposed method achieves F1 scores of 0.997 and 1.0 in object detection and planogram compliance control blocks, respectively. Furthermore, we calculated that the complete embedded system can work in stand-alone form up to 2 years based on battery. This duration can be further extended with the integration of the proposed solar and RF energy-harvesting options.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"58 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0