The emergency braking and braking distance prediction of an overhead crane pose challenging issues in its safe operation. This paper employs a multilayer perceptron (MLP) to implement an adaptive safe distance prediction functionality for an overhead crane with multiple variations. First, a discrete model of an overhead crane is constructed, and a model predictive control (MPC) model with angle constraints is applied for safe braking. Second, we analysed and selected the input variations of the safe distance prediction model. Subsequently, we permuted the inputs to the MLP and analysed the effect of each input on the accuracy of the MLP in predicting safety distances separately. We constructed a training dataset, and a test dataset and we optimised the safe distance prediction model through the training dataset. Finally, we conducted a comparative analysis between the MLP and nlinfit algorithms, highlighting the superiority of MLP-based adaptive safety distance prediction for bridge cranes. Experiments confirm the method's ability to ensure minimal swing angle during the entire braking process to achieve safe braking. The results underscore the practical utility and novelty of the proposed algorithm.
{"title":"Adaptive Safe Braking and Distance Prediction for Overhead Cranes With Multivariation Using MLP","authors":"Tenglong Zhang, Guoliang Liu, Huili Chen, Guohui Tian, Qingqiang Guo","doi":"10.1049/csy2.70007","DOIUrl":"10.1049/csy2.70007","url":null,"abstract":"<p>The emergency braking and braking distance prediction of an overhead crane pose challenging issues in its safe operation. This paper employs a multilayer perceptron (MLP) to implement an adaptive safe distance prediction functionality for an overhead crane with multiple variations. First, a discrete model of an overhead crane is constructed, and a model predictive control (MPC) model with angle constraints is applied for safe braking. Second, we analysed and selected the input variations of the safe distance prediction model. Subsequently, we permuted the inputs to the MLP and analysed the effect of each input on the accuracy of the MLP in predicting safety distances separately. We constructed a training dataset, and a test dataset and we optimised the safe distance prediction model through the training dataset. Finally, we conducted a comparative analysis between the MLP and nlinfit algorithms, highlighting the superiority of MLP-based adaptive safety distance prediction for bridge cranes. Experiments confirm the method's ability to ensure minimal swing angle during the entire braking process to achieve safe braking. The results underscore the practical utility and novelty of the proposed algorithm.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143749682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Active object detection (AOD) is a crucial task in the field of robotics. A key challenge in household environments for AOD is that the target object is often undetectable due to partial occlusion, which leads to the failure of traditional methods. To address the occlusion problem, this paper first proposes a novel occlusion handling method based on the large multimodal model (LMM). The method utilises an LMM to detect and analyse input RGB images and generates adjustment actions to progressively eliminate occlusion. After the occlusion is handled, an improved AOD method based on a deep Q-learning network (DQN) is used to complete the task. We introduce an attention mechanism to process image features, enabling the model to focus on critical regions of the input images. Additionally, a new reward function is proposed that comprehensively considers the bounding box of the target object and the robot's distance to the object, along with the actions performed by the robot. Experiments on the dataset and in real-world scenarios validate the effectiveness of the proposed method in performing AOD tasks under partial occlusion.
{"title":"Move to See More: Approaching Object With Partial Occlusion Using Large Multimodal Model and Active Object Detection","authors":"Aoqi Wang, Guohui Tian, Yuhao Wang, Zhongyang Li","doi":"10.1049/csy2.70008","DOIUrl":"10.1049/csy2.70008","url":null,"abstract":"<p>Active object detection (AOD) is a crucial task in the field of robotics. A key challenge in household environments for AOD is that the target object is often undetectable due to partial occlusion, which leads to the failure of traditional methods. To address the occlusion problem, this paper first proposes a novel occlusion handling method based on the large multimodal model (LMM). The method utilises an LMM to detect and analyse input RGB images and generates adjustment actions to progressively eliminate occlusion. After the occlusion is handled, an improved AOD method based on a deep Q-learning network (DQN) is used to complete the task. We introduce an attention mechanism to process image features, enabling the model to focus on critical regions of the input images. Additionally, a new reward function is proposed that comprehensively considers the bounding box of the target object and the robot's distance to the object, along with the actions performed by the robot. Experiments on the dataset and in real-world scenarios validate the effectiveness of the proposed method in performing AOD tasks under partial occlusion.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143707626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the field of robotics and visual-based navigation, event cameras are gaining popularity due to their exceptional dynamic range, low power consumption, and rapid response capabilities. These neuromorphic devices facilitate the efficient detection and avoidance of fast moving obstacles, and address common limitations of traditional hardware. However, the majority of state-of-the-art event-based algorithms still rely on conventional computer vision strategies. The goal is to shift from the standard protocols for dynamic obstacle detection by taking inspiration from the time-computational paradigm of biological vision system. In this paper, the authors present an innovative framework inspired by a biological response mechanism triggered by approaching objects, enabling the perception and identification of potential collision threats. The method, validated through both simulation and real-world experimentation, charts a new path in the application of event cameras for dynamic obstacle detection and avoidance in autonomous unmanned aerial vehicles. When compared to conventional methods, the proposed approach demonstrates a success rate of 97% in detecting obstacles within real-world outdoor settings.
{"title":"Bioinspired framework for real-time collision detection with dynamic obstacles in cluttered outdoor environments using event cameras","authors":"Meriem Ben Miled, Wenwen Liu, Yuanchang Liu","doi":"10.1049/csy2.70006","DOIUrl":"10.1049/csy2.70006","url":null,"abstract":"<p>In the field of robotics and visual-based navigation, event cameras are gaining popularity due to their exceptional dynamic range, low power consumption, and rapid response capabilities. These neuromorphic devices facilitate the efficient detection and avoidance of fast moving obstacles, and address common limitations of traditional hardware. However, the majority of state-of-the-art event-based algorithms still rely on conventional computer vision strategies. The goal is to shift from the standard protocols for dynamic obstacle detection by taking inspiration from the time-computational paradigm of biological vision system. In this paper, the authors present an innovative framework inspired by a biological response mechanism triggered by approaching objects, enabling the perception and identification of potential collision threats. The method, validated through both simulation and real-world experimentation, charts a new path in the application of event cameras for dynamic obstacle detection and avoidance in autonomous unmanned aerial vehicles. When compared to conventional methods, the proposed approach demonstrates a success rate of 97% in detecting obstacles within real-world outdoor settings.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Yan, Guangyao Jin, Zheng Mu, Shouxing Zhang, Yinghao Cai, Tao Lu, Yan Zhuang
In the past decades, substantial progress has been made in human action recognition. However, most existing studies and datasets for human action recognition utilise still images or videos as the primary modality. Image-based approaches can be easily impacted by adverse environmental conditions. In this paper, the authors propose combining RGB images and point clouds from LiDAR sensors for human action recognition. A dynamic lateral convolutional network (DLCN) is proposed to fuse features from multi-modalities. The RGB features and the geometric information from the point clouds closely interact with each other in the DLCN, which is complementary in action recognition. The experimental results on the JRDB-Act dataset demonstrate that the proposed DLCN outperforms the state-of-the-art approaches of human action recognition. The authors show the potential of the proposed DLCN in various complex scenarios, which is highly valuable in real-world applications.
{"title":"Novel vision-LiDAR fusion framework for human action recognition based on dynamic lateral connection","authors":"Fei Yan, Guangyao Jin, Zheng Mu, Shouxing Zhang, Yinghao Cai, Tao Lu, Yan Zhuang","doi":"10.1049/csy2.70005","DOIUrl":"10.1049/csy2.70005","url":null,"abstract":"<p>In the past decades, substantial progress has been made in human action recognition. However, most existing studies and datasets for human action recognition utilise still images or videos as the primary modality. Image-based approaches can be easily impacted by adverse environmental conditions. In this paper, the authors propose combining RGB images and point clouds from LiDAR sensors for human action recognition. A dynamic lateral convolutional network (DLCN) is proposed to fuse features from multi-modalities. The RGB features and the geometric information from the point clouds closely interact with each other in the DLCN, which is complementary in action recognition. The experimental results on the JRDB-Act dataset demonstrate that the proposed DLCN outperforms the state-of-the-art approaches of human action recognition. The authors show the potential of the proposed DLCN in various complex scenarios, which is highly valuable in real-world applications.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"6 4","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Small convolutional neural network (CNN)-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer-based large model and the CNN-based small network. In this paper, the authors develop the first heterogeneous self-supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN-based models in a self-supervised fashion. Our method builds a bridge between transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced self-supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet-50 (sparse) from 76.98% to 80.01%.
{"title":"Big2Small: Learning from masked image modelling with heterogeneous self-supervised knowledge distillation","authors":"Ziming Wang, Shumin Han, Xiaodi Wang, Jing Hao, Xianbin Cao, Baochang Zhang","doi":"10.1049/csy2.70002","DOIUrl":"10.1049/csy2.70002","url":null,"abstract":"<p>Small convolutional neural network (CNN)-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer-based large model and the CNN-based small network. In this paper, the authors develop the first heterogeneous self-supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN-based models in a self-supervised fashion. Our method builds a bridge between transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced self-supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet-50 (sparse) from 76.98% to 80.01%.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"6 4","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Augmented reality (AR) is gaining traction in the field of computer-assisted treatment (CAT). Head-mounted display (HMD)-based AR in CAT provides dentists with enhanced visualisation by directly overlaying a three-dimensional (3D) model on a real patient during dental treatment. However, conventional AR-based treatments rely on optical markers and trackers, which makes them tedious, expensive, and uncomfortable for dentists. Therefore, a markerless image-to-patient tracking system is necessary to overcome these challenges and enhance system efficiency. This paper proposes a novel feature-based markerless calibration and navigation method for an HMD-based AR visualisation system. The authors address three sub-challenges: firstly, synthetic RGB-D data for anatomical landmark detection is generated to train a deep convolutional neural network (DCNN); secondly, the HMD is automatically calibrated using detected anatomical landmarks, eliminating the need for user input or optical trackers; and thirdly, a multi-iterative closest point (ICP) algorithm is developed for effective 3D-3D real-time navigation. The authors conduct several experiments on a commercially available HMD (HoloLens 2). Finally, the authors compare and evaluate the approach against state-of-the-art methods that employ HoloLens. The proposed method achieves a calibration virtual-to-real re-projection distance of (1.09 ± 0.23) mm and navigation projection errors and accuracies of approximately (0.53 ± 0.19) mm and 93.87%, respectively.
{"title":"Automatic feature-based markerless calibration and navigation method for augmented reality assisted dental treatment","authors":"Faizan Ahmad, Jing Xiong, Zeyang Xia","doi":"10.1049/csy2.70003","DOIUrl":"10.1049/csy2.70003","url":null,"abstract":"<p>Augmented reality (AR) is gaining traction in the field of computer-assisted treatment (CAT). Head-mounted display (HMD)-based AR in CAT provides dentists with enhanced visualisation by directly overlaying a three-dimensional (3D) model on a real patient during dental treatment. However, conventional AR-based treatments rely on optical markers and trackers, which makes them tedious, expensive, and uncomfortable for dentists. Therefore, a markerless image-to-patient tracking system is necessary to overcome these challenges and enhance system efficiency. This paper proposes a novel feature-based markerless calibration and navigation method for an HMD-based AR visualisation system. The authors address three sub-challenges: firstly, synthetic RGB-D data for anatomical landmark detection is generated to train a deep convolutional neural network (DCNN); secondly, the HMD is automatically calibrated using detected anatomical landmarks, eliminating the need for user input or optical trackers; and thirdly, a multi-iterative closest point (ICP) algorithm is developed for effective 3D-3D real-time navigation. The authors conduct several experiments on a commercially available HMD (HoloLens 2). Finally, the authors compare and evaluate the approach against state-of-the-art methods that employ HoloLens. The proposed method achieves a calibration virtual-to-real re-projection distance of (1.09 ± 0.23) mm and navigation projection errors and accuracies of approximately (0.53 ± 0.19) mm and 93.87%, respectively.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"6 4","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yizhen Sun, Junyou Yang, Donghui Zhao, Moses Chukwuka Okonkwo, Jianmin Zhang, Shuoyu Wang, Yang Liu
The advancements in intelligent manufacturing have made high-precision trajectory tracking technology crucial for improving the efficiency and safety of in-factory cargo transportation. This study addresses the limitations of current forklift navigation systems in trajectory control accuracy and stability by proposing the Enhanced Stability and Safety Model Predictive Control (ESS-MPC) method. This approach includes a multi-constraint strategy for improved stability and safety. The kinematic model for a single front steering-wheel forklift vehicle is constructed with all known state quantities, including the steering angle, resulting in a more accurate model description and trajectory prediction. To ensure vehicle safety, the spatial safety boundary obtained from the trajectory planning module is established as a hard constraint for ESS-MPC tracking. The optimisation constraints are also updated with the key kinematic and dynamic parameters of the forklift. The ESS-MPC method improved the position and pose accuracy and stability by 57.93%, 37.83%, and 57.51%, respectively, as demonstrated through experimental validation using simulation and real-world environments. This study provides significant support for the development of autonomous navigation systems for industrial forklifts.
{"title":"Enhancing stability and safety: A novel multi-constraint model predictive control approach for forklift trajectory","authors":"Yizhen Sun, Junyou Yang, Donghui Zhao, Moses Chukwuka Okonkwo, Jianmin Zhang, Shuoyu Wang, Yang Liu","doi":"10.1049/csy2.70004","DOIUrl":"10.1049/csy2.70004","url":null,"abstract":"<p>The advancements in intelligent manufacturing have made high-precision trajectory tracking technology crucial for improving the efficiency and safety of in-factory cargo transportation. This study addresses the limitations of current forklift navigation systems in trajectory control accuracy and stability by proposing the Enhanced Stability and Safety Model Predictive Control (ESS-MPC) method. This approach includes a multi-constraint strategy for improved stability and safety. The kinematic model for a single front steering-wheel forklift vehicle is constructed with all known state quantities, including the steering angle, resulting in a more accurate model description and trajectory prediction. To ensure vehicle safety, the spatial safety boundary obtained from the trajectory planning module is established as a hard constraint for ESS-MPC tracking. The optimisation constraints are also updated with the key kinematic and dynamic parameters of the forklift. The ESS-MPC method improved the position and pose accuracy and stability by 57.93%, 37.83%, and 57.51%, respectively, as demonstrated through experimental validation using simulation and real-world environments. This study provides significant support for the development of autonomous navigation systems for industrial forklifts.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"6 4","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}