Fei Yan, Guangyao Jin, Zheng Mu, Shouxing Zhang, Yinghao Cai, Tao Lu, Yan Zhuang
In the past decades, substantial progress has been made in human action recognition. However, most existing studies and datasets for human action recognition utilise still images or videos as the primary modality. Image-based approaches can be easily impacted by adverse environmental conditions. In this paper, the authors propose combining RGB images and point clouds from LiDAR sensors for human action recognition. A dynamic lateral convolutional network (DLCN) is proposed to fuse features from multi-modalities. The RGB features and the geometric information from the point clouds closely interact with each other in the DLCN, which is complementary in action recognition. The experimental results on the JRDB-Act dataset demonstrate that the proposed DLCN outperforms the state-of-the-art approaches of human action recognition. The authors show the potential of the proposed DLCN in various complex scenarios, which is highly valuable in real-world applications.
{"title":"Novel vision-LiDAR fusion framework for human action recognition based on dynamic lateral connection","authors":"Fei Yan, Guangyao Jin, Zheng Mu, Shouxing Zhang, Yinghao Cai, Tao Lu, Yan Zhuang","doi":"10.1049/csy2.70005","DOIUrl":"https://doi.org/10.1049/csy2.70005","url":null,"abstract":"<p>In the past decades, substantial progress has been made in human action recognition. However, most existing studies and datasets for human action recognition utilise still images or videos as the primary modality. Image-based approaches can be easily impacted by adverse environmental conditions. In this paper, the authors propose combining RGB images and point clouds from LiDAR sensors for human action recognition. A dynamic lateral convolutional network (DLCN) is proposed to fuse features from multi-modalities. The RGB features and the geometric information from the point clouds closely interact with each other in the DLCN, which is complementary in action recognition. The experimental results on the JRDB-Act dataset demonstrate that the proposed DLCN outperforms the state-of-the-art approaches of human action recognition. The authors show the potential of the proposed DLCN in various complex scenarios, which is highly valuable in real-world applications.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"6 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Small convolutional neural network (CNN)-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer-based large model and the CNN-based small network. In this paper, the authors develop the first heterogeneous self-supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN-based models in a self-supervised fashion. Our method builds a bridge between transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced self-supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet-50 (sparse) from 76.98% to 80.01%.
{"title":"Big2Small: Learning from masked image modelling with heterogeneous self-supervised knowledge distillation","authors":"Ziming Wang, Shumin Han, Xiaodi Wang, Jing Hao, Xianbin Cao, Baochang Zhang","doi":"10.1049/csy2.70002","DOIUrl":"https://doi.org/10.1049/csy2.70002","url":null,"abstract":"<p>Small convolutional neural network (CNN)-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer-based large model and the CNN-based small network. In this paper, the authors develop the first heterogeneous self-supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN-based models in a self-supervised fashion. Our method builds a bridge between transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced self-supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet-50 (sparse) from 76.98% to 80.01%.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"6 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Augmented reality (AR) is gaining traction in the field of computer-assisted treatment (CAT). Head-mounted display (HMD)-based AR in CAT provides dentists with enhanced visualisation by directly overlaying a three-dimensional (3D) model on a real patient during dental treatment. However, conventional AR-based treatments rely on optical markers and trackers, which makes them tedious, expensive, and uncomfortable for dentists. Therefore, a markerless image-to-patient tracking system is necessary to overcome these challenges and enhance system efficiency. This paper proposes a novel feature-based markerless calibration and navigation method for an HMD-based AR visualisation system. The authors address three sub-challenges: firstly, synthetic RGB-D data for anatomical landmark detection is generated to train a deep convolutional neural network (DCNN); secondly, the HMD is automatically calibrated using detected anatomical landmarks, eliminating the need for user input or optical trackers; and thirdly, a multi-iterative closest point (ICP) algorithm is developed for effective 3D-3D real-time navigation. The authors conduct several experiments on a commercially available HMD (HoloLens 2). Finally, the authors compare and evaluate the approach against state-of-the-art methods that employ HoloLens. The proposed method achieves a calibration virtual-to-real re-projection distance of (1.09 ± 0.23) mm and navigation projection errors and accuracies of approximately (0.53 ± 0.19) mm and 93.87%, respectively.
{"title":"Automatic feature-based markerless calibration and navigation method for augmented reality assisted dental treatment","authors":"Faizan Ahmad, Jing Xiong, Zeyang Xia","doi":"10.1049/csy2.70003","DOIUrl":"https://doi.org/10.1049/csy2.70003","url":null,"abstract":"<p>Augmented reality (AR) is gaining traction in the field of computer-assisted treatment (CAT). Head-mounted display (HMD)-based AR in CAT provides dentists with enhanced visualisation by directly overlaying a three-dimensional (3D) model on a real patient during dental treatment. However, conventional AR-based treatments rely on optical markers and trackers, which makes them tedious, expensive, and uncomfortable for dentists. Therefore, a markerless image-to-patient tracking system is necessary to overcome these challenges and enhance system efficiency. This paper proposes a novel feature-based markerless calibration and navigation method for an HMD-based AR visualisation system. The authors address three sub-challenges: firstly, synthetic RGB-D data for anatomical landmark detection is generated to train a deep convolutional neural network (DCNN); secondly, the HMD is automatically calibrated using detected anatomical landmarks, eliminating the need for user input or optical trackers; and thirdly, a multi-iterative closest point (ICP) algorithm is developed for effective 3D-3D real-time navigation. The authors conduct several experiments on a commercially available HMD (HoloLens 2). Finally, the authors compare and evaluate the approach against state-of-the-art methods that employ HoloLens. The proposed method achieves a calibration virtual-to-real re-projection distance of (1.09 ± 0.23) mm and navigation projection errors and accuracies of approximately (0.53 ± 0.19) mm and 93.87%, respectively.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"6 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yizhen Sun, Junyou Yang, Donghui Zhao, Moses Chukwuka Okonkwo, Jianmin Zhang, Shuoyu Wang, Yang Liu
The advancements in intelligent manufacturing have made high-precision trajectory tracking technology crucial for improving the efficiency and safety of in-factory cargo transportation. This study addresses the limitations of current forklift navigation systems in trajectory control accuracy and stability by proposing the Enhanced Stability and Safety Model Predictive Control (ESS-MPC) method. This approach includes a multi-constraint strategy for improved stability and safety. The kinematic model for a single front steering-wheel forklift vehicle is constructed with all known state quantities, including the steering angle, resulting in a more accurate model description and trajectory prediction. To ensure vehicle safety, the spatial safety boundary obtained from the trajectory planning module is established as a hard constraint for ESS-MPC tracking. The optimisation constraints are also updated with the key kinematic and dynamic parameters of the forklift. The ESS-MPC method improved the position and pose accuracy and stability by 57.93%, 37.83%, and 57.51%, respectively, as demonstrated through experimental validation using simulation and real-world environments. This study provides significant support for the development of autonomous navigation systems for industrial forklifts.
{"title":"Enhancing stability and safety: A novel multi-constraint model predictive control approach for forklift trajectory","authors":"Yizhen Sun, Junyou Yang, Donghui Zhao, Moses Chukwuka Okonkwo, Jianmin Zhang, Shuoyu Wang, Yang Liu","doi":"10.1049/csy2.70004","DOIUrl":"https://doi.org/10.1049/csy2.70004","url":null,"abstract":"<p>The advancements in intelligent manufacturing have made high-precision trajectory tracking technology crucial for improving the efficiency and safety of in-factory cargo transportation. This study addresses the limitations of current forklift navigation systems in trajectory control accuracy and stability by proposing the Enhanced Stability and Safety Model Predictive Control (ESS-MPC) method. This approach includes a multi-constraint strategy for improved stability and safety. The kinematic model for a single front steering-wheel forklift vehicle is constructed with all known state quantities, including the steering angle, resulting in a more accurate model description and trajectory prediction. To ensure vehicle safety, the spatial safety boundary obtained from the trajectory planning module is established as a hard constraint for ESS-MPC tracking. The optimisation constraints are also updated with the key kinematic and dynamic parameters of the forklift. The ESS-MPC method improved the position and pose accuracy and stability by 57.93%, 37.83%, and 57.51%, respectively, as demonstrated through experimental validation using simulation and real-world environments. This study provides significant support for the development of autonomous navigation systems for industrial forklifts.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"6 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}