Pub Date : 2023-12-04DOI: 10.1109/ROBIO58561.2023.10354793
Jiangtao Luo, Dongbo Zhang, Tao Yi
As a representative bunch-type fruit,the collision-free and undamaged harvesting of grapes is of great significance. To obtain accurate 3D spatial semantic information,this paper proposes a method for multi-feature enhanced semantic segmentation model based on Mask R-CNN and PointNet++. Firstly, a depth camera is used to obtain RGBD images. The RGB images are then inputted into the Mask-RCNN network for fast detection of grape bunches. The color and depth information are fused and transformed into point cloud data, followed by the estimation of normal vectors. Finally, the nine-dimensional point cloud,which include spatial location, color information, and normal vectors, are inputted into the improved PointNet++ network to achieve semantic segmentation of grape bunches, peduncles, and leaves. This process obtains the extraction of spatial semantic information from the surrounding area of the bunches. The experimental results show that by incorporating normal vector and color features, the overall accuracy of point cloud segmentation increases to 93.7%, with a mean accuracy of 81.8%. This represents a significant improvement of 12.1% and 13.5% compared to using only positional features. The results demonstrate that the model method presented in this paper can effectively provide precise 3D semantic information to the robot while ensuring both speed and accuracy. This lays the groundwork for subsequent collision-free and damage-free picking.
{"title":"3D Semantic Segmentation for Grape Bunch Point Cloud Based on Feature Enhancement","authors":"Jiangtao Luo, Dongbo Zhang, Tao Yi","doi":"10.1109/ROBIO58561.2023.10354793","DOIUrl":"https://doi.org/10.1109/ROBIO58561.2023.10354793","url":null,"abstract":"As a representative bunch-type fruit,the collision-free and undamaged harvesting of grapes is of great significance. To obtain accurate 3D spatial semantic information,this paper proposes a method for multi-feature enhanced semantic segmentation model based on Mask R-CNN and PointNet++. Firstly, a depth camera is used to obtain RGBD images. The RGB images are then inputted into the Mask-RCNN network for fast detection of grape bunches. The color and depth information are fused and transformed into point cloud data, followed by the estimation of normal vectors. Finally, the nine-dimensional point cloud,which include spatial location, color information, and normal vectors, are inputted into the improved PointNet++ network to achieve semantic segmentation of grape bunches, peduncles, and leaves. This process obtains the extraction of spatial semantic information from the surrounding area of the bunches. The experimental results show that by incorporating normal vector and color features, the overall accuracy of point cloud segmentation increases to 93.7%, with a mean accuracy of 81.8%. This represents a significant improvement of 12.1% and 13.5% compared to using only positional features. The results demonstrate that the model method presented in this paper can effectively provide precise 3D semantic information to the robot while ensuring both speed and accuracy. This lays the groundwork for subsequent collision-free and damage-free picking.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"63 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139187023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-04DOI: 10.1109/ROBIO58561.2023.10354726
W. Ng, Han Yi Wang, Zheng Li
With the increasing surgical need in our aging society, there is a lack of experienced surgical assistants, such as scrub nurses. To facilitate the training of junior scrub nurses and to reduce human errors, e.g., missing surgical items, we develop a speech-image based multimodal AI framework to assist scrub nurses in the operating room. The proposed framework allows real-time instrument type identification and instance detection, which enables junior scrub nurses to become more familiar with the surgical instruments and guides them throughout the surgical procedure. We construct an ex-vivo video-assisted thorascopic surgery dataset and benchmark it on common object detection models, reaching an average precision of 98.5% and an average recall of 98.9% on the state-of-the-art YOLO-v7. Additionally, we implement an oriented bounding box version of YOLO-v7 to address the undesired bounding box suppression in instrument crossing over. By achieving an average precision of 95.6% and an average recall of 97.4%, we improve the average recall by up to 9.2% compared to the previous oriented bounding box version of YOLO-v5. To minimize distraction during surgery, we adopt a deep learning-based automatic speech recognition model to allow surgeons to concentrate on the procedure. Our physical demonstration substantiates the feasibility of the proposed framework in providing real-time guidance and assistance for scrub nurses.
{"title":"Speech-image based Multimodal AI Interaction for Scrub Nurse Assistance in the Operating Room","authors":"W. Ng, Han Yi Wang, Zheng Li","doi":"10.1109/ROBIO58561.2023.10354726","DOIUrl":"https://doi.org/10.1109/ROBIO58561.2023.10354726","url":null,"abstract":"With the increasing surgical need in our aging society, there is a lack of experienced surgical assistants, such as scrub nurses. To facilitate the training of junior scrub nurses and to reduce human errors, e.g., missing surgical items, we develop a speech-image based multimodal AI framework to assist scrub nurses in the operating room. The proposed framework allows real-time instrument type identification and instance detection, which enables junior scrub nurses to become more familiar with the surgical instruments and guides them throughout the surgical procedure. We construct an ex-vivo video-assisted thorascopic surgery dataset and benchmark it on common object detection models, reaching an average precision of 98.5% and an average recall of 98.9% on the state-of-the-art YOLO-v7. Additionally, we implement an oriented bounding box version of YOLO-v7 to address the undesired bounding box suppression in instrument crossing over. By achieving an average precision of 95.6% and an average recall of 97.4%, we improve the average recall by up to 9.2% compared to the previous oriented bounding box version of YOLO-v5. To minimize distraction during surgery, we adopt a deep learning-based automatic speech recognition model to allow surgeons to concentrate on the procedure. Our physical demonstration substantiates the feasibility of the proposed framework in providing real-time guidance and assistance for scrub nurses.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"73 2","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139187027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-04DOI: 10.1109/ROBIO58561.2023.10354662
Maosheng Yang, Lin Xiao, Ce Chen, Yangyi Hu, Yi Sun, Huayan Pu, Wenchuan Jia
In this paper, we propose and design an underwater delta robot for fast seafood grasping. First, the hardware structure of the robot is described in detail. After that, a visual servo control method for fast catching of this underwater delta robot is proposed. The method is able to generate real-time radial trajectories so as to realize the catching based on the swaying of the robot body as well as the movement of the object to be caught. In the actual grasping test, the moving platform and the slave arm can occlude to the target resulting in the loss of target position information. Therefore, we propose a position prediction method to predict the position of the grasped object when occlusion occurs, thus improving the success rate of grasping and ensuring a smooth robot trajectory. Finally, several land and underwater experiments were conducted with good results, which verified the feasibility of the robot structure and algorithm.
{"title":"Fast Visual Servo for Rapidly Seafood Capturing of Underwater Delta Robots","authors":"Maosheng Yang, Lin Xiao, Ce Chen, Yangyi Hu, Yi Sun, Huayan Pu, Wenchuan Jia","doi":"10.1109/ROBIO58561.2023.10354662","DOIUrl":"https://doi.org/10.1109/ROBIO58561.2023.10354662","url":null,"abstract":"In this paper, we propose and design an underwater delta robot for fast seafood grasping. First, the hardware structure of the robot is described in detail. After that, a visual servo control method for fast catching of this underwater delta robot is proposed. The method is able to generate real-time radial trajectories so as to realize the catching based on the swaying of the robot body as well as the movement of the object to be caught. In the actual grasping test, the moving platform and the slave arm can occlude to the target resulting in the loss of target position information. Therefore, we propose a position prediction method to predict the position of the grasped object when occlusion occurs, thus improving the success rate of grasping and ensuring a smooth robot trajectory. Finally, several land and underwater experiments were conducted with good results, which verified the feasibility of the robot structure and algorithm.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"71 11","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139187052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-04DOI: 10.1109/ROBIO58561.2023.10354989
Jingcheng Jiang, Yifang Zhang, N. Tsagarakis
The timing belt transmission offers numerous advantages for legged robots, including high efficiency, impact absorption and large range of joint motion. However, the transmission error under high load remains challenging to locomotion control and further applications of belt transmission. Traditional linear models cannot effectively model the belt deformation under a wide range of tension variations due to the nonlinearity. In this paper, we propose a model of the compensation for the belt transmission error based on the pretension and torque of the pully. The adopted approach bypasses the complexity of elaborate physical model derivations, yielding a non-linear model for transmission system errors through straightforward fitting. Based on the proposed model, an error compensation control is investigated and tested with an one-DoF leg prototype of legged robot. The alignment between experimental results and theoretical analysis demonstrates the accuracy of the modeling and the effectiveness of the error compensation control method. The proposed model provides a convenient and straightforward solution to effectively compensate for the belt transmission errors in legged robots.
{"title":"Modelling and Compensation for Transmission Error of Timing Belt in Legged Robots","authors":"Jingcheng Jiang, Yifang Zhang, N. Tsagarakis","doi":"10.1109/ROBIO58561.2023.10354989","DOIUrl":"https://doi.org/10.1109/ROBIO58561.2023.10354989","url":null,"abstract":"The timing belt transmission offers numerous advantages for legged robots, including high efficiency, impact absorption and large range of joint motion. However, the transmission error under high load remains challenging to locomotion control and further applications of belt transmission. Traditional linear models cannot effectively model the belt deformation under a wide range of tension variations due to the nonlinearity. In this paper, we propose a model of the compensation for the belt transmission error based on the pretension and torque of the pully. The adopted approach bypasses the complexity of elaborate physical model derivations, yielding a non-linear model for transmission system errors through straightforward fitting. Based on the proposed model, an error compensation control is investigated and tested with an one-DoF leg prototype of legged robot. The alignment between experimental results and theoretical analysis demonstrates the accuracy of the modeling and the effectiveness of the error compensation control method. The proposed model provides a convenient and straightforward solution to effectively compensate for the belt transmission errors in legged robots.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"60 9","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139187064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article presents a new visual servoing method based on cosine similarity metric, which focuses on utilizing cosine distance defined by cosine similarity as the optimization objective of histogram-based direct visual servoing (HDVS) to design the servoing control law. As a more compact global descriptor, the histogram makes direct visual servoing more robust against noise than directly using image intensity. Cosine similarity is the cosine value between two vectors, which has been widely employed to calculate the similarity between multidimensional information. The cosine distance derived from the cosine similarity is more sensitive to the directional difference between the histograms, making the proposed method have a larger convergence rate than the existing Matusita distance-based servoing method. This advantage is verified by simulations, and experiments are conducted on a manipulator to further verify the effectiveness of the proposed method in practical situations.
{"title":"Visual Servoing Using Cosine Similarity Metric","authors":"Wenbo Ning, Yecan Yin, Xiangfei Li, Huan Zhao, Yunfeng Fu, Han Ding","doi":"10.1109/ROBIO58561.2023.10354973","DOIUrl":"https://doi.org/10.1109/ROBIO58561.2023.10354973","url":null,"abstract":"This article presents a new visual servoing method based on cosine similarity metric, which focuses on utilizing cosine distance defined by cosine similarity as the optimization objective of histogram-based direct visual servoing (HDVS) to design the servoing control law. As a more compact global descriptor, the histogram makes direct visual servoing more robust against noise than directly using image intensity. Cosine similarity is the cosine value between two vectors, which has been widely employed to calculate the similarity between multidimensional information. The cosine distance derived from the cosine similarity is more sensitive to the directional difference between the histograms, making the proposed method have a larger convergence rate than the existing Matusita distance-based servoing method. This advantage is verified by simulations, and experiments are conducted on a manipulator to further verify the effectiveness of the proposed method in practical situations.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"70 9","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139187090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-04DOI: 10.1109/ROBIO58561.2023.10354651
Jinsong. Zhang, Deling. Wang, Huadan. Hao, Liangwen. Yan
In the two-phase flow experiments with different conditions of materials and process parameters, the collected image dataset with the low similarity and small amount was difficult for the common deep learning algorithms to achieve a high-precision recognition of flow pattern. due to the low extraction capability of global features. In this article, we proposed a new deep learning algorithm to enhance Swin-T network by CNN which combined the advantages of Swin-T network with Dynamic Region-Aware Convolution. The new algorithm retained the window multi-head self-attention mechanism and added the self-attention adjustment module to enhance the extraction of image features and the convergence speed of network. It significantly improved the recognition accuracy of the different flow patterns in the sharp and blurred images. The enhanced network Swin-T by CNN had the high applicability to the classification of image dataset with low similarity and small amount.
{"title":"The Enhanced Network Swin-T by CNN on Flow Pattern Recognition for Two-phase Image Dataset with Low Similarity","authors":"Jinsong. Zhang, Deling. Wang, Huadan. Hao, Liangwen. Yan","doi":"10.1109/ROBIO58561.2023.10354651","DOIUrl":"https://doi.org/10.1109/ROBIO58561.2023.10354651","url":null,"abstract":"In the two-phase flow experiments with different conditions of materials and process parameters, the collected image dataset with the low similarity and small amount was difficult for the common deep learning algorithms to achieve a high-precision recognition of flow pattern. due to the low extraction capability of global features. In this article, we proposed a new deep learning algorithm to enhance Swin-T network by CNN which combined the advantages of Swin-T network with Dynamic Region-Aware Convolution. The new algorithm retained the window multi-head self-attention mechanism and added the self-attention adjustment module to enhance the extraction of image features and the convergence speed of network. It significantly improved the recognition accuracy of the different flow patterns in the sharp and blurred images. The enhanced network Swin-T by CNN had the high applicability to the classification of image dataset with low similarity and small amount.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"59 3","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139187097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-04DOI: 10.1109/ROBIO58561.2023.10354861
Qiang Fu, Muxuan Han, Yunjiang Lou, Ke Li, Zhiyuan Yu
When the quadruped robot is engaged in logistics transportation tasks, it encounters a challenge where the distribution of the center of mass (CoM) of the loaded items is not only random but also subject to time variations. Consequently, the robot becomes susceptible to non-zero resultant torques, which inevitably impact its body posture during the walking process. This paper proposes a method to estimate the CoM inertia using four one-dimensional force sensors and a walking control strategy for complex urban terrain. The inertia tensor and CoM of the load are first estimated, then the robot’s dynamics are compensated, and foothold adjustments are made for underactuated orientations to compensate for the extra moment generated by the CoM offset. For uneven terrain, the terrain estimator and event-based gait are used to adjust the robot’s gait to reduce the impact of terrain changes on the robot. The effectiveness of the proposed method and the feasibility of load walking in urban terrain are verified through comparative experiments, complex terrain load walking experiments in Webots, and real prototype experiments.
当四足机器人执行物流运输任务时,会遇到这样一个挑战:装载物品的质心(CoM)分布不仅是随机的,还会受时间变化的影响。因此,机器人在行走过程中很容易受到非零结果扭矩的影响,从而不可避免地影响其身体姿态。本文提出了一种利用四个一维力传感器估算 CoM 惯量的方法,以及针对复杂城市地形的行走控制策略。首先对负载的惯性张量和CoM进行估算,然后对机器人的动力学进行补偿,并对未充分驱动的方向进行立足点调整,以补偿CoM偏移产生的额外力矩。对于不平坦的地形,则使用地形估计器和基于事件的步态来调整机器人的步态,以减少地形变化对机器人的影响。通过对比实验、Webots 中的复杂地形负重行走实验和实际原型实验,验证了所提方法的有效性和在城市地形中负重行走的可行性。
{"title":"Inertia Estimation of Quadruped Robot under Load and Its Walking Control Strategy in Urban Complex Terrain","authors":"Qiang Fu, Muxuan Han, Yunjiang Lou, Ke Li, Zhiyuan Yu","doi":"10.1109/ROBIO58561.2023.10354861","DOIUrl":"https://doi.org/10.1109/ROBIO58561.2023.10354861","url":null,"abstract":"When the quadruped robot is engaged in logistics transportation tasks, it encounters a challenge where the distribution of the center of mass (CoM) of the loaded items is not only random but also subject to time variations. Consequently, the robot becomes susceptible to non-zero resultant torques, which inevitably impact its body posture during the walking process. This paper proposes a method to estimate the CoM inertia using four one-dimensional force sensors and a walking control strategy for complex urban terrain. The inertia tensor and CoM of the load are first estimated, then the robot’s dynamics are compensated, and foothold adjustments are made for underactuated orientations to compensate for the extra moment generated by the CoM offset. For uneven terrain, the terrain estimator and event-based gait are used to adjust the robot’s gait to reduce the impact of terrain changes on the robot. The effectiveness of the proposed method and the feasibility of load walking in urban terrain are verified through comparative experiments, complex terrain load walking experiments in Webots, and real prototype experiments.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"69 11","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139187104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-04DOI: 10.1109/ROBIO58561.2023.10354797
T. Rautio, M. Jaskari, Haider Ali Bhatti, Aappo Mustakangas, M. Keskitalo, A. Järvenpää
This study investigates the fatigue performance and impact toughness of laser powder bed fusion (PBF-LB) manufactured Inconel 718. Inconel 718 is a nickel-based superalloy known for its high-temperature properties. The PBF-LB process offers accuracy and the ability to produce parts with the final geometry, eliminating the need for expensive machining. These features make it an tempting material also for robotic applications, such as structural components or environments that have elevated temperature or are corrosive. However, the influence of heat treatment on the mechanical and dynamic properties of Inconel 718 is not yet fully understood. The study aims to characterize Inconel 718 specimens through tensile, impact, and fatigue testing, as well as microstructural analysis using Field-Emission Scanning Electron Microscopy (FESEM) with Electron Backscatter Diffraction (EBSD). The results will provide insights into the mechanical behavior of PBF-LB-manufactured Inconel 718, considering printing orientation, mechanical properties, and surface quality. The findings will contribute to the understanding of this material’s dynamic properties, crucial for the design and utilization of components produced through PBF-LB.
{"title":"Fatigue Performance and Impact Toughness of PBF-LB Manufactured Inconel 718","authors":"T. Rautio, M. Jaskari, Haider Ali Bhatti, Aappo Mustakangas, M. Keskitalo, A. Järvenpää","doi":"10.1109/ROBIO58561.2023.10354797","DOIUrl":"https://doi.org/10.1109/ROBIO58561.2023.10354797","url":null,"abstract":"This study investigates the fatigue performance and impact toughness of laser powder bed fusion (PBF-LB) manufactured Inconel 718. Inconel 718 is a nickel-based superalloy known for its high-temperature properties. The PBF-LB process offers accuracy and the ability to produce parts with the final geometry, eliminating the need for expensive machining. These features make it an tempting material also for robotic applications, such as structural components or environments that have elevated temperature or are corrosive. However, the influence of heat treatment on the mechanical and dynamic properties of Inconel 718 is not yet fully understood. The study aims to characterize Inconel 718 specimens through tensile, impact, and fatigue testing, as well as microstructural analysis using Field-Emission Scanning Electron Microscopy (FESEM) with Electron Backscatter Diffraction (EBSD). The results will provide insights into the mechanical behavior of PBF-LB-manufactured Inconel 718, considering printing orientation, mechanical properties, and surface quality. The findings will contribute to the understanding of this material’s dynamic properties, crucial for the design and utilization of components produced through PBF-LB.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"69 4","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139187109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-04DOI: 10.1109/ROBIO58561.2023.10354731
Wen Wang, Zheyuan Lin, Shanshan Ji, Te Li, J. Gu, Minhong Wan, Chunlong Zhang
Transformer-based visual technologies have witnessed remarkable progress in recent years, and person re-identification (ReID) is one of the active research areas that adopts transformers to improve the performance. However, a major challenge of applying transformers to ReID is the high computational cost, which hinders the real-time deployment of such methods. To address this issue, this paper proposes two simple yet effective techniques to reduce the computation of transformers for ReID. The first technique is to eliminate the invalid patches that do not contain any person information, thereby reducing the number of tokens fed into the transformer. Considering that computational complexity is quadratic with respect to input tokens, the second technique partitions the image into multiple windows, applies separate transformers to each window, and merges class tokens from each window, which can reduce the complexity of the self-attention mechanism. By combining these two techniques, our proposed method reduces the SOTA baseline model by 12.2% FLOPs, while slightly improving the rank-1 accuracy and only sacrificing 1.1% mAP on DukeMTMC-ReID dataset.
{"title":"Reducing the Computational Cost of Transformers for Person Re-identification","authors":"Wen Wang, Zheyuan Lin, Shanshan Ji, Te Li, J. Gu, Minhong Wan, Chunlong Zhang","doi":"10.1109/ROBIO58561.2023.10354731","DOIUrl":"https://doi.org/10.1109/ROBIO58561.2023.10354731","url":null,"abstract":"Transformer-based visual technologies have witnessed remarkable progress in recent years, and person re-identification (ReID) is one of the active research areas that adopts transformers to improve the performance. However, a major challenge of applying transformers to ReID is the high computational cost, which hinders the real-time deployment of such methods. To address this issue, this paper proposes two simple yet effective techniques to reduce the computation of transformers for ReID. The first technique is to eliminate the invalid patches that do not contain any person information, thereby reducing the number of tokens fed into the transformer. Considering that computational complexity is quadratic with respect to input tokens, the second technique partitions the image into multiple windows, applies separate transformers to each window, and merges class tokens from each window, which can reduce the complexity of the self-attention mechanism. By combining these two techniques, our proposed method reduces the SOTA baseline model by 12.2% FLOPs, while slightly improving the rank-1 accuracy and only sacrificing 1.1% mAP on DukeMTMC-ReID dataset.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"69 11","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139187117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-04DOI: 10.1109/ROBIO58561.2023.10354860
Jingwu Li, Zhijun Sun, Zhongqing Sun, Xing Gao, C. Cao, Yingtian Li
When magnetic surgical instruments are used to perform surgical operations, two situations must be strictly avoided to ensure the safety: 1) the magnetic surgical instrument fall down in the abdominal cavity; 2) the pushing forces between the inner wall of the abdominal cavity and the magnetic surgical instruments are too high to harm human body. However, when calculating the magnetic force applied to the magnetic surgical instruments, the variation of the magnetic field within the space which is occupied by the internal permanent magnets (IPMs), placed inside the surgical instrument, is normally omitted. In this paper, to calculate the magnetic field generated by the external permanent magnets (EPMs), a multi-dipole model is proposed considering the variation of the magnetic field in the region where IPMs locate, and a numerical integration method to calculate the magnetic force is introduced. The experimental results showed that the multi-dipole model could predict the magnetic flux density within the distance of 20 - 50 mm away from the permanent magnet. And the magnetic force calculation model can predict the magnetic force variation trend well.
{"title":"A Magnetic Force Calculation of Permanent Magnet for Magnetic Surgical Instruments","authors":"Jingwu Li, Zhijun Sun, Zhongqing Sun, Xing Gao, C. Cao, Yingtian Li","doi":"10.1109/ROBIO58561.2023.10354860","DOIUrl":"https://doi.org/10.1109/ROBIO58561.2023.10354860","url":null,"abstract":"When magnetic surgical instruments are used to perform surgical operations, two situations must be strictly avoided to ensure the safety: 1) the magnetic surgical instrument fall down in the abdominal cavity; 2) the pushing forces between the inner wall of the abdominal cavity and the magnetic surgical instruments are too high to harm human body. However, when calculating the magnetic force applied to the magnetic surgical instruments, the variation of the magnetic field within the space which is occupied by the internal permanent magnets (IPMs), placed inside the surgical instrument, is normally omitted. In this paper, to calculate the magnetic field generated by the external permanent magnets (EPMs), a multi-dipole model is proposed considering the variation of the magnetic field in the region where IPMs locate, and a numerical integration method to calculate the magnetic force is introduced. The experimental results showed that the multi-dipole model could predict the magnetic flux density within the distance of 20 - 50 mm away from the permanent magnet. And the magnetic force calculation model can predict the magnetic force variation trend well.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"59 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139187124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}