{"title":"基于语音图像的多模态人工智能交互,为手术室内的洗刷护士提供帮助","authors":"W. Ng, Han Yi Wang, Zheng Li","doi":"10.1109/ROBIO58561.2023.10354726","DOIUrl":null,"url":null,"abstract":"With the increasing surgical need in our aging society, there is a lack of experienced surgical assistants, such as scrub nurses. To facilitate the training of junior scrub nurses and to reduce human errors, e.g., missing surgical items, we develop a speech-image based multimodal AI framework to assist scrub nurses in the operating room. The proposed framework allows real-time instrument type identification and instance detection, which enables junior scrub nurses to become more familiar with the surgical instruments and guides them throughout the surgical procedure. We construct an ex-vivo video-assisted thorascopic surgery dataset and benchmark it on common object detection models, reaching an average precision of 98.5% and an average recall of 98.9% on the state-of-the-art YOLO-v7. Additionally, we implement an oriented bounding box version of YOLO-v7 to address the undesired bounding box suppression in instrument crossing over. By achieving an average precision of 95.6% and an average recall of 97.4%, we improve the average recall by up to 9.2% compared to the previous oriented bounding box version of YOLO-v5. To minimize distraction during surgery, we adopt a deep learning-based automatic speech recognition model to allow surgeons to concentrate on the procedure. Our physical demonstration substantiates the feasibility of the proposed framework in providing real-time guidance and assistance for scrub nurses.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"73 2","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech-image based Multimodal AI Interaction for Scrub Nurse Assistance in the Operating Room\",\"authors\":\"W. Ng, Han Yi Wang, Zheng Li\",\"doi\":\"10.1109/ROBIO58561.2023.10354726\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing surgical need in our aging society, there is a lack of experienced surgical assistants, such as scrub nurses. To facilitate the training of junior scrub nurses and to reduce human errors, e.g., missing surgical items, we develop a speech-image based multimodal AI framework to assist scrub nurses in the operating room. The proposed framework allows real-time instrument type identification and instance detection, which enables junior scrub nurses to become more familiar with the surgical instruments and guides them throughout the surgical procedure. We construct an ex-vivo video-assisted thorascopic surgery dataset and benchmark it on common object detection models, reaching an average precision of 98.5% and an average recall of 98.9% on the state-of-the-art YOLO-v7. Additionally, we implement an oriented bounding box version of YOLO-v7 to address the undesired bounding box suppression in instrument crossing over. By achieving an average precision of 95.6% and an average recall of 97.4%, we improve the average recall by up to 9.2% compared to the previous oriented bounding box version of YOLO-v5. To minimize distraction during surgery, we adopt a deep learning-based automatic speech recognition model to allow surgeons to concentrate on the procedure. Our physical demonstration substantiates the feasibility of the proposed framework in providing real-time guidance and assistance for scrub nurses.\",\"PeriodicalId\":505134,\"journal\":{\"name\":\"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"volume\":\"73 2\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBIO58561.2023.10354726\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO58561.2023.10354726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech-image based Multimodal AI Interaction for Scrub Nurse Assistance in the Operating Room
With the increasing surgical need in our aging society, there is a lack of experienced surgical assistants, such as scrub nurses. To facilitate the training of junior scrub nurses and to reduce human errors, e.g., missing surgical items, we develop a speech-image based multimodal AI framework to assist scrub nurses in the operating room. The proposed framework allows real-time instrument type identification and instance detection, which enables junior scrub nurses to become more familiar with the surgical instruments and guides them throughout the surgical procedure. We construct an ex-vivo video-assisted thorascopic surgery dataset and benchmark it on common object detection models, reaching an average precision of 98.5% and an average recall of 98.9% on the state-of-the-art YOLO-v7. Additionally, we implement an oriented bounding box version of YOLO-v7 to address the undesired bounding box suppression in instrument crossing over. By achieving an average precision of 95.6% and an average recall of 97.4%, we improve the average recall by up to 9.2% compared to the previous oriented bounding box version of YOLO-v5. To minimize distraction during surgery, we adopt a deep learning-based automatic speech recognition model to allow surgeons to concentrate on the procedure. Our physical demonstration substantiates the feasibility of the proposed framework in providing real-time guidance and assistance for scrub nurses.