Xiaofan Li , Bo Peng , Yuan Yao , Guangchao Zhang , Zhuyang Xie , Muhammad Usman Saleem
{"title":"可靠的多模态原型对比学习用于困难气道评估","authors":"Xiaofan Li , Bo Peng , Yuan Yao , Guangchao Zhang , Zhuyang Xie , Muhammad Usman Saleem","doi":"10.1016/j.eswa.2025.126870","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advancements in facial image-based prediction for difficult airway assessment show significant clinical promise. However, existing methods often struggle to accurately distinguish subtle facial features, contend with limited label information, and address the uncertainty in correlating facial features with airway difficulty. In this study, we propose a Reliable Multimodal Prototypical Contrastive Learning Network (RMP-Net) for difficult airway assessment, which aims to overcome these challenges. RMP-Net integrates multiple modalities, including facial images processed by a Convolutional Neural Network (CNN) and keypoint graphs processed by a Graph Convolutional Network (GCN). In addition to the commonly used image modality, we innovatively build a graph based on the keypoints modality for prediction. It not only captures comprehensive facial information but also targets critical anatomical features, enhancing feature representation and model interpretability. During the training, features extracted from laryngoscopic images serve as a priori prototype, which are further aligned with facial image and keypoint features for a clearer feature representation. Importantly, the laryngoscopic modality is used exclusively during training since it is obtained intraoperatively. This ensures that RMP-Net remains a preoperative prediction method while leveraging detailed anatomical insights during learning. Furthermore, we introduce a uncertainty learning process to validate the correlation between facial features and airway difficulty, improving the model’s robustness by focusing on reliable data. We construct a comprehensive multi-modal dataset, including facial images, laryngoscopic images, and facial keypoints. Five-fold cross-validation experiments demonstrate that RMP-Net achieves significant improvements in diagnostic AUC, sensitivity, and specificity compared to traditional and state-of-the-art (SoTA) methods. The code for this study is available at <span><span>https://github.com/a6177738/RMP-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126870"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reliable multi-modal prototypical contrastive learning for difficult airway assessment\",\"authors\":\"Xiaofan Li , Bo Peng , Yuan Yao , Guangchao Zhang , Zhuyang Xie , Muhammad Usman Saleem\",\"doi\":\"10.1016/j.eswa.2025.126870\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recent advancements in facial image-based prediction for difficult airway assessment show significant clinical promise. However, existing methods often struggle to accurately distinguish subtle facial features, contend with limited label information, and address the uncertainty in correlating facial features with airway difficulty. In this study, we propose a Reliable Multimodal Prototypical Contrastive Learning Network (RMP-Net) for difficult airway assessment, which aims to overcome these challenges. RMP-Net integrates multiple modalities, including facial images processed by a Convolutional Neural Network (CNN) and keypoint graphs processed by a Graph Convolutional Network (GCN). In addition to the commonly used image modality, we innovatively build a graph based on the keypoints modality for prediction. It not only captures comprehensive facial information but also targets critical anatomical features, enhancing feature representation and model interpretability. During the training, features extracted from laryngoscopic images serve as a priori prototype, which are further aligned with facial image and keypoint features for a clearer feature representation. Importantly, the laryngoscopic modality is used exclusively during training since it is obtained intraoperatively. This ensures that RMP-Net remains a preoperative prediction method while leveraging detailed anatomical insights during learning. Furthermore, we introduce a uncertainty learning process to validate the correlation between facial features and airway difficulty, improving the model’s robustness by focusing on reliable data. We construct a comprehensive multi-modal dataset, including facial images, laryngoscopic images, and facial keypoints. Five-fold cross-validation experiments demonstrate that RMP-Net achieves significant improvements in diagnostic AUC, sensitivity, and specificity compared to traditional and state-of-the-art (SoTA) methods. The code for this study is available at <span><span>https://github.com/a6177738/RMP-Net</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"273 \",\"pages\":\"Article 126870\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425004920\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425004920","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/18 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Reliable multi-modal prototypical contrastive learning for difficult airway assessment
Recent advancements in facial image-based prediction for difficult airway assessment show significant clinical promise. However, existing methods often struggle to accurately distinguish subtle facial features, contend with limited label information, and address the uncertainty in correlating facial features with airway difficulty. In this study, we propose a Reliable Multimodal Prototypical Contrastive Learning Network (RMP-Net) for difficult airway assessment, which aims to overcome these challenges. RMP-Net integrates multiple modalities, including facial images processed by a Convolutional Neural Network (CNN) and keypoint graphs processed by a Graph Convolutional Network (GCN). In addition to the commonly used image modality, we innovatively build a graph based on the keypoints modality for prediction. It not only captures comprehensive facial information but also targets critical anatomical features, enhancing feature representation and model interpretability. During the training, features extracted from laryngoscopic images serve as a priori prototype, which are further aligned with facial image and keypoint features for a clearer feature representation. Importantly, the laryngoscopic modality is used exclusively during training since it is obtained intraoperatively. This ensures that RMP-Net remains a preoperative prediction method while leveraging detailed anatomical insights during learning. Furthermore, we introduce a uncertainty learning process to validate the correlation between facial features and airway difficulty, improving the model’s robustness by focusing on reliable data. We construct a comprehensive multi-modal dataset, including facial images, laryngoscopic images, and facial keypoints. Five-fold cross-validation experiments demonstrate that RMP-Net achieves significant improvements in diagnostic AUC, sensitivity, and specificity compared to traditional and state-of-the-art (SoTA) methods. The code for this study is available at https://github.com/a6177738/RMP-Net.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.