{"title":"MLST-Net: Multi-Task Learning based SpatialTemporal Disentanglement Scheme for Video Facial Paralysis Severity Grading.","authors":"Zehui Feng, Tongtong Zhou, Ting Han","doi":"10.1109/JBHI.2025.3546019","DOIUrl":null,"url":null,"abstract":"<p><p>Facial paralysis, as a common nerve system disease, seriously affects the patients' facial muscle function and appearance. Accurate facial paralysis grading is of great significance for the formulation of personalized treatment. Existing artificial intelligence based grading methods extensively focus on static image classification, which fails to capture the dynamic facial movements. Additionally, due to private concerns, building comprehensive facial paralysis datasets is challenging, making it impractical to fully train a robust model from scratch. Finally, maintaining precision and inference speed on edge devices remains a key challenge. To address these shortcomings, we propose MLST-Net, a novel and explainable three-stage deep-learning method based on multi-task learning. In the first stage, the pre-trained model is used to extract the facial static appearance structure and dynamic texture changes. The second stage fuses the proxy task results to construct a unified face semantic expression and outputs the \"with or without facial paralysis\" simple task results. In the third stage, we use spatial-temporal disentanglement to capture the spatial-temporal combinatorial-dependencies in video sequences. Finally, we input the classifier to get the results of complex tasks of facial paralysis classification. Compared with all advanced methods, MLST-Net is computationally inexpensive and achieves state-of-the-art results on the 1241 public dataset videos. It significantly benefits the digital diagnosis of facial palsy and offers innovative and explainable ideas for video-based digital medical treatment.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3546019","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Facial paralysis, as a common nerve system disease, seriously affects the patients' facial muscle function and appearance. Accurate facial paralysis grading is of great significance for the formulation of personalized treatment. Existing artificial intelligence based grading methods extensively focus on static image classification, which fails to capture the dynamic facial movements. Additionally, due to private concerns, building comprehensive facial paralysis datasets is challenging, making it impractical to fully train a robust model from scratch. Finally, maintaining precision and inference speed on edge devices remains a key challenge. To address these shortcomings, we propose MLST-Net, a novel and explainable three-stage deep-learning method based on multi-task learning. In the first stage, the pre-trained model is used to extract the facial static appearance structure and dynamic texture changes. The second stage fuses the proxy task results to construct a unified face semantic expression and outputs the "with or without facial paralysis" simple task results. In the third stage, we use spatial-temporal disentanglement to capture the spatial-temporal combinatorial-dependencies in video sequences. Finally, we input the classifier to get the results of complex tasks of facial paralysis classification. Compared with all advanced methods, MLST-Net is computationally inexpensive and achieves state-of-the-art results on the 1241 public dataset videos. It significantly benefits the digital diagnosis of facial palsy and offers innovative and explainable ideas for video-based digital medical treatment.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.