MLST-Net: Multi-Task Learning based SpatialTemporal Disentanglement Scheme for Video Facial Paralysis Severity Grading.

IF 6.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Journal of Biomedical and Health Informatics Pub Date : 2025-02-26 DOI:10.1109/JBHI.2025.3546019

Zehui Feng, Tongtong Zhou, Ting Han

{"title":"MLST-Net: Multi-Task Learning based SpatialTemporal Disentanglement Scheme for Video Facial Paralysis Severity Grading.","authors":"Zehui Feng, Tongtong Zhou, Ting Han","doi":"10.1109/JBHI.2025.3546019","DOIUrl":null,"url":null,"abstract":"<p><p>Facial paralysis, as a common nerve system disease, seriously affects the patients' facial muscle function and appearance. Accurate facial paralysis grading is of great significance for the formulation of personalized treatment. Existing artificial intelligence based grading methods extensively focus on static image classification, which fails to capture the dynamic facial movements. Additionally, due to private concerns, building comprehensive facial paralysis datasets is challenging, making it impractical to fully train a robust model from scratch. Finally, maintaining precision and inference speed on edge devices remains a key challenge. To address these shortcomings, we propose MLST-Net, a novel and explainable three-stage deep-learning method based on multi-task learning. In the first stage, the pre-trained model is used to extract the facial static appearance structure and dynamic texture changes. The second stage fuses the proxy task results to construct a unified face semantic expression and outputs the \"with or without facial paralysis\" simple task results. In the third stage, we use spatial-temporal disentanglement to capture the spatial-temporal combinatorial-dependencies in video sequences. Finally, we input the classifier to get the results of complex tasks of facial paralysis classification. Compared with all advanced methods, MLST-Net is computationally inexpensive and achieves state-of-the-art results on the 1241 public dataset videos. It significantly benefits the digital diagnosis of facial palsy and offers innovative and explainable ideas for video-based digital medical treatment.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3546019","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Facial paralysis, as a common nerve system disease, seriously affects the patients' facial muscle function and appearance. Accurate facial paralysis grading is of great significance for the formulation of personalized treatment. Existing artificial intelligence based grading methods extensively focus on static image classification, which fails to capture the dynamic facial movements. Additionally, due to private concerns, building comprehensive facial paralysis datasets is challenging, making it impractical to fully train a robust model from scratch. Finally, maintaining precision and inference speed on edge devices remains a key challenge. To address these shortcomings, we propose MLST-Net, a novel and explainable three-stage deep-learning method based on multi-task learning. In the first stage, the pre-trained model is used to extract the facial static appearance structure and dynamic texture changes. The second stage fuses the proxy task results to construct a unified face semantic expression and outputs the "with or without facial paralysis" simple task results. In the third stage, we use spatial-temporal disentanglement to capture the spatial-temporal combinatorial-dependencies in video sequences. Finally, we input the classifier to get the results of complex tasks of facial paralysis classification. Compared with all advanced methods, MLST-Net is computationally inexpensive and achieves state-of-the-art results on the 1241 public dataset videos. It significantly benefits the digital diagnosis of facial palsy and offers innovative and explainable ideas for video-based digital medical treatment.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Journal of Biomedical and Health Informatics COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

13.60

自引率

6.50%

发文量

1151

期刊介绍： IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.

期刊最新文献

Table of Contents Front Cover IEEE Journal of Biomedical and Health Informatics Information for Authors IEEE Journal of Biomedical and Health Informatics Publication Information Guest Editorial:Application of Computational Techniques in Drug Discovery and Disease Treatment