{"title":"TongueTransUNet: toward effective tongue contour segmentation using well-managed dataset.","authors":"Khalid Al-Hammuri, Fayez Gebali, Awos Kanan","doi":"10.1007/s11517-024-03278-7","DOIUrl":null,"url":null,"abstract":"<p><p>In modern telehealth and healthcare information systems medical image analysis is essential to understand the context of the images and its complex structure from large, inconsistent-quality, and distributed datasets. Achieving desired results faces a few challenges for deep learning. Examples of these challenges are date size, labeling, balancing, training, and feature extraction. These challenges made the AI model complex and expensive to be built and difficult to understand which made it a black box and produce hysteresis and irrelevant, illegal, and unethical output in some cases. In this article, lingual ultrasound is studied to extract tongue contour to understand language behavior and language signature and utilize it as biofeedback for different applications. This article introduces a design strategy that can work effectively using a well-managed dynamic-size dataset. It includes a hybrid architecture using UNet, Vision Transformer (ViT), and contrastive loss in latent space to build a foundation model cumulatively. The process starts with building a reference representation in the embedding space using human experts to validate any new input for training data. UNet and ViT encoders are used to extract the input feature representations. The contrastive loss was then compared to the new feature embedding with the reference in the embedding space. The UNet-based decoder is used to reconstruct the image to its original size. Before releasing the final results, quality control is used to assess the segmented contour, and if rejected, the algorithm requests an action from a human expert to annotate it manually. The results show an improved accuracy over the traditional techniques as it contains only high quality and relevant features.</p>","PeriodicalId":49840,"journal":{"name":"Medical & Biological Engineering & Computing","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical & Biological Engineering & Computing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11517-024-03278-7","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
In modern telehealth and healthcare information systems medical image analysis is essential to understand the context of the images and its complex structure from large, inconsistent-quality, and distributed datasets. Achieving desired results faces a few challenges for deep learning. Examples of these challenges are date size, labeling, balancing, training, and feature extraction. These challenges made the AI model complex and expensive to be built and difficult to understand which made it a black box and produce hysteresis and irrelevant, illegal, and unethical output in some cases. In this article, lingual ultrasound is studied to extract tongue contour to understand language behavior and language signature and utilize it as biofeedback for different applications. This article introduces a design strategy that can work effectively using a well-managed dynamic-size dataset. It includes a hybrid architecture using UNet, Vision Transformer (ViT), and contrastive loss in latent space to build a foundation model cumulatively. The process starts with building a reference representation in the embedding space using human experts to validate any new input for training data. UNet and ViT encoders are used to extract the input feature representations. The contrastive loss was then compared to the new feature embedding with the reference in the embedding space. The UNet-based decoder is used to reconstruct the image to its original size. Before releasing the final results, quality control is used to assess the segmented contour, and if rejected, the algorithm requests an action from a human expert to annotate it manually. The results show an improved accuracy over the traditional techniques as it contains only high quality and relevant features.
期刊介绍:
Founded in 1963, Medical & Biological Engineering & Computing (MBEC) continues to serve the biomedical engineering community, covering the entire spectrum of biomedical and clinical engineering. The journal presents exciting and vital experimental and theoretical developments in biomedical science and technology, and reports on advances in computer-based methodologies in these multidisciplinary subjects. The journal also incorporates new and evolving technologies including cellular engineering and molecular imaging.
MBEC publishes original research articles as well as reviews and technical notes. Its Rapid Communications category focuses on material of immediate value to the readership, while the Controversies section provides a forum to exchange views on selected issues, stimulating a vigorous and informed debate in this exciting and high profile field.
MBEC is an official journal of the International Federation of Medical and Biological Engineering (IFMBE).