基于时间卷积网络的唇读建模在医疗支持中的应用

Dimitris Kastaniotis, Dimitrios Tsourounis, S. Fotopoulos
{"title":"基于时间卷积网络的唇读建模在医疗支持中的应用","authors":"Dimitris Kastaniotis, Dimitrios Tsourounis, S. Fotopoulos","doi":"10.1109/CISP-BMEI51763.2020.9263634","DOIUrl":null,"url":null,"abstract":"Automated Lip Reading (LR) task is the process of predicting a spoken word using only visual information of a sequence of frames. This sequence modeling task has been approached with Convolutional Neural Networks (CNNs) combined with Long Short-Term Memory networks (LSTM). In this work, a novel scheme for modeling LR sequences with a combination of Temporal Convolutional Networks (TCN) driven by the feature vectors produced by CNN is presented. More specifically, the contribution of this work is two-fold. Firstly, a novel approach that utilize the TCN topology as an alternative way to deal with the sequential data of the LR task is presented. Secondly, this approach is evaluated on a new real-world challenging dataset particularly designed for the problem of LR in Greek words related to biomedical and clinical conditions. More specifically, the Greek words of the dataset are selected to be words that a patient would like to communicate when receiving medical treatment using the frontal camera of a mobile phone. Experimental results indicate that the proposed CNN-TCN architecture can surpass recurrent oriented approaches based on CNN-LSTM while also providing major benefits for deployment in model hardware architectures and more stability during training.","PeriodicalId":346757,"journal":{"name":"2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Lip Reading modeling with Temporal Convolutional Networks for medical support applications\",\"authors\":\"Dimitris Kastaniotis, Dimitrios Tsourounis, S. Fotopoulos\",\"doi\":\"10.1109/CISP-BMEI51763.2020.9263634\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated Lip Reading (LR) task is the process of predicting a spoken word using only visual information of a sequence of frames. This sequence modeling task has been approached with Convolutional Neural Networks (CNNs) combined with Long Short-Term Memory networks (LSTM). In this work, a novel scheme for modeling LR sequences with a combination of Temporal Convolutional Networks (TCN) driven by the feature vectors produced by CNN is presented. More specifically, the contribution of this work is two-fold. Firstly, a novel approach that utilize the TCN topology as an alternative way to deal with the sequential data of the LR task is presented. Secondly, this approach is evaluated on a new real-world challenging dataset particularly designed for the problem of LR in Greek words related to biomedical and clinical conditions. More specifically, the Greek words of the dataset are selected to be words that a patient would like to communicate when receiving medical treatment using the frontal camera of a mobile phone. Experimental results indicate that the proposed CNN-TCN architecture can surpass recurrent oriented approaches based on CNN-LSTM while also providing major benefits for deployment in model hardware architectures and more stability during training.\",\"PeriodicalId\":346757,\"journal\":{\"name\":\"2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISP-BMEI51763.2020.9263634\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP-BMEI51763.2020.9263634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

自动唇读(LR)任务是仅使用一系列帧的视觉信息来预测口语单词的过程。卷积神经网络(cnn)结合长短期记忆网络(LSTM)实现了序列建模任务。在这项工作中,提出了一种由CNN产生的特征向量驱动的时间卷积网络(TCN)组合来建模LR序列的新方案。更具体地说,这项工作的贡献是双重的。首先,提出了一种利用TCN拓扑作为处理LR任务序列数据的替代方法的新方法。其次,该方法在一个新的现实世界具有挑战性的数据集上进行评估,该数据集专门设计用于与生物医学和临床条件相关的希腊语单词的LR问题。更具体地说,选择数据集中的希腊语单词作为患者在使用手机正面摄像头接受医疗时想要交流的单词。实验结果表明,提出的CNN-TCN架构可以超越基于CNN-LSTM的面向循环的方法,同时也为模型硬件架构的部署提供了主要优势,并且在训练过程中更具稳定性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Lip Reading modeling with Temporal Convolutional Networks for medical support applications
Automated Lip Reading (LR) task is the process of predicting a spoken word using only visual information of a sequence of frames. This sequence modeling task has been approached with Convolutional Neural Networks (CNNs) combined with Long Short-Term Memory networks (LSTM). In this work, a novel scheme for modeling LR sequences with a combination of Temporal Convolutional Networks (TCN) driven by the feature vectors produced by CNN is presented. More specifically, the contribution of this work is two-fold. Firstly, a novel approach that utilize the TCN topology as an alternative way to deal with the sequential data of the LR task is presented. Secondly, this approach is evaluated on a new real-world challenging dataset particularly designed for the problem of LR in Greek words related to biomedical and clinical conditions. More specifically, the Greek words of the dataset are selected to be words that a patient would like to communicate when receiving medical treatment using the frontal camera of a mobile phone. Experimental results indicate that the proposed CNN-TCN architecture can surpass recurrent oriented approaches based on CNN-LSTM while also providing major benefits for deployment in model hardware architectures and more stability during training.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Network Attack Detection based on Domain Attack Behavior Analysis Feature selection of time series based on reinforcement learning An Improved Double-Layer Kalman Filter Attitude Algorithm For Motion Capture System Probability Boltzmann Machine Network for Face Detection on Video Evolutionary Optimized Multiple Instance Concept Learning for Beat-to-Beat Heart Rate Estimation from Electrocardiograms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1