S. Chowdhury, Mushfiqur Rahman, M. T. Oyshi, Md. Arid Hasan
{"title":"使用深度学习通过视频唇读提取文本","authors":"S. Chowdhury, Mushfiqur Rahman, M. T. Oyshi, Md. Arid Hasan","doi":"10.1109/SMART46866.2019.9117224","DOIUrl":null,"url":null,"abstract":"Automated text extraction from video data through lip reading can overcome the language barrier and open the door of opportunities in terms of security, connectivity and physical challenges. The conversion is possible by analyzing facial expression using deep learning method. But this conversion is a challenging task due to the varieties of pronunciation and accents of the same word causing different countenance. In this research, a method of converting video data to text data through lip reading has been proposed. The proposed method includes test dataset, image frame analysis and having text output from identified words. In the proposed technique, the test dataset will be organized by combining all the possible facial expressions of different words.","PeriodicalId":328124,"journal":{"name":"2019 8th International Conference System Modeling and Advancement in Research Trends (SMART)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text Extraction through Video Lip Reading Using Deep Learning\",\"authors\":\"S. Chowdhury, Mushfiqur Rahman, M. T. Oyshi, Md. Arid Hasan\",\"doi\":\"10.1109/SMART46866.2019.9117224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated text extraction from video data through lip reading can overcome the language barrier and open the door of opportunities in terms of security, connectivity and physical challenges. The conversion is possible by analyzing facial expression using deep learning method. But this conversion is a challenging task due to the varieties of pronunciation and accents of the same word causing different countenance. In this research, a method of converting video data to text data through lip reading has been proposed. The proposed method includes test dataset, image frame analysis and having text output from identified words. In the proposed technique, the test dataset will be organized by combining all the possible facial expressions of different words.\",\"PeriodicalId\":328124,\"journal\":{\"name\":\"2019 8th International Conference System Modeling and Advancement in Research Trends (SMART)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 8th International Conference System Modeling and Advancement in Research Trends (SMART)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SMART46866.2019.9117224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 8th International Conference System Modeling and Advancement in Research Trends (SMART)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMART46866.2019.9117224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Text Extraction through Video Lip Reading Using Deep Learning
Automated text extraction from video data through lip reading can overcome the language barrier and open the door of opportunities in terms of security, connectivity and physical challenges. The conversion is possible by analyzing facial expression using deep learning method. But this conversion is a challenging task due to the varieties of pronunciation and accents of the same word causing different countenance. In this research, a method of converting video data to text data through lip reading has been proposed. The proposed method includes test dataset, image frame analysis and having text output from identified words. In the proposed technique, the test dataset will be organized by combining all the possible facial expressions of different words.