Omprakash Yadav, Atharva Jadhav, Abdul Hannan Sunsara, Idris Vohra
{"title":"使用深度学习从图像生成音频字幕","authors":"Omprakash Yadav, Atharva Jadhav, Abdul Hannan Sunsara, Idris Vohra","doi":"10.54473/ijtret.2021.5101","DOIUrl":null,"url":null,"abstract":"Visually impaired individuals face various types of difficulties as they cannot visualize the natural environment. To overcome this problem, the proposed system would automatically generate captions for an input images and convert the generated caption to an audio format so that visually impaired individuals can listen to the generated captions. Captioning is performed using Deep Learning algorithm Convolution Neural Network (CNN), Recurrent Neural Network (RNN) and Long Short-Term Memory.","PeriodicalId":127327,"journal":{"name":"International Journal Of Trendy Research In Engineering And Technology","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AUDIO CAPTION GENERATION FROM IMAGES USING DEEP LEARNING\",\"authors\":\"Omprakash Yadav, Atharva Jadhav, Abdul Hannan Sunsara, Idris Vohra\",\"doi\":\"10.54473/ijtret.2021.5101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visually impaired individuals face various types of difficulties as they cannot visualize the natural environment. To overcome this problem, the proposed system would automatically generate captions for an input images and convert the generated caption to an audio format so that visually impaired individuals can listen to the generated captions. Captioning is performed using Deep Learning algorithm Convolution Neural Network (CNN), Recurrent Neural Network (RNN) and Long Short-Term Memory.\",\"PeriodicalId\":127327,\"journal\":{\"name\":\"International Journal Of Trendy Research In Engineering And Technology\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal Of Trendy Research In Engineering And Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54473/ijtret.2021.5101\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal Of Trendy Research In Engineering And Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54473/ijtret.2021.5101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
AUDIO CAPTION GENERATION FROM IMAGES USING DEEP LEARNING
Visually impaired individuals face various types of difficulties as they cannot visualize the natural environment. To overcome this problem, the proposed system would automatically generate captions for an input images and convert the generated caption to an audio format so that visually impaired individuals can listen to the generated captions. Captioning is performed using Deep Learning algorithm Convolution Neural Network (CNN), Recurrent Neural Network (RNN) and Long Short-Term Memory.