{"title":"Imaging Description Production by Means of Deeper Neural Networks","authors":"Velichala Sucharitha, Kancharla Sneha, Dhandu Sravani, Sravani Jhade, Pochaboina Sravani, Sarangapur Sreeja","doi":"10.1109/ACCAI58221.2023.10200618","DOIUrl":null,"url":null,"abstract":"Natural language processing (NLP) and computer vision (CV) methods may be used to provide a textual interpretation and explanation of an image's meaning. The human brain can provide a detailed description of a picture, but can a computer do the same? Captioning images is considered difficult work in the realm of artificial intelligence. Contextualizing an image and converting it into grammatically correct text requires the use of both natural language processing and computer vision methods. Improved deep learning techniques and a wealth of publicly available datasets have made it possible to construct a variety of models for automatically generating picture descriptions. Picture classification based on the greatest number of items in the image is the first step in creating an acceptable description of an image provided as input. We can do this with the help of a neural network and certain NLP principles. In this study, we explain in depth how a \"Convolutional Neural Network\" and Long Short-Term Memory were used to build a visual description.Both pictures and texts may be classified with the help of \"Convolutional Neural Networks\" (or \"CNNs\") and \"Recurrent Neural Networks\" (or \"RNNs\") (RNNs). We trained the model to draw from a bigger lexical set when describing the images it has seen in order to increase the precision of its predictions. We conducted a number of trials across many picture datasets, and found that visual description is the single most important factor in determining a model's accuracy. In general, the outcome improves as the amount of the dataset grows.","PeriodicalId":382104,"journal":{"name":"2023 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACCAI58221.2023.10200618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Natural language processing (NLP) and computer vision (CV) methods may be used to provide a textual interpretation and explanation of an image's meaning. The human brain can provide a detailed description of a picture, but can a computer do the same? Captioning images is considered difficult work in the realm of artificial intelligence. Contextualizing an image and converting it into grammatically correct text requires the use of both natural language processing and computer vision methods. Improved deep learning techniques and a wealth of publicly available datasets have made it possible to construct a variety of models for automatically generating picture descriptions. Picture classification based on the greatest number of items in the image is the first step in creating an acceptable description of an image provided as input. We can do this with the help of a neural network and certain NLP principles. In this study, we explain in depth how a "Convolutional Neural Network" and Long Short-Term Memory were used to build a visual description.Both pictures and texts may be classified with the help of "Convolutional Neural Networks" (or "CNNs") and "Recurrent Neural Networks" (or "RNNs") (RNNs). We trained the model to draw from a bigger lexical set when describing the images it has seen in order to increase the precision of its predictions. We conducted a number of trials across many picture datasets, and found that visual description is the single most important factor in determining a model's accuracy. In general, the outcome improves as the amount of the dataset grows.