{"title":"利用Web抓取的文本数据对类语义进行微调的图像标记","authors":"Mehedi Hasan Bijoy, Nirob Hasan, Md. Tahrim Faroque Tushar, Shafin Rahmany","doi":"10.1109/ICCIT54785.2021.9689793","DOIUrl":null,"url":null,"abstract":"The image tagging task aims to assign relevant known tags to an image. It is an active research topic in computer vision and machine learning because of the diversity of its applications in semantic search and image retrieval. Earlier efforts on image tagging address this problem as a multi-level classification problem using visual features from images and semantic word vectors of tags. In most cases, a pre-trained language model like word2vec or Globe is used to obtain those word vectors. Because of using a pre-trained language model, an image tagging approach cannot scale itself to the context of the targeted application. This paper fine-tunes a language (BERT) model using text descriptions obtained from web (Wikipedia) scraping to learn a rich distributed representation of tags. Then, we employ word vectors of tags extracted from finetuned language (BERT) model to solve the image tagging task. Our method is more specialized to the particular application by incorporating context information between targeted tags and images. As a result, word vectors obtained from the fine-tuned model perform better than those from pre-trained language models. We evaluate our method on the widely used NUS-WIDE dataset and achieve competitive results compared with state-of-the-art methods.","PeriodicalId":166450,"journal":{"name":"2021 24th International Conference on Computer and Information Technology (ICCIT)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Image Tagging by Fine-tuning Class Semantics Using Text Data from Web Scraping\",\"authors\":\"Mehedi Hasan Bijoy, Nirob Hasan, Md. Tahrim Faroque Tushar, Shafin Rahmany\",\"doi\":\"10.1109/ICCIT54785.2021.9689793\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The image tagging task aims to assign relevant known tags to an image. It is an active research topic in computer vision and machine learning because of the diversity of its applications in semantic search and image retrieval. Earlier efforts on image tagging address this problem as a multi-level classification problem using visual features from images and semantic word vectors of tags. In most cases, a pre-trained language model like word2vec or Globe is used to obtain those word vectors. Because of using a pre-trained language model, an image tagging approach cannot scale itself to the context of the targeted application. This paper fine-tunes a language (BERT) model using text descriptions obtained from web (Wikipedia) scraping to learn a rich distributed representation of tags. Then, we employ word vectors of tags extracted from finetuned language (BERT) model to solve the image tagging task. Our method is more specialized to the particular application by incorporating context information between targeted tags and images. As a result, word vectors obtained from the fine-tuned model perform better than those from pre-trained language models. We evaluate our method on the widely used NUS-WIDE dataset and achieve competitive results compared with state-of-the-art methods.\",\"PeriodicalId\":166450,\"journal\":{\"name\":\"2021 24th International Conference on Computer and Information Technology (ICCIT)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 24th International Conference on Computer and Information Technology (ICCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIT54785.2021.9689793\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 24th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT54785.2021.9689793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Image Tagging by Fine-tuning Class Semantics Using Text Data from Web Scraping
The image tagging task aims to assign relevant known tags to an image. It is an active research topic in computer vision and machine learning because of the diversity of its applications in semantic search and image retrieval. Earlier efforts on image tagging address this problem as a multi-level classification problem using visual features from images and semantic word vectors of tags. In most cases, a pre-trained language model like word2vec or Globe is used to obtain those word vectors. Because of using a pre-trained language model, an image tagging approach cannot scale itself to the context of the targeted application. This paper fine-tunes a language (BERT) model using text descriptions obtained from web (Wikipedia) scraping to learn a rich distributed representation of tags. Then, we employ word vectors of tags extracted from finetuned language (BERT) model to solve the image tagging task. Our method is more specialized to the particular application by incorporating context information between targeted tags and images. As a result, word vectors obtained from the fine-tuned model perform better than those from pre-trained language models. We evaluate our method on the widely used NUS-WIDE dataset and achieve competitive results compared with state-of-the-art methods.