R. Shah, Yi Yu, Suhua Tang, S. Satoh, Akshay Verma, Roger Zimmermann
{"title":"基于召回权的Flickr照片标签概念级多模态排序","authors":"R. Shah, Yi Yu, Suhua Tang, S. Satoh, Akshay Verma, Roger Zimmermann","doi":"10.1145/2983554.2983555","DOIUrl":null,"url":null,"abstract":"Social media platforms allow users to annotate photos with tags that significantly facilitate an effective semantics understanding, search, and retrieval of photos. However, due to the manual, ambiguous, and personalized nature of user tagging, many tags of a photo are in a random order and even irrelevant to the visual content. Aiming to automatically compute tag relevance for a given photo, we propose a tag ranking scheme based on voting from photo neighbors derived from multimodal information. Specifically, we determine photo neighbors leveraging geo, visual, and semantics concepts derived from spatial information, visual content, and textual metadata, respectively. We leverage high-level features instead traditional low-level features to compute tag relevance. Experimental results on a representative set of 203,840 photos from the YFCC100M dataset confirm that above-mentioned multimodal concepts complement each other in computing tag relevance. Moreover, we explore the fusion of multimodal information to refine tag ranking leveraging recall based weighting. Experimental results on the representative set confirm that the proposed algorithm outperforms state-of-the-arts.","PeriodicalId":340803,"journal":{"name":"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Concept-Level Multimodal Ranking of Flickr Photo Tags via Recall Based Weighting\",\"authors\":\"R. Shah, Yi Yu, Suhua Tang, S. Satoh, Akshay Verma, Roger Zimmermann\",\"doi\":\"10.1145/2983554.2983555\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media platforms allow users to annotate photos with tags that significantly facilitate an effective semantics understanding, search, and retrieval of photos. However, due to the manual, ambiguous, and personalized nature of user tagging, many tags of a photo are in a random order and even irrelevant to the visual content. Aiming to automatically compute tag relevance for a given photo, we propose a tag ranking scheme based on voting from photo neighbors derived from multimodal information. Specifically, we determine photo neighbors leveraging geo, visual, and semantics concepts derived from spatial information, visual content, and textual metadata, respectively. We leverage high-level features instead traditional low-level features to compute tag relevance. Experimental results on a representative set of 203,840 photos from the YFCC100M dataset confirm that above-mentioned multimodal concepts complement each other in computing tag relevance. Moreover, we explore the fusion of multimodal information to refine tag ranking leveraging recall based weighting. Experimental results on the representative set confirm that the proposed algorithm outperforms state-of-the-arts.\",\"PeriodicalId\":340803,\"journal\":{\"name\":\"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2983554.2983555\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983554.2983555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Concept-Level Multimodal Ranking of Flickr Photo Tags via Recall Based Weighting
Social media platforms allow users to annotate photos with tags that significantly facilitate an effective semantics understanding, search, and retrieval of photos. However, due to the manual, ambiguous, and personalized nature of user tagging, many tags of a photo are in a random order and even irrelevant to the visual content. Aiming to automatically compute tag relevance for a given photo, we propose a tag ranking scheme based on voting from photo neighbors derived from multimodal information. Specifically, we determine photo neighbors leveraging geo, visual, and semantics concepts derived from spatial information, visual content, and textual metadata, respectively. We leverage high-level features instead traditional low-level features to compute tag relevance. Experimental results on a representative set of 203,840 photos from the YFCC100M dataset confirm that above-mentioned multimodal concepts complement each other in computing tag relevance. Moreover, we explore the fusion of multimodal information to refine tag ranking leveraging recall based weighting. Experimental results on the representative set confirm that the proposed algorithm outperforms state-of-the-arts.