用统一的层次模型预测图像标题

2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-06-01 DOI:10.1109/ICME.2015.7177427

Lin Bai, Kan Li

{"title":"用统一的层次模型预测图像标题","authors":"Lin Bai, Kan Li","doi":"10.1109/ICME.2015.7177427","DOIUrl":null,"url":null,"abstract":"Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Predicting image caption by a unified hierarchical model\",\"authors\":\"Lin Bai, Kan Li\",\"doi\":\"10.1109/ICME.2015.7177427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.\",\"PeriodicalId\":146271,\"journal\":{\"name\":\"2015 IEEE International Conference on Multimedia and Expo (ICME)\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Multimedia and Expo (ICME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2015.7177427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2015.7177427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在人工智能领域，自动描述图像内容是一项具有挑战性的任务。在活动识别和通过对图像中涉及的活动的关系分析所揭示的图像标题中，困难尤为明显。本文提出了一种统一的层次模型，对人与附近物体之间的交互活动进行建模，然后通过分析交互活动之间的逻辑关系来推测图像内容。在我们的模型中，第一层因子三向交互机对人与相关对象之间的三维空间环境进行建模，直接帮助预测人与对象的交互活动。然后，通过顶层因子三向交互机对活动进行进一步处理，借助活动之间的三维空间语境学习图像内容。在联合数据集上的实验表明，我们的统一层次模型在预测人-物交互活动和描述图像标题方面优于目前最先进的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Predicting image caption by a unified hierarchical model

Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量

期刊最新文献

Affect-expressive hand gestures synthesis and animation VTouch: Vision-enhanced interaction for large touch displays Egocentric hand pose estimation and distance recovery in a single RGB image A hybrid approach for retrieving diverse social images of landmarks Spatial perception reproduction of sound events based on sound property coincidences