CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge Graph

IF 4 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-01-26 DOI:10.1145/3643565

Huan Rong, Minfeng Qian, Tinghuai Ma, Di Jin, Victor S. Sheng

{"title":"CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge Graph","authors":"Huan Rong, Minfeng Qian, Tinghuai Ma, Di Jin, Victor S. Sheng","doi":"10.1145/3643565","DOIUrl":null,"url":null,"abstract":"Object detection is a widely studied problem in existing works. However, in this paper, we turn to a more challenging problem of “Covered Object Reasoning”, aimed at reasoning the category label of target object in the given image particularly when it has been totally covered (or invisible). To resolve this problem, we propose CoBjeason to seize the opportunity when visual reasoning meets the knowledge graph, where “empirical cognition” on common visual contexts have been incorporated as knowledge graph to conduct reinforced multi-hop reasoning via two collaborative agents. Such two agents, for one thing, stand at the covered object (or unknown entity) to observe the surrounding visual cues in the given image and gradually select entities and relations from the global gallery-level knowledge graph which contains entity-pairs frequently occurring across the entire image-collection, so as to infer the main structure of image-level knowledge graph forward expanded from the unknown entity. In turn, for another, based on the reasoned image-level knowledge graph, the semantic context among entities will be aggregated backward into unknown entity to select an appropriate entity from the global gallery-level knowledge graph as the reasoning result. Moreover, such two agents will collaborate with each other, securing that the above Forward & Backward Reasoning will step towards the same destination of the higher performance on covered object reasoning. To our best knowledge, this is the first work on Covered Object Reasoning with Knowledge Graphs and reinforced Multi-Agent collaboration. Particularly, our study on Covered Object Reasoning and the proposed model CoBjeason could offer novel insights into more basic Computer Vision (CV) tasks, such as Semantic Segmentation with better understanding on the current scene when some objects are blurred or covered, Visual Question Answering with enhancement on the inference in more complicated visual context when some objects are covered or invisible, and Image Caption Generation with the augmentation on the richness of visual context for images containing partially visible objects. The improvement on the above basic CV tasks can further refine more complicated ones involved with nuanced visual interpretation like Autonomous Driving, where the recognition and reasoning on partially visible or covered object are critical. According to the experimental results, our proposed CoBjeason can achieve the best overall ranking performance on covered object reasoning compared with other models, meanwhile enjoying the advantage of lower “exploration cost”, with the insensitivity against the long-tail covered objects and the acceptable time complexity.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"75 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3643565","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Object detection is a widely studied problem in existing works. However, in this paper, we turn to a more challenging problem of “Covered Object Reasoning”, aimed at reasoning the category label of target object in the given image particularly when it has been totally covered (or invisible). To resolve this problem, we propose CoBjeason to seize the opportunity when visual reasoning meets the knowledge graph, where “empirical cognition” on common visual contexts have been incorporated as knowledge graph to conduct reinforced multi-hop reasoning via two collaborative agents. Such two agents, for one thing, stand at the covered object (or unknown entity) to observe the surrounding visual cues in the given image and gradually select entities and relations from the global gallery-level knowledge graph which contains entity-pairs frequently occurring across the entire image-collection, so as to infer the main structure of image-level knowledge graph forward expanded from the unknown entity. In turn, for another, based on the reasoned image-level knowledge graph, the semantic context among entities will be aggregated backward into unknown entity to select an appropriate entity from the global gallery-level knowledge graph as the reasoning result. Moreover, such two agents will collaborate with each other, securing that the above Forward & Backward Reasoning will step towards the same destination of the higher performance on covered object reasoning. To our best knowledge, this is the first work on Covered Object Reasoning with Knowledge Graphs and reinforced Multi-Agent collaboration. Particularly, our study on Covered Object Reasoning and the proposed model CoBjeason could offer novel insights into more basic Computer Vision (CV) tasks, such as Semantic Segmentation with better understanding on the current scene when some objects are blurred or covered, Visual Question Answering with enhancement on the inference in more complicated visual context when some objects are covered or invisible, and Image Caption Generation with the augmentation on the richness of visual context for images containing partially visible objects. The improvement on the above basic CV tasks can further refine more complicated ones involved with nuanced visual interpretation like Autonomous Driving, where the recognition and reasoning on partially visible or covered object are critical. According to the experimental results, our proposed CoBjeason can achieve the best overall ranking performance on covered object reasoning compared with other models, meanwhile enjoying the advantage of lower “exploration cost”, with the insensitivity against the long-tail covered objects and the acceptable time complexity.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CoBjeason：基于知情知识图谱的多代理协作推理图像中的覆盖对象

物体检测是现有著作中广泛研究的问题。然而，在本文中，我们将转向更具挑战性的 "覆盖物体推理 "问题，旨在推理给定图像中目标物体的类别标签，尤其是当目标物体被完全覆盖（或不可见）时。为了解决这个问题，我们提出了 CoBjeason，以抓住视觉推理与知识图谱相遇的机会，将对常见视觉环境的 "经验认知 "纳入知识图谱，通过两个协作代理进行强化的多跳推理。这样的两个代理，一是站在被覆盖对象（或未知实体）的位置，观察给定图像中周围的视觉线索，并逐步从包含整个图像集合中频繁出现的实体对的全局图库级知识图谱中选择实体和关系，从而推断出从未知实体向前扩展的图像级知识图谱的主要结构。而另一个代理则根据推理出的图像级知识图谱，将实体间的语义上下文反向聚合到未知实体中，从全局图库级知识图谱中选择合适的实体作为推理结果。此外，这两个代理还将相互协作，确保上述前向与后向推理（Forward & Backward Reasoning）朝着同一目标迈进，即提高覆盖对象推理的性能。据我们所知，这是第一项利用知识图谱和强化多代理协作进行覆盖对象推理的研究。特别是，我们对覆盖物体推理的研究和提出的模型 CoBjeason 可以为更多基本的计算机视觉（CV）任务提供新的见解，例如，当一些物体被模糊或覆盖时，语义分割可以更好地理解当前场景；当一些物体被覆盖或不可见时，视觉问题解答可以增强在更复杂的视觉上下文中的推理；对于包含部分可见物体的图像，图像标题生成可以增强视觉上下文的丰富性。对上述基本 CV 任务的改进可以进一步完善涉及细微视觉解释的更复杂任务，如自动驾驶，其中对部分可见或覆盖物体的识别和推理至关重要。实验结果表明，与其他模型相比，我们提出的 CoBjeason 在覆盖物体推理方面的整体排名性能最佳，同时还具有 "探索成本 "较低、对长尾覆盖物体不敏感、时间复杂度可接受等优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Knowledge Discovery from Data COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

6.70

自引率

5.60%

发文量

172

审稿时长

3 months

期刊介绍： TKDD welcomes papers on a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include, but are not limited to: scalable and effective algorithms for data mining and big data analysis, mining brain networks, mining data streams, mining multi-media data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, pre-processing and post-processing for data mining, robust and scalable statistical methods, data mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data mining technology including massively parallel processing and cloud computing platforms. TKDD encourages papers that explore the above subjects in the context of large distributed networks of computers, parallel or multiprocessing computers, or new data devices. TKDD also encourages papers that describe emerging data mining applications that cannot be satisfied by the current data mining technology.