部分场景文本检索

Hao Wang;Minghui Liao;Zhouyi Xie;Wenyu Liu;Xiang Bai
{"title":"部分场景文本检索","authors":"Hao Wang;Minghui Liao;Zhouyi Xie;Wenyu Liu;Xiang Bai","doi":"10.1109/TPAMI.2024.3496576","DOIUrl":null,"url":null,"abstract":"The task of partial scene text retrieval involves localizing and searching for text instances that are the same or similar to a given query text from an image gallery. However, existing methods can only handle text-line instances, leaving the problem of searching for partial patches within these text-line instances unsolved due to a lack of patch annotations in the training data. To address this issue, we propose a network that can simultaneously retrieve both text-line instances and their partial patches. Our method embeds the two types of data (query text and scene text instances) into a shared feature space and measures their cross-modal similarities. To handle partial patches, our proposed approach adopts a Multiple Instance Learning (MIL) approach to learn their similarities with query text, without requiring extra annotations. However, constructing bags, which is a standard step of conventional MIL approaches, can introduce numerous noisy samples for training, and lower inference speed. To address this issue, we propose a Ranking MIL (RankMIL) approach to adaptively filter those noisy samples. Additionally, we present a Dynamic Partial Match Algorithm (DPMA) that can directly search for the target partial patch from a text-line instance during the inference stage, without requiring bags. This greatly improves the search efficiency and the performance of retrieving partial patches. We evaluate the proposed method on both English and Chinese datasets in two tasks: retrieving text-line instances and partial patches. For English text retrieval, our method outperforms state-of-the-art approaches by 8.04% mAP and 12.71% mAP on average, respectively, among three datasets for the two tasks. For Chinese text retrieval, our approach surpasses state-of-the-art approaches by 24.45% mAP and 38.06% mAP on average, respectively, among three datasets for the two tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1548-1563"},"PeriodicalIF":18.6000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Partial Scene Text Retrieval\",\"authors\":\"Hao Wang;Minghui Liao;Zhouyi Xie;Wenyu Liu;Xiang Bai\",\"doi\":\"10.1109/TPAMI.2024.3496576\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The task of partial scene text retrieval involves localizing and searching for text instances that are the same or similar to a given query text from an image gallery. However, existing methods can only handle text-line instances, leaving the problem of searching for partial patches within these text-line instances unsolved due to a lack of patch annotations in the training data. To address this issue, we propose a network that can simultaneously retrieve both text-line instances and their partial patches. Our method embeds the two types of data (query text and scene text instances) into a shared feature space and measures their cross-modal similarities. To handle partial patches, our proposed approach adopts a Multiple Instance Learning (MIL) approach to learn their similarities with query text, without requiring extra annotations. However, constructing bags, which is a standard step of conventional MIL approaches, can introduce numerous noisy samples for training, and lower inference speed. To address this issue, we propose a Ranking MIL (RankMIL) approach to adaptively filter those noisy samples. Additionally, we present a Dynamic Partial Match Algorithm (DPMA) that can directly search for the target partial patch from a text-line instance during the inference stage, without requiring bags. This greatly improves the search efficiency and the performance of retrieving partial patches. We evaluate the proposed method on both English and Chinese datasets in two tasks: retrieving text-line instances and partial patches. For English text retrieval, our method outperforms state-of-the-art approaches by 8.04% mAP and 12.71% mAP on average, respectively, among three datasets for the two tasks. For Chinese text retrieval, our approach surpasses state-of-the-art approaches by 24.45% mAP and 38.06% mAP on average, respectively, among three datasets for the two tasks.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 3\",\"pages\":\"1548-1563\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2024-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10758313/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10758313/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

部分场景文本检索的任务涉及本地化和搜索与图片库中给定查询文本相同或相似的文本实例。然而,现有的方法只能处理文本行实例,由于训练数据中缺乏补丁注释,无法解决在这些文本行实例中搜索部分补丁的问题。为了解决这个问题,我们提出了一个可以同时检索文本行实例及其部分补丁的网络。我们的方法将两种类型的数据(查询文本和场景文本实例)嵌入到共享的特征空间中,并测量它们的跨模态相似性。为了处理部分补丁,我们提出的方法采用多实例学习(MIL)方法来学习它们与查询文本的相似性,而不需要额外的注释。然而,作为传统MIL方法的标准步骤,构造袋可能会引入大量的噪声样本用于训练,并降低推理速度。为了解决这个问题,我们提出了一种排序MIL (RankMIL)方法来自适应过滤这些有噪声的样本。此外,我们提出了一种动态部分匹配算法(DPMA),该算法可以在推理阶段直接从文本行实例中搜索目标部分补丁,而不需要包。这大大提高了搜索效率和检索部分补丁的性能。我们在检索文本行实例和部分补丁两个任务上对该方法进行了评估。对于英语文本检索,我们的方法在两个任务的三个数据集中平均比最先进的方法分别高出8.04% mAP和12.71% mAP。对于中文文本检索,我们的方法在两个任务的三个数据集上平均比现有的方法分别高出24.45%和38.06%的mAP。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Partial Scene Text Retrieval
The task of partial scene text retrieval involves localizing and searching for text instances that are the same or similar to a given query text from an image gallery. However, existing methods can only handle text-line instances, leaving the problem of searching for partial patches within these text-line instances unsolved due to a lack of patch annotations in the training data. To address this issue, we propose a network that can simultaneously retrieve both text-line instances and their partial patches. Our method embeds the two types of data (query text and scene text instances) into a shared feature space and measures their cross-modal similarities. To handle partial patches, our proposed approach adopts a Multiple Instance Learning (MIL) approach to learn their similarities with query text, without requiring extra annotations. However, constructing bags, which is a standard step of conventional MIL approaches, can introduce numerous noisy samples for training, and lower inference speed. To address this issue, we propose a Ranking MIL (RankMIL) approach to adaptively filter those noisy samples. Additionally, we present a Dynamic Partial Match Algorithm (DPMA) that can directly search for the target partial patch from a text-line instance during the inference stage, without requiring bags. This greatly improves the search efficiency and the performance of retrieving partial patches. We evaluate the proposed method on both English and Chinese datasets in two tasks: retrieving text-line instances and partial patches. For English text retrieval, our method outperforms state-of-the-art approaches by 8.04% mAP and 12.71% mAP on average, respectively, among three datasets for the two tasks. For Chinese text retrieval, our approach surpasses state-of-the-art approaches by 24.45% mAP and 38.06% mAP on average, respectively, among three datasets for the two tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Calibrating Biased Distribution in VFM-Derived Latent Space via Cross-Domain Geometric Consistency. Penny-Wise and Pound-Foolish in AI-Generated Image Detection. 50 Years of Automated Face Recognition. Soft Label Pruning and Quantization for Large-Scale Dataset Distillation. On the Adversarial Transferability of Generalized "Skip Connections".
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1