区域级图像匹配的学习搜索路径

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-20 DOI:10.1109/ICASSP.2019.8682714

Onkar Krishna, Go Irie, Xiaomeng Wu, T. Kawanishi, K. Kashino

{"title":"区域级图像匹配的学习搜索路径","authors":"Onkar Krishna, Go Irie, Xiaomeng Wu, T. Kawanishi, K. Kashino","doi":"10.1109/ICASSP.2019.8682714","DOIUrl":null,"url":null,"abstract":"Finding a region of an image which matches to a query from a large number of candidates is a fundamental problem in image processing. The exhaustive nature of the sliding window approach has encouraged works that can reduce the run time by skipping unnecessary windows or pixels that do not play a substantial role in search results. However, such a pruning-based approach still needs to evaluate the non-ignorable number of candidates, which leads to a limited efficiency improvement. We propose an approach to learn efficient search paths from data. Our model is based on a CNN-LSTM architecture which is designed to sequentially determine a prospective location to be searched next based on the history of the locations attended. We propose a reinforcement learning algorithm to train the model in an end-to-end manner, which allows to jointly learn the search paths and deep image features for matching. These properties together significantly reduce the number of windows to be evaluated and makes it robust to background clutters. Our model gives remarkable matching accuracy with the reduced number of windows and run time on MNIST and FlickrLogos-32 datasets.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"1967-1971"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Learning Search Path for Region-level Image Matching\",\"authors\":\"Onkar Krishna, Go Irie, Xiaomeng Wu, T. Kawanishi, K. Kashino\",\"doi\":\"10.1109/ICASSP.2019.8682714\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding a region of an image which matches to a query from a large number of candidates is a fundamental problem in image processing. The exhaustive nature of the sliding window approach has encouraged works that can reduce the run time by skipping unnecessary windows or pixels that do not play a substantial role in search results. However, such a pruning-based approach still needs to evaluate the non-ignorable number of candidates, which leads to a limited efficiency improvement. We propose an approach to learn efficient search paths from data. Our model is based on a CNN-LSTM architecture which is designed to sequentially determine a prospective location to be searched next based on the history of the locations attended. We propose a reinforcement learning algorithm to train the model in an end-to-end manner, which allows to jointly learn the search paths and deep image features for matching. These properties together significantly reduce the number of windows to be evaluated and makes it robust to background clutters. Our model gives remarkable matching accuracy with the reduced number of windows and run time on MNIST and FlickrLogos-32 datasets.\",\"PeriodicalId\":13203,\"journal\":{\"name\":\"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"1 1\",\"pages\":\"1967-1971\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2019.8682714\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2019.8682714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

从大量的候选图像中找到与查询匹配的图像区域是图像处理中的一个基本问题。滑动窗口方法的详尽性鼓励了一些可以通过跳过不必要的窗口或在搜索结果中不起重要作用的像素来减少运行时间的工作。然而，这种基于剪枝的方法仍然需要评估不可忽略的候选数量，这导致效率的提高有限。我们提出了一种从数据中学习有效搜索路径的方法。我们的模型基于CNN-LSTM架构，该架构旨在根据出席地点的历史顺序确定下一步要搜索的潜在地点。我们提出了一种强化学习算法，以端到端方式训练模型，允许联合学习搜索路径和深度图像特征进行匹配。这些属性一起显着减少了要评估的窗口数量，并使其对背景杂波具有鲁棒性。我们的模型在MNIST和FlickrLogos-32数据集上提供了显著的匹配精度，减少了窗口数量和运行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning Search Path for Region-level Image Matching

Finding a region of an image which matches to a query from a large number of candidates is a fundamental problem in image processing. The exhaustive nature of the sliding window approach has encouraged works that can reduce the run time by skipping unnecessary windows or pixels that do not play a substantial role in search results. However, such a pruning-based approach still needs to evaluate the non-ignorable number of candidates, which leads to a limited efficiency improvement. We propose an approach to learn efficient search paths from data. Our model is based on a CNN-LSTM architecture which is designed to sequentially determine a prospective location to be searched next based on the history of the locations attended. We propose a reinforcement learning algorithm to train the model in an end-to-end manner, which allows to jointly learn the search paths and deep image features for matching. These properties together significantly reduce the number of windows to be evaluated and makes it robust to background clutters. Our model gives remarkable matching accuracy with the reduced number of windows and run time on MNIST and FlickrLogos-32 datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量

期刊最新文献

Universal Acoustic Modeling Using Neural Mixture Models Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech Robust M-estimation Based Matrix Completion When Can a System of Subnetworks Be Registered Uniquely? Learning Search Path for Region-level Image Matching