International Journal of Multimedia Information Retrieval最新文献

英文中文

A local representation-enhanced recurrent convolutional network for image captioning 图像标注的局部表示增强递归卷积网络

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Multimedia Information Retrieval

Pub Date : 2022-04-12 DOI: 10.1007/s13735-022-00231-y

Xiaoyi Wang, Jun Huang

引用次数: 0

Multimodal Quasi-AutoRegression: forecasting the visual popularity of new fashion products 多模态准自回归:预测新时尚产品的视觉流行度

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Multimedia Information Retrieval

Pub Date : 2022-04-08 DOI: 10.48550/arXiv.2204.04014

Stefanos Papadopoulos, C. Koutlis, S. Papadopoulos, Y. Kompatsiaris

Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of historical data. To this end, we propose MuQAR, a Multimodal Quasi-AutoRegressive deep learning architecture that combines two modules: (1) a multimodal multilayer perceptron processing categorical, visual and textual features of the product and (2) a Quasi-AutoRegressive neural network modelling the “target” time series of the product’s attributes along with the “exogenous” time series of all other attributes. We utilize computer vision, image classification and image captioning, for automatically extracting visual features and textual descriptions from the images of new products. Product design in fashion is initially expressed visually and these features represent the products’ unique characteristics without interfering with the creative process of its designers by requiring additional inputs (e.g. manually written texts). We employ the product’s target attributes time series as a proxy of temporal popularity patterns, mitigating the lack of historical data, while exogenous time series help capture trends among interrelated attributes. We perform an extensive ablation analysis on two large-scale image fashion datasets, Mallzee-P and SHIFT15m to assess the adequacy of MuQAR and also use the Amazon Reviews: Home and Kitchen dataset to assess generalization to other domains. A comparative study on the VISUELLE dataset shows that MuQAR is capable of competing and surpassing the domain’s current state of the art by 4.65% and 4.8% in terms of WAPE and MAE, respectively.

估计消费者的偏好对时尚行业来说是至关重要的，因为适当地利用这些信息对利润是有益的。由于时尚行业的快速变化，时尚趋势检测是一项具有挑战性的任务。此外，由于缺乏历史数据，预测新服装设计的视觉流行度更加困难。为此，我们提出了MuQAR，一种多模态准自回归深度学习架构，它结合了两个模块:(1)处理产品的分类、视觉和文本特征的多模态多层感知器;(2)对产品属性的“目标”时间序列以及所有其他属性的“外生”时间序列进行建模的准自回归神经网络。我们利用计算机视觉、图像分类和图像字幕，从新产品图像中自动提取视觉特征和文本描述。时尚产品设计最初是通过视觉来表达的，这些特征代表了产品的独特特征，而不需要额外的输入(例如手工书写文本)来干扰设计师的创作过程。我们使用产品的目标属性时间序列作为时间流行模式的代理，减轻了历史数据的缺乏，而外生时间序列有助于捕获相关属性之间的趋势。我们对两个大型图像时尚数据集Mallzee-P和SHIFT15m进行了广泛的消纳分析，以评估MuQAR的充分性，并使用亚马逊评论:家庭和厨房数据集评估对其他领域的泛化。对VISUELLE数据集的比较研究表明，MuQAR能够在WAPE和MAE方面分别竞争并超过该领域目前的技术水平4.65%和4.8%。

{"title":"Multimodal Quasi-AutoRegression: forecasting the visual popularity of new fashion products","authors":"Stefanos Papadopoulos, C. Koutlis, S. Papadopoulos, Y. Kompatsiaris","doi":"10.48550/arXiv.2204.04014","DOIUrl":"https://doi.org/10.48550/arXiv.2204.04014","url":null,"abstract":"Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of historical data. To this end, we propose MuQAR, a Multimodal Quasi-AutoRegressive deep learning architecture that combines two modules: (1) a multimodal multilayer perceptron processing categorical, visual and textual features of the product and (2) a Quasi-AutoRegressive neural network modelling the “target” time series of the product’s attributes along with the “exogenous” time series of all other attributes. We utilize computer vision, image classification and image captioning, for automatically extracting visual features and textual descriptions from the images of new products. Product design in fashion is initially expressed visually and these features represent the products’ unique characteristics without interfering with the creative process of its designers by requiring additional inputs (e.g. manually written texts). We employ the product’s target attributes time series as a proxy of temporal popularity patterns, mitigating the lack of historical data, while exogenous time series help capture trends among interrelated attributes. We perform an extensive ablation analysis on two large-scale image fashion datasets, Mallzee-P and SHIFT15m to assess the adequacy of MuQAR and also use the Amazon Reviews: Home and Kitchen dataset to assess generalization to other domains. A comparative study on the VISUELLE dataset shows that MuQAR is capable of competing and surpassing the domain’s current state of the art by 4.65% and 4.8% in terms of WAPE and MAE, respectively.","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"61 1","pages":"717-729"},"PeriodicalIF":5.6,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84560561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

PDS-Net: A novel point and depth-wise separable convolution for real-time object detection PDS-Net:一种用于实时目标检测的新颖的点和深度可分离卷积

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Multimedia Information Retrieval

Pub Date : 2022-03-24 DOI: 10.1007/s13735-022-00229-6

M. Junayed, Md Baharul Islam, H. Imani, Tarkan Aydin

引用次数: 3

Caption TLSTMs: combining transformer with LSTMs for image captioning tlstm:结合变压器和lstm进行图像字幕

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Multimedia Information Retrieval

Pub Date : 2022-03-23 DOI: 10.1007/s13735-022-00228-7

Jie Yan, Yuxiang Xie, Xidao Luan, Yanming Guo, Quanzhi Gong, Suru Feng

引用次数: 4

Few2Decide: towards a robust model via using few neuron connections to decide Few2Decide:通过使用较少的神经元连接来决定一个鲁棒模型

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Multimedia Information Retrieval

Pub Date : 2022-01-30 DOI: 10.1007/s13735-021-00223-4

Jian Li, Yanming Guo, Songyang Lao, Xiang Zhao, Liang Bai, Haoran Wang

引用次数: 1

Enhancing the performance of 3D auto-correlation gradient features in depth action classification 增强三维自相关梯度特征在深度动作分类中的性能

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Multimedia Information Retrieval

Pub Date : 2022-01-16 DOI: 10.1007/s13735-021-00226-1

Mohammad Farhad Bulbul, S. Islam, Zannatul Azme, Preksha Pareek, Md. Humaun Kabir, Hazrat Ali

引用次数: 1

Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey. 生成对抗网络及其在生物医学图像分割中的应用综述。

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Multimedia Information Retrieval

Pub Date : 2022-01-01 DOI: 10.1007/s13735-022-00240-x

Ahmed Iqbal, Muhammad Sharif, Mussarat Yasmin, Mudassar Raza, Shabib Aftab

Recent advancements with deep generative models have proven significant potential in the task of image synthesis, detection, segmentation, and classification. Segmenting the medical images is considered a primary challenge in the biomedical imaging field. There have been various GANs-based models proposed in the literature to resolve medical segmentation challenges. Our research outcome has identified 151 papers; after the twofold screening, 138 papers are selected for the final survey. A comprehensive survey is conducted on GANs network application to medical image segmentation, primarily focused on various GANs-based models, performance metrics, loss function, datasets, augmentation methods, paper implementation, and source codes. Secondly, this paper provides a detailed overview of GANs network application in different human diseases segmentation. We conclude our research with critical discussion, limitations of GANs, and suggestions for future directions. We hope this survey is beneficial and increases awareness of GANs network implementations for biomedical image segmentation tasks.

深度生成模型的最新进展已经在图像合成、检测、分割和分类任务中证明了巨大的潜力。医学图像的分割被认为是生物医学成像领域的一个主要挑战。文献中提出了各种基于高斯的模型来解决医疗分割的挑战。我们的研究成果鉴定了151篇论文;经过两次筛选，最终选出138篇论文进行最终调查。对gan网络在医学图像分割中的应用进行了全面的综述，主要集中在各种基于gan的模型、性能指标、损失函数、数据集、增强方法、论文实现和源代码。其次，详细概述了gan网络在不同人类疾病分割中的应用。我们以批判性的讨论、gan的局限性以及对未来发展方向的建议来结束我们的研究。我们希望这项调查是有益的，并提高对gan网络在生物医学图像分割任务中的实现的认识。

{"title":"Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey.","authors":"Ahmed Iqbal, Muhammad Sharif, Mussarat Yasmin, Mudassar Raza, Shabib Aftab","doi":"10.1007/s13735-022-00240-x","DOIUrl":"https://doi.org/10.1007/s13735-022-00240-x","url":null,"abstract":"Recent advancements with deep generative models have proven significant potential in the task of image synthesis, detection, segmentation, and classification. Segmenting the medical images is considered a primary challenge in the biomedical imaging field. There have been various GANs-based models proposed in the literature to resolve medical segmentation challenges. Our research outcome has identified 151 papers; after the twofold screening, 138 papers are selected for the final survey. A comprehensive survey is conducted on GANs network application to medical image segmentation, primarily focused on various GANs-based models, performance metrics, loss function, datasets, augmentation methods, paper implementation, and source codes. Secondly, this paper provides a detailed overview of GANs network application in different human diseases segmentation. We conclude our research with critical discussion, limitations of GANs, and suggestions for future directions. We hope this survey is beneficial and increases awareness of GANs network implementations for biomedical image segmentation tasks.","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"11 3","pages":"333-368"},"PeriodicalIF":5.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10253310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

A review on deep learning in medical image analysis. 深度学习在医学图像分析中的研究进展。

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Multimedia Information Retrieval

Pub Date : 2022-01-01 Epub Date: 2021-09-04 DOI: 10.1007/s13735-021-00218-1

S Suganyadevi, V Seethalakshmi, K Balasamy

Ongoing improvements in AI, particularly concerning deep learning techniques, are assisting to identify, classify, and quantify patterns in clinical images. Deep learning is the quickest developing field in artificial intelligence and is effectively utilized lately in numerous areas, including medication. A brief outline is given on studies carried out on the region of application: neuro, brain, retinal, pneumonic, computerized pathology, bosom, heart, breast, bone, stomach, and musculoskeletal. For information exploration, knowledge deployment, and knowledge-based prediction, deep learning networks can be successfully applied to big data. In the field of medical image processing methods and analysis, fundamental information and state-of-the-art approaches with deep learning are presented in this paper. The primary goals of this paper are to present research on medical image processing as well as to define and implement the key guidelines that are identified and addressed.

人工智能的持续改进，特别是在深度学习技术方面，正在帮助识别、分类和量化临床图像中的模式。深度学习是人工智能中发展最快的领域，最近在包括医疗在内的许多领域得到了有效的应用。简要概述了在应用领域进行的研究:神经、脑、视网膜、肺炎、计算机病理学、胸部、心脏、乳房、骨骼、胃和肌肉骨骼。对于信息探索、知识部署和基于知识的预测，深度学习网络可以成功地应用于大数据。在医学图像处理方法和分析领域，本文介绍了深度学习的基本信息和最新方法。本文的主要目标是介绍医学图像处理的研究，以及定义和实施所确定和解决的关键指导方针。

引用次数: 77

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown. 远程交互式视频检索评估：在第 10 届视频浏览器对决中比较远程环境下的 16 个交互式视频搜索系统。

IF 3.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Multimedia Information Retrieval

Pub Date : 2022-01-01 Epub Date: 2022-01-26 DOI: 10.1007/s13735-021-00225-2

Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, Jiaxin Wu

The Video Browser Showdown addresses difficult video search challenges through an annual interactive evaluation campaign attracting research teams focusing on interactive video retrieval. The campaign aims to provide insights into the performance of participating interactive video retrieval systems, tested by selected search tasks on large video collections. For the first time in its ten year history, the Video Browser Showdown 2021 was organized in a fully remote setting and hosted a record number of sixteen scoring systems. In this paper, we describe the competition setting, tasks and results and give an overview of state-of-the-art methods used by the competing systems. By looking at query result logs provided by ten systems, we analyze differences in retrieval model performances and browsing times before a correct submission. Through advances in data gathering methodology and tools, we provide a comprehensive analysis of ad-hoc video search tasks, discuss results, task design and methodological challenges. We highlight that almost all top performing systems utilize some sort of joint embedding for text-image retrieval and enable specification of temporal context in queries for known-item search. Whereas a combination of these techniques drive the currently top performing systems, we identify several future challenges for interactive video search engines and the Video Browser Showdown competition itself.

视频浏览器对决 "通过年度互动评估活动，吸引专注于互动视频检索的研究团队，解决视频搜索的难题。该活动旨在通过在大型视频库中执行选定的搜索任务来测试参与活动的交互式视频检索系统的性能。2021 年 "视频浏览器对决 "在其十年的历史上首次采用了完全远程的组织方式，并创下了 16 个评分系统参赛的记录。在本文中，我们将介绍比赛设置、任务和结果，并概述参赛系统所使用的最先进方法。通过查看十个系统提供的查询结果日志，我们分析了检索模型性能和正确提交前浏览时间的差异。通过数据收集方法和工具的进步，我们对临时视频搜索任务进行了全面分析，讨论了结果、任务设计和方法挑战。我们强调，几乎所有性能最佳的系统都利用某种联合嵌入技术进行文本-图像检索，并在已知项目搜索的查询中指定时间上下文。这些技术的结合推动了目前表现最出色的系统，同时我们也为交互式视频搜索引擎和视频浏览器对决竞赛本身指出了几个未来的挑战。

{"title":"Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown.","authors":"Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, Jiaxin Wu","doi":"10.1007/s13735-021-00225-2","DOIUrl":"10.1007/s13735-021-00225-2","url":null,"abstract":"The Video Browser Showdown addresses difficult video search challenges through an annual interactive evaluation campaign attracting research teams focusing on interactive video retrieval. The campaign aims to provide insights into the performance of participating interactive video retrieval systems, tested by selected search tasks on large video collections. For the first time in its ten year history, the Video Browser Showdown 2021 was organized in a fully remote setting and hosted a record number of sixteen scoring systems. In this paper, we describe the competition setting, tasks and results and give an overview of state-of-the-art methods used by the competing systems. By looking at query result logs provided by ten systems, we analyze differences in retrieval model performances and browsing times before a correct submission. Through advances in data gathering methodology and tools, we provide a comprehensive analysis of ad-hoc video search tasks, discuss results, task design and methodological challenges. We highlight that almost all top performing systems utilize some sort of joint embedding for text-image retrieval and enable specification of temporal context in queries for known-item search. Whereas a combination of these techniques drive the currently top performing systems, we identify several future challenges for interactive video search engines and the Video Browser Showdown competition itself.","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"11 1","pages":"1-18"},"PeriodicalIF":3.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8791088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39872573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A fast and robust affine-invariant method for shape registration under partial occlusion 一种快速鲁棒的局部遮挡下形状配准的仿射不变方法

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Multimedia Information Retrieval

Pub Date : 2021-11-30 DOI: 10.1007/s13735-021-00224-3

Sinda Elghoul, F. Ghorbel

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Journal of Multimedia Information Retrieval

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀