Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing最新文献

英文中文

A Robust Ensemble of ResNets for Character Level End-to-end Text Detection in Natural Scene Images 自然场景图像中字符级端到端文本检测的鲁棒ResNets集成

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

Pub Date : 2017-06-19 DOI: 10.1145/3095713.3095724

Jinsu Kim, Yoonhyung Kim, Changick Kim

Detecting text in natural scene images is a challenging task. In this paper, we propose a character-level end-to-end text detection algorithm in natural scene images. In general, text detection tasks are categorized into three parts: text localization, text segmentation, and text recognition. The proposed method aims not only to localize but also to recognize text. To do these tasks successfully, the proposed method consists of four steps: character candidate patch extraction, patch classification using ensemble of ResNets, non-character region elimination, and character region grouping via self-tuning spectral clustering. In the character candidate patch extraction step, character candidate patches are extracted from the image by using both edge information from multi-scale images and Maximally Stable Extremal Regions (MSERs). Then each patch is classified into either character patch or non-character patch by using the deep network that is composed of three ResNets with different hyper-parameters. Text regions are determined by filtering out non-character patches. In order to make further reduction of classification errors, character characteristics are employed to compensate classification results of the ensemble of ResNets. To evaluate the text detection performance, character regions are grouped via self-tuning spectral clustering. The proposed method shows competitive performance on the ICDAR 2013 dataset.

在自然场景图像中检测文本是一项具有挑战性的任务。本文提出了一种基于自然场景图像的字符级端到端文本检测算法。一般来说，文本检测任务分为三个部分:文本定位、文本分割和文本识别。本文提出的方法既能实现文本的定位，又能实现文本的识别。为了成功地完成这些任务，该方法包括四个步骤:字符候选补丁提取、使用ResNets集成的补丁分类、非字符区域消除和通过自调谐谱聚类对字符区域进行分组。在字符候选补丁提取步骤中，利用多尺度图像的边缘信息和最大稳定极值区域(mser)从图像中提取字符候选补丁。然后利用由三个具有不同超参数的resnet组成的深度网络将每个补丁分为字符补丁和非字符补丁。文本区域是通过过滤掉非字符补丁来确定的。为了进一步降低分类误差，采用特征特征对ResNets集合的分类结果进行补偿。为了评估文本检测性能，通过自调优谱聚类对字符区域进行分组。该方法在ICDAR 2013数据集上显示出具有竞争力的性能。

{"title":"A Robust Ensemble of ResNets for Character Level End-to-end Text Detection in Natural Scene Images","authors":"Jinsu Kim, Yoonhyung Kim, Changick Kim","doi":"10.1145/3095713.3095724","DOIUrl":"https://doi.org/10.1145/3095713.3095724","url":null,"abstract":"Detecting text in natural scene images is a challenging task. In this paper, we propose a character-level end-to-end text detection algorithm in natural scene images. In general, text detection tasks are categorized into three parts: text localization, text segmentation, and text recognition. The proposed method aims not only to localize but also to recognize text. To do these tasks successfully, the proposed method consists of four steps: character candidate patch extraction, patch classification using ensemble of ResNets, non-character region elimination, and character region grouping via self-tuning spectral clustering. In the character candidate patch extraction step, character candidate patches are extracted from the image by using both edge information from multi-scale images and Maximally Stable Extremal Regions (MSERs). Then each patch is classified into either character patch or non-character patch by using the deep network that is composed of three ResNets with different hyper-parameters. Text regions are determined by filtering out non-character patches. In order to make further reduction of classification errors, character characteristics are employed to compensate classification results of the ensemble of ResNets. To evaluate the text detection performance, character regions are grouped via self-tuning spectral clustering. The proposed method shows competitive performance on the ICDAR 2013 dataset.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122202161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Selection of User Generated Event Videos 用户生成事件视频的学习选择

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

Pub Date : 2017-06-19 DOI: 10.1145/3095713.3095715

W. Bailer, M. Winter, Stefanie Wechtitsch

User generated images and videos can enhance the coverage of live events on social and online media, as well as in broadcasts. However, the quality, relevance and complementarity of the received contributions varies greatly. In a live scenario, it is often not feasible for the editorial team to review all content and make selections. We propose to support this work by automatic selection based on captured metadata, and extracted quality and content features. It is usually desired to have a human in the loop, thus the automatic system does not make a final decision, but provides a ranked list of content items. As the operator makes selections, the automatic system shall learn from these decisions, which may change over time. Due to the need for online learning and quick adaptation, we propose the use of online random forests for this task. We show on data from three real live events that the approach is able to provide a ranking based on the predicted selection likelihood after an initial adjustment phase.

用户生成的图像和视频可以在社交和在线媒体以及广播中增强对现场活动的报道。但是，收到的捐款的质量、相关性和互补性差别很大。在实时场景中，编辑团队审查所有内容并做出选择通常是不可行的。我们建议通过基于捕获的元数据和提取的质量和内容特征的自动选择来支持这项工作。人们通常希望有一个人参与其中，这样自动系统就不会做出最终决定，而是提供一个内容项目的排名列表。当操作员做出选择时，自动系统将从这些决定中学习，这些决定可能会随着时间的推移而改变。由于在线学习和快速适应的需要，我们建议使用在线随机森林来完成这项任务。我们通过三个真实事件的数据表明，该方法能够在初始调整阶段后根据预测的选择可能性提供排名。

引用次数: 3

Connecting the Dots: Enhancing the Usability of Indexed Multimedia Data for AR Cultural Heritage Applications through Storytelling 连接点:通过讲故事增强AR文化遗产应用的索引多媒体数据的可用性

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

Pub Date : 2017-06-19 DOI: 10.1145/3095713.3095725

Jae-eun Shin, Hyerim Park, Woontack Woo

This paper proposes a method to effectively utilize multimedia databases created and indexed by a metadata schema designed specifically for AR applications used at cultural heritage sites. We attempt to do so by incorporating storytelling principles that employ video data for the purpose of providing useful and meaningful guidance at Changdeokgung Palace, a UNESCO World Heritage Site that consists of multiple PoIs. We designed a themed narrative that embeds video data related to the PoIs, creating a guide route that connects each PoI in a fixed order. An extensive between-group user evaluation comparing a search-based AR experience and the proposed narrative-based one was conducted to test and prove the validity and effectiveness of our approach. Our results show that storytelling is indeed a significantly powerful tool to enhance the level of immersion users experience through video data in an AR environment at cultural heritage sites.

本文提出了一种有效利用多媒体数据库的方法，该数据库是由专门为文化遗址的AR应用程序设计的元数据模式创建和索引的。昌德宫是由多个景点组成的联合国教科文组织世界遗产，我们试图通过结合视频数据的叙事原则，为昌德宫提供有用和有意义的指导。我们设计了一个主题叙事，嵌入与PoI相关的视频数据，创建一个以固定顺序连接每个PoI的指导路线。我们进行了广泛的组间用户评估，比较了基于搜索的AR体验和基于叙述的AR体验，以测试和证明我们方法的有效性和有效性。我们的研究结果表明，在文化遗址的AR环境中，讲故事确实是一个非常强大的工具，可以通过视频数据提高用户的沉浸体验水平。

引用次数: 13

FuseMe: Classification of sMRI images by fusion of Deep CNNs in 2D+ε projections FuseMe:基于2D+ε投影的深度cnn融合的sMRI图像分类

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

Pub Date : 2017-06-19 DOI: 10.1145/3095713.3095749

Karim Aderghal, J. Benois-Pineau, K. Afdel, G. Catheline

The methods of Content-Based visual information indexing and retrieval penetrate into Healthcare and become popular in Computer-Aided Diagnosis. Multimedia in medical images means different imaging modalities, but also multiple views of the same physiological object, such as human brain. In this paper we propose1 a multi-projection fusion approach with CNNs for diagnostics of Alzheimer Disease. Instead of working with the whole brain volume, it fuses CNNs from each brain projection sagittal, coronal, and axial ingesting a 2D+ε limited volume we have previously proposed. Three binary classification tasks are considered separating Alzheimer Disease (AD) patients from Mild Cognitive Impairment (MCI) and Normal control Subject (NC). Two fusion methods on FC-layer and on the single-projection CNN output show better performances, up to 91% and show competitive results with the SOA using heavier algorithmic chains.

基于内容的可视化信息索引与检索方法深入到医疗卫生领域，在计算机辅助诊断领域得到广泛应用。医学图像中的多媒体意味着不同的成像方式，也意味着同一生理对象(如人脑)的多个视图。本文提出了一种基于cnn的多投影融合方法用于阿尔茨海默病的诊断。它不是处理整个脑容量，而是融合来自每个脑投影的cnn，矢状面，冠状面和轴向，摄取我们之前提出的2D+ε有限体积。采用三种二元分类任务将阿尔茨海默病(AD)患者与轻度认知障碍(MCI)和正常对照(NC)区分。在fc层和单投影CNN输出上的两种融合方法表现出更好的性能，达到91%，与使用更重算法链的SOA具有竞争力。

引用次数: 58

Harvesting Deep Models for Cross-Lingual Image Annotation 获取跨语言图像标注的深度模型

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

Pub Date : 2017-06-19 DOI: 10.1145/3095713.3095751

Qijie Wei, Xiaoxu Wang, Xirong Li

This paper considers cross-lingual image annotation, harvesting deep visual models from one language to annotate images with labels from another language. This task cannot be accomplished by machine translation, as labels can be ambiguous and a translated vocabulary leaves us limited freedom to annotate images with appropriate labels. Given non-overlapping vocabularies between two languages, we formulate cross-lingual image annotation as a zero-shot learning problem. For cross-lingual label matching, we adapt zero-shot by replacing the current monolingual semantic embedding space by a bilingual alternative. In order to reduce both label ambiguity and redundancy we propose a simple yet effective approach called label-enhanced zero-shot learning. Using three state-of-the-art deep visual models, i.e., ResNet-152, GoogleNet-Shuffle and OpenImages, experiments on the test set of Flickr8k-CN demonstrate the viability of the proposed approach for cross-lingual image annotation.

本文考虑跨语言图像标注，从一种语言中获取深度视觉模型，并用另一种语言的标签对图像进行标注。这个任务不能通过机器翻译完成，因为标签可能是模糊的，而且翻译后的词汇表使我们用适当的标签注释图像的自由受到限制。给定两种语言之间不重叠的词汇表，我们将跨语言图像标注制定为零学习问题。对于跨语言的标签匹配，我们通过用双语替代当前的单语语义嵌入空间来适应零射击。为了减少标签歧义和冗余，我们提出了一种简单而有效的方法，称为标签增强零次学习。使用ResNet-152、GoogleNet-Shuffle和OpenImages三种最先进的深度视觉模型，在Flickr8k-CN测试集上的实验证明了该方法用于跨语言图像标注的可行性。

引用次数: 5

Question Part Relevance and Editing for Cooperative and Context-Aware VQA (C2VQA) 基于协作和上下文感知的VQA (C2VQA)问题部分关联与编辑

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

Pub Date : 2017-06-19 DOI: 10.1145/3095713.3095718

Andeep S. Toor, H. Wechsler, M. Nappi

Visual Question Answering (VQA), a task that requires the ability to provide an answer to a question given an image, has recently become an important benchmark for computer vision. However, current VQA approaches are unable to adequately handle questions that are "irrelevant", such as asking about a cat for an image that has no cat. To date, only one paper has examined the idea of question relevance in VQA, using a binary classification model to assign a relevancy to the entire question / image pair. Truly robust VQA models, however, must not only identify potentially irrelevant questions, but also discover the source of irrelevance and seek to correct it. We therefore introduce two novel problems, question part relevance and question editing, and approaches for solving each problem. In question part relevance, our models go beyond binary question relevance by assigning a classification probability to the portion of the question that is irrelevant. The best question part relevance classifier is later used in question editing to rank possible corrections to the irrelevant portion of given questions. Two custom datasets are developed for these problems using the Visual Genome dataset as a source. Our best models show promising results in these novel tasks over baseline approaches and models adapted from whole-question relevance classification. This work contributes directly to the development of more context-aware and cooperative VQA models, dubbed C2VQA.

视觉问答(Visual Question answer, VQA)是一项要求对给定图像的问题提供答案的任务，最近已成为计算机视觉的重要基准。然而，目前的VQA方法无法充分处理“不相关”的问题，例如在没有猫的图像中询问猫。到目前为止，只有一篇论文研究了VQA中问题相关性的概念，使用二元分类模型为整个问题/图像对分配相关性。然而，真正健壮的VQA模型不仅必须识别潜在的不相关问题，而且还必须发现不相关的来源并设法纠正它。因此，我们介绍了两个新问题，问题部分关联和问题编辑，以及解决每个问题的方法。在问题部分相关性中，我们的模型通过为问题不相关的部分分配分类概率来超越二元问题相关性。最佳问题部分相关分类器随后用于问题编辑，对给定问题的不相关部分进行可能的更正排序。使用Visual Genome数据集作为源，为这些问题开发了两个自定义数据集。我们最好的模型在这些新任务中显示出有希望的结果，超过基线方法和模型，从整个问题相关分类中改编。这项工作直接有助于开发更多上下文感知和协作的VQA模型，称为C2VQA。

{"title":"Question Part Relevance and Editing for Cooperative and Context-Aware VQA (C2VQA)","authors":"Andeep S. Toor, H. Wechsler, M. Nappi","doi":"10.1145/3095713.3095718","DOIUrl":"https://doi.org/10.1145/3095713.3095718","url":null,"abstract":"Visual Question Answering (VQA), a task that requires the ability to provide an answer to a question given an image, has recently become an important benchmark for computer vision. However, current VQA approaches are unable to adequately handle questions that are \"irrelevant\", such as asking about a cat for an image that has no cat. To date, only one paper has examined the idea of question relevance in VQA, using a binary classification model to assign a relevancy to the entire question / image pair. Truly robust VQA models, however, must not only identify potentially irrelevant questions, but also discover the source of irrelevance and seek to correct it. We therefore introduce two novel problems, question part relevance and question editing, and approaches for solving each problem. In question part relevance, our models go beyond binary question relevance by assigning a classification probability to the portion of the question that is irrelevant. The best question part relevance classifier is later used in question editing to rank possible corrections to the irrelevant portion of given questions. Two custom datasets are developed for these problems using the Visual Genome dataset as a source. Our best models show promising results in these novel tasks over baseline approaches and models adapted from whole-question relevance classification. This work contributes directly to the development of more context-aware and cooperative VQA models, dubbed C2VQA.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124297480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

The 3D-Pitoti Dataset: A Dataset for high-resolution 3D Surface Segmentation 3D- pitoti数据集:用于高分辨率3D表面分割的数据集

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

Pub Date : 2016-10-06 DOI: 10.1145/3095713.3095719

Georg Poier, Markus Seidl, M. Zeppelzauer, Christian Reinbacher, M. Schaich, G. Bellandi, A. Marretta, H. Bischof

The development of powerful 3D scanning hardware and reconstruction algorithms has strongly promoted the generation of 3D surface reconstructions in different domains. An area of special interest for such 3D reconstructions is the cultural heritage domain, where surface reconstructions are generated to digitally preserve historical artifacts. While reconstruction quality nowadays is sufficient in many cases, the robust analysis (e.g. segmentation, matching, and classification) of reconstructed 3D data is still an open topic. In this paper, we target the automatic segmentation of high-resolution 3D surface reconstructions of petroglyphs. To foster research in this field, we introduce a fully annotated, large-scale 3D surface dataset including high-resolution meshes, depth maps and point clouds as a novel benchmark dataset, which we make publicly available. Additionally, we provide baseline results for a random forest as well as a convolutional neural network based approach. Results show the complementary strengths and weaknesses of both approaches and point out that the provided dataset represents an open challenge for future research.

强大的三维扫描硬件和重建算法的发展有力地促进了不同领域三维表面重建的生成。这种3D重建的一个特别感兴趣的领域是文化遗产领域，在那里生成表面重建以数字方式保存历史文物。虽然目前在许多情况下重建的质量是足够的，但重建的三维数据的鲁棒分析(如分割、匹配和分类)仍然是一个开放的话题。本文针对高分辨率岩画三维表面重建的自动分割问题进行了研究。为了促进这一领域的研究，我们引入了一个完全注释的大规模3D表面数据集，包括高分辨率网格，深度图和点云，作为一个新的基准数据集，我们公开提供。此外，我们为随机森林和基于卷积神经网络的方法提供了基线结果。结果显示了两种方法的互补优势和弱点，并指出所提供的数据集代表了未来研究的公开挑战。

{"title":"The 3D-Pitoti Dataset: A Dataset for high-resolution 3D Surface Segmentation","authors":"Georg Poier, Markus Seidl, M. Zeppelzauer, Christian Reinbacher, M. Schaich, G. Bellandi, A. Marretta, H. Bischof","doi":"10.1145/3095713.3095719","DOIUrl":"https://doi.org/10.1145/3095713.3095719","url":null,"abstract":"The development of powerful 3D scanning hardware and reconstruction algorithms has strongly promoted the generation of 3D surface reconstructions in different domains. An area of special interest for such 3D reconstructions is the cultural heritage domain, where surface reconstructions are generated to digitally preserve historical artifacts. While reconstruction quality nowadays is sufficient in many cases, the robust analysis (e.g. segmentation, matching, and classification) of reconstructed 3D data is still an open topic. In this paper, we target the automatic segmentation of high-resolution 3D surface reconstructions of petroglyphs. To foster research in this field, we introduce a fully annotated, large-scale 3D surface dataset including high-resolution meshes, depth maps and point clouds as a novel benchmark dataset, which we make publicly available. Additionally, we provide baseline results for a random forest as well as a convolutional neural network based approach. Results show the complementary strengths and weaknesses of both approaches and point out that the provided dataset represents an open challenge for future research.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132936082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing 第十五届基于内容的多媒体索引国际研讨会论文集

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

Pub Date : 1900-01-01 DOI: 10.1145/3095713

引用次数: 8

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀