2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)最新文献

英文中文

Music sparse decomposition onto a MIDI dictionary of musical words and its application to music mood classification MIDI音乐词字典的音乐稀疏分解及其在音乐情绪分类中的应用

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269798

Boyang Gao, E. Dellandréa, Liming Chen

Most of the automated music analysis methods available in the literature rely on the representation of the music through a set of low-level audio features related to temporal and frequential properties. Identifying high-level concepts, such as music mood, from this "black-box" representation is particularly challenging. Therefore we present in this paper a novel music representation that allows gaining an in-depth understanding of the music structure. Its principle is to decompose sparsely the music into a basis of elementary audio elements, called musical words, which represent the notes played by various instruments generated through a MIDI synthesizer. From this representation, a music feature is also proposed to allow automatic music classification. Experiments driven on two music datasets have shown the effectiveness of this approach to represent accurately music signals and to allow efficient classification for the complex problem of music mood classification.

文献中可用的大多数自动音乐分析方法都依赖于通过一组与时间和频率属性相关的低级音频特征来表示音乐。从这种“黑盒”表示中识别高级概念(如音乐情绪)尤其具有挑战性。因此，我们在本文中提出了一种新颖的音乐表示，可以深入了解音乐结构。它的原理是将音乐稀疏地分解为基本音频元素的基础，称为音乐词，它代表通过MIDI合成器生成的各种乐器演奏的音符。从这个表示中，还提出了一个音乐特征来允许自动音乐分类。在两个音乐数据集上进行的实验表明，这种方法可以准确地表示音乐信号，并对复杂的音乐情绪分类问题进行有效的分类。

引用次数: 7

Supervised acoustic topic model with a consequent classifier for unstructured audio classification 带结果分类器的监督声学主题模型用于非结构化音频分类

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269853

Samuel Kim, P. Georgiou, Shrikanth S. Narayanan

In the problem of classifying unstructured audio signals, we have reported promising results using acoustic topic models assuming that an audio signal consists of latent acoustic topics [1, 2]. In this paper, we introduce a two-step method that consists of performing supervised acoustic topic modeling on audio features followed by a classification process. Experimental results in classifying audio signals with respect to onomatopoeias and semantic labels using the BBC Sound Effects library show that the proposed method can improve the classification accuracy relatively 10~14% against the baseline supervised acoustic topic model. We also show that the proposed method is compatible with different labels so that the topic models can be trained with one set of labels and used to classify another set of labels.

在对非结构化音频信号进行分类的问题上，我们报道了使用声学主题模型的有希望的结果，假设音频信号由潜在的声学主题组成[1,2]。在本文中，我们介绍了一种两步方法，该方法包括对音频特征进行监督声学主题建模，然后进行分类过程。利用BBC Sound Effects库对拟声词和语义标签的音频信号进行分类的实验结果表明，与基线监督声学主题模型相比，该方法的分类准确率提高了10~14%。我们还证明了所提出的方法兼容不同的标签，使得主题模型可以使用一组标签进行训练，并用于对另一组标签进行分类。

引用次数: 5

Water flow detection from a wearable device with a new feature, the spectral cover 水流检测来自一种具有新功能的可穿戴设备，光谱覆盖

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269814

Patrice Guyot, J. Pinquier, R. André-Obrecht

This paper presents a new system for water flow detection on real life recordings and its application to medical context. The recognition system is based on an original feature for sound event detection in real life. This feature, called ”spectral cover” shows an interesting behaviour to recognize water flow in a noisy environment. The system is only based on thresholds. It is simple, robust, and can be used on every corpus without training. An experiment is realized with more than 7 hours of videos recorded by a wearable device. Our system obtains good results for the water flow event recognition (F-measure of 66%). A comparison with classical approaches using MFCC or low levels descriptors with GMM classifiers is done to attest the good performance of our system. Adding the spectral cover to low levels descriptors also improve their performance and confirms that this feature is relevant.

本文介绍了一种基于真实生活记录的水流检测系统及其在医学环境中的应用。该识别系统基于现实生活中声音事件检测的原始特征。这种被称为“光谱覆盖”的特征显示出在嘈杂环境中识别水流的有趣行为。系统仅基于阈值。它简单、健壮，无需训练就可以在任何语料库上使用。通过可穿戴设备录制的7小时以上的视频，实现了一个实验。该系统对水流事件的识别效果良好(f值为66%)。通过与使用MFCC的经典方法或使用GMM分类器的低级描述符进行比较，证明了系统的良好性能。将光谱覆盖添加到低电平描述符也可以提高它们的性能，并确认该特性是相关的。

引用次数: 9

Formula 1 onboard camera shot detector using motion activity areas 一级方程式车载相机拍摄探测器使用运动活动区域

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269800

Arnau Raventos, F. Tarrés

Shot detection in sports video sequences has been of great interest in the last years. In this paper, a new approach to detect onboard camera shots in compressed Formula 1 video sequences is presented. To that end, and after studying the characteristics of the shot, a technique based in the thresholding comparison between a high motion area and a stationary one has been devised. Efficient computation is achieved by direct decoding of the motion vectors in the MPEG stream. The shot detection process is done through a frame-by-frame hysteresis thresholding analysis. In order to enhance the results, a SVD shot boundary detector is applied. Promising results are presented that show the validity of the approach.

在过去的几年里，体育视频序列中的镜头检测已经引起了人们极大的兴趣。本文提出了一种新的检测f1压缩视频序列中车载摄像机镜头的方法。为此，在研究了投篮特点的基础上，设计了一种基于高运动区域和静止区域阈值比较的技术。通过直接解码MPEG流中的运动向量，实现了高效的计算。镜头检测过程是通过逐帧迟滞阈值分析完成的。为了提高检测结果，采用了奇异值分解镜头边界检测器。结果表明了该方法的有效性。

引用次数: 0

An improved algorithm on Viola-Jones object detector 一种改进的维奥拉-琼斯物体检测器算法

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269796

Qian Li, U. Niaz, B. Mérialdo

In image processing, Viola-Jones object detector [1] is one of the most successful and widely used object detectors. A popular implementation used by the community is the one in OpenCV. The detector shows its strong power in detecting faces, but we found it hard to be extended to other kinds of objects. The convergence of the training phase of this algorithm depends a lot on the training data. And the prediction precision stays low. In this paper, we have come up with new ideas to improve its performance for diverse object categories. We incorporated six different types of feature images into the Viola and Jones' framework. The integral image [1] used by the Viola-Jones detector is then computed on these feature images respectively instead of only on the gray image. The stage classifier in Viola-Jones detector is now trained on one of these feature images. We also present a new stopping criterion for the stage training. In addition, we integrate a key points based SVM [2] predictor into the prediction phase to improve the confidence of the detection result.

在图像处理中，Viola-Jones目标检测器[1]是最成功、应用最广泛的目标检测器之一。社区使用的一个流行实现是OpenCV中的实现。该检测器在检测人脸方面显示出强大的能力，但我们发现它很难扩展到其他类型的物体。该算法训练阶段的收敛性很大程度上取决于训练数据。而且预测精度很低。在本文中，我们提出了新的思路来提高其在不同对象类别下的性能。我们将六种不同类型的特征图像合并到Viola和Jones的框架中。然后分别在这些特征图像上计算Viola-Jones检测器使用的积分图像[1]，而不仅仅是在灰度图像上计算。Viola-Jones检测器中的阶段分类器现在在这些特征图像之一上进行训练。提出了一种新的阶段训练停止准则。此外，我们将基于关键点的SVM[2]预测器集成到预测阶段，以提高检测结果的置信度。

引用次数: 22

Detecting and labeling folk literature in spoken cultural heritage archives using structural and prosodic features 利用结构和韵律特征对口语文化遗产档案中的民间文学进行检测和标注

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269839

F. Valente, P. Motlícek

Spoken cultural heritage can present considerably heterogeneous content as tales, stories, recitals, poems, theatrical representations and other form of folk literature. This work investigates the automatic detection and classification of those data type in large spoken audio archives. The corpus used for this study consists of 90 radio broadcast shows collected for preserving a large variety of Swiss French dialects. Given the variability of the language spoken in the recordings, the paper proposes a language-independent system based on structural features obtained using a speaker diarization system and various acoustic/prosodic features. Results reveal that such a system can achieve an F-measure equal to 0.85 (Precision 0.88/Recall 0.84) in retrieving folk literature in those archives. Prosodic features appear more effective and complementary to structural features. Furthermore, the paper investigates whether the same approach can be used to label speech segments into five large classes (Storytelling, Poetry, Theatre, Interviews, Functionals) showing F-measures ranging from 0.52 to 0.88. As last contribution, prosodic features for disambiguating between spoken prose and spoken poetry are investigated. In summary the study shows that simple structural and acoustic/prosodic features can be used to effectively retrieve and label folk literature in broadcast archives.

口头文化遗产可以呈现相当多样化的内容，如故事、故事、朗诵、诗歌、戏剧表现和其他形式的民间文学。本文研究了大型语音档案中数据类型的自动检测与分类。本研究使用的语料库包括90个电台广播节目，收集这些节目是为了保存各种各样的瑞士法语方言。考虑到录音中所讲语言的可变性，本文提出了一个基于使用说话人音化系统和各种声学/韵律特征获得的结构特征的语言独立系统。结果表明，该系统在民间文献检索中的f值为0.85 (Precision 0.88/Recall 0.84)。韵律特征对结构特征显得更加有效和互补。此外，本文还研究了是否可以使用相同的方法将语音片段标记为五大类(讲故事，诗歌，戏剧，采访，功能)，其f值范围从0.52到0.88。作为最后的贡献，韵律特征之间消除歧义的口语散文和口语诗歌进行了研究。综上所述，研究表明，简单的结构特征和声学/韵律特征可以有效地检索和标记广播档案中的民间文学。

{"title":"Detecting and labeling folk literature in spoken cultural heritage archives using structural and prosodic features","authors":"F. Valente, P. Motlícek","doi":"10.1109/CBMI.2012.6269839","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269839","url":null,"abstract":"Spoken cultural heritage can present considerably heterogeneous content as tales, stories, recitals, poems, theatrical representations and other form of folk literature. This work investigates the automatic detection and classification of those data type in large spoken audio archives. The corpus used for this study consists of 90 radio broadcast shows collected for preserving a large variety of Swiss French dialects. Given the variability of the language spoken in the recordings, the paper proposes a language-independent system based on structural features obtained using a speaker diarization system and various acoustic/prosodic features. Results reveal that such a system can achieve an F-measure equal to 0.85 (Precision 0.88/Recall 0.84) in retrieving folk literature in those archives. Prosodic features appear more effective and complementary to structural features. Furthermore, the paper investigates whether the same approach can be used to label speech segments into five large classes (Storytelling, Poetry, Theatre, Interviews, Functionals) showing F-measures ranging from 0.52 to 0.88. As last contribution, prosodic features for disambiguating between spoken prose and spoken poetry are investigated. In summary the study shows that simple structural and acoustic/prosodic features can be used to effectively retrieve and label folk literature in broadcast archives.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129015691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analyzing the behavior of professional video searchers using RAI query logs 使用RAI查询日志分析专业视频搜索者的行为

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269795

Claudio Carpineto, Giovanni Romano, Andrea Bernardini

A large number of studies have investigated the query logs of Web search engines, but there is a lack of analogous studies for multimedia database management systems (MDBMSs) used by professional searchers. In this paper we perform an extensive analysis of the query logs of the RAI multimedia catalogue, both at the query level and at the session level. Based on the observation that a large proportion of the queries returned zero or, conversely, too many hits, we identified three query reformulation strategies to reduce or enlarge the set of results. Our study indicates that the desire of controlling the amount of output may have a relatively limited (moderate-to-little) impact on the user's behavior, while at the same time some counter-intuitive findings suggest a suboptimal utilization of the system. The findings are useful for MDBMS developers and for trainers of professional searchers to improve the performance of interactive searches, and for researchers to conduct further work.

大量的研究调查了Web搜索引擎的查询日志，但是缺乏针对专业搜索人员使用的多媒体数据库管理系统(mdbms)的类似研究。在本文中，我们在查询级和会话级对RAI多媒体目录的查询日志进行了广泛的分析。根据对很大一部分查询返回零或相反地，返回过多的点击的观察，我们确定了三种查询重构策略来减少或扩大结果集。我们的研究表明，控制输出量的愿望可能对用户的行为产生相对有限的(中等到很小)影响，而与此同时，一些反直觉的发现表明系统的次优利用。这些发现对于MDBMS开发人员和专业搜索人员的培训人员提高交互式搜索的性能以及研究人员进行进一步的工作都很有用。

引用次数: 1

Two-layers re-ranking approach based on contextual information for visual concepts detection in videos 基于上下文信息的视频视觉概念检测双层重排序方法

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269837

Abdelkader Hamadi, G. Quénot, P. Mulhem

Context helps to understand the meaning of a word and allows the disambiguation of polysemic terms. Many researches took advantage of this notion in information retrieval. For concept-based video indexing and retrieval, this idea seems a priori valid. One of the major problems is then to provide a definition of the context and to choose the most appropriate methods for using it. Two kinds of contexts were exploited in the past to improve concepts detection: in some works, inter-concepts relations are used as semantic context, where other approaches use the temporal features of videos to improve concepts detection. Results of these works showed that the “temporal” and the “semantic” contexts can improve concept detection. In this work we use the semantic context through an ontology and exploit the efficiency of the temporal context in a “two-layers” re-ranking approach. Experiments conducted on TRECVID 2010 data show that the proposed approach always improves over initial results obtained using either MSVM or KNN classifiers or their late fusion, achieving relative gains between 9% and 33% of the MAP measure.

上下文有助于理解一个词的意思，并允许多义术语消除歧义。许多研究将这一概念应用到信息检索中。对于基于概念的视频索引和检索，这个想法似乎是先验有效的。其中一个主要问题是提供上下文的定义并选择最合适的方法来使用它。过去有两种上下文被用来改进概念检测:在一些作品中，概念间关系被用作语义上下文，而其他方法利用视频的时间特征来改进概念检测。这些研究结果表明，“时间”和“语义”上下文可以提高概念检测。在这项工作中，我们通过本体使用语义上下文，并在“两层”重新排序方法中利用时间上下文的效率。在TRECVID 2010数据上进行的实验表明，所提出的方法总是优于使用MSVM或KNN分类器或其后期融合获得的初始结果，实现MAP测量的9%至33%的相对增益。

{"title":"Two-layers re-ranking approach based on contextual information for visual concepts detection in videos","authors":"Abdelkader Hamadi, G. Quénot, P. Mulhem","doi":"10.1109/CBMI.2012.6269837","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269837","url":null,"abstract":"Context helps to understand the meaning of a word and allows the disambiguation of polysemic terms. Many researches took advantage of this notion in information retrieval. For concept-based video indexing and retrieval, this idea seems a priori valid. One of the major problems is then to provide a definition of the context and to choose the most appropriate methods for using it. Two kinds of contexts were exploited in the past to improve concepts detection: in some works, inter-concepts relations are used as semantic context, where other approaches use the temporal features of videos to improve concepts detection. Results of these works showed that the “temporal” and the “semantic” contexts can improve concept detection. In this work we use the semantic context through an ontology and exploit the efficiency of the temporal context in a “two-layers” re-ranking approach. Experiments conducted on TRECVID 2010 data show that the proposed approach always improves over initial results obtained using either MSVM or KNN classifiers or their late fusion, achieving relative gains between 9% and 33% of the MAP measure.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131999537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Comprehensive wavelet-based image characterization for Content-Based Image Retrieval 基于内容的图像检索中基于小波的综合图像表征

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269840

G. Quellec, M. Lamard, B. Cochener, C. Roux, G. Cazuguel

A novel image characterization based on the wavelet transform is presented in this paper. Previous works on wavelet-based image characterization have focused on adapting a wavelet basis to an image or an image dataset. We propose in this paper to take one step further: images are characterized with all possible wavelet bases, with a given support. A simple image signature based on the standardized moments of the wavelet coefficient distributions is proposed. This signature can be computed for each possible wavelet filter fast. An image signature map is thus obtained. We propose to use this signature map as an image characterization for Content-Based Image Retrieval (CBIR). High retrieval performance was achieved on a medical, a face detection and a texture dataset: a precision at five of 62.5%, 97.8% and 64.0% was obtained for these datasets, respectively.

提出了一种基于小波变换的图像表征方法。以前基于小波的图像表征的工作主要集中在将小波基应用于图像或图像数据集。我们建议在本文中更进一步:在给定的支持下，用所有可能的小波基对图像进行表征。提出了一种基于小波系数分布的标准矩的简单图像签名方法。该签名可以快速计算每个可能的小波滤波器。从而获得图像签名映射。我们建议使用该特征映射作为基于内容的图像检索(CBIR)的图像表征。在医疗数据集、人脸检测数据集和纹理数据集上取得了很高的检索性能:这些数据集的精确度分别为62.5%、97.8%和64.0%。

引用次数: 4

A generative model for concurrent image retrieval and ROI segmentation 一种并行图像检索和ROI分割的生成模型

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269844

I. González-Díaz, Carlos E. Baz-Hormigos, Moises Berdonces, F. Díaz-de-María

This paper proposes a probabilistic generative model that concurrently tackles the problems of image retrieval and detection of the region-of-interest (ROI). By introducing a latent variable that classifies the matches as true or false, we specifically focus on the application of geometric constrains to the keypoint matching process and the achievement of robust estimates of the geometric transformation between two images showing the same object. Our experiments in a challenging image retrieval database demonstrate that our approach outperforms the most prevalent approach for geometrically constrained matching, and compares favorably to other state-of-the-art methods. Furthermore, the proposed technique concurrently provides very good segmentations of the region of interest.

本文提出了一种概率生成模型，同时解决了图像检索和感兴趣区域(ROI)检测问题。通过引入一个将匹配分类为真或假的潜在变量，我们特别关注几何约束在关键点匹配过程中的应用，并实现对显示同一物体的两幅图像之间几何变换的鲁棒估计。我们在一个具有挑战性的图像检索数据库中的实验表明，我们的方法优于最流行的几何约束匹配方法，并且与其他最先进的方法相比更具优势。此外，所提出的技术同时提供了非常好的感兴趣区域分割。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀