首页 > 最新文献

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing最新文献

英文 中文
The effect of different video summarization models on the quality of video recommendation based on low-level visual features 基于底层视觉特征的不同视频摘要模型对视频推荐质量的影响
Yashar Deldjoo, P. Cremonesi, M. Schedl, Massimo Quadrana
Video summarization is a powerful tool for video understanding and browsing and is considered as an enabler for many video analysis tasks. While the effect of video summarization models has been largely studied in video retrieval and indexing applications over the last decade, its impact has not been well investigated in content-based video recommendation systems (RSs) based on low-level visual features, where the goal is to recommend items/videos to users based on visual content of videos. This work reveals specific problems related to video summarization and their impact on video recommendation. We present preliminary results of an analysis involving applying different video summarization models for the problem of video recommendation on a real-world RS dataset (MovieLens-10M) and show how temporal feature aggregation and video segmentation granularity can significantly influence/improve the quality of recommendation.
视频摘要是视频理解和浏览的强大工具,被认为是许多视频分析任务的推动者。在过去的十年中,视频摘要模型在视频检索和索引应用中的作用已经得到了大量的研究,但其在基于低级视觉特征的基于内容的视频推荐系统(RSs)中的影响还没有得到很好的研究,RSs的目标是根据视频的视觉内容向用户推荐项目/视频。这项工作揭示了与视频摘要相关的具体问题及其对视频推荐的影响。我们提出了一项分析的初步结果,该分析涉及在现实世界的RS数据集(MovieLens-10M)上应用不同的视频摘要模型来解决视频推荐问题,并展示了时间特征聚合和视频分割粒度如何显著影响/提高推荐质量。
{"title":"The effect of different video summarization models on the quality of video recommendation based on low-level visual features","authors":"Yashar Deldjoo, P. Cremonesi, M. Schedl, Massimo Quadrana","doi":"10.1145/3095713.3095734","DOIUrl":"https://doi.org/10.1145/3095713.3095734","url":null,"abstract":"Video summarization is a powerful tool for video understanding and browsing and is considered as an enabler for many video analysis tasks. While the effect of video summarization models has been largely studied in video retrieval and indexing applications over the last decade, its impact has not been well investigated in content-based video recommendation systems (RSs) based on low-level visual features, where the goal is to recommend items/videos to users based on visual content of videos. This work reveals specific problems related to video summarization and their impact on video recommendation. We present preliminary results of an analysis involving applying different video summarization models for the problem of video recommendation on a real-world RS dataset (MovieLens-10M) and show how temporal feature aggregation and video segmentation granularity can significantly influence/improve the quality of recommendation.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125929439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Tag Propagation Approaches within Speaking Face Graphs for Multimodal Person Discovery 面向多模态人物发现的说话人脸图中的标签传播方法
G. B. Fonseca, I. Freire, Zenilton K. G. Patrocínio, S. Guimarães, G. Sargent, R. Sicre, G. Gravier
The indexing of broadcast TV archives is a current problem in multimedia research. As the size of these databases grows continuously, meaningful features are needed to describe and connect their elements efficiently, such as the identification of speaking faces. In this context, this paper focuses on two approaches for unsupervised person discovery. Initial tagging of speaking faces is provided by an OCR-based method, and these tags propagate through a graph model based on audiovisual relations between speaking faces. Two propagation methods are proposed, one based on random walks and the other based on a hierarchical approach. To better evaluate their performances, these methods were compared with two graph clustering baselines. We also study the impact of different modality fusions on the graph-based tag propagation scenario. From a quantitative analysis, we observed that the graph propagation techniques always outperform the baselines. Among all compared strategies, the methods based on hierarchical propagation with late fusion and random walk with score-fusion obtained the highest MAP values. Finally, even though these two methods produce highly equivalent results according to Kappa coefficient, the random walk method performs better according to a paired t-test, and the computing time for the hierarchical propagation is more than 4 times lower than the one for the random walk propagation.
广播电视档案的标引是多媒体研究中的一个热点问题。随着数据库规模的不断增长,需要有意义的特征来有效地描述和连接数据库中的元素,如说话面孔的识别。在此背景下,本文重点讨论了两种无监督人员发现的方法。通过基于ocr的方法对说话面孔进行初始标记,这些标记通过基于说话面孔之间视听关系的图形模型进行传播。提出了两种传播方法,一种是基于随机行走的方法,另一种是基于分层方法的方法。为了更好地评估它们的性能,将这些方法与两个图聚类基线进行了比较。我们还研究了不同的模态融合对基于图的标签传播场景的影响。从定量分析中,我们观察到图传播技术总是优于基线。在所有比较策略中,基于后期融合的分层传播和基于分数融合的随机行走策略获得了最高的MAP值。最后,尽管这两种方法根据Kappa系数产生了高度等效的结果,但根据配对t检验,随机漫步方法表现更好,分层传播的计算时间比随机漫步传播的计算时间低4倍以上。
{"title":"Tag Propagation Approaches within Speaking Face Graphs for Multimodal Person Discovery","authors":"G. B. Fonseca, I. Freire, Zenilton K. G. Patrocínio, S. Guimarães, G. Sargent, R. Sicre, G. Gravier","doi":"10.1145/3095713.3095729","DOIUrl":"https://doi.org/10.1145/3095713.3095729","url":null,"abstract":"The indexing of broadcast TV archives is a current problem in multimedia research. As the size of these databases grows continuously, meaningful features are needed to describe and connect their elements efficiently, such as the identification of speaking faces. In this context, this paper focuses on two approaches for unsupervised person discovery. Initial tagging of speaking faces is provided by an OCR-based method, and these tags propagate through a graph model based on audiovisual relations between speaking faces. Two propagation methods are proposed, one based on random walks and the other based on a hierarchical approach. To better evaluate their performances, these methods were compared with two graph clustering baselines. We also study the impact of different modality fusions on the graph-based tag propagation scenario. From a quantitative analysis, we observed that the graph propagation techniques always outperform the baselines. Among all compared strategies, the methods based on hierarchical propagation with late fusion and random walk with score-fusion obtained the highest MAP values. Finally, even though these two methods produce highly equivalent results according to Kappa coefficient, the random walk method performs better according to a paired t-test, and the computing time for the hierarchical propagation is more than 4 times lower than the one for the random walk propagation.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114457048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Semi-automatic Video Assessment System 半自动视频评估系统
Pedro Martins, N. Correia
This paper describes a system for semi-automatic quality assessment of user generated content (UGC) from large events. It uses image and video processing techniques1 combined with a computational quality model that takes in account aesthetics and how human visual perception and attention mechanisms discriminate visual interest. We describe the approach and show that the developed system allows to sort and filter a large stream of UGC in an efficient and timely manner.
本文描述了一个大型活动用户生成内容(UGC)的半自动质量评估系统。它将图像和视频处理技术与计算质量模型相结合,该模型考虑了美学以及人类视觉感知和注意机制如何区分视觉兴趣。我们描述了这种方法,并表明开发的系统允许以有效和及时的方式对大量的UGC进行分类和过滤。
{"title":"Semi-automatic Video Assessment System","authors":"Pedro Martins, N. Correia","doi":"10.1145/3095713.3095748","DOIUrl":"https://doi.org/10.1145/3095713.3095748","url":null,"abstract":"This paper describes a system for semi-automatic quality assessment of user generated content (UGC) from large events. It uses image and video processing techniques1 combined with a computational quality model that takes in account aesthetics and how human visual perception and attention mechanisms discriminate visual interest. We describe the approach and show that the developed system allows to sort and filter a large stream of UGC in an efficient and timely manner.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122206396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Efficient Approximate Medoids of Temporal Sequences 时间序列的有效近似介质
W. Bailer
In order to compactly represent a set of data, its medoid (the element with minimum summed distance to all other elements) is a useful choice. This has applications in clustering, compression and visualisation of data. In multimedia data, the set of data is often sampled as a sequence in time or space, such as a video shot or views of a scene. The exact calculation of the medoid may be costly, especially if the distance function between elements is not trivial. While approximation methods for medoid selection exist, we show in this work that they do not perform well on sequences of images. We thus propose a novel algorithm for efficiently selecting an approximate medoid of a temporal sequence and assess its performance on two large-scale video data sets.
为了紧凑地表示一组数据,它的介质(与所有其他元素的总距离最小的元素)是一个有用的选择。这在聚类、压缩和数据可视化中都有应用。在多媒体数据中,数据集通常作为时间或空间上的序列进行采样,例如视频镜头或场景视图。介质的精确计算可能是昂贵的,特别是如果元素之间的距离函数不是微不足道的。虽然存在介质选择的近似方法,但我们在这项工作中表明,它们在图像序列上表现不佳。因此,我们提出了一种新的算法来有效地选择时间序列的近似介质,并评估其在两个大规模视频数据集上的性能。
{"title":"Efficient Approximate Medoids of Temporal Sequences","authors":"W. Bailer","doi":"10.1145/3095713.3095717","DOIUrl":"https://doi.org/10.1145/3095713.3095717","url":null,"abstract":"In order to compactly represent a set of data, its medoid (the element with minimum summed distance to all other elements) is a useful choice. This has applications in clustering, compression and visualisation of data. In multimedia data, the set of data is often sampled as a sequence in time or space, such as a video shot or views of a scene. The exact calculation of the medoid may be costly, especially if the distance function between elements is not trivial. While approximation methods for medoid selection exist, we show in this work that they do not perform well on sequences of images. We thus propose a novel algorithm for efficiently selecting an approximate medoid of a temporal sequence and assess its performance on two large-scale video data sets.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129522915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection and Classification of Bleeding Region in WCE Images using Color Feature 基于颜色特征的WCE图像出血区域检测与分类
S. Suman, F. Hussin, A. Malik, Konstantin Pogorelov, M. Riegler, Shiaw-Hooi Ho, I. Hilmi, K. Goh
Wireless capsule endoscopy (WCE) is a modern and efficient technology to diagnose complete gastrointestinal tract (GIT) for various abnormalities. Due to long recording time of WCE, it acquires a huge amount of images, which is very tedious for clinical expertise to inspect each and every frame of a complete video footage. In this paper, an automated color feature based technique of bleeding detection is proposed. In case of bleeding, color is a very important feature for an efficient information extraction. Our algorithm is based on statistical color feature analysis and we use support vector machine (SVM) to classify WCE video frames into bleeding and non-bleeding classes with a high processing speed. An experimental evaluation shows that our method has promising bleeding detection performance with sensitivity and specificity higher than existing approaches.
无线胶囊内窥镜(WCE)是一种诊断全胃肠道(GIT)各种异常的现代高效技术。由于WCE的录制时间长,获取的图像量非常大,对于临床专业人员来说,要检查一段完整的视频片段的每一帧都是非常繁琐的。本文提出了一种基于颜色特征的自动出血检测方法。在出血的情况下,颜色是有效提取信息的一个非常重要的特征。该算法基于统计色彩特征分析,采用支持向量机(SVM)将WCE视频帧分为出血类和非出血类,处理速度快。实验结果表明,该方法具有较高的敏感性和特异性,具有较好的出血检测性能。
{"title":"Detection and Classification of Bleeding Region in WCE Images using Color Feature","authors":"S. Suman, F. Hussin, A. Malik, Konstantin Pogorelov, M. Riegler, Shiaw-Hooi Ho, I. Hilmi, K. Goh","doi":"10.1145/3095713.3095731","DOIUrl":"https://doi.org/10.1145/3095713.3095731","url":null,"abstract":"Wireless capsule endoscopy (WCE) is a modern and efficient technology to diagnose complete gastrointestinal tract (GIT) for various abnormalities. Due to long recording time of WCE, it acquires a huge amount of images, which is very tedious for clinical expertise to inspect each and every frame of a complete video footage. In this paper, an automated color feature based technique of bleeding detection is proposed. In case of bleeding, color is a very important feature for an efficient information extraction. Our algorithm is based on statistical color feature analysis and we use support vector machine (SVM) to classify WCE video frames into bleeding and non-bleeding classes with a high processing speed. An experimental evaluation shows that our method has promising bleeding detection performance with sensitivity and specificity higher than existing approaches.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124539572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Detecting adversarial example attacks to deep neural networks 检测对深度神经网络的对抗性示例攻击
F. Carrara, F. Falchi, R. Caldelli, Giuseppe Amato, Roberta Fumarola, Rudy Becarelli
Deep learning has recently become the state of the art in many computer vision applications and in image classification in particular. However, recent works have shown that it is quite easy to create adversarial examples, i.e., images intentionally created or modified to cause the deep neural network to make a mistake. They are like optical illusions for machines containing changes unnoticeable to the human eye. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks.
深度学习最近已经成为许多计算机视觉应用的最新技术,特别是在图像分类方面。然而,最近的研究表明,创建对抗性示例非常容易,即故意创建或修改图像以导致深度神经网络犯错误。它们就像机器的视觉错觉,包含人眼无法察觉的变化。这对机器学习方法构成了严重威胁。在本文中,我们通过分析其隐藏层的激活来研究被愚弄神经网络学习到的表征的鲁棒性。具体来说,我们测试了用于kNN分类的评分方法,以区分正确分类的真实图像和对抗示例。结果表明,隐藏层激活可以用于检测由对抗性攻击引起的错误分类。
{"title":"Detecting adversarial example attacks to deep neural networks","authors":"F. Carrara, F. Falchi, R. Caldelli, Giuseppe Amato, Roberta Fumarola, Rudy Becarelli","doi":"10.1145/3095713.3095753","DOIUrl":"https://doi.org/10.1145/3095713.3095753","url":null,"abstract":"Deep learning has recently become the state of the art in many computer vision applications and in image classification in particular. However, recent works have shown that it is quite easy to create adversarial examples, i.e., images intentionally created or modified to cause the deep neural network to make a mistake. They are like optical illusions for machines containing changes unnoticeable to the human eye. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116289143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Connoisseur: classification of styles of Mexican architectural heritage with deep learning and visual attention prediction 鉴赏家:用深度学习和视觉注意力预测对墨西哥建筑遗产的风格进行分类
A. M. Obeso, M. García-Vázquez, A. A. Ramírez-Acosta, J. Benois-Pineau
The automatic description of multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance for application to this method. Our problem is classification of architectural styles of buildings in digital photographs of Mexican cultural heritage. The selection of relevant content in the scene for training classification models allows them to be more precise in the classification task. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Convolutional Neural Network to identify the architectural style of Mexican buildings. Also, we present an analysis of the behavior of the models trained under the traditional cropped image and the prominence maps. In this sense, we show that the performance of the saliency-based CNNs is better than the traditional training reaching a classification rate of 97% in validation dataset. It is considered that style identification with this technique can make a wide contribution in video description tasks, specifically in the automatic documentation of Mexican cultural heritage.
多媒体内容的自动描述主要是为分类任务、检索系统和海量数据排序而开发的。文化遗产保护是应用该方法的一个非常重要的领域。我们的问题是在墨西哥文化遗产的数码照片中对建筑风格进行分类。在场景中选择相关内容进行训练分类模型,可以使分类模型在分类任务中更加精确。在这里,我们使用显著性驱动的方法来预测图像中的视觉注意力,并使用它来训练卷积神经网络来识别墨西哥建筑的建筑风格。此外,我们还分析了在传统裁剪图像和突出映射下训练的模型的行为。从这个意义上说,我们表明基于显著性的cnn的性能优于传统训练,在验证数据集中达到97%的分类率。人们认为,这种技术的风格识别可以在视频描述任务中做出广泛的贡献,特别是在墨西哥文化遗产的自动记录中。
{"title":"Connoisseur: classification of styles of Mexican architectural heritage with deep learning and visual attention prediction","authors":"A. M. Obeso, M. García-Vázquez, A. A. Ramírez-Acosta, J. Benois-Pineau","doi":"10.1145/3095713.3095730","DOIUrl":"https://doi.org/10.1145/3095713.3095730","url":null,"abstract":"The automatic description of multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance for application to this method. Our problem is classification of architectural styles of buildings in digital photographs of Mexican cultural heritage. The selection of relevant content in the scene for training classification models allows them to be more precise in the classification task. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Convolutional Neural Network to identify the architectural style of Mexican buildings. Also, we present an analysis of the behavior of the models trained under the traditional cropped image and the prominence maps. In this sense, we show that the performance of the saliency-based CNNs is better than the traditional training reaching a classification rate of 97% in validation dataset. It is considered that style identification with this technique can make a wide contribution in video description tasks, specifically in the automatic documentation of Mexican cultural heritage.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116367970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Towards large scale multimedia indexing: A case study on person discovery in broadcast news 迈向大规模多媒体索引:以广播新闻中的人物发现为例
N. Le, H. Bredin, G. Sargent, Miquel India, Paula Lopez-Otero, C. Barras, Camille Guinaudeau, G. Gravier, G. B. Fonseca, I. Freire, Zenilton K. G. Patrocínio, S. Guimarães, Gerard Martí, J. Morros, J. Hernando, Laura Docío Fernández, C. García-Mateo, S. Meignier, J. Odobez
The rapid growth of multimedia databases and the human interest in their peers make indices representing the location and identity of people in audio-visual documents essential for searching archives. Person discovery in the absence of prior identity knowledge requires accurate association of audio-visual cues and detected names. To this end, we present 3 different strategies to approach this problem: clustering-based naming, verification-based naming, and graph-based naming. Each of these strategies utilizes different recent advances in unsupervised face / speech representation, verification, and optimization. To have a better understanding of the approaches, this paper also provides a quantitative and qualitative comparative study of these approaches using the associated corpus of the Person Discovery challenge at MediaEval 2016. From the results of our experiments, we can observe the pros and cons of each approach, thus paving the way for future promising research directions.
多媒体数据库的迅速发展以及人们对其同侪的兴趣,使得音像文献中代表人物位置和身份的索引成为检索档案的必要条件。在没有先前身份知识的情况下,发现人物需要将视听线索和检测到的姓名准确地联系起来。为此,我们提出了3种不同的策略来解决这个问题:基于聚类的命名、基于验证的命名和基于图的命名。这些策略中的每一种都利用了无监督面部/语音表示、验证和优化方面的最新进展。为了更好地理解这些方法,本文还使用2016年MediaEval个人发现挑战的相关语料库对这些方法进行了定量和定性的比较研究。从我们的实验结果中,我们可以观察到每种方法的优缺点,从而为未来有前途的研究方向铺平道路。
{"title":"Towards large scale multimedia indexing: A case study on person discovery in broadcast news","authors":"N. Le, H. Bredin, G. Sargent, Miquel India, Paula Lopez-Otero, C. Barras, Camille Guinaudeau, G. Gravier, G. B. Fonseca, I. Freire, Zenilton K. G. Patrocínio, S. Guimarães, Gerard Martí, J. Morros, J. Hernando, Laura Docío Fernández, C. García-Mateo, S. Meignier, J. Odobez","doi":"10.1145/3095713.3095732","DOIUrl":"https://doi.org/10.1145/3095713.3095732","url":null,"abstract":"The rapid growth of multimedia databases and the human interest in their peers make indices representing the location and identity of people in audio-visual documents essential for searching archives. Person discovery in the absence of prior identity knowledge requires accurate association of audio-visual cues and detected names. To this end, we present 3 different strategies to approach this problem: clustering-based naming, verification-based naming, and graph-based naming. Each of these strategies utilizes different recent advances in unsupervised face / speech representation, verification, and optimization. To have a better understanding of the approaches, this paper also provides a quantitative and qualitative comparative study of these approaches using the associated corpus of the Person Discovery challenge at MediaEval 2016. From the results of our experiments, we can observe the pros and cons of each approach, thus paving the way for future promising research directions.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129548910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Music Feature Maps with Convolutional Neural Networks for Music Genre Classification 基于卷积神经网络的音乐类型分类特征映射
Christine Sénac, Thomas Pellegrini, Florian Mouret, J. Pinquier
Nowadays, deep learning is more and more used for Music Genre Classification: particularly Convolutional Neural Networks (CNN) taking as entry a spectrogram considered as an image on which are sought different types of structure. But, facing the criticism relating to the difficulty in understanding the underlying relationships that neural networks learn in presence of a spectrogram, we propose to use, as entries of a CNN, a small set of eight music features chosen along three main music dimensions: dynamics, timbre and tonality. With CNNs trained in such a way that filter dimensions are interpretable in time and frequency, results show that only eight music features are more efficient than 513 frequency bins of a spectrogram and that late score fusion between systems based on both feature types reaches 91% accuracy on the GTZAN database.
如今,深度学习越来越多地用于音乐类型分类,尤其是卷积神经网络(CNN),它将一个频谱图作为图像作为入口,在其上寻找不同类型的结构。但是,面对与理解神经网络在谱图中学习的潜在关系方面的困难有关的批评,我们建议使用,作为CNN的条目,根据三个主要音乐维度选择的八个音乐特征的一小组:动态,音色和调性。在cnn的训练中,滤波器的维度在时间和频率上都是可解释的,结果表明,只有8个音乐特征比513个谱图的频率桶更有效,并且基于两种特征类型的系统之间的后期乐谱融合在GTZAN数据库上达到了91%的准确率。
{"title":"Music Feature Maps with Convolutional Neural Networks for Music Genre Classification","authors":"Christine Sénac, Thomas Pellegrini, Florian Mouret, J. Pinquier","doi":"10.1145/3095713.3095733","DOIUrl":"https://doi.org/10.1145/3095713.3095733","url":null,"abstract":"Nowadays, deep learning is more and more used for Music Genre Classification: particularly Convolutional Neural Networks (CNN) taking as entry a spectrogram considered as an image on which are sought different types of structure. But, facing the criticism relating to the difficulty in understanding the underlying relationships that neural networks learn in presence of a spectrogram, we propose to use, as entries of a CNN, a small set of eight music features chosen along three main music dimensions: dynamics, timbre and tonality. With CNNs trained in such a way that filter dimensions are interpretable in time and frequency, results show that only eight music features are more efficient than 513 frequency bins of a spectrogram and that late score fusion between systems based on both feature types reaches 91% accuracy on the GTZAN database.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130636934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Live Collaborative Social-Media Video Timelines 实时协作社交媒体视频时间表
Rui Queiros, N. Correia, João Magalhães
In this paper, we propose a collaborative system to let users share their own videos and interact among themselves to collaboratively do a video coverage of live events. Our intention is to motivate users to make positive contributions to the comprehensiveness of available videos about that event. To achieve this we propose a collaborative video framework, named LiveTime, allowing users to shared information timelines of real-world events. With this solution we offer collaboration features that go beyond existing systems like Youtube and Vimeo. The paper describes the rational and main concepts, the implementation and the results of a preliminary user study.
在本文中,我们提出了一个协作系统,让用户分享他们自己的视频,并在他们之间进行交互,共同对现场事件进行视频报道。我们的目的是激励用户对有关该事件的可用视频的全面性做出积极贡献。为了实现这一目标,我们提出了一个名为LiveTime的协作视频框架,允许用户共享现实世界事件的信息时间表。通过这个解决方案,我们提供了超越现有系统(如Youtube和Vimeo)的协作功能。本文介绍了该系统的原理和主要概念、实现和初步用户研究的结果。
{"title":"Live Collaborative Social-Media Video Timelines","authors":"Rui Queiros, N. Correia, João Magalhães","doi":"10.1145/3095713.3095750","DOIUrl":"https://doi.org/10.1145/3095713.3095750","url":null,"abstract":"In this paper, we propose a collaborative system to let users share their own videos and interact among themselves to collaboratively do a video coverage of live events. Our intention is to motivate users to make positive contributions to the comprehensiveness of available videos about that event. To achieve this we propose a collaborative video framework, named LiveTime, allowing users to shared information timelines of real-world events. With this solution we offer collaboration features that go beyond existing systems like Youtube and Vimeo. The paper describes the rational and main concepts, the implementation and the results of a preliminary user study.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125381654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1