We propose a novel approach for compressed domain copy detection of scalable videos stored in a database. We analyze compressed H.264/SVC streams and form different scalable low-level and mid-level feature vectors that are robust to multiple transformations. The features are based on easily available information like the encoding bit rate over time and the motion vectors found in the stream. The focus of this paper lies on the scalability and robustness of the features. A combination of different descriptors is used to perform copy detection on a database containing scalable, SVC-coded High-Definition (HD) video clips.
{"title":"Compressed Domain Copy Detection of Scalable SVC Videos","authors":"Christian Käs, H. Nicolas","doi":"10.1109/CBMI.2009.26","DOIUrl":"https://doi.org/10.1109/CBMI.2009.26","url":null,"abstract":"We propose a novel approach for compressed domain copy detection of scalable videos stored in a database. We analyze compressed H.264/SVC streams and form different scalable low-level and mid-level feature vectors that are robust to multiple transformations. The features are based on easily available information like the encoding bit rate over time and the motion vectors found in the stream. The focus of this paper lies on the scalability and robustness of the features. A combination of different descriptors is used to perform copy detection on a database containing scalable, SVC-coded High-Definition (HD) video clips.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124641440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Dupont, Thomas Dubuisson, J. Urbain, R. Sebbe, N. D'Alessandro, Christian Frisson
This paper presents AudioCycle, a prototype application for browsing through music loop libraries. AudioCycle provides the user with a graphical view where the audio extracts are visualized and organized according to their similarity in terms of musical properties, such as timbre, harmony, and rhythm. The user is able to navigate in this visual representation, and listen to individual audio extracts, searching for those of interest. AudioCycle draws from a range of technologies, including audio analysis from music information retrieval research, 3D visualization, spatial auditory rendering, audio time-scaling and pitch modification. The proposed approach extends on previously described music and audio browsers. Concepts developed here will be of interest to DJs, remixers, musicians, soundtrack composers, but also sound designers and foley artists. Possible extension to multimedia libraries are also suggested.
{"title":"AudioCycle: Browsing Musical Loop Libraries","authors":"S. Dupont, Thomas Dubuisson, J. Urbain, R. Sebbe, N. D'Alessandro, Christian Frisson","doi":"10.1109/CBMI.2009.19","DOIUrl":"https://doi.org/10.1109/CBMI.2009.19","url":null,"abstract":"This paper presents AudioCycle, a prototype application for browsing through music loop libraries. AudioCycle provides the user with a graphical view where the audio extracts are visualized and organized according to their similarity in terms of musical properties, such as timbre, harmony, and rhythm. The user is able to navigate in this visual representation, and listen to individual audio extracts, searching for those of interest. AudioCycle draws from a range of technologies, including audio analysis from music information retrieval research, 3D visualization, spatial auditory rendering, audio time-scaling and pitch modification. The proposed approach extends on previously described music and audio browsers. Concepts developed here will be of interest to DJs, remixers, musicians, soundtrack composers, but also sound designers and foley artists. Possible extension to multimedia libraries are also suggested.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125676780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper focuses on the robust indexing of sperm whale hydrophone recordings based on a set of features extracted from a real-time passive underwater acoustic tracking algorithm for multiple emitting whales. In past years, interest in marine mammals has increased leading to the development of robust and real-time systems. Acoustic localization permits to study whales' behavior in deep water (several hundreds of meters) without interfering with the environment. In this paper, we recall and use a real-time multiple tracking algorithm recently developed, which provides a localization of one or several sperm whales. Given the position coordinates, we are able to analyse different features such as speed, energy of the clicks, Inter-Click-Interval (ICI).... These features allow us to construct different markers which lead to the indexing and structuring the audio files. Thus, the behavior study is facilitated choosing and accessing the corresponding index in the audio file. The complete indexing algorithm is processed on real data from the NUWC (Naval Undersea Warfare Center of the US Navy) and the AUTEC (Atlantic Undersea Test & Evaluation Center - Bahamas). Our model is validated by similar results from the US Navy (NUWC) and SOEST (School of Ocean and Earth Science and Technology) Hawaii university labs in a single whale case. Finally, as an illustration, we index a single whale sound file thanks to the extracted whale's features provided by the tracking, and we present an example of an XML script structuring it.
{"title":"Sperm Whales Records Indexation Using Passive Acoustics Localization","authors":"F. Bénard, H. Glotin","doi":"10.1109/CBMI.2009.39","DOIUrl":"https://doi.org/10.1109/CBMI.2009.39","url":null,"abstract":"This paper focuses on the robust indexing of sperm whale hydrophone recordings based on a set of features extracted from a real-time passive underwater acoustic tracking algorithm for multiple emitting whales. In past years, interest in marine mammals has increased leading to the development of robust and real-time systems. Acoustic localization permits to study whales' behavior in deep water (several hundreds of meters) without interfering with the environment. In this paper, we recall and use a real-time multiple tracking algorithm recently developed, which provides a localization of one or several sperm whales. Given the position coordinates, we are able to analyse different features such as speed, energy of the clicks, Inter-Click-Interval (ICI).... These features allow us to construct different markers which lead to the indexing and structuring the audio files. Thus, the behavior study is facilitated choosing and accessing the corresponding index in the audio file. The complete indexing algorithm is processed on real data from the NUWC (Naval Undersea Warfare Center of the US Navy) and the AUTEC (Atlantic Undersea Test & Evaluation Center - Bahamas). Our model is validated by similar results from the US Navy (NUWC) and SOEST (School of Ocean and Earth Science and Technology) Hawaii university labs in a single whale case. Finally, as an illustration, we index a single whale sound file thanks to the extracted whale's features provided by the tracking, and we present an example of an XML script structuring it.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121525264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Our contribution takes place in the context of music indexation. In many applications, such as multipitch estimation, it can be useful to know the number of notes played at a time. In this work, we aim at distinguish monophonies (one note at a time) from polyphonies (several notes at a time). We analyze an indicator which gives the confidence on the estimated pitch. In the case of a monophony, the pitch is relatively easy to determine, this indicator is low. In the case of a polyphony, the pitch is much more difficult to determine, so the indicator is higher and varies more. Considering these two facts, we compute the short term mean and variance of the indicator, and model the bivariate repartition of these two parameters with Weibull bivariate distributions for each class (monophony and polyphony). The classification is made by computing the likelihood over one second for each class and taking the best one.Models are learned with 25 seconds of each kind of signal. Our best results give a global error rate of 6.3 %, performed on a balanced corpus containing approximately 18 minutes of signal.
{"title":"Monophony vs Polyphony: A New Method Based on Weibull Bivariate Models","authors":"H. Lachambre, R. André-Obrecht, J. Pinquier","doi":"10.1109/CBMI.2009.24","DOIUrl":"https://doi.org/10.1109/CBMI.2009.24","url":null,"abstract":"Our contribution takes place in the context of music indexation. In many applications, such as multipitch estimation, it can be useful to know the number of notes played at a time. In this work, we aim at distinguish monophonies (one note at a time) from polyphonies (several notes at a time). We analyze an indicator which gives the confidence on the estimated pitch. In the case of a monophony, the pitch is relatively easy to determine, this indicator is low. In the case of a polyphony, the pitch is much more difficult to determine, so the indicator is higher and varies more. Considering these two facts, we compute the short term mean and variance of the indicator, and model the bivariate repartition of these two parameters with Weibull bivariate distributions for each class (monophony and polyphony). The classification is made by computing the likelihood over one second for each class and taking the best one.Models are learned with 25 seconds of each kind of signal. Our best results give a global error rate of 6.3 %, performed on a balanced corpus containing approximately 18 minutes of signal.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116822667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summarising or generating trailers for films or movies involves finding the highlights within those films, those segments where we become most afraid, happy, sad, annoyed, excited, etc. In this paper we explore three questions related to automatic detection of film highlights by measuring the physiological responses of viewers of those films. Firstly, whether emotional highlights can be detected through viewer biometrics, secondly whether individuals watching a film in a group experience similar emotional reactions as others in the group and thirdly whether the presence of music in a film correlates with the occurrence of emotional highlights. We analyse the results of an experiment known as the CDVPlex, where we monitored and recorded physiological reactions from people as they viewed films in a controlled cinema-like environment. A selection of films were manually annotated for the locations of their emotive contents. We then studied the physiological peaks identified among participants while viewing the same film and how these correlated with emotion tags and with music. We conclude that these are highly correlated and that music-rich segments of a film do act as a catalyst in stimulating viewer response, though we don't know what exact emotions the viewers were experiencing. The results of this work could impact the way in which we index movie content on PVRs for example, paying special significance to movie segments which are most likely to be highlights.
{"title":"Biometric Responses to Music-Rich Segments in Films: The CDVPlex","authors":"A. Smeaton, S. Rothwell","doi":"10.1109/CBMI.2009.21","DOIUrl":"https://doi.org/10.1109/CBMI.2009.21","url":null,"abstract":"Summarising or generating trailers for films or movies involves finding the highlights within those films, those segments where we become most afraid, happy, sad, annoyed, excited, etc. In this paper we explore three questions related to automatic detection of film highlights by measuring the physiological responses of viewers of those films. Firstly, whether emotional highlights can be detected through viewer biometrics, secondly whether individuals watching a film in a group experience similar emotional reactions as others in the group and thirdly whether the presence of music in a film correlates with the occurrence of emotional highlights. We analyse the results of an experiment known as the CDVPlex, where we monitored and recorded physiological reactions from people as they viewed films in a controlled cinema-like environment. A selection of films were manually annotated for the locations of their emotive contents. We then studied the physiological peaks identified among participants while viewing the same film and how these correlated with emotion tags and with music. We conclude that these are highly correlated and that music-rich segments of a film do act as a catalyst in stimulating viewer response, though we don't know what exact emotions the viewers were experiencing. The results of this work could impact the way in which we index movie content on PVRs for example, paying special significance to movie segments which are most likely to be highlights.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117177761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In open set speaker identification it is important to build an alternative model against which to compare scores from the 'target' speaker model. Two alternative strategies for building an alternative model are to build a single global model by sampling from a pool of training data, the Universal Background (UBM), or to build a cohort of models from selected individuals in the training data for the target speaker. The main contribution in this paper is to show that these approaches can be unified by using a Support Vector Machine (SVM) to learn a decision rule in the score space made up of the output scores of the client, cohort and UBM model.
{"title":"Combining Cohort and UBM Models in Open Set Speaker Identification","authors":"Anthony Brew, P. Cunningham","doi":"10.1109/CBMI.2009.30","DOIUrl":"https://doi.org/10.1109/CBMI.2009.30","url":null,"abstract":"In open set speaker identification it is important to build an alternative model against which to compare scores from the 'target' speaker model. Two alternative strategies for building an alternative model are to build a single global model by sampling from a pool of training data, the Universal Background (UBM), or to build a cohort of models from selected individuals in the training data for the target speaker. The main contribution in this paper is to show that these approaches can be unified by using a Support Vector Machine (SVM) to learn a decision rule in the score space made up of the output scores of the client, cohort and UBM model.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127004524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past decade, many research fields have realized the benefits of motion capture data, leading to an exponential growth of the size of motion databases. Consequently indexing, querying, and retrieving motion capture data have become important considerations in the usability of such databases. Our aim is to efficiently retrieve motion from such databases in order to produce real-time animation. For that purpose, we propose a new database architecture which structures both the semantic and raw data contained in motion data. The performance of the overall architecture is evaluated by measuring the efficiency of the motion retrieval process, in terms of the mean time access to the data.
{"title":"A Database Architecture for Real-Time Motion Retrieval","authors":"Charly Awad, N. Courty, S. Gibet","doi":"10.1109/CBMI.2009.20","DOIUrl":"https://doi.org/10.1109/CBMI.2009.20","url":null,"abstract":"Over the past decade, many research fields have realized the benefits of motion capture data, leading to an exponential growth of the size of motion databases. Consequently indexing, querying, and retrieving motion capture data have become important considerations in the usability of such databases. Our aim is to efficiently retrieve motion from such databases in order to produce real-time animation. For that purpose, we propose a new database architecture which structures both the semantic and raw data contained in motion data. The performance of the overall architecture is evaluated by measuring the efficiency of the motion retrieval process, in terms of the mean time access to the data.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131681304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Laurier, O. Meyers, J. Serrà, Martin Blech, P. Herrera
A robust and efficient technique for automatic music mood annotation is presented. A song's mood is expressed by a supervised machine learning approach based on musical features extracted from the raw audio signal. A ground truth, used for training, is created using both social network information systems and individual experts. Tests of 7 different classification configurations have been performed, showing that Support Vector Machines perform best for the task at hand. Moreover, we evaluate the algorithm robustness to different audio compression schemes. This fact, often neglected, is fundamental to build a system that is usable in real conditions. In addition, the integration of a fast and scalable version of this technique with the European Project PHAROS is discussed.
{"title":"Music Mood Annotator Design and Integration","authors":"C. Laurier, O. Meyers, J. Serrà, Martin Blech, P. Herrera","doi":"10.1109/CBMI.2009.45","DOIUrl":"https://doi.org/10.1109/CBMI.2009.45","url":null,"abstract":"A robust and efficient technique for automatic music mood annotation is presented. A song's mood is expressed by a supervised machine learning approach based on musical features extracted from the raw audio signal. A ground truth, used for training, is created using both social network information systems and individual experts. Tests of 7 different classification configurations have been performed, showing that Support Vector Machines perform best for the task at hand. Moreover, we evaluate the algorithm robustness to different audio compression schemes. This fact, often neglected, is fundamental to build a system that is usable in real conditions. In addition, the integration of a fast and scalable version of this technique with the European Project PHAROS is discussed.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133553461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emmanuel Bruno, Nicolas Faessel, J. Maitre, M. Scholl
BlockWeb is a model that we have developed for indexing and querying web pages according to their content as well as to their visual rendering. These pages are split up into blocks what has several advantages in terms of page indexing and querying: (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to the content of neighbor blocks. In this paper, we present the BlockWeb model and show its interest for indexing images of Web pages, through an experiment performed on electronic versions of French daily newspapers. We also present the engine we have implemented for block extraction, indexing and querying according to the BlockWeb model.
{"title":"BlockWeb: An IR Model for Block Structured Web Pages","authors":"Emmanuel Bruno, Nicolas Faessel, J. Maitre, M. Scholl","doi":"10.1109/CBMI.2009.36","DOIUrl":"https://doi.org/10.1109/CBMI.2009.36","url":null,"abstract":"BlockWeb is a model that we have developed for indexing and querying web pages according to their content as well as to their visual rendering. These pages are split up into blocks what has several advantages in terms of page indexing and querying: (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to the content of neighbor blocks. In this paper, we present the BlockWeb model and show its interest for indexing images of Web pages, through an experiment performed on electronic versions of French daily newspapers. We also present the engine we have implemented for block extraction, indexing and querying according to the BlockWeb model.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133862357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper focuses on the role of structures in named entity retrieval inside audio transcription. We consider the transcription documents structures that guide the parsing process, and from which we deduce an optimal hierarchical structure of the space of concepts. Therefore, a concept (named entity) is represented by a node or any sub-path in this hierarchy. We show the interest of such structure in the recognition of the named entities using the Conditional Random Fields (CRFs). The comparison of our approach to the Hidden Markov Model (HMM) method shows an important improvement of recognition using Combining CRFs. We also show the impact of time axis in the prediction process.
{"title":"Structured Named Entity Retrieval in Audio Broadcast News","authors":"Azeddine Zidouni, M. Quafafou, H. Glotin","doi":"10.1109/CBMI.2009.41","DOIUrl":"https://doi.org/10.1109/CBMI.2009.41","url":null,"abstract":"This paper focuses on the role of structures in named entity retrieval inside audio transcription. We consider the transcription documents structures that guide the parsing process, and from which we deduce an optimal hierarchical structure of the space of concepts. Therefore, a concept (named entity) is represented by a node or any sub-path in this hierarchy. We show the interest of such structure in the recognition of the named entities using the Conditional Random Fields (CRFs). The comparison of our approach to the Hidden Markov Model (HMM) method shows an important improvement of recognition using Combining CRFs. We also show the impact of time axis in the prediction process.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125902443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}