首页 > 最新文献

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)最新文献

英文 中文
Comparative Study on Dimensionality Reduction in Large-Scale Image Retrieval 大规模图像检索中降维方法的比较研究
Bo Cheng, L. Zhuo, Jing Zhang
Dimensionality reduction plays a significant role for the performance of large-scale image retrieval. In this paper, various dimensionality reduction methods are compared to validate their own performance in image retrieval. For this purpose, first, the Scale Invariant Feature Transform (SIFT) features and HSV (Hue, Saturation, Value) histogram are extracted as image features. Second, the Principal Component Analysis (PCA), Fisher Linear Discriminant Analysis (FLDA), Local Fisher Discriminant Analysis (LFDA), Isometric Mapping (ISOMAP), Locally Linear Embedding (LLE), and Locality Preserving Projections (LPP) are respectively applied to reduce the dimensions of SIFT feature descriptors and color information, which can be used to generate vocabulary trees. Finally, through setting the match weights of vocabulary trees, large-scale image retrieval scheme is implemented. By comparing multiple sets of experimental data from several platforms, it can be concluded that dimensionality reduction method of LLE and LPP can effectively reduce the computational cost of image features, and maintain the high retrieval performance as well.
降维对大规模图像检索的性能起着至关重要的作用。本文比较了各种降维方法,验证了它们在图像检索中的性能。为此,首先提取尺度不变特征变换(SIFT)特征和HSV (Hue, Saturation, Value)直方图作为图像特征;其次,分别采用主成分分析(PCA)、Fisher线性判别分析(FLDA)、局部Fisher判别分析(LFDA)、等长映射(ISOMAP)、局部线性嵌入(LLE)和局部保持投影(LPP)对SIFT特征描述符和颜色信息进行降维,生成词汇树;最后,通过设置词汇树的匹配权值,实现大规模图像检索方案。通过比较多个平台的多组实验数据,可以得出LLE和LPP降维方法可以有效降低图像特征的计算成本,并保持较高的检索性能。
{"title":"Comparative Study on Dimensionality Reduction in Large-Scale Image Retrieval","authors":"Bo Cheng, L. Zhuo, Jing Zhang","doi":"10.1109/ISM.2013.86","DOIUrl":"https://doi.org/10.1109/ISM.2013.86","url":null,"abstract":"Dimensionality reduction plays a significant role for the performance of large-scale image retrieval. In this paper, various dimensionality reduction methods are compared to validate their own performance in image retrieval. For this purpose, first, the Scale Invariant Feature Transform (SIFT) features and HSV (Hue, Saturation, Value) histogram are extracted as image features. Second, the Principal Component Analysis (PCA), Fisher Linear Discriminant Analysis (FLDA), Local Fisher Discriminant Analysis (LFDA), Isometric Mapping (ISOMAP), Locally Linear Embedding (LLE), and Locality Preserving Projections (LPP) are respectively applied to reduce the dimensions of SIFT feature descriptors and color information, which can be used to generate vocabulary trees. Finally, through setting the match weights of vocabulary trees, large-scale image retrieval scheme is implemented. By comparing multiple sets of experimental data from several platforms, it can be concluded that dimensionality reduction method of LLE and LPP can effectively reduce the computational cost of image features, and maintain the high retrieval performance as well.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73643137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Novel Method for Identifying Exact Sensor Using Multiplicative Noise Component 利用乘性噪声分量识别精确传感器的新方法
B. Mahdian, S. Saic
In this paper, we analyze and analytically describe the multiplicative deterministic noise component of imaging sensors in a novel way. Specifically, we show how to use the multiplicative nature of this component to derive a method enabling its estimation. Since this noise component is unique per sensor, consequently, the derived method is applied on digital image ballistics tasks in order to pinpoint the exact device that created a specific digital photo. Moreover, we enhance the method to be resistent to optical zoom and JPEG compression.
本文以一种新颖的方法对成像传感器的乘性确定性噪声分量进行分析和解析描述。具体地说,我们将展示如何使用该组件的乘法性质来推导一种能够对其进行估计的方法。由于每个传感器的噪声成分是唯一的,因此,导出的方法应用于数字图像弹道学任务,以查明创建特定数字照片的确切设备。此外,我们还对该方法进行了改进,使其能够抵抗光学变焦和JPEG压缩。
{"title":"A Novel Method for Identifying Exact Sensor Using Multiplicative Noise Component","authors":"B. Mahdian, S. Saic","doi":"10.1109/ISM.2013.46","DOIUrl":"https://doi.org/10.1109/ISM.2013.46","url":null,"abstract":"In this paper, we analyze and analytically describe the multiplicative deterministic noise component of imaging sensors in a novel way. Specifically, we show how to use the multiplicative nature of this component to derive a method enabling its estimation. Since this noise component is unique per sensor, consequently, the derived method is applied on digital image ballistics tasks in order to pinpoint the exact device that created a specific digital photo. Moreover, we enhance the method to be resistent to optical zoom and JPEG compression.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74374790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network Coding for Streaming Video over P2P Networks P2P网络上视频流的网络编码
F. A. López-Fuentes, C. Cabrera-Medina
In this contribution we simulate network coding and evaluate its benefits for streaming video over P2P networks. Network coding has emerged as a promise technique in the information theory field. This novel technique has shown several benefits in the communication networks related to throughput, security and resources optimization. In this work, we implement network coding for a multi-source P2P scenario. The video is encoded in the sources, while the intermediate nodes implement network coding before forwarding the encoded packets to the end nodes. The received packets are decoded in each receiving peer in order to recovery the original video. Our scheme is implemented under the H.264/MPEG-4 AVC compression standard and using the network simulator (NS-2). We evaluate our scheme in terms of overall throughput, packet loss and video quality. Results show that these parameters can be improved in P2P video streaming systems by using network coding.
在这篇文章中,我们模拟了网络编码,并评估了其在P2P网络上流媒体视频的好处。网络编码是信息论领域的一种新兴技术。这种新技术在通信网络吞吐量、安全性和资源优化方面显示出若干优势。在这项工作中,我们实现了一个多源P2P场景的网络编码。视频在源端进行编码,中间节点进行网络编码,然后将编码后的数据包转发到终端节点。接收到的数据包在每个接收端进行解码,以便恢复原始视频。我们的方案是在H.264/MPEG-4 AVC压缩标准下,使用网络模拟器(NS-2)实现的。我们从总体吞吐量、丢包率和视频质量等方面对我们的方案进行了评估。结果表明,在P2P视频流系统中,采用网络编码可以改善这些参数。
{"title":"Network Coding for Streaming Video over P2P Networks","authors":"F. A. López-Fuentes, C. Cabrera-Medina","doi":"10.1109/ISM.2013.63","DOIUrl":"https://doi.org/10.1109/ISM.2013.63","url":null,"abstract":"In this contribution we simulate network coding and evaluate its benefits for streaming video over P2P networks. Network coding has emerged as a promise technique in the information theory field. This novel technique has shown several benefits in the communication networks related to throughput, security and resources optimization. In this work, we implement network coding for a multi-source P2P scenario. The video is encoded in the sources, while the intermediate nodes implement network coding before forwarding the encoded packets to the end nodes. The received packets are decoded in each receiving peer in order to recovery the original video. Our scheme is implemented under the H.264/MPEG-4 AVC compression standard and using the network simulator (NS-2). We evaluate our scheme in terms of overall throughput, packet loss and video quality. Results show that these parameters can be improved in P2P video streaming systems by using network coding.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75208079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Secure Steganography Technique Based on Bitplane Indexes 基于位面索引的安全隐写技术
A. A. Abdulla, S. Jassim, H. Sellahewa
This paper is concerned with secret hiding in multiple image bitplanes for increased security without undermining capacity. A secure steganographic algorithm based on bitplanes index manipulation is proposed. The index manipulation is confined to the first two Least Significant Bits of the cover image. The proposed algorithm has the property of un-detect ability with respect to stego quality and payload capacity. Experimental results demonstrate that the proposed technique is secure against statistical attacks such as pair of value (PoV), Weighted Stego steganalyser (WS), and Multi Bitplane Weighted Stego steganalyser (MLSB-WS).
本文研究了在不损害容量的情况下,在多个图像位平面中隐藏秘密以提高安全性的方法。提出了一种基于位面索引操作的安全隐写算法。索引操作仅限于封面图像的前两个最低有效位。该算法在隐写质量和有效载荷能力方面具有不可检测性。实验结果表明,该方法能够有效抵御对值(PoV)、加权隐写分析器(WS)和多位面加权隐写分析器(MLSB-WS)等统计攻击。
{"title":"Secure Steganography Technique Based on Bitplane Indexes","authors":"A. A. Abdulla, S. Jassim, H. Sellahewa","doi":"10.1109/ISM.2013.55","DOIUrl":"https://doi.org/10.1109/ISM.2013.55","url":null,"abstract":"This paper is concerned with secret hiding in multiple image bitplanes for increased security without undermining capacity. A secure steganographic algorithm based on bitplanes index manipulation is proposed. The index manipulation is confined to the first two Least Significant Bits of the cover image. The proposed algorithm has the property of un-detect ability with respect to stego quality and payload capacity. Experimental results demonstrate that the proposed technique is secure against statistical attacks such as pair of value (PoV), Weighted Stego steganalyser (WS), and Multi Bitplane Weighted Stego steganalyser (MLSB-WS).","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80287198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Impression Estimation of Video and Application to Video Creation 视频印象估计及其在视频创作中的应用
Kiyoshi Tokunaga, Takahiro Hayashi
Adding BGM (background music) to a video is an important process in video creation because BGM determines the impression of the video. We model impression estimation of a video as mappping from computer-mesurable audio and visual features to impression degrees. As an application of impression estimation of a video, we propose OtoPittan, a system for recommending BGM for helping users to make impressive videos. OtoPittan regards the problem of selecting BGM from a music collection as a partial inverse problem of the impression estimation. That is, to an inputted video and desired impression, BGM which produces a good match to the desired impression when adding it to the inputted video is recommended. As implementation ways of impression estimation of a video, we use a static user model and a dynamic user model. The first model statically constructs a mapping function learnt from training data. The second model dynamically optimizes a mapping function through user interaction. Experimental results have shown that the static user model has high estimation accuracy and the dynamic user model can efficiently performs optimization without much user interaction.
在视频中加入背景音乐是视频创作的一个重要环节,因为背景音乐决定了视频给人的印象。我们将视频的印象估计建模为从计算机可测量的音频和视觉特征到印象度的映射。作为视频印象估计的一个应用,我们提出了一个推荐BGM的系统OtoPittan,以帮助用户制作印象深刻的视频。OtoPittan将从音乐集合中选择BGM的问题看作是印象估计的偏逆问题。即,对于输入的视频和期望的印象,推荐添加到输入的视频时,与期望的印象产生良好匹配的BGM。作为视频印象估计的实现方法,我们使用了静态用户模型和动态用户模型。第一个模型静态地构造了一个从训练数据中学习到的映射函数。第二个模型通过用户交互动态优化映射功能。实验结果表明,静态用户模型具有较高的估计精度,动态用户模型可以在不需要大量用户交互的情况下有效地进行优化。
{"title":"Impression Estimation of Video and Application to Video Creation","authors":"Kiyoshi Tokunaga, Takahiro Hayashi","doi":"10.1109/ISM.2013.25","DOIUrl":"https://doi.org/10.1109/ISM.2013.25","url":null,"abstract":"Adding BGM (background music) to a video is an important process in video creation because BGM determines the impression of the video. We model impression estimation of a video as mappping from computer-mesurable audio and visual features to impression degrees. As an application of impression estimation of a video, we propose OtoPittan, a system for recommending BGM for helping users to make impressive videos. OtoPittan regards the problem of selecting BGM from a music collection as a partial inverse problem of the impression estimation. That is, to an inputted video and desired impression, BGM which produces a good match to the desired impression when adding it to the inputted video is recommended. As implementation ways of impression estimation of a video, we use a static user model and a dynamic user model. The first model statically constructs a mapping function learnt from training data. The second model dynamically optimizes a mapping function through user interaction. Experimental results have shown that the static user model has high estimation accuracy and the dynamic user model can efficiently performs optimization without much user interaction.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86001954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cellular GPU Model for Structured Mesh Generation and Its Application to the Stereo-Matching Disparity Map 结构化网格生成的元胞GPU模型及其在立体匹配视差图中的应用
N. Zhang, Hongjian Wang, Jean-Charles Créput, Julien Moreau, Y. Ruichek
This paper presents a cellular GPU model for structured mesh generation according to an input stereo-matching disparity map. Here, the disparity map stands for a density distribution that reflects the proximity of objects to the camera in 3D space. The meshing process consists in covering such data density distribution with a topological structured hexagonal grid that adapts itself and deforms according to the density values. The goal is to generate a compressed mesh where the nearest objects are provided with more details than objects which are far from the camera. The solution we propose is based on the Kohonen's Self-Organizing Map learning algorithm for the benefit of its ability to generate a topological map according to a probability distribution and its ability to be a natural massive parallel algorithm. We propose a GPU parallel model and its implantation of the SOM standard algorithm, and present experiments on a set of standard stereo-matching disparity map benchmarks.
提出了一种基于输入立体匹配视差图的结构化网格生成的元胞GPU模型。这里,视差图代表了一个密度分布,它反映了3D空间中物体与相机的接近程度。网格划分过程包括用拓扑结构的六边形网格覆盖这样的数据密度分布,该网格根据密度值自适应并变形。目标是生成一个压缩网格,其中最近的物体比远离相机的物体提供更多的细节。我们提出的解决方案是基于Kohonen的自组织地图学习算法,因为它能够根据概率分布生成拓扑地图,并且它能够成为一个自然的大规模并行算法。我们提出了一种GPU并行模型及其SOM标准算法的植入,并在一组标准的立体匹配视差图基准上进行了实验。
{"title":"Cellular GPU Model for Structured Mesh Generation and Its Application to the Stereo-Matching Disparity Map","authors":"N. Zhang, Hongjian Wang, Jean-Charles Créput, Julien Moreau, Y. Ruichek","doi":"10.1109/ISM.2013.18","DOIUrl":"https://doi.org/10.1109/ISM.2013.18","url":null,"abstract":"This paper presents a cellular GPU model for structured mesh generation according to an input stereo-matching disparity map. Here, the disparity map stands for a density distribution that reflects the proximity of objects to the camera in 3D space. The meshing process consists in covering such data density distribution with a topological structured hexagonal grid that adapts itself and deforms according to the density values. The goal is to generate a compressed mesh where the nearest objects are provided with more details than objects which are far from the camera. The solution we propose is based on the Kohonen's Self-Organizing Map learning algorithm for the benefit of its ability to generate a topological map according to a probability distribution and its ability to be a natural massive parallel algorithm. We propose a GPU parallel model and its implantation of the SOM standard algorithm, and present experiments on a set of standard stereo-matching disparity map benchmarks.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78332862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Towards Portable Multi-camera High Definition Video Capture Using Smartphones 使用智能手机实现便携式多摄像头高清视频采集
Surendar Chandra, Patrick Chiu, Maribeth Back
Real-time tele-immersion requires low latency and synchronized multi-camera capture. Prior high definition (HD) capture systems were bulky. We investigate the suitability of using flocks of smartphone cameras for tele-immersion. Smartphones integrate capture and streaming into a single portable package. However, they archive the captured video into a movie. Hence, we create a sequence of H.264 movies and stream them. Capture delay is reduced by minimizing the number of frames in each movie segment. However, fewer frames reduces compression efficiency. Also, smartphone video encoders do not sacrifice video quality to lower the compression latency or the stream size. On an iPhone 4S, our application that uses published APIs streams 1920×1080 videos at 16.5 fps with a delay of 712 ms between a real-life event and displaying an uncompressed bitmap of this event on a local laptop. Note that the bulky Cisco Tandberg required 300 ms delay. Stereoscopic video from two unsynchronized smartphones also showed minimal visual artifacts in an indoor setting.
实时远程沉浸需要低延迟和同步多相机捕获。先前的高清晰度(HD)捕获系统体积庞大。我们调查了使用一群智能手机相机进行远程沉浸的适用性。智能手机将捕捉和流媒体集成到一个便携包中。然而,他们将捕获的视频存档成电影。因此,我们创建一个H.264电影序列并流式传输它们。通过最小化每个电影片段中的帧数来减少捕获延迟。然而,更少的帧会降低压缩效率。此外,智能手机视频编码器不会牺牲视频质量来降低压缩延迟或流大小。在iPhone 4S上,我们使用已发布api的应用程序以16.5 fps的速度流式传输1920×1080视频,在真实事件和在本地笔记本电脑上显示此事件的未压缩位图之间的延迟为712毫秒。请注意,笨重的Cisco Tandberg需要300毫秒的延迟。两台不同步智能手机的立体视频在室内环境中也显示出最小的视觉伪影。
{"title":"Towards Portable Multi-camera High Definition Video Capture Using Smartphones","authors":"Surendar Chandra, Patrick Chiu, Maribeth Back","doi":"10.1109/ISM.2013.74","DOIUrl":"https://doi.org/10.1109/ISM.2013.74","url":null,"abstract":"Real-time tele-immersion requires low latency and synchronized multi-camera capture. Prior high definition (HD) capture systems were bulky. We investigate the suitability of using flocks of smartphone cameras for tele-immersion. Smartphones integrate capture and streaming into a single portable package. However, they archive the captured video into a movie. Hence, we create a sequence of H.264 movies and stream them. Capture delay is reduced by minimizing the number of frames in each movie segment. However, fewer frames reduces compression efficiency. Also, smartphone video encoders do not sacrifice video quality to lower the compression latency or the stream size. On an iPhone 4S, our application that uses published APIs streams 1920×1080 videos at 16.5 fps with a delay of 712 ms between a real-life event and displaying an uncompressed bitmap of this event on a local laptop. Note that the bulky Cisco Tandberg required 300 ms delay. Stereoscopic video from two unsynchronized smartphones also showed minimal visual artifacts in an indoor setting.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82561564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive Event Recognition in Video 视频中的交互式事件识别
Mennan Güder, N. Cicekli
In this paper, we propose a multi-modal decision-level fusion framework to recognize events in videos. The main parts of the proposed framework are ontology based event definition, structural video decomposition, temporal rule discovery and event classification. Various decision sources such as audio continuity, content similarity, and shot sequence characteristics together with visual video feature sets are combined with event descriptors during decision-level fusion. The method is considered to be interactive because of the user directed ontology connection and temporal rule extraction strategies. It enables users to integrate available ontologies such as Image Net and Word Net while defining new event types. Temporal rules are discovered by association rule mining. In the proposed approach, computationally I/O intensive requirements of the association rule mining is reduced by one-pass frequent item set extractor and the proposed rule definition strategy. Accuracy of the proposed methodology is evaluated by employing TRECVid 2007 high level feature detection data set by comparing the results with C4.5 decision tree, SVM classifiers and Multiple Correspondence Analysis.
本文提出了一种多模态决策级融合框架来识别视频中的事件。该框架的主要部分是基于本体的事件定义、结构化视频分解、时间规则发现和事件分类。在决策级融合过程中,将音频连续性、内容相似性、镜头序列特征等多种决策源以及视频视觉特征集与事件描述符相结合。由于采用了用户导向的本体连接和时态规则抽取策略,该方法被认为是交互式的。它使用户能够在定义新的事件类型时集成可用的本体,如imagenet和wordnet。时间规则是通过关联规则挖掘发现的。该方法采用单遍频繁项集提取器和规则定义策略,减少了关联规则挖掘的I/O密集型计算需求。利用TRECVid 2007高级特征检测数据集,将结果与C4.5决策树、SVM分类器和多重对应分析进行比较,对所提方法的准确性进行了评价。
{"title":"Interactive Event Recognition in Video","authors":"Mennan Güder, N. Cicekli","doi":"10.1109/ISM.2013.24","DOIUrl":"https://doi.org/10.1109/ISM.2013.24","url":null,"abstract":"In this paper, we propose a multi-modal decision-level fusion framework to recognize events in videos. The main parts of the proposed framework are ontology based event definition, structural video decomposition, temporal rule discovery and event classification. Various decision sources such as audio continuity, content similarity, and shot sequence characteristics together with visual video feature sets are combined with event descriptors during decision-level fusion. The method is considered to be interactive because of the user directed ontology connection and temporal rule extraction strategies. It enables users to integrate available ontologies such as Image Net and Word Net while defining new event types. Temporal rules are discovered by association rule mining. In the proposed approach, computationally I/O intensive requirements of the association rule mining is reduced by one-pass frequent item set extractor and the proposed rule definition strategy. Accuracy of the proposed methodology is evaluated by employing TRECVid 2007 high level feature detection data set by comparing the results with C4.5 decision tree, SVM classifiers and Multiple Correspondence Analysis.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75958495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Segmentation of Music Based on Repeated Harmonies 基于重复和声的音乐结构分割
W. B. Haas, A. Volk, F. Wiering
In this paper we present a simple, yet powerful method for deriving the structural segmentation of a musical piece based on repetitions in chord sequences, called FORM. Repetition in harmony is a fundamental factor in constituting musical form. However, repeated pattern discovery in music still remains an open problem, and it has not been addressed before in chord sequences. FORM relies on a suffix tree based algorithm to find repeated patterns in symbolic chord sequences that are either provided by machine transcriptions or musical experts. This novel approach complements other segmentation methods, which generally use a self-distance matrix based on other musical features describing timbre, instrumentation, rhythm, or melody. We evaluate the segmentation quality of FORM on 649 popular songs, and show that FORM outperforms two baseline approaches. With FORM we explore new ways of exploiting musical repetition for structural segmentation, yielding a flexible and practical algorithm, and a better understanding of musical repetition.
在本文中,我们提出了一种简单而强大的方法,用于基于和弦序列中重复的音乐片段的结构分割,称为FORM。和声中的重复是构成音乐形式的一个基本因素。然而,音乐中重复模式的发现仍然是一个开放的问题,在和弦序列中还没有解决。FORM依赖于基于后缀树的算法来查找由机器转录或音乐专家提供的符号和弦序列中的重复模式。这种新颖的方法补充了其他分割方法,这些方法通常使用基于描述音色,乐器,节奏或旋律的其他音乐特征的自距离矩阵。我们评估了FORM对649首流行歌曲的分割质量,并表明FORM优于两种基线方法。通过FORM,我们探索了利用音乐重复进行结构分割的新方法,产生了灵活实用的算法,并更好地理解了音乐重复。
{"title":"Structural Segmentation of Music Based on Repeated Harmonies","authors":"W. B. Haas, A. Volk, F. Wiering","doi":"10.1109/ISM.2013.48","DOIUrl":"https://doi.org/10.1109/ISM.2013.48","url":null,"abstract":"In this paper we present a simple, yet powerful method for deriving the structural segmentation of a musical piece based on repetitions in chord sequences, called FORM. Repetition in harmony is a fundamental factor in constituting musical form. However, repeated pattern discovery in music still remains an open problem, and it has not been addressed before in chord sequences. FORM relies on a suffix tree based algorithm to find repeated patterns in symbolic chord sequences that are either provided by machine transcriptions or musical experts. This novel approach complements other segmentation methods, which generally use a self-distance matrix based on other musical features describing timbre, instrumentation, rhythm, or melody. We evaluate the segmentation quality of FORM on 649 popular songs, and show that FORM outperforms two baseline approaches. With FORM we explore new ways of exploiting musical repetition for structural segmentation, yielding a flexible and practical algorithm, and a better understanding of musical repetition.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88282567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Detecting Musical Genre Borders for Multi-label Genre Classification 多标签音乐类型分类的音乐类型边界检测
Hiroki Nakamura, Hung-Hsuan Huang, K. Kawagoe
In this paper, we propose a novel method to detect music genre borders for the music genre classification. The music genre classification is getting more important because music is influenced by an increasing amount of different musical styles. A general approach to classify music genres is a single genre labeling that usually gives the meaning of inherent stylistic elements to a musical piece. However this gives ambiguity in case of a music piece having multiple genres. To solve the problem, we consider separating the multi-label classification task into the single-label genre classification task. We propose a novel method to detect music genre borders for multi-label genre classification. The proposed method can find borderlines of different genres in music. Moreover, it is strongly expected to realize the multi-label genre classification to apply the single-label genre classification to each detected music segment.
本文提出了一种新的音乐体裁边界检测方法,用于音乐体裁分类。音乐类型的分类变得越来越重要,因为音乐受到越来越多不同音乐风格的影响。对音乐类型进行分类的一般方法是使用单一类型标签,该标签通常赋予音乐作品固有风格元素的含义。然而,这在具有多种类型的音乐作品的情况下会产生歧义。为了解决这个问题,我们考虑将多标签分类任务拆分为单标签类型分类任务。提出了一种新的多标签音乐类型边界检测方法。该方法可以发现音乐中不同体裁的边界。此外,强烈期望实现多标签类型分类,将单标签类型分类应用于每个检测到的音乐片段。
{"title":"Detecting Musical Genre Borders for Multi-label Genre Classification","authors":"Hiroki Nakamura, Hung-Hsuan Huang, K. Kawagoe","doi":"10.1109/ISM.2013.108","DOIUrl":"https://doi.org/10.1109/ISM.2013.108","url":null,"abstract":"In this paper, we propose a novel method to detect music genre borders for the music genre classification. The music genre classification is getting more important because music is influenced by an increasing amount of different musical styles. A general approach to classify music genres is a single genre labeling that usually gives the meaning of inherent stylistic elements to a musical piece. However this gives ambiguity in case of a music piece having multiple genres. To solve the problem, we consider separating the multi-label classification task into the single-label genre classification task. We propose a novel method to detect music genre borders for multi-label genre classification. The proposed method can find borderlines of different genres in music. Moreover, it is strongly expected to realize the multi-label genre classification to apply the single-label genre classification to each detected music segment.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82878418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1