首页 > 最新文献

2005 IEEE International Conference on Multimedia and Expo最新文献

英文 中文
Non-linear image enhancement for digital TV applications using Gabor filters 使用Gabor滤波器的数字电视应用的非线性图像增强
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521597
Yue Yang, Baoxin Li
We propose a non-linear image enhancement method based on Gabor filters, which allows selective enhancement based on the contrast sensitivity function of the human visual system. We also propose an evaluation method for measuring the performance of the algorithm and for comparing it with existing approaches. The selective enhancement of the proposed approach is especially suitable for digital television applications to improve the perceived visual quality of the images when the source image contains less satisfactory amount of high frequencies due to various reasons, including interpolation that is used to convert standard definition sources into high-definition images.
提出了一种基于Gabor滤波器的非线性图像增强方法,该方法可以根据人类视觉系统的对比敏感度函数进行选择性增强。我们还提出了一种评估方法来衡量算法的性能,并将其与现有方法进行比较。所提出的方法的选择性增强特别适用于数字电视应用,当源图像由于各种原因(包括用于将标准清晰度源转换为高清图像的插值)而包含较少的高频时,可以改善图像的感知视觉质量。
{"title":"Non-linear image enhancement for digital TV applications using Gabor filters","authors":"Yue Yang, Baoxin Li","doi":"10.1109/ICME.2005.1521597","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521597","url":null,"abstract":"We propose a non-linear image enhancement method based on Gabor filters, which allows selective enhancement based on the contrast sensitivity function of the human visual system. We also propose an evaluation method for measuring the performance of the algorithm and for comparing it with existing approaches. The selective enhancement of the proposed approach is especially suitable for digital television applications to improve the perceived visual quality of the images when the source image contains less satisfactory amount of high frequencies due to various reasons, including interpolation that is used to convert standard definition sources into high-definition images.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130950058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Visual/Acoustic Emotion Recognition 视觉/听觉情感识别
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521709
Cheng-Yao Chen, Yue Huang, P. Cook
To recognize and understand a person's emotion has been known as one of the most important issue in human-computer interaction. In this paper, we present a multimodal system that supports emotion recognition from both visual and acoustic feature analysis. Our main achievement is that with this bimodal method, we can effectively extend the recognized emotion categories compared to when only visual or acoustic feature analysis works alone. We also show that by carefully cooperating bimodal features, the recognition precision of each emotion category will exceed the limit set up by the single modality, both visual and acoustic. Moreover, we believe our system is closer to real human perception and experience and hence will make emotion recognition closer to practical application in the future
识别和理解一个人的情绪是人机交互中最重要的问题之一。在本文中,我们提出了一个支持从视觉和声学特征分析情感识别的多模态系统。我们的主要成果是,与仅使用视觉或声学特征分析相比,使用这种双峰方法,我们可以有效地扩展识别的情感类别。我们还表明,通过仔细合作的双峰特征,每个情感类别的识别精度将超过单一模态的限制,无论是视觉还是听觉。此外,我们相信我们的系统更接近真实的人类感知和经验,因此将使情感识别在未来更接近实际应用
{"title":"Visual/Acoustic Emotion Recognition","authors":"Cheng-Yao Chen, Yue Huang, P. Cook","doi":"10.1109/ICME.2005.1521709","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521709","url":null,"abstract":"To recognize and understand a person's emotion has been known as one of the most important issue in human-computer interaction. In this paper, we present a multimodal system that supports emotion recognition from both visual and acoustic feature analysis. Our main achievement is that with this bimodal method, we can effectively extend the recognized emotion categories compared to when only visual or acoustic feature analysis works alone. We also show that by carefully cooperating bimodal features, the recognition precision of each emotion category will exceed the limit set up by the single modality, both visual and acoustic. Moreover, we believe our system is closer to real human perception and experience and hence will make emotion recognition closer to practical application in the future","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127300735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Optimized wireless video transmission using classification 利用分类优化无线视频传输
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521591
R. Wong, M. Schaar, D. Turaga
Cross protocol layer optimizations have been recently proposed for improving the performance of real-time video transmission over 802.11 WLANs. However, performing such cross-layer optimizations is difficult since the video data and channel characteristics are time-varying, and analytically deriving the relationships between quality and channel characteristics given delay and power constraints is difficult. Furthermore, these relationships are often non-linear and non-deterministic (only worst or average case values can be determined). Complex Lagrangian or multi-objective optimization problems are thus often faced. In this paper, we propose a novel framework for solving cross MAC-application layer optimization problems. More specifically, we employ classification techniques to find an optimized cross-layer strategy for wireless multimedia transmission. Our solution deploys both content- and channel-related features to select a joint application-MAC strategy from the different strategies available at the various layers. Preliminary results indicate that considerable improvements can be obtained through the proposed cross-layer techniques relying on classification as opposed to ad-hoc solutions. The improvements are especially important at high packet-loss rates (5% and higher), where deploying a judicious mixture of strategies at the various layers becomes essential.
最近提出了跨协议层优化,以提高802.11 wlan上实时视频传输的性能。然而,执行这种跨层优化是困难的,因为视频数据和信道特性是时变的,并且在给定延迟和功率约束的情况下,很难解析地推导出质量和信道特性之间的关系。此外,这些关系通常是非线性和不确定的(只能确定最坏或平均情况的值)。因此经常面临复杂的拉格朗日或多目标优化问题。在本文中,我们提出了一个新的框架来解决跨mac应用层优化问题。更具体地说,我们使用分类技术来寻找无线多媒体传输的优化跨层策略。我们的解决方案部署了与内容和通道相关的功能,以便从各层可用的不同策略中选择联合应用程序mac策略。初步结果表明,通过所提出的依赖于分类的跨层技术可以获得相当大的改进,而不是临时解决方案。这些改进在高丢包率(5%或更高)的情况下尤为重要,在这种情况下,在各个层部署明智的混合策略变得至关重要。
{"title":"Optimized wireless video transmission using classification","authors":"R. Wong, M. Schaar, D. Turaga","doi":"10.1109/ICME.2005.1521591","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521591","url":null,"abstract":"Cross protocol layer optimizations have been recently proposed for improving the performance of real-time video transmission over 802.11 WLANs. However, performing such cross-layer optimizations is difficult since the video data and channel characteristics are time-varying, and analytically deriving the relationships between quality and channel characteristics given delay and power constraints is difficult. Furthermore, these relationships are often non-linear and non-deterministic (only worst or average case values can be determined). Complex Lagrangian or multi-objective optimization problems are thus often faced. In this paper, we propose a novel framework for solving cross MAC-application layer optimization problems. More specifically, we employ classification techniques to find an optimized cross-layer strategy for wireless multimedia transmission. Our solution deploys both content- and channel-related features to select a joint application-MAC strategy from the different strategies available at the various layers. Preliminary results indicate that considerable improvements can be obtained through the proposed cross-layer techniques relying on classification as opposed to ad-hoc solutions. The improvements are especially important at high packet-loss rates (5% and higher), where deploying a judicious mixture of strategies at the various layers becomes essential.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116185381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A spatial-temporal de-interlacing algorithm 一种时空去隔行算法
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521407
T. Chong, O. Au, Tai-Wai Chan, Wing-San Chau
In this paper, we proposed a spatial-temporal de-interlacing algorithm for conversion of interlaced video to progressive video. Our proposed algorithm estimates the motion trajectory of three consecutive fields interpolates the missing field along the motion trajectory. In the motion estimator, the unidirectional motion estimation and the bidirectional motion estimation processes are combined by multiple objective minimization technique. The unidirectional motion estimation estimates the motion trajectory by comparing the blocks from opposite parity fields while the bi-directional motion estimation compares blocks from the same parity fields. By combining the two motion estimations, the motion trajectory can be accurately predicted. In addition, a quality analyzer is proposed to evaluate the visual quality of the reconstructed frame, which chooses the appropriate interpolation scheme in order to provide maximum de-interlacing performance. Simulation results show the proposed algorithm has better performance over existing de-interlacing algorithm.
在本文中,我们提出了一种将隔行视频转换为逐行视频的时空去隔行算法。我们提出的算法估计三个连续场的运动轨迹,沿运动轨迹插值缺失的场。在运动估计器中,采用多目标最小化技术将单向运动估计和双向运动估计相结合。单向运动估计通过比较来自相反奇偶域的块来估计运动轨迹,双向运动估计通过比较来自相同奇偶域的块来估计运动轨迹。结合这两种运动估计,可以准确地预测运动轨迹。此外,提出了一个质量分析仪来评估重建帧的视觉质量,选择合适的插值方案,以提供最大的去隔行性能。仿真结果表明,该算法比现有的去隔行算法具有更好的性能。
{"title":"A spatial-temporal de-interlacing algorithm","authors":"T. Chong, O. Au, Tai-Wai Chan, Wing-San Chau","doi":"10.1109/ICME.2005.1521407","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521407","url":null,"abstract":"In this paper, we proposed a spatial-temporal de-interlacing algorithm for conversion of interlaced video to progressive video. Our proposed algorithm estimates the motion trajectory of three consecutive fields interpolates the missing field along the motion trajectory. In the motion estimator, the unidirectional motion estimation and the bidirectional motion estimation processes are combined by multiple objective minimization technique. The unidirectional motion estimation estimates the motion trajectory by comparing the blocks from opposite parity fields while the bi-directional motion estimation compares blocks from the same parity fields. By combining the two motion estimations, the motion trajectory can be accurately predicted. In addition, a quality analyzer is proposed to evaluate the visual quality of the reconstructed frame, which chooses the appropriate interpolation scheme in order to provide maximum de-interlacing performance. Simulation results show the proposed algorithm has better performance over existing de-interlacing algorithm.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121175245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Probabilistic Description of Man-Machine Spoken Communication 人机语音交流的概率描述
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521447
O. Pietquin
Speech enabled interfaces and spoken dialog systems are mostly based on statistical speech and language processing modules. Their behavior is therefore not deterministic and hardly predictable. This makes the simulation and the optimization of such systems performances difficult, as well as the reuse of previous work to build new systems. In the aim of a partially automated optimization of such systems, this paper presents a formalism attempt for the description of man-machine spoken communication in the framework of spoken dialog systems. This formalization is partly based on a probabilistic description of the information processing occurring in each module composing a spoken dialog system but also on a stochastic user modeling. Eventually, some possible applications of this theoretic framework are proposed
支持语音的界面和语音对话系统主要基于统计语音和语言处理模块。因此,它们的行为是不确定的,也很难预测。这使得仿真和优化这类系统的性能变得困难,同时也使得在构建新系统时重用以前的工作变得困难。为了对这类系统进行部分自动化的优化,本文提出了一种在口语对话系统框架下描述人机口语交流的形式化尝试。这种形式化部分基于构成口语对话系统的每个模块中发生的信息处理的概率描述,但也基于随机用户建模。最后,提出了该理论框架的一些可能的应用
{"title":"A Probabilistic Description of Man-Machine Spoken Communication","authors":"O. Pietquin","doi":"10.1109/ICME.2005.1521447","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521447","url":null,"abstract":"Speech enabled interfaces and spoken dialog systems are mostly based on statistical speech and language processing modules. Their behavior is therefore not deterministic and hardly predictable. This makes the simulation and the optimization of such systems performances difficult, as well as the reuse of previous work to build new systems. In the aim of a partially automated optimization of such systems, this paper presents a formalism attempt for the description of man-machine spoken communication in the framework of spoken dialog systems. This formalization is partly based on a probabilistic description of the information processing occurring in each module composing a spoken dialog system but also on a stochastic user modeling. Eventually, some possible applications of this theoretic framework are proposed","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127953416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Personally Customizable Group Navigation System Using Cellular Phones and Wireless Ad-Hoc Communication 使用移动电话和无线自组织通信的个人定制组导航系统
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521678
Yoshitaka Nakamura, Guiquan Ren, Masatoshi Nakamura, T. Umedu, T. Higashino
Due to the progress of portable computing devices such as PDAs, cellular phones and small sized PCs, many personal navigation systems have been developed which navigate their users to display routes to given destinations. Those navigation systems mainly focus on the guidance for personal use. In this paper, we have developed a group navigation system, which provides facilities for (1) personally customizable route navigation to a given destination, (2) management of group movement and (3) rehearsal usage when we make the personally customized route navigation. In our system, using wireless ad-hoc communication a few leaders of a group can collect and distribute the information about its members' current positions and give each member a suitable suggestion when the user is losing his/her way. The personalized route navigation scenario (program) running on portable devices can be obtained automatically only by clicking intersections sequentially on a given map and giving pictures and comments. A rehearsal mode is also prepared when we make the personalized route navigation
由于便携式计算设备的进步,如掌上电脑、移动电话和小型个人电脑,许多个人导航系统已经开发出来,为用户导航,显示到给定目的地的路线。这些导航系统主要侧重于个人使用的导航。在本文中,我们开发了一个群组导航系统,该系统提供了以下功能:(1)个人定制路线导航到给定目的地;(2)群组运动管理;(3)个人定制路线导航时的预演使用。在我们的系统中,通过无线自组织通信,小组中的几个领导可以收集和分发小组成员当前位置的信息,并在用户迷路时给每个成员一个合适的建议。在移动设备上运行的个性化路线导航场景(程序),只需在给定的地图上依次点击十字路口,并给出图片和评论即可自动获得。我们在进行个性化的路线导航时,也准备了一个预演模式
{"title":"Personally Customizable Group Navigation System Using Cellular Phones and Wireless Ad-Hoc Communication","authors":"Yoshitaka Nakamura, Guiquan Ren, Masatoshi Nakamura, T. Umedu, T. Higashino","doi":"10.1109/ICME.2005.1521678","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521678","url":null,"abstract":"Due to the progress of portable computing devices such as PDAs, cellular phones and small sized PCs, many personal navigation systems have been developed which navigate their users to display routes to given destinations. Those navigation systems mainly focus on the guidance for personal use. In this paper, we have developed a group navigation system, which provides facilities for (1) personally customizable route navigation to a given destination, (2) management of group movement and (3) rehearsal usage when we make the personally customized route navigation. In our system, using wireless ad-hoc communication a few leaders of a group can collect and distribute the information about its members' current positions and give each member a suitable suggestion when the user is losing his/her way. The personalized route navigation scenario (program) running on portable devices can be obtained automatically only by clicking intersections sequentially on a given map and giving pictures and comments. A rehearsal mode is also prepared when we make the personalized route navigation","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127551383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Gender identification using frontal facial images 正面面部图像的性别识别
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521613
Amith K Jain, Jeffrey R. Huang, S. Fang
Computer vision and pattern recognition systems play an important role in our lives by means of automated face detection, face and gesture recognition, and estimation of gender and age. This paper addresses the problem of gender classification using frontal facial images. We have developed gender classifiers with performance superior to existing gender classifiers. We experiment on 500 images (250 females and 250 males) randomly withdrawn from the FERET facial database. Independent component analysis (ICA) is used to represent each image as a feature vector in a low dimensional subspace. Different classifiers are studied in this lower dimensional space. Our experimental results show the superior performance of our approach to the existing gender classifiers. We get a 96% accuracy using support vector machine (SVM) in ICA space.
计算机视觉和模式识别系统在我们的生活中发挥着重要的作用,通过自动人脸检测,人脸和手势识别,以及性别和年龄的估计。本文研究了利用正面人脸图像进行性别分类的问题。我们开发了性能优于现有性别分类器的性别分类器。我们对从FERET面部数据库中随机抽取的500张图像(250张女性和250张男性)进行了实验。采用独立分量分析(ICA)将每张图像表示为低维子空间中的特征向量。在这个低维空间中研究了不同的分类器。我们的实验结果表明,我们的方法优于现有的性别分类器。在ICA空间中使用支持向量机(SVM)得到了96%的准确率。
{"title":"Gender identification using frontal facial images","authors":"Amith K Jain, Jeffrey R. Huang, S. Fang","doi":"10.1109/ICME.2005.1521613","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521613","url":null,"abstract":"Computer vision and pattern recognition systems play an important role in our lives by means of automated face detection, face and gesture recognition, and estimation of gender and age. This paper addresses the problem of gender classification using frontal facial images. We have developed gender classifiers with performance superior to existing gender classifiers. We experiment on 500 images (250 females and 250 males) randomly withdrawn from the FERET facial database. Independent component analysis (ICA) is used to represent each image as a feature vector in a low dimensional subspace. Different classifiers are studied in this lower dimensional space. Our experimental results show the superior performance of our approach to the existing gender classifiers. We get a 96% accuracy using support vector machine (SVM) in ICA space.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133590380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Fuzzy image segmentation using shape information 基于形状信息的模糊图像分割
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521529
Mohammed Ameer Ali, G. Karmakar, L. Dooley
Results of any clustering algorithm are highly sensitive to features that limit their generalization and hence provide a strong motivation to integrate shape information into the algorithm. Existing fuzzy shape-based clustering algorithms consider only circular and elliptical shape information and consequently do not segment well, arbitrary shaped objects. To address this issue, this paper introduces a new shape-based algorithm, called fuzzy image segmentation using shape information (FISS) by incorporating general shape information. Both qualitative and quantitative analysis proves the superiority of the new FISS algorithm compared to other well-established shape-based fuzzy clustering algorithms, including Gustafson-Kessel, ring-shaped, circular shell, c-ellipsoidal shells and elliptic ring-shaped clusters.
任何聚类算法的结果都对限制其泛化的特征高度敏感,因此提供了将形状信息集成到算法中的强烈动机。现有的基于模糊形状的聚类算法只考虑圆形和椭圆形的形状信息,因此不能很好地分割任意形状的物体。为了解决这一问题,本文引入了一种基于形状的模糊图像分割算法——基于形状信息的模糊图像分割(FISS)。定性和定量分析都证明了新的FISS算法与其他已建立的基于形状的模糊聚类算法(包括Gustafson-Kessel、环形聚类、圆形聚类、c椭球壳聚类和椭圆环形聚类)相比具有优越性。
{"title":"Fuzzy image segmentation using shape information","authors":"Mohammed Ameer Ali, G. Karmakar, L. Dooley","doi":"10.1109/ICME.2005.1521529","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521529","url":null,"abstract":"Results of any clustering algorithm are highly sensitive to features that limit their generalization and hence provide a strong motivation to integrate shape information into the algorithm. Existing fuzzy shape-based clustering algorithms consider only circular and elliptical shape information and consequently do not segment well, arbitrary shaped objects. To address this issue, this paper introduces a new shape-based algorithm, called fuzzy image segmentation using shape information (FISS) by incorporating general shape information. Both qualitative and quantitative analysis proves the superiority of the new FISS algorithm compared to other well-established shape-based fuzzy clustering algorithms, including Gustafson-Kessel, ring-shaped, circular shell, c-ellipsoidal shells and elliptic ring-shaped clusters.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130655048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Feature Selection and Stacking for Robust Discrimination of Speech, Monophonic Singing, and Polyphonic Music 语音、单声歌唱和复声音乐鲁棒识别的特征选择与叠加
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521554
Björn Schuller, Brüning J. B. Schmitt, D. Arsic, S. Reiter, M. Lang, G. Rigoll
In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are kernel machines, decision trees, and Bayesian classifiers. Moreover we improve single classifier performance by bagging and boosting, and finally combine strengths of classifiers by stackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working realtime capable implementation stress the practicability of the proposed novel ideas
在这项工作中,我们努力寻找一组最佳的声学特征,用于识别语音、单声歌唱和复声音乐,以稳健地分割声学媒体流,用于注释和交互目的。此外,我们在本任务中引入了基于集成的分类方法。我们从276个属性中选择SVM-SFFS最有效的集合。此外,通过计算信息增益比,提出了单个特征的相关性。作为比较的基础,我们用主成分分析法降维。我们展示了对命名任务中不同分类器的广泛分析。其中包括核机器、决策树和贝叶斯分类器。此外,我们通过装袋和提升来提高单个分类器的性能,最后通过堆叠来结合分类器的优势。该数据库由2,114个语音样本和58个人的歌声组成。从1980-2000年MTV-Europe-Top-20中截取了1000个音乐片段。一个工作的实时能力实现的突出的识别结果强调了所提出的新思想的实用性
{"title":"Feature Selection and Stacking for Robust Discrimination of Speech, Monophonic Singing, and Polyphonic Music","authors":"Björn Schuller, Brüning J. B. Schmitt, D. Arsic, S. Reiter, M. Lang, G. Rigoll","doi":"10.1109/ICME.2005.1521554","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521554","url":null,"abstract":"In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are kernel machines, decision trees, and Bayesian classifiers. Moreover we improve single classifier performance by bagging and boosting, and finally combine strengths of classifiers by stackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working realtime capable implementation stress the practicability of the proposed novel ideas","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116698682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Automatic Annotation of Location Information for WWW Images WWW图像位置信息的自动标注
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521537
Zhigang Hua, Chuang Wang, Xing Xie, Hanqing Lu, Wei-Ying Ma
Currently, a crucial challenge is raised on how to manage a large amount of images on the Web. Due to a real synergy between an image and its location, we propose an automatic solution to annotate contextual location information for WWW images. We construct an image importance model to acquire the dominant images in a page that comprise contextual surrounding text. For each acquired image, we develop an effective algorithm to compute location from its contextual text. We apply our approach to 1,000 pages from various Websites for image location annotation. The experiments demonstrated that more than 30% WWW images are related with geographic location information, and our solution can achieve the satisfactory results. Finally, we present some potential applications involving the utilization of image location information
目前,如何管理Web上的大量图像是一个重要的挑战。由于图像与其位置之间存在真正的协同作用,我们提出了一种自动注释WWW图像上下文位置信息的解决方案。我们构建了一个图像重要性模型来获取包含上下文周围文本的页面中的主导图像。对于每个获取的图像,我们开发了一个有效的算法,从其上下文文本计算位置。我们将我们的方法应用于来自不同网站的1,000页图像位置注释。实验表明,超过30%的WWW图像与地理位置信息相关,我们的解决方案可以达到令人满意的效果。最后,我们提出了一些涉及图像位置信息利用的潜在应用
{"title":"Automatic Annotation of Location Information for WWW Images","authors":"Zhigang Hua, Chuang Wang, Xing Xie, Hanqing Lu, Wei-Ying Ma","doi":"10.1109/ICME.2005.1521537","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521537","url":null,"abstract":"Currently, a crucial challenge is raised on how to manage a large amount of images on the Web. Due to a real synergy between an image and its location, we propose an automatic solution to annotate contextual location information for WWW images. We construct an image importance model to acquire the dominant images in a page that comprise contextual surrounding text. For each acquired image, we develop an effective algorithm to compute location from its contextual text. We apply our approach to 1,000 pages from various Websites for image location annotation. The experiments demonstrated that more than 30% WWW images are related with geographic location information, and our solution can achieve the satisfactory results. Finally, we present some potential applications involving the utilization of image location information","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116946014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2005 IEEE International Conference on Multimedia and Expo
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1