Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia最新文献

英文中文

Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition. 用于语音情感识别的双向融合网络中的相互关联注意因素

Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia

Pub Date : 2019-10-01 DOI: 10.1145/3343031.3351039

Yue Gu, Xinyu Lyu, Weijia Sun, Weitian Li, Shuhong Chen, Xinyu Li, Marsic Ivan

Emotion recognition in dyadic communication is challenging because: 1. Extracting informative modality-specific representations requires disparate feature extractor designs due to the heterogenous input data formats. 2. How to effectively and efficiently fuse unimodal features and learn associations between dyadic utterances are critical to the model generalization in actual scenario. 3. Disagreeing annotations prevent previous approaches from precisely predicting emotions in context. To address the above issues, we propose an efficient dyadic fusion network that only relies on an attention mechanism to select representative vectors, fuse modality-specific features, and learn the sequence information. Our approach has three distinct characteristics: 1. Instead of using a recurrent neural network to extract temporal associations as in most previous research, we introduce multiple sub-view attention layers to compute the relevant dependencies among sequential utterances; this significantly improves model efficiency. 2. To improve fusion performance, we design a learnable mutual correlation factor inside each attention layer to compute associations across different modalities. 3. To overcome the label disagreement issue, we embed the labels from all annotators into a k-dimensional vector and transform the categorical problem into a regression problem; this method provides more accurate annotation information and fully uses the entire dataset. We evaluate the proposed model on two published multimodal emotion recognition datasets: IEMOCAP and MELD. Our model significantly outperforms previous state-of-the-art research by 3.8%-7.5% accuracy, using a more efficient model.

双人交流中的情感识别具有挑战性，因为1.由于输入数据格式不同，提取信息量大的特定模态表征需要不同的特征提取器设计。2.如何有效、高效地融合单模态特征，并学习双向语篇之间的关联，对于模型在实际场景中的泛化至关重要。3.由于注释不一致，以往的方法无法精确预测语境中的情绪。为了解决上述问题，我们提出了一种高效的双元融合网络，它仅依靠注意力机制来选择代表性向量、融合特定模态特征并学习序列信息。我们的方法有三个显著特点：1.我们没有像之前的大多数研究那样使用递归神经网络来提取时间关联，而是引入了多个子视图注意层来计算序列语篇之间的相关依赖关系；这大大提高了模型的效率。2.2. 为了提高融合性能，我们在每个注意层内设计了一个可学习的相互关联因子，以计算不同模态之间的关联。3.3. 为了克服标签分歧问题，我们将所有注释者的标签嵌入到一个 k 维向量中，并将分类问题转化为回归问题；这种方法能提供更准确的注释信息，并充分利用整个数据集。我们在两个已发布的多模态情感识别数据集上对所提出的模型进行了评估：IEMOCAP 和 MELD。通过使用更高效的模型，我们的模型以 3.8%-7.5% 的准确率明显优于之前的先进研究。

{"title":"Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition.","authors":"Yue Gu, Xinyu Lyu, Weijia Sun, Weitian Li, Shuhong Chen, Xinyu Li, Marsic Ivan","doi":"10.1145/3343031.3351039","DOIUrl":"10.1145/3343031.3351039","url":null,"abstract":"Emotion recognition in dyadic communication is challenging because: 1. Extracting informative modality-specific representations requires disparate feature extractor designs due to the heterogenous input data formats. 2. How to effectively and efficiently fuse unimodal features and learn associations between dyadic utterances are critical to the model generalization in actual scenario. 3. Disagreeing annotations prevent previous approaches from precisely predicting emotions in context. To address the above issues, we propose an efficient dyadic fusion network that only relies on an attention mechanism to select representative vectors, fuse modality-specific features, and learn the sequence information. Our approach has three distinct characteristics: 1. Instead of using a recurrent neural network to extract temporal associations as in most previous research, we introduce multiple sub-view attention layers to compute the relevant dependencies among sequential utterances; this significantly improves model efficiency. 2. To improve fusion performance, we design a learnable mutual correlation factor inside each attention layer to compute associations across different modalities. 3. To overcome the label disagreement issue, we embed the labels from all annotators into a k-dimensional vector and transform the categorical problem into a regression problem; this method provides more accurate annotation information and fully uses the entire dataset. We evaluate the proposed model on two published multimodal emotion recognition datasets: IEMOCAP and MELD. Our model significantly outperforms previous state-of-the-art research by 3.8%-7.5% accuracy, using a more efficient model.","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2019 ","pages":"157-166"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7085887/pdf/nihms-1571671.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37763064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder. 利用具有层次编码器-解码器的注意力多模态网络进行人类对话分析

Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia

Pub Date : 2018-10-01 DOI: 10.1145/3240508.3240714

Yue Gu, Xinyu Li, Kaixiang Huang, Shiyu Fu, Kangning Yang, Shuhong Chen, Moliang Zhou, Ivan Marsic

Human conversation analysis is challenging because the meaning can be expressed through words, intonation, or even body language and facial expression. We introduce a hierarchical encoder-decoder structure with attention mechanism for conversation analysis. The hierarchical encoder learns word-level features from video, audio, and text data that are then formulated into conversation-level features. The corresponding hierarchical decoder is able to predict different attributes at given time instances. To integrate multiple sensory inputs, we introduce a novel fusion strategy with modality attention. We evaluated our system on published emotion recognition, sentiment analysis, and speaker trait analysis datasets. Our system outperformed previous state-of-the-art approaches in both classification and regressions tasks on three datasets. We also outperformed previous approaches in generalization tests on two commonly used datasets. We achieved comparable performance in predicting co-existing labels using the proposed model instead of multiple individual models. In addition, the easily-visualized modality and temporal attention demonstrated that the proposed attention mechanism helps feature selection and improves model interpretability.

人类对话分析具有挑战性，因为对话的含义可以通过语言、语调甚至肢体语言和面部表情来表达。我们为对话分析引入了一种具有注意力机制的分层编码器-解码器结构。分层编码器从视频、音频和文本数据中学习单词级特征，然后将这些特征转化为对话级特征。相应的分层解码器能够预测给定时间实例的不同属性。为了整合多种感官输入，我们引入了一种具有模态注意力的新型融合策略。我们在已发布的情感识别、情感分析和说话者特质分析数据集上评估了我们的系统。在三个数据集的分类和回归任务中，我们的系统都优于之前的先进方法。在两个常用数据集的泛化测试中，我们的表现也优于之前的方法。在预测共存标签时，我们使用了所提出的模型，而不是多个单独的模型，取得了不相上下的性能。此外，易于可视化的模式和时间注意力表明，所提出的注意力机制有助于特征选择并提高了模型的可解释性。

{"title":"Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder.","authors":"Yue Gu, Xinyu Li, Kaixiang Huang, Shiyu Fu, Kangning Yang, Shuhong Chen, Moliang Zhou, Ivan Marsic","doi":"10.1145/3240508.3240714","DOIUrl":"10.1145/3240508.3240714","url":null,"abstract":"Human conversation analysis is challenging because the meaning can be expressed through words, intonation, or even body language and facial expression. We introduce a hierarchical encoder-decoder structure with attention mechanism for conversation analysis. The hierarchical encoder learns word-level features from video, audio, and text data that are then formulated into conversation-level features. The corresponding hierarchical decoder is able to predict different attributes at given time instances. To integrate multiple sensory inputs, we introduce a novel fusion strategy with modality attention. We evaluated our system on published emotion recognition, sentiment analysis, and speaker trait analysis datasets. Our system outperformed previous state-of-the-art approaches in both classification and regressions tasks on three datasets. We also outperformed previous approaches in generalization tests on two commonly used datasets. We achieved comparable performance in predicting co-existing labels using the proposed model instead of multiple individual models. In addition, the easily-visualized modality and temporal attention demonstrated that the proposed attention mechanism helps feature selection and improves model interpretability.","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2018 ","pages":"537-545"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7085889/pdf/nihms-1571718.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37763063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-Modal Health State Estimation. 跨模态健康状态估计。

Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia

Pub Date : 2018-10-01 DOI: 10.1145/3240508.3241913

Nitish Nag, Vaibhav Pandey, Preston J Putzel, Hari Bhimaraju, Srikanth Krishnan, Ramesh Jain

Individuals create and consume more diverse data about themselves today than any time in history. Sources of this data include wearable devices, images, social media, geo-spatial information and more. A tremendous opportunity rests within cross-modal data analysis that leverages existing domain knowledge methods to understand and guide human health. Especially in chronic diseases, current medical practice uses a combination of sparse hospital based biological metrics (blood tests, expensive imaging, etc.) to understand the evolving health status of an individual. Future health systems must integrate data created at the individual level to better understand health status perpetually, especially in a cybernetic framework. In this work we fuse multiple user created and open source data streams along with established biomedical domain knowledge to give two types of quantitative state estimates of cardiovascular health. First, we use wearable devices to calculate cardiorespiratory fitness (CRF), a known quantitative leading predictor of heart disease which is not routinely collected in clinical settings. Second, we estimate inherent genetic traits, living environmental risks, circadian rhythm, and biological metrics from a diverse dataset. Our experimental results on 24 subjects demonstrate how multi-modal data can provide personalized health insight. Understanding the dynamic nature of health status will pave the way for better health based recommendation engines, better clinical decision making and positive lifestyle changes.

今天，个人创造和消费的关于自己的数据比历史上任何时候都更加多样化。这些数据的来源包括可穿戴设备、图像、社交媒体、地理空间信息等。跨模态数据分析是一个巨大的机会，它利用现有的领域知识方法来理解和指导人类健康。特别是在慢性病方面，目前的医疗实践使用稀疏的基于医院的生物指标（血液测试、昂贵的成像等）来了解个人不断变化的健康状况。未来的卫生系统必须整合在个人层面创建的数据，以更好地永远了解健康状况，特别是在控制论框架中。在这项工作中，我们融合了多个用户创建的开源数据流以及已建立的生物医学领域知识，给出了心血管健康的两种定量状态估计。首先，我们使用可穿戴设备来计算心肺功能适合度（CRF），这是一种已知的心脏病的主要定量预测指标，在临床环境中没有常规收集。其次，我们从不同的数据集中估计固有的遗传特征、生活环境风险、昼夜节律和生物指标。我们在24名受试者身上的实验结果表明，多模态数据可以提供个性化的健康见解。了解健康状况的动态性质将为更好的基于健康的推荐引擎、更好的临床决策和积极的生活方式改变铺平道路。

{"title":"Cross-Modal Health State Estimation.","authors":"Nitish Nag, Vaibhav Pandey, Preston J Putzel, Hari Bhimaraju, Srikanth Krishnan, Ramesh Jain","doi":"10.1145/3240508.3241913","DOIUrl":"10.1145/3240508.3241913","url":null,"abstract":"Individuals create and consume more diverse data about themselves today than any time in history. Sources of this data include wearable devices, images, social media, geo-spatial information and more. A tremendous opportunity rests within cross-modal data analysis that leverages existing domain knowledge methods to understand and guide human health. Especially in chronic diseases, current medical practice uses a combination of sparse hospital based biological metrics (blood tests, expensive imaging, etc.) to understand the evolving health status of an individual. Future health systems must integrate data created at the individual level to better understand health status perpetually, especially in a cybernetic framework. In this work we fuse multiple user created and open source data streams along with established biomedical domain knowledge to give two types of quantitative state estimates of cardiovascular health. First, we use wearable devices to calculate cardiorespiratory fitness (CRF), a known quantitative leading predictor of heart disease which is not routinely collected in clinical settings. Second, we estimate inherent genetic traits, living environmental risks, circadian rhythm, and biological metrics from a diverse dataset. Our experimental results on 24 subjects demonstrate how multi-modal data can provide personalized health insight. Understanding the dynamic nature of health status will pave the way for better health based recommendation engines, better clinical decision making and positive lifestyle changes.","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2018 ","pages":"1993-2002"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6530992/pdf/nihms-1026575.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37277202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Region-based Activity Recognition Using Conditional GAN. 使用条件GAN的基于区域的活动识别。

Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia

Pub Date : 2017-10-01 DOI: 10.1145/3123266.3123365

Xinyu Li, Yanyi Zhang, Jianyu Zhang, Yueyang Chen, Huangcan Li, Ivan Marsic, Randall S Burd

We present a method for activity recognition that first estimates the activity performer's location and uses it with input data for activity recognition. Existing approaches directly take video frames or entire video for feature extraction and recognition, and treat the classifier as a black box. Our method first locates the activities in each input video frame by generating an activity mask using a conditional generative adversarial network (cGAN). The generated mask is appended to color channels of input images and fed into a VGG-LSTM network for activity recognition. To test our system, we produced two datasets with manually created masks, one containing Olympic sports activities and the other containing trauma resuscitation activities. Our system makes activity prediction for each video frame and achieves performance comparable to the state-of-the-art systems while simultaneously outlining the location of the activity. We show how the generated masks facilitate the learning of features that are representative of the activity rather than accidental surrounding information.

我们提出了一种活动识别方法，该方法首先估计活动执行者的位置，并将其与输入数据一起用于活动识别。现有的方法直接采用视频帧或整个视频进行特征提取和识别，并将分类器视为黑盒。我们的方法首先通过使用条件生成对抗性网络（cGAN）生成活动掩码来定位每个输入视频帧中的活动。生成的掩码被附加到输入图像的颜色通道，并被馈送到VGG-LSTM网络中用于活动识别。为了测试我们的系统，我们制作了两个带有手动创建口罩的数据集，一个包含奥运会体育活动，另一个包含创伤复苏活动。我们的系统对每个视频帧进行活动预测，并实现了与最先进的系统相当的性能，同时概述了活动的位置。我们展示了生成的掩码如何帮助学习代表活动的特征，而不是意外的周围信息。

{"title":"Region-based Activity Recognition Using Conditional GAN.","authors":"Xinyu Li, Yanyi Zhang, Jianyu Zhang, Yueyang Chen, Huangcan Li, Ivan Marsic, Randall S Burd","doi":"10.1145/3123266.3123365","DOIUrl":"10.1145/3123266.3123365","url":null,"abstract":"We present a method for activity recognition that first estimates the activity performer's location and uses it with input data for activity recognition. Existing approaches directly take video frames or entire video for feature extraction and recognition, and treat the classifier as a black box. Our method first locates the activities in each input video frame by generating an activity mask using a conditional generative adversarial network (cGAN). The generated mask is appended to color channels of input images and fed into a VGG-LSTM network for activity recognition. To test our system, we produced two datasets with manually created masks, one containing Olympic sports activities and the other containing trauma resuscitation activities. Our system makes activity prediction for each video frame and achieves performance comparable to the state-of-the-art systems while simultaneously outlining the location of the activity. We show how the generated masks facilitate the learning of features that are representative of the activity rather than accidental surrounding information.","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2017 ","pages":"1059-1067"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3123266.3123365","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36624678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

On Shape and the Computability of Emotions. 论形状与情感的可计算性。

Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia

Pub Date : 2012-10-01 DOI: 10.1145/2393347.2393384

Xin Lu, Poonam Suryanarayan, Reginald B Adams, Jia Li, Michelle G Newman, James Z Wang

We investigated how shape features in natural images influence emotions aroused in human beings. Shapes and their characteristics such as roundness, angularity, simplicity, and complexity have been postulated to affect the emotional responses of human beings in the field of visual arts and psychology. However, no prior research has modeled the dimensionality of emotions aroused by roundness and angularity. Our contributions include an in-depth statistical analysis to understand the relationship between shapes and emotions. Through experimental results on the International Affective Picture System (IAPS) dataset we provide evidence for the significance of roundness-angularity and simplicity-complexity on predicting emotional content in images. We combine our shape features with other state-of-the-art features to show a gain in prediction and classification accuracy. We model emotions from a dimensional perspective in order to predict valence and arousal ratings which have advantages over modeling the traditional discrete emotional categories. Finally, we distinguish images with strong emotional content from emotionally neutral images with high accuracy.

我们研究了自然图像中的形状特征如何影响人类的情绪。在视觉艺术和心理学领域，形状及其特征，如圆度、棱角性、简单性和复杂性，被认为会影响人类的情感反应。然而，先前没有任何研究对圆度和棱角性引起的情绪维度进行建模。我们的贡献包括深入的统计分析，以了解形状和情绪之间的关系。通过在国际情感图片系统（IAPS）数据集上的实验结果，我们为圆度、角度和简单复杂性在预测图像情感内容方面的意义提供了证据。我们将我们的形状特征与其他最先进的特征相结合，以显示预测和分类精度的提高。我们从维度的角度对情绪进行建模，以预测效价和唤醒评级，这比建模传统的离散情绪类别具有优势。最后，我们以高精度将具有强烈情感内容的图像与情感中性的图像区分开来。

{"title":"On Shape and the Computability of Emotions.","authors":"Xin Lu, Poonam Suryanarayan, Reginald B Adams, Jia Li, Michelle G Newman, James Z Wang","doi":"10.1145/2393347.2393384","DOIUrl":"https://doi.org/10.1145/2393347.2393384","url":null,"abstract":"We investigated how shape features in natural images influence emotions aroused in human beings. Shapes and their characteristics such as roundness, angularity, simplicity, and complexity have been postulated to affect the emotional responses of human beings in the field of visual arts and psychology. However, no prior research has modeled the dimensionality of emotions aroused by roundness and angularity. Our contributions include an in-depth statistical analysis to understand the relationship between shapes and emotions. Through experimental results on the International Affective Picture System (IAPS) dataset we provide evidence for the significance of roundness-angularity and simplicity-complexity on predicting emotional content in images. We combine our shape features with other state-of-the-art features to show a gain in prediction and classification accuracy. We model emotions from a dimensional perspective in order to predict valence and arousal ratings which have advantages over modeling the traditional discrete emotional categories. Finally, we distinguish images with strong emotional content from emotionally neutral images with high accuracy.","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2012 ","pages":"229-238"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2393347.2393384","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41223246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 139

Recognizing Clothes Patterns for Blind People by Confidence Margin based Feature Combination. 基于置信度的特征组合识别盲人服装图案

Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia

Pub Date : 2011-01-01 DOI: 10.1145/2072298.2071947

Xiaodong Yang, Shuai Yuan, YingLi Tian

Clothes pattern recognition is a challenging task for blind or visually impaired people. Automatic clothes pattern recognition is also a challenging problem in computer vision due to the large pattern variations. In this paper, we present a new method to classify clothes patterns into 4 categories: stripe, lattice, special, and patternless. While existing texture analysis methods mainly focused on textures varying with distinctive pattern changes, they cannot achieve the same level of accuracy for clothes pattern recognition because of the large intra-class variations in each clothes pattern category. To solve this problem, we extract both structural feature and statistical feature from image wavelet subbands. Furthermore, we develop a new feature combination scheme based on the confidence margin of a classifier to combine the two types of features to form a novel local image descriptor in a compact and discriminative format. The recognition experiment is conducted on a database with 627 clothes images of 4 categories of patterns. Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art texture analysis methods in the context of clothes pattern recognition.

服装模式识别对盲人或视障人士来说是一项具有挑战性的任务。服装模式的自动识别也是计算机视觉中一个具有挑战性的问题。本文提出了一种将服装图案分为条纹、格子、特殊、无图案4类的新方法。现有的纹理分析方法主要关注的是随着图案变化而变化的纹理，但由于每个服装图案类别的类内变化较大，因此无法达到服装模式识别的同样精度。为了解决这一问题，我们从图像小波子带中提取结构特征和统计特征。在此基础上，提出了一种基于分类器置信度的特征组合方案，将两类特征组合在一起，形成一种紧凑、判别格式的局部图像描述符。在一个包含4类图案的627幅服装图像的数据库上进行识别实验。实验结果表明，该方法在服装模式识别中明显优于现有的纹理分析方法。

{"title":"Recognizing Clothes Patterns for Blind People by Confidence Margin based Feature Combination.","authors":"Xiaodong Yang, Shuai Yuan, YingLi Tian","doi":"10.1145/2072298.2071947","DOIUrl":"https://doi.org/10.1145/2072298.2071947","url":null,"abstract":"Clothes pattern recognition is a challenging task for blind or visually impaired people. Automatic clothes pattern recognition is also a challenging problem in computer vision due to the large pattern variations. In this paper, we present a new method to classify clothes patterns into 4 categories: stripe, lattice, special, and patternless. While existing texture analysis methods mainly focused on textures varying with distinctive pattern changes, they cannot achieve the same level of accuracy for clothes pattern recognition because of the large intra-class variations in each clothes pattern category. To solve this problem, we extract both structural feature and statistical feature from image wavelet subbands. Furthermore, we develop a new feature combination scheme based on the confidence margin of a classifier to combine the two types of features to form a novel local image descriptor in a compact and discriminative format. The recognition experiment is conducted on a database with 627 clothes images of 4 categories of patterns. Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art texture analysis methods in the context of clothes pattern recognition.","PeriodicalId":90687,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia","volume":"2011 ","pages":"1097-1100"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2072298.2071947","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32721226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀