Video-Mediated group communication is quickly moving from the office to the home, where network conditions might fluctuate. If we are to provide a software component that can, in real-time, monitor the Quality of Experience (QoE), we would have to carry out extensive experiments under different varying (but controllable) conditions. Unfortunately, there are no tools available that provide us the required fined-grained level of control. This paper reports on our efforts implementing such a test bed. The test bed provides the experiment conductor full control over the complete media pipeline, and the possibility of modifying in real-time network and media conditions. Additionally, it has facilities to easily develop an experiment with custom layouts, task integration, and assessment of subjective ratings through questionnaires. We have already used the test bed in a number of evaluations, reported in this paper for discussing the benefits and drawbacks of our solution. The test bed have been proven to be a flexible and effective canvas for better understanding QoE on video-mediated group communication.
{"title":"A Quality of Experience Testbed for Video-Mediated Group Communication","authors":"Marwin Schmitt, S. Gunkel, Pablo César","doi":"10.1109/ISM.2013.102","DOIUrl":"https://doi.org/10.1109/ISM.2013.102","url":null,"abstract":"Video-Mediated group communication is quickly moving from the office to the home, where network conditions might fluctuate. If we are to provide a software component that can, in real-time, monitor the Quality of Experience (QoE), we would have to carry out extensive experiments under different varying (but controllable) conditions. Unfortunately, there are no tools available that provide us the required fined-grained level of control. This paper reports on our efforts implementing such a test bed. The test bed provides the experiment conductor full control over the complete media pipeline, and the possibility of modifying in real-time network and media conditions. Additionally, it has facilities to easily develop an experiment with custom layouts, task integration, and assessment of subjective ratings through questionnaires. We have already used the test bed in a number of evaluations, reported in this paper for discussing the benefits and drawbacks of our solution. The test bed have been proven to be a flexible and effective canvas for better understanding QoE on video-mediated group communication.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"19 1","pages":"514-515"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90243245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elisardo González-Agulla, J. Alba-Castro, Hector Canto, Vicente Goyanes
This paper describes a fully automated Real-Time Lecturer-Tracking module (RTLT) and the seamless integration into a Matter horn-based Lecture Capturing System (LCS). The main purpose of the RTLT module is obtaining a lecturer's portrait image for creating an integrated slides lecturer single-stream ready to distribute and consume in portable devices, where displayed contents must be optimized. The module robustly tracks any number of presenters in real-time using a set of visual cues and delivers frame-rate metadata to plug into a Virtual Cinematographer module. The so-called Gal tracker RTLT module allows broadcasting live in conjunction with the LCS, Gal caster, or processing off-line as a video-production engine inserted into the Matter horn workflow.
本文介绍了一个完全自动化的实时讲师跟踪模块(RTLT),并将其无缝集成到一个基于物质角的讲座捕捉系统(LCS)中。RTLT模块的主要目的是获取讲师的肖像图像,用于创建集成幻灯片讲师单流,准备在便携式设备中分发和消费,其中显示的内容必须进行优化。该模块使用一组视觉线索实时跟踪任意数量的演示者,并提供帧率元数据,以插入虚拟电影摄影师模块。所谓的Gal跟踪器RTLT模块允许与LCS, Gal cast一起直播,或作为插入Matter horn工作流程的视频制作引擎离线处理。
{"title":"GaliTracker: Real-Time Lecturer-Tracking for Lecture Capturing","authors":"Elisardo González-Agulla, J. Alba-Castro, Hector Canto, Vicente Goyanes","doi":"10.1109/ISM.2013.89","DOIUrl":"https://doi.org/10.1109/ISM.2013.89","url":null,"abstract":"This paper describes a fully automated Real-Time Lecturer-Tracking module (RTLT) and the seamless integration into a Matter horn-based Lecture Capturing System (LCS). The main purpose of the RTLT module is obtaining a lecturer's portrait image for creating an integrated slides lecturer single-stream ready to distribute and consume in portable devices, where displayed contents must be optimized. The module robustly tracks any number of presenters in real-time using a set of visual cues and delivers frame-rate metadata to plug into a Virtual Cinematographer module. The so-called Gal tracker RTLT module allows broadcasting live in conjunction with the LCS, Gal caster, or processing off-line as a video-production engine inserted into the Matter horn workflow.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"128 1","pages":"462-467"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88706063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The term "shitsukan" refers to the perception of materials and surface qualities of natural and manmade objects. The main goal of our study is to extend the current content-based image retrieval (CBIR) systems to develop a shitsukan-based image retrieval (SBIR) system. This paper focuses on Japanese onomatopoeias as a feature for SBIR and verifies their suitability based on psychophysical experiments. In this study, we conducted two different experiments. In the first experiment, subjects assigned suitable onomatopoeias to 50 test images. In the second experiment, they selected perceptually similar images for each test image from the other 49 test images. Then, we investigated the relationship between the assigned onomatopoeias and the selected similar images. The results indicate that perceptually similar images were assigned to the same onomatopoeias with correlation, and that onomatopoeias were effective as a feature for SBIR.
{"title":"Investigation of Japanese Onomatopoeias as Features for SHITSUKAN-Based Image Retrieval","authors":"Yuxing Wu, K. Hirai, T. Horiuchi","doi":"10.1109/ISM.2013.75","DOIUrl":"https://doi.org/10.1109/ISM.2013.75","url":null,"abstract":"The term \"shitsukan\" refers to the perception of materials and surface qualities of natural and manmade objects. The main goal of our study is to extend the current content-based image retrieval (CBIR) systems to develop a shitsukan-based image retrieval (SBIR) system. This paper focuses on Japanese onomatopoeias as a feature for SBIR and verifies their suitability based on psychophysical experiments. In this study, we conducted two different experiments. In the first experiment, subjects assigned suitable onomatopoeias to 50 test images. In the second experiment, they selected perceptually similar images for each test image from the other 49 test images. Then, we investigated the relationship between the assigned onomatopoeias and the selected similar images. The results indicate that perceptually similar images were assigned to the same onomatopoeias with correlation, and that onomatopoeias were effective as a feature for SBIR.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"50 1","pages":"399-400"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73682368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we address the challenges of applying three-dimensional virtual worlds for learning. Despite the numerous positive conclusions, this technology is far from becoming mainstream in education. The most common problems with applying it in everyday teaching and learning are steep learning curve and demand for computational and network resources. In order to address these problems, we developed a stream processors texture generation model for displaying educational content in 3D virtual worlds. The model suggests conducting image-processing tasks on stream processors in order to reduce the load on CPU. It allows designing convenient and sophisticated tools for collaborative work with graphics inside a 3D environment. Such tools simplify the use of a 3D virtual environment, and therefore, improve the negative learning curve effect. We present the methods of generating images based on the suggested model, the design and implementation of a set of tools for collaborative work with 2D graphical content in vAcademia virtual world. In addition, we provide the evaluation of the suggested model based on a series of tests which we applied to the whole system and specific algorithms. We also present the initial result of user evaluation.
{"title":"Stream Processors Texture Generation Model for 3D Virtual Worlds: Learning Tools in vAcademia","authors":"A. Smorkalov, Mikhail Fominykh, M. Morozov","doi":"10.1109/ISM.2013.13","DOIUrl":"https://doi.org/10.1109/ISM.2013.13","url":null,"abstract":"In this paper, we address the challenges of applying three-dimensional virtual worlds for learning. Despite the numerous positive conclusions, this technology is far from becoming mainstream in education. The most common problems with applying it in everyday teaching and learning are steep learning curve and demand for computational and network resources. In order to address these problems, we developed a stream processors texture generation model for displaying educational content in 3D virtual worlds. The model suggests conducting image-processing tasks on stream processors in order to reduce the load on CPU. It allows designing convenient and sophisticated tools for collaborative work with graphics inside a 3D environment. Such tools simplify the use of a 3D virtual environment, and therefore, improve the negative learning curve effect. We present the methods of generating images based on the suggested model, the design and implementation of a set of tools for collaborative work with 2D graphical content in vAcademia virtual world. In addition, we provide the evaluation of the suggested model based on a series of tests which we applied to the whole system and specific algorithms. We also present the initial result of user evaluation.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"94 1","pages":"17-24"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80335124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael J. Henry, Shawn D. Hampton, A. Endert, Ian Roberts, D. Payne
Faceted browsing is a common technique for exploring collections where the data can be grouped into a number of pre-defined categories, most often generated from textual metadata. Historically, faceted browsing has been applied to a single data type such as text or image data. However, typical collections contain multiple data types, such as information from web pages that contain text, images, and video. Additionally, when browsing a collection of images and video, facets are often created based on the metadata which may be incomplete, inaccurate, or missing altogether instead of the actual visual content contained within those images and video. In this work we address these limitations by presenting MultiFacet, a faceted browsing interface that supports multiple data types. MultiFacet constructs facets for images and video in a collection from the visual content using computer vision techniques. These visual facets can then be browsed in conjunction with text facets within a single interface to reveal relationships and phenomena within multimedia collections. Additionally, we present a use case based on real-world data, demonstrating the utility of this approach towards browsing a large multimedia data collection.
{"title":"MultiFacet: A Faceted Interface for Browsing Large Multimedia Collections","authors":"Michael J. Henry, Shawn D. Hampton, A. Endert, Ian Roberts, D. Payne","doi":"10.1109/ISM.2013.66","DOIUrl":"https://doi.org/10.1109/ISM.2013.66","url":null,"abstract":"Faceted browsing is a common technique for exploring collections where the data can be grouped into a number of pre-defined categories, most often generated from textual metadata. Historically, faceted browsing has been applied to a single data type such as text or image data. However, typical collections contain multiple data types, such as information from web pages that contain text, images, and video. Additionally, when browsing a collection of images and video, facets are often created based on the metadata which may be incomplete, inaccurate, or missing altogether instead of the actual visual content contained within those images and video. In this work we address these limitations by presenting MultiFacet, a faceted browsing interface that supports multiple data types. MultiFacet constructs facets for images and video in a collection from the visual content using computer vision techniques. These visual facets can then be browsed in conjunction with text facets within a single interface to reveal relationships and phenomena within multimedia collections. Additionally, we present a use case based on real-world data, demonstrating the utility of this approach towards browsing a large multimedia data collection.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"34 1","pages":"347-350"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81190132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The usage and number of available video conferencing (VC) applications are rising as the high-bandwidth, low latency networks on which they depend become increasingly prevalent. Since VC applications support real-time human interaction, problems with performance that impair interactivity are social issues. Currently, performance measurements cannot easily be obtained due to the proprietary nature of VC applications, however, such measurements would be useful because they enable researchers to easily evaluate the performance impact of architectural and design decisions, quantitatively compare VC applications, and determine service level agreement (SLA) compliance. In this paper, we present a tool called Av Cloak that is capable of measuring several key performance metrics in proprietary VC applications: mouth-to-ear latency and jitter, capture-to-display latency and jitter, and audio-visual synchronization skew. AvCloak takes these measurements by wrapping ("cloaking") the VC application's audio/video inputs/outputs and transmitting timestamp data through them. At the sender side, AvCloak synthesizes media data encoding timestamps and feeds them to the VC application's media inputs, while at the receiver side, AvCloak decodes timestamps from the VC application's media outputs. Since AvCloak interacts with the target VC application only through its media inputs and outputs, it treats the target application as a black box and is thus applicable to arbitrary VC applications. We provide extensive analyses to measure AvCloak's overhead and show how to improve accuracy in measurements using two popular VC applications: Skype and Google+ Hangouts.
{"title":"AvCloak: A Tool for Black Box Latency Measurements in Video Conferencing Applications","authors":"Andrew Kryczka, A. Arefin, K. Nahrstedt","doi":"10.1109/ISM.2013.52","DOIUrl":"https://doi.org/10.1109/ISM.2013.52","url":null,"abstract":"The usage and number of available video conferencing (VC) applications are rising as the high-bandwidth, low latency networks on which they depend become increasingly prevalent. Since VC applications support real-time human interaction, problems with performance that impair interactivity are social issues. Currently, performance measurements cannot easily be obtained due to the proprietary nature of VC applications, however, such measurements would be useful because they enable researchers to easily evaluate the performance impact of architectural and design decisions, quantitatively compare VC applications, and determine service level agreement (SLA) compliance. In this paper, we present a tool called Av Cloak that is capable of measuring several key performance metrics in proprietary VC applications: mouth-to-ear latency and jitter, capture-to-display latency and jitter, and audio-visual synchronization skew. AvCloak takes these measurements by wrapping (\"cloaking\") the VC application's audio/video inputs/outputs and transmitting timestamp data through them. At the sender side, AvCloak synthesizes media data encoding timestamps and feeds them to the VC application's media inputs, while at the receiver side, AvCloak decodes timestamps from the VC application's media outputs. Since AvCloak interacts with the target VC application only through its media inputs and outputs, it treats the target application as a black box and is thus applicable to arbitrary VC applications. We provide extensive analyses to measure AvCloak's overhead and show how to improve accuracy in measurements using two popular VC applications: Skype and Google+ Hangouts.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"69 1","pages":"271-278"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81483165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Alshamrani, H. Cruickshank, Zhili Sun, Vahid Fami, B. Elmasri, Emad Danish
The unstable nature of MANETs over different types of wireless topologies and mobility models affects the Quality of Service (QoS) for real time applications such as Voice over IP (VoIP). One of the most efficient signaling systems for VoIP applications is the Session Initiation Protocol (SIP) which is mainly used to initiate, manage, and terminate VoIP calls over different types of IP based network systems. As a part of upgrading to Next Generation Network, MANETs will be considering IPv6 for different types of applications and devices. Therefore, SIP signaling over IPv6 MANETs needs to be investigated with different QoS performance metrics such as bandwidth, packet loss, delay and jitter. In this paper, an evaluation of SIP signaling is conducted for SIP based VoIP calls using GSM voice codec system over MANETs with Static, Uniform, and Random mobility models. This evaluation considered AODV as a reactive routing protocol and OLSR as a proactive routing protocol over both IPv4 as well as IPv6. The evaluation study of SIP signaling examined call setup time, number of active calls, number of rejected calls and calls duration. The results of this study show that, in general, IPv4 has better performance over different types of mobility models, while IPv6 upholds longer delays and poor performance over Random mobility models.
{"title":"Signaling Performance for SIP over IPv6 Mobile Ad-Hoc Network (MANET)","authors":"M. Alshamrani, H. Cruickshank, Zhili Sun, Vahid Fami, B. Elmasri, Emad Danish","doi":"10.1109/ISM.2013.44","DOIUrl":"https://doi.org/10.1109/ISM.2013.44","url":null,"abstract":"The unstable nature of MANETs over different types of wireless topologies and mobility models affects the Quality of Service (QoS) for real time applications such as Voice over IP (VoIP). One of the most efficient signaling systems for VoIP applications is the Session Initiation Protocol (SIP) which is mainly used to initiate, manage, and terminate VoIP calls over different types of IP based network systems. As a part of upgrading to Next Generation Network, MANETs will be considering IPv6 for different types of applications and devices. Therefore, SIP signaling over IPv6 MANETs needs to be investigated with different QoS performance metrics such as bandwidth, packet loss, delay and jitter. In this paper, an evaluation of SIP signaling is conducted for SIP based VoIP calls using GSM voice codec system over MANETs with Static, Uniform, and Random mobility models. This evaluation considered AODV as a reactive routing protocol and OLSR as a proactive routing protocol over both IPv4 as well as IPv6. The evaluation study of SIP signaling examined call setup time, number of active calls, number of rejected calls and calls duration. The results of this study show that, in general, IPv4 has better performance over different types of mobility models, while IPv6 upholds longer delays and poor performance over Random mobility models.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"17 1","pages":"231-236"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90225919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a histogram based real-time object tracking system using distributed smart cameras. Each such smart camera module consists of a camera and an embedded device that is capable of performing the task of object tracking entirely by itself. The module recognizes and tracks the object in real time. The processed video stream containing the marked object is then transmitted to a central server for display. The embedded device runs a novel block division based CAMShift algorithm proposed in this paper. We show that this technique reduces the number of computations required and hence is more suitable for embedded platforms. The solution is implemented using a central server and multiple camera modules with non-overlapping fields of view in indoor settings. We validate the improvement in the performance by comparing the experimental results with existing solutions.
{"title":"Block Division Based CAMShift Algorithm for Real-Time Object Tracking Using Distributed Smart Cameras","authors":"Manjunath Kulkarni, Paras Wadekar, Haresh Dagale","doi":"10.1109/ISM.2013.56","DOIUrl":"https://doi.org/10.1109/ISM.2013.56","url":null,"abstract":"In this paper, we present a histogram based real-time object tracking system using distributed smart cameras. Each such smart camera module consists of a camera and an embedded device that is capable of performing the task of object tracking entirely by itself. The module recognizes and tracks the object in real time. The processed video stream containing the marked object is then transmitted to a central server for display. The embedded device runs a novel block division based CAMShift algorithm proposed in this paper. We show that this technique reduces the number of computations required and hence is more suitable for embedded platforms. The solution is implemented using a central server and multiple camera modules with non-overlapping fields of view in indoor settings. We validate the improvement in the performance by comparing the experimental results with existing solutions.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"6 1","pages":"292-296"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80080498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a fast similarity retrieval method for vector images. To reduce the computational cost of similarity matching, the proposed method uses pre-calculation results of similarity matching, which are obtained in advance by matching DB images with previously selected images called representative queries. At runtime the proposed method just matches the actual query (the user-inputted query) and the representative queries. Comparing the similarities with the precalculated similarities, the proposed method quickly estimates the actual similarities of DB images to the actual query. Experimental results have shown that the retrieval time is greatly reduced by the proposed method without much deterioration of retrieval accuracy.
{"title":"Fast Similarity Retrieval of Vector Images Using Representative Queries","authors":"Takahiro Hayashi, A. Sato","doi":"10.1109/ISM.2013.95","DOIUrl":"https://doi.org/10.1109/ISM.2013.95","url":null,"abstract":"This paper presents a fast similarity retrieval method for vector images. To reduce the computational cost of similarity matching, the proposed method uses pre-calculation results of similarity matching, which are obtained in advance by matching DB images with previously selected images called representative queries. At runtime the proposed method just matches the actual query (the user-inputted query) and the representative queries. Comparing the similarities with the precalculated similarities, the proposed method quickly estimates the actual similarities of DB images to the actual query. Experimental results have shown that the retrieval time is greatly reduced by the proposed method without much deterioration of retrieval accuracy.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"119 1","pages":"498-499"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77484254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Smailagic, D. Siewiorek, A. Rudnicky, Sandeep Nallan Chakravarthula, Anshuman Kar, Nivedita Jagdale, Saksham Gautam, Rohit Vijayaraghavan, S. Jagtap
The paper presents an audio-based emotion recognition system that is able to classify emotions as anger, fear, happy, neutral, sadness or disgust in real time. We use the virtual coach as an application example of how emotion recognition can be used to modulate intelligent systems' behavior. A novel minimum-error feature removal mechanism to reduce bandwidth and increase accuracy of our emotion recognition system has been introduced. A two-stage hierarchical classification approach along with a One-Against-All (OAA) framework are used. We obtained an average accuracy of 82.07% using the OAA approach, and 87.70% with a two-stage hierarchical approach, by pruning the feature set and using Support Vector Machines (SVMs) for classification.
{"title":"Emotion Recognition Modulating the Behavior of Intelligent Systems","authors":"A. Smailagic, D. Siewiorek, A. Rudnicky, Sandeep Nallan Chakravarthula, Anshuman Kar, Nivedita Jagdale, Saksham Gautam, Rohit Vijayaraghavan, S. Jagtap","doi":"10.1109/ISM.2013.72","DOIUrl":"https://doi.org/10.1109/ISM.2013.72","url":null,"abstract":"The paper presents an audio-based emotion recognition system that is able to classify emotions as anger, fear, happy, neutral, sadness or disgust in real time. We use the virtual coach as an application example of how emotion recognition can be used to modulate intelligent systems' behavior. A novel minimum-error feature removal mechanism to reduce bandwidth and increase accuracy of our emotion recognition system has been introduced. A two-stage hierarchical classification approach along with a One-Against-All (OAA) framework are used. We obtained an average accuracy of 82.07% using the OAA approach, and 87.70% with a two-stage hierarchical approach, by pruning the feature set and using Support Vector Machines (SVMs) for classification.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"8 1","pages":"378-383"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88586954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}