P. Pouladzadeh, G. Villalobos, R. Almaghrabi, S. Shirmohammadi
Emerging food classification methods play an important role in nowadays food recognition applications. For this purpose, a new recognition algorithm for food is presented, considering its shape, color, size, and texture characteristics. Using various combinations of these features, a better classification will be achieved. Based on our simulation results, the proposed algorithm recognizes food categories with an approval recognition rate of 92.6%, in average.
{"title":"A Novel SVM Based Food Recognition Method for Calorie Measurement Applications","authors":"P. Pouladzadeh, G. Villalobos, R. Almaghrabi, S. Shirmohammadi","doi":"10.1109/ICMEW.2012.92","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.92","url":null,"abstract":"Emerging food classification methods play an important role in nowadays food recognition applications. For this purpose, a new recognition algorithm for food is presented, considering its shape, color, size, and texture characteristics. Using various combinations of these features, a better classification will be achieved. Based on our simulation results, the proposed algorithm recognizes food categories with an approval recognition rate of 92.6%, in average.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122983860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper extends a proposed new approach to deal with smart objects or smart mobile devices by proposing a middleware framework inspired by RNA mechanisms in molecular biology. This framework represents complex application scenarios of proximity-based federation of smart objects as catalytic reaction networks. Each catalytic reaction is modeled as an RNA expression from a DNA. We introduce smart object subtype polymorphism and port subtype polymorphism in this framework to simplify the description. We also add a new condition for the description of the rules. This new condition allows us to describe physical user interactions with smart objects. This approach is used to describe new rules allowing the user to build the context improvisation ally for a reaction by physically interacting with simple mobile smart objects.
{"title":"Improvisational Construction of a Context for Dynamic Implementation of Arbitrary Smart Object Federation Scenarios","authors":"Jérémie Julia, Yuzuru Tanaka","doi":"10.1109/ICMEW.2012.45","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.45","url":null,"abstract":"This paper extends a proposed new approach to deal with smart objects or smart mobile devices by proposing a middleware framework inspired by RNA mechanisms in molecular biology. This framework represents complex application scenarios of proximity-based federation of smart objects as catalytic reaction networks. Each catalytic reaction is modeled as an RNA expression from a DNA. We introduce smart object subtype polymorphism and port subtype polymorphism in this framework to simplify the description. We also add a new condition for the description of the rules. This new condition allows us to describe physical user interactions with smart objects. This approach is used to describe new rules allowing the user to build the context improvisation ally for a reaction by physically interacting with simple mobile smart objects.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115704635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an overview of state-of-the-art technologies for perceptual processing of digital pictures, as well as a discussion of the issues related to their implementation, optimization and testing. The paper begins with a brief description of the main computational modules that are used as part of a perceptual-based visual signal processing framework. Then, a number of perceptual-based visual processing techniques and applications to which perceptual models are presented, including image/video compression, visual signal quality evaluation, and computer graphics. The most significant research efforts are highlighted for each topic, and a number of issues and views are put forward regarding the related research and opportunities.
{"title":"An Overview of Perceptual Processing for Digital Pictures","authors":"H. Wu, Weisi Lin, Lina Karam","doi":"10.1109/ICMEW.2012.27","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.27","url":null,"abstract":"This paper presents an overview of state-of-the-art technologies for perceptual processing of digital pictures, as well as a discussion of the issues related to their implementation, optimization and testing. The paper begins with a brief description of the main computational modules that are used as part of a perceptual-based visual signal processing framework. Then, a number of perceptual-based visual processing techniques and applications to which perceptual models are presented, including image/video compression, visual signal quality evaluation, and computer graphics. The most significant research efforts are highlighted for each topic, and a number of issues and views are put forward regarding the related research and opportunities.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131706504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper specifically targets violin learners who are working on their pitch accuracy. We employ a pitch tracking algorithm to extract the pitch played. Through volume thresholding and region detection, only parts of frames are processed. So our system can provide real-time feedback to show violin learners whether they played the right pitch. The system also provides the major scales and arpeggio scores as teaching materials, and violin learners can choose different tempos to practice, depending on their level. The user-friendly system interface allows violin learners to easily perceive the pitch differential between the pitch of the target note and the pitch played, allowing users to precisely adjust their playing. The statistical feedback records progress and analyzes error patterns, enabling violin teachers to evaluate student progress precisely, and correct common error patterns effectively.
{"title":"Real-Time Pitch Training System for Violin Learners","authors":"Jian-Heng Wang, Siang-An Wang, Wen-Chieh Chen, Ken-Ning Chang, Herng-Yow Chen","doi":"10.1109/ICMEW.2012.35","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.35","url":null,"abstract":"This paper specifically targets violin learners who are working on their pitch accuracy. We employ a pitch tracking algorithm to extract the pitch played. Through volume thresholding and region detection, only parts of frames are processed. So our system can provide real-time feedback to show violin learners whether they played the right pitch. The system also provides the major scales and arpeggio scores as teaching materials, and violin learners can choose different tempos to practice, depending on their level. The user-friendly system interface allows violin learners to easily perceive the pitch differential between the pitch of the target note and the pitch played, allowing users to precisely adjust their playing. The statistical feedback records progress and analyzes error patterns, enabling violin teachers to evaluate student progress precisely, and correct common error patterns effectively.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125355159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Müller, Martin Smole, Klaus Schöffmann
This paper demonstrates a novel 3D Video Browser (3VB) that enables interactive search within a single video as well as video collections by utilizing 3D projection and an intuitive interaction. The browsing approach is based on hierarchical search, which means that the user can split a video into several segments. The 3VB disposes a convenient interface that allows flexible arrangement of video segments in the 3D space. It allows for concurrent playback of video segments and flexible inspection of these segments at any desired level of detail through convenient user interaction.
{"title":"A Demonstration of a Hierarchical Multi-Layout 3D Video Browser","authors":"Christopher Müller, Martin Smole, Klaus Schöffmann","doi":"10.1109/ICMEW.2012.121","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.121","url":null,"abstract":"This paper demonstrates a novel 3D Video Browser (3VB) that enables interactive search within a single video as well as video collections by utilizing 3D projection and an intuitive interaction. The browsing approach is based on hierarchical search, which means that the user can split a video into several segments. The 3VB disposes a convenient interface that allows flexible arrangement of video segments in the 3D space. It allows for concurrent playback of video segments and flexible inspection of these segments at any desired level of detail through convenient user interaction.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124756678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pratik Shah, A. Faza, Raghavendra Nimmala, S. Grant, W. Chapin, Robert Montgomery
The Immersive Audio Environment (IAE) was designed to provide an effective military training facility. It's efficacy at synthesizing sounds from desired directions and also the ability to synthesize moving sounds has been previously reported. This paper discusses the addition of a tracking system to evaluate subject training performance. Numerous tracking systems have been developed for tracking in immersive environments. Some examples include using head mounted web cams, visible light cameras mounted on the support structure, or even single camera tracking as in commercially available entertainment. Our system uses a combination of an existing infrared tracking system and a specially designed system of inertial tracking. This paper presents tests and results to evaluate the accuracy of the tracking system with respect to our application and verifies the efficacy of using the IAE for training enhancement.
{"title":"Infrared and Intertial Tracking in the Immersive Audio Environment for Enhanced Military Training","authors":"Pratik Shah, A. Faza, Raghavendra Nimmala, S. Grant, W. Chapin, Robert Montgomery","doi":"10.1109/ICMEW.2012.38","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.38","url":null,"abstract":"The Immersive Audio Environment (IAE) was designed to provide an effective military training facility. It's efficacy at synthesizing sounds from desired directions and also the ability to synthesize moving sounds has been previously reported. This paper discusses the addition of a tracking system to evaluate subject training performance. Numerous tracking systems have been developed for tracking in immersive environments. Some examples include using head mounted web cams, visible light cameras mounted on the support structure, or even single camera tracking as in commercially available entertainment. Our system uses a combination of an existing infrared tracking system and a specially designed system of inertial tracking. This paper presents tests and results to evaluate the accuracy of the tracking system with respect to our application and verifies the efficacy of using the IAE for training enhancement.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128319227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A photo stream is a chronological sequence of photos. Most existing photo stream segmentation methods assume that a photo stream comprises of photos from multiple events and their goal is to produce groups of photos, each corresponding to an event, i.e. they perform automatic albuming. Even if these photos are grouped by event, sifting through the abundance of photos in each event is cumbersome. To help make photos of each event more manageable, we propose a photo stream segmentation method for an event photo stream - the chronological sequence of photos of a single event - to produce groups of photos, each corresponding to a photo-worthy moment in the event. Our method is based on a hidden Markov model with parameters learned from time, EXIF metadata, and visual information from 1) training data of unlabelled, unsegmented event photo streams and 2) the event photo stream we want to segment. In an experiment with over 5000 photos from 28 personal photo sets, our method outperformed all six baselines with statistical significance (p <; 0.10 with the best baseline and p <; 0.005 with the others).
{"title":"Hidden Markov Model for Event Photo Stream Segmentation","authors":"Jesse Prabawa Gozali, Min-Yen Kan, H. Sundaram","doi":"10.1109/ICMEW.2012.12","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.12","url":null,"abstract":"A photo stream is a chronological sequence of photos. Most existing photo stream segmentation methods assume that a photo stream comprises of photos from multiple events and their goal is to produce groups of photos, each corresponding to an event, i.e. they perform automatic albuming. Even if these photos are grouped by event, sifting through the abundance of photos in each event is cumbersome. To help make photos of each event more manageable, we propose a photo stream segmentation method for an event photo stream - the chronological sequence of photos of a single event - to produce groups of photos, each corresponding to a photo-worthy moment in the event. Our method is based on a hidden Markov model with parameters learned from time, EXIF metadata, and visual information from 1) training data of unlabelled, unsegmented event photo streams and 2) the event photo stream we want to segment. In an experiment with over 5000 photos from 28 personal photo sets, our method outperformed all six baselines with statistical significance (p <; 0.10 with the best baseline and p <; 0.005 with the others).","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121194676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motion segmentation has been a well explored research topic due to its vast application area. This work proposes a real-time motion segmentation method based on 3D histogram and temporal mode selection. The temporal distribution of a video sequence consists of the motion in the foreground and the relatively immobile background. A 3D histogram provides a short-term memory of the aforementioned distribution. The temporal mode selection process involves identifying the most frequent values in the distribution and construct the background thereafter. This work provides a detailed analysis of the proposed method along with an easy-to-implement algorithm. A number of experimental results and comparisons with some of the leading algorithms are provided to show that the proposed method can provide real-time, robust and highly accurate results.
{"title":"Motion Segmentation Based on 3D Histogram and Temporal Mode Selection","authors":"D. Mukherjee, Q. M. J. Wu","doi":"10.1109/ICMEW.2012.90","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.90","url":null,"abstract":"Motion segmentation has been a well explored research topic due to its vast application area. This work proposes a real-time motion segmentation method based on 3D histogram and temporal mode selection. The temporal distribution of a video sequence consists of the motion in the foreground and the relatively immobile background. A 3D histogram provides a short-term memory of the aforementioned distribution. The temporal mode selection process involves identifying the most frequent values in the distribution and construct the background thereafter. This work provides a detailed analysis of the proposed method along with an easy-to-implement algorithm. A number of experimental results and comparisons with some of the leading algorithms are provided to show that the proposed method can provide real-time, robust and highly accurate results.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116538628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunyu Lin, J. D. Cock, Jürgen Slowack, P. Lambert, R. Walle
In this paper, we propose to extract depth information from a monocular video sequence. When estimating the depth of the current frame, the bidirectional energy minimization in our scheme considers both the previous frame and next frame, which promises a much more robust depth map and reduces the problems associated with occlusion to a certain extent. After getting an initial depth map from bidirectional energy minimization, we further refine the depth map using segmentation by assuming similar depth values in one segmented region. Different from other segmentation algorithms, we use initial depth information together with the original color image to get more reliable segmented regions. Finally, detecting the sky region using a dark channel prior is employed to correct some possibly wrong depth values for outdoor video. The experimental results are much more accurate compared with the state-of-the-art algorithms.
{"title":"Depth Extraction from Monocular Video Using Bidirectional Energy Minimization and Initial Depth Segmentation","authors":"Chunyu Lin, J. D. Cock, Jürgen Slowack, P. Lambert, R. Walle","doi":"10.1109/ICMEW.2012.94","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.94","url":null,"abstract":"In this paper, we propose to extract depth information from a monocular video sequence. When estimating the depth of the current frame, the bidirectional energy minimization in our scheme considers both the previous frame and next frame, which promises a much more robust depth map and reduces the problems associated with occlusion to a certain extent. After getting an initial depth map from bidirectional energy minimization, we further refine the depth map using segmentation by assuming similar depth values in one segmented region. Different from other segmentation algorithms, we use initial depth information together with the original color image to get more reliable segmented regions. Finally, detecting the sky region using a dark channel prior is employed to correct some possibly wrong depth values for outdoor video. The experimental results are much more accurate compared with the state-of-the-art algorithms.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115818953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper details the technologies displayed by the authors at ICME 2012. The technologies are those of Variable-Ambisonics and Variable-Polar Pattern Reproduction. These technologies are demonstrated using virtual 2-dimensional speaker layouts via binaural headphone reproduction. The technologies offer benefits over standard pair-wise panning of surround sound by offering on-the-fly changing of speaker count, source width control and per sound source rendering. The technologies also overcome problems of Ambisonics and Ambisonic based reproduction methods by allowing each sound source to be individually rendered by decoder type, of an arbitrary rather than fixed order and the mixing of various orders within the same reproduction system.
{"title":"Surround Sound Using Variable-Ambisonics and Variable-Polar Pattern Theories","authors":"M. J. Morrell, J. Reiss, Sonia Wilkie","doi":"10.1109/ICMEW.2012.124","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.124","url":null,"abstract":"This paper details the technologies displayed by the authors at ICME 2012. The technologies are those of Variable-Ambisonics and Variable-Polar Pattern Reproduction. These technologies are demonstrated using virtual 2-dimensional speaker layouts via binaural headphone reproduction. The technologies offer benefits over standard pair-wise panning of surround sound by offering on-the-fly changing of speaker count, source width control and per sound source rendering. The technologies also overcome problems of Ambisonics and Ambisonic based reproduction methods by allowing each sound source to be individually rendered by decoder type, of an arbitrary rather than fixed order and the mixing of various orders within the same reproduction system.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125163223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}