Yoshihiro Sejima, Tomio Watanabe, M. Jindai, Atsushi Osa
In our previous research, we proposed an eyeball movement model that consists of a saccade model and a group gaze model for enhancing group interaction and communication. In this study, in order to evaluate the effects of the proposed model, we develop an advanced communication system in which the proposed model is used with a speech-driven embodied group entrained communication system. The effectiveness of the proposed model is demonstrated for performing the communication experiments with a sensory evaluation using the developed system.
{"title":"Eyeball Movement Model for Lecturer Character in Speech-Driven Embodied Group Entrainment System","authors":"Yoshihiro Sejima, Tomio Watanabe, M. Jindai, Atsushi Osa","doi":"10.1109/ISM.2013.99","DOIUrl":"https://doi.org/10.1109/ISM.2013.99","url":null,"abstract":"In our previous research, we proposed an eyeball movement model that consists of a saccade model and a group gaze model for enhancing group interaction and communication. In this study, in order to evaluate the effects of the proposed model, we develop an advanced communication system in which the proposed model is used with a speech-driven embodied group entrained communication system. The effectiveness of the proposed model is demonstrated for performing the communication experiments with a sensory evaluation using the developed system.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"39 1","pages":"506-507"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77059376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we address the problem of adapting video files to meet terminal file size and resolution constraints while maximizing visual quality. First, two new quality estimation models are proposed, which predict quality as function of resolution, quantization step size, and frame rate parameters. The first model is generic and the second takes video motion into account. Then, we propose a video file size estimation model. Simulation results show a Pearson correlation coefficient (PCC) of 0.956 between the mean opinion score and our generic quality model (0.959 for the motion-conscious model). We obtain a PCC of 0.98 between actual and estimated file sizes. Using these models, we estimate the combination of parameters that yields the best video quality while meeting the target terminal's constraints. We obtain an average quality difference of 4.39% (generic model) and of 3.22% (motion-conscious model) when compared with the best theoretical transcoding possible. The proposed models can be applied to video transcoding for the Multimedia Messaging Service and for video on demand services such as YouTube and Netflix.
{"title":"Visual Quality and File Size Prediction of H.264 Videos and Its Application to Video Transcoding for the Multimedia Messaging Service and Video on Demand","authors":"Didier Joset, S. Coulombe","doi":"10.1109/ISM.2013.62","DOIUrl":"https://doi.org/10.1109/ISM.2013.62","url":null,"abstract":"In this paper, we address the problem of adapting video files to meet terminal file size and resolution constraints while maximizing visual quality. First, two new quality estimation models are proposed, which predict quality as function of resolution, quantization step size, and frame rate parameters. The first model is generic and the second takes video motion into account. Then, we propose a video file size estimation model. Simulation results show a Pearson correlation coefficient (PCC) of 0.956 between the mean opinion score and our generic quality model (0.959 for the motion-conscious model). We obtain a PCC of 0.98 between actual and estimated file sizes. Using these models, we estimate the combination of parameters that yields the best video quality while meeting the target terminal's constraints. We obtain an average quality difference of 4.39% (generic model) and of 3.22% (motion-conscious model) when compared with the best theoretical transcoding possible. The proposed models can be applied to video transcoding for the Multimedia Messaging Service and for video on demand services such as YouTube and Netflix.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"12 1","pages":"321-328"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84013046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco A. Hudelist, Klaus Schöffmann, David Ahlström
Smartphones and tablets are popular devices. As lightweight, compact devices with built-in high-quality cameras, they are ideal to carry around and to use for snapshot photography. As the number of photos accumulate on the device quickly finding a particular photo can be tedious using the default grid-based photo browser installed on the device. In this paper we investigate user performance in a photo browsing task on an iPad and an iPod Touch device. We present results from two user experiments comparing the standard grid interface to a pan-and-zoom able grid, a 3D-globe and a 3D-ring. In particular we are interested in how the interfaces perform with large photo collections (100 to 400 photos). The results show most promise for the pan-and-zoom grid and that the performance with the standard grid interface quickly deteriorates with large collections.
{"title":"Evaluation of Image Browsing Interfaces for Smartphones and Tablets","authors":"Marco A. Hudelist, Klaus Schöffmann, David Ahlström","doi":"10.1109/ISM.2013.11","DOIUrl":"https://doi.org/10.1109/ISM.2013.11","url":null,"abstract":"Smartphones and tablets are popular devices. As lightweight, compact devices with built-in high-quality cameras, they are ideal to carry around and to use for snapshot photography. As the number of photos accumulate on the device quickly finding a particular photo can be tedious using the default grid-based photo browser installed on the device. In this paper we investigate user performance in a photo browsing task on an iPad and an iPod Touch device. We present results from two user experiments comparing the standard grid interface to a pan-and-zoom able grid, a 3D-globe and a 3D-ring. In particular we are interested in how the interfaces perform with large photo collections (100 to 400 photos). The results show most promise for the pan-and-zoom grid and that the performance with the standard grid interface quickly deteriorates with large collections.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"82 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78325220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
André Klassen, Marcus Eibrink-Lunzenauer, Till Gloggler
Mobile learning has gained significant importance in the field of e-learning and higher education during the last years. Especially in student self-organization of learning, it is important that previous use-cases can be transferred and enhanced to mobile platforms. In the first part of the paper a requirements analysis for an adjusted mobile version of the learn management system (LMS) Stud.IP is presented. The analysis consists of an evaluation of existing approaches, focus group sessions and student surveys to obtain insights on the subject on the one hand and on the other to get specific requirements and usage scenarios of students. The latter part of the paper describes the implementation of an Android-based and web-based app for the LMS Stud.IP.
{"title":"Requirements for Mobile Learning Applications in Higher Education","authors":"André Klassen, Marcus Eibrink-Lunzenauer, Till Gloggler","doi":"10.1109/ISM.2013.94","DOIUrl":"https://doi.org/10.1109/ISM.2013.94","url":null,"abstract":"Mobile learning has gained significant importance in the field of e-learning and higher education during the last years. Especially in student self-organization of learning, it is important that previous use-cases can be transferred and enhanced to mobile platforms. In the first part of the paper a requirements analysis for an adjusted mobile version of the learn management system (LMS) Stud.IP is presented. The analysis consists of an evaluation of existing approaches, focus group sessions and student surveys to obtain insights on the subject on the one hand and on the other to get specific requirements and usage scenarios of students. The latter part of the paper describes the implementation of an Android-based and web-based app for the LMS Stud.IP.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"3 1","pages":"492-497"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88965301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This demo paper presents a system based on PostgreSQL and the AH-Tree that supports Content-Based Image Retrieval (CBIR) through similarity queries. The AH-Tree is a balanced, tree-based index structure that utilizes high-level semantic information to address the well-known problems of semantic gap and user perception subjectivity. The proposed system implements the AH-Tree inside PostgreSQL's kernel by internally modifying PostgreSQL's GiST access mechanism and thus provides a DBMS with a viable and efficient content-based multimedia retrieval functionality.
{"title":"Efficient Content-Based Multimedia Retrieval Using Novel Indexing Structure in PostgreSQL","authors":"Fausto Fleites, Shu‐Ching Chen","doi":"10.1109/ISM.2013.96","DOIUrl":"https://doi.org/10.1109/ISM.2013.96","url":null,"abstract":"This demo paper presents a system based on PostgreSQL and the AH-Tree that supports Content-Based Image Retrieval (CBIR) through similarity queries. The AH-Tree is a balanced, tree-based index structure that utilizes high-level semantic information to address the well-known problems of semantic gap and user perception subjectivity. The proposed system implements the AH-Tree inside PostgreSQL's kernel by internally modifying PostgreSQL's GiST access mechanism and thus provides a DBMS with a viable and efficient content-based multimedia retrieval functionality.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"39 1","pages":"500-501"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90536606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongdong Zhang, Lijing Gao, D. Zang, Yaoru Sun, Jiujun Cheng
Most of the traditional just-noticeable-distortion (JND) models in pixel domain compute the JND threshold by incorporating the spatial luminance adaptation effect and the textures contrast masking effect. Recently, with the rapid development of the computable models of visual attention, researchers started to improve the JND model by considering visual saliency of images, a foveated spatial JND model (FSJND) was proposed by incorporating the traditional visual characteristics and fovea characteristic of human eyes to enhance JND thresholds. However, the thresholds computed by the FSJND model may be overestimated for some high resolution images. In this paper, we proposed a new JND profile in pixel domain, in which a multi-level modulation function is built to reflect the effect of hierarchically selective visual attention on JND thresholds. The contrast masking is also considered in our modulation function to obtain more accurate JND thresholds. Compared with the lasted JND profiles, the proposed model can tolerate more distortion and has much better perceptual quality. The proposed JND model can be easily applied in many areas, such as compression, error protection, and so on.
{"title":"A JND Profile Based on Hierarchically Selective Attention for Images","authors":"Dongdong Zhang, Lijing Gao, D. Zang, Yaoru Sun, Jiujun Cheng","doi":"10.1109/ISM.2013.50","DOIUrl":"https://doi.org/10.1109/ISM.2013.50","url":null,"abstract":"Most of the traditional just-noticeable-distortion (JND) models in pixel domain compute the JND threshold by incorporating the spatial luminance adaptation effect and the textures contrast masking effect. Recently, with the rapid development of the computable models of visual attention, researchers started to improve the JND model by considering visual saliency of images, a foveated spatial JND model (FSJND) was proposed by incorporating the traditional visual characteristics and fovea characteristic of human eyes to enhance JND thresholds. However, the thresholds computed by the FSJND model may be overestimated for some high resolution images. In this paper, we proposed a new JND profile in pixel domain, in which a multi-level modulation function is built to reflect the effect of hierarchically selective visual attention on JND thresholds. The contrast masking is also considered in our modulation function to obtain more accurate JND thresholds. Compared with the lasted JND profiles, the proposed model can tolerate more distortion and has much better perceptual quality. The proposed JND model can be easily applied in many areas, such as compression, error protection, and so on.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"39 1","pages":"263-266"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90747643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haze removal is the process by which horizontal obscuration is eliminated from hazy images captured during inclement weather. Sandstorms present a particularly challenging condition, images captured during sandstorms often exhibit color-shift effects due to inadequate spectrum absorption. In this paper, we present a new type of haze removal approach which uses a combination of hybrid spectrum analysis and dark channel prior in order to repair color shifts and thereby achieve effective restoration of hazy images captured during sandstorms. The restoration results and qualitative evaluation demonstrate that our proposed approach can provide superior restoration results for images captured during sandstorms in comparison with the previous state-of-the-art approach.
{"title":"Improved Visibility of Single Hazy Images Captured in Inclement Weather Conditions","authors":"Bo-Hao Chen, Shih-Chia Huang","doi":"10.1109/ISM.2013.51","DOIUrl":"https://doi.org/10.1109/ISM.2013.51","url":null,"abstract":"Haze removal is the process by which horizontal obscuration is eliminated from hazy images captured during inclement weather. Sandstorms present a particularly challenging condition, images captured during sandstorms often exhibit color-shift effects due to inadequate spectrum absorption. In this paper, we present a new type of haze removal approach which uses a combination of hybrid spectrum analysis and dark channel prior in order to repair color shifts and thereby achieve effective restoration of hazy images captured during sandstorms. The restoration results and qualitative evaluation demonstrate that our proposed approach can provide superior restoration results for images captured during sandstorms in comparison with the previous state-of-the-art approach.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"60 1","pages":"267-270"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84788233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, it became common to record video footage of laparoscopic surgeries. This leads to large video archives that are very hard to manage. They often contain a considerable portion of completely irrelevant scenes which waste storage capacity and hamper an efficient retrieval of relevant scenes. In this paper we (1) define three classes of irrelevant segments, (2) propose visual feature extraction methods to obtain irrelevance indicators for each class and (3) present an extensible framework to detect irrelevant segments in laparoscopic videos. The framework includes a training component that learns a prediction model using nonlinear regression with a generalized logistic function and a segment composition algorithm that derives segment boundaries from the fuzzy frame classifications. The experimental results show that our method performs very good both for the classification of individual frames and the detection of segment boundaries in videos and enables considerable storage space savings.
{"title":"Relevance Segmentation of Laparoscopic Videos","authors":"Bernd Münzer, Klaus Schöffmann, L. Böszörményi","doi":"10.1109/ISM.2013.22","DOIUrl":"https://doi.org/10.1109/ISM.2013.22","url":null,"abstract":"In recent years, it became common to record video footage of laparoscopic surgeries. This leads to large video archives that are very hard to manage. They often contain a considerable portion of completely irrelevant scenes which waste storage capacity and hamper an efficient retrieval of relevant scenes. In this paper we (1) define three classes of irrelevant segments, (2) propose visual feature extraction methods to obtain irrelevance indicators for each class and (3) present an extensible framework to detect irrelevant segments in laparoscopic videos. The framework includes a training component that learns a prediction model using nonlinear regression with a generalized logistic function and a segment composition algorithm that derives segment boundaries from the fuzzy frame classifications. The experimental results show that our method performs very good both for the classification of individual frames and the detection of segment boundaries in videos and enables considerable storage space savings.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"20 1","pages":"84-91"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89580226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital video has become a very popular media in several contexts, with an ever expanding horizon of applications and uses. Thus, the amount of available video data is growing almost limitless. For this reason, video summarization continues to attract the attention of a wide spectrum of research efforts. In this work we present a novel video summarization technique based on tracking local features among consecutive frames. Our approach operates on the uncompressed domain, and requires only a small set of consecutive frames to perform, thus being able to process the video stream directly and produce results on the fly. We tested our implementation on standard available datasets, and compared the results with the most recent published work in the field. The results achieved show that our proposal produces summarizations that have similar quality than the best published proposals, with the additional advantage of being able to process the stream directly in the uncompressed domain.
{"title":"Speeded-Up Video Summarization Based on Local Features","authors":"Javier Iparraguirre, C. Delrieux","doi":"10.1109/ISM.2013.70","DOIUrl":"https://doi.org/10.1109/ISM.2013.70","url":null,"abstract":"Digital video has become a very popular media in several contexts, with an ever expanding horizon of applications and uses. Thus, the amount of available video data is growing almost limitless. For this reason, video summarization continues to attract the attention of a wide spectrum of research efforts. In this work we present a novel video summarization technique based on tracking local features among consecutive frames. Our approach operates on the uncompressed domain, and requires only a small set of consecutive frames to perform, thus being able to process the video stream directly and produce results on the fly. We tested our implementation on standard available datasets, and compared the results with the most recent published work in the field. The results achieved show that our proposal produces summarizations that have similar quality than the best published proposals, with the additional advantage of being able to process the stream directly in the uncompressed domain.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"15 1","pages":"370-373"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81019536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Faced with the increasing large scale video databases, retrieving videos quickly and efficiently has become a crucial problem. Video text, which carries high level semantic information, is a type of important source that is useful for this task. In this paper, we introduce a video text detecting and tracking approach. By these methods we can obtain clear binary text images, and these text images can be processed by OCR (Optical Character Recognition) software directly. Our approach including two parts, one is stroke-model based video text detection and localization method, the other is SURF (Speeded Up Robust Features) based text region tracking method. In our detection and localization approach, we use stroke model and morphological operation to roughly identify candidate text regions. Combine stroke-map and edge response to localize text lines in each candidate text regions. Several heuristics and SVM (Support Vector Machine) used to verifying text blocks. The core part of our text tracking method is fast approximate nearest-neighbour search algorithm for extracted SURF features. Text-ending frame is determined based on SURF feature point numbers, while, text motion estimation is based on correct matches in adjacent frames. Experimental result on large number of different video clips shows that our approach can effectively detect and track both static texts and scrolling texts.
{"title":"A Video Text Detection and Tracking System","authors":"Tuoerhongjiang Yusufu, Yiqing Wang, Xiangzhong Fang","doi":"10.1109/ISM.2013.106","DOIUrl":"https://doi.org/10.1109/ISM.2013.106","url":null,"abstract":"Faced with the increasing large scale video databases, retrieving videos quickly and efficiently has become a crucial problem. Video text, which carries high level semantic information, is a type of important source that is useful for this task. In this paper, we introduce a video text detecting and tracking approach. By these methods we can obtain clear binary text images, and these text images can be processed by OCR (Optical Character Recognition) software directly. Our approach including two parts, one is stroke-model based video text detection and localization method, the other is SURF (Speeded Up Robust Features) based text region tracking method. In our detection and localization approach, we use stroke model and morphological operation to roughly identify candidate text regions. Combine stroke-map and edge response to localize text lines in each candidate text regions. Several heuristics and SVM (Support Vector Machine) used to verifying text blocks. The core part of our text tracking method is fast approximate nearest-neighbour search algorithm for extracted SURF features. Text-ending frame is determined based on SURF feature point numbers, while, text motion estimation is based on correct matches in adjacent frames. Experimental result on large number of different video clips shows that our approach can effectively detect and track both static texts and scrolling texts.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"25 1","pages":"522-529"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81590513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}