Sebastian Pospiech, N. Birnbaum, L. Knipping, R. Mertens
Web lectures can be employed in a variety of didactic scenarios ranging from add-on for a live lecture to stand-alone learning content. In all of these scenarios, though less in the stand-alone one, indexing and navigation are crucial for real world usability. As a consequence, many approaches like slide based indexing, transcript based indexing, collaborative manual indexing as well as individual or social indexing based on viewing behavior have been devised. The approach proposed in this paper takes individual indexing based on viewing behavior two steps further in that (a) indexes the recording at production time in the lecture hall and (b) actively analyzes the students attention focus instead of passively recording viewing time as done in conventional footprinting. In order to track student attention during the lecture, recoding and analyzing the student's behaviour in parallel to the lecture as well as synchronizing both data streams is necessary. This paper discusses the architecture required for personalized attention based indexing, possible problems and strategies to tackle them.
{"title":"Personalized Indexing of Attention in Lectures -- Requirements and Concept","authors":"Sebastian Pospiech, N. Birnbaum, L. Knipping, R. Mertens","doi":"10.1109/ISM.2015.44","DOIUrl":"https://doi.org/10.1109/ISM.2015.44","url":null,"abstract":"Web lectures can be employed in a variety of didactic scenarios ranging from add-on for a live lecture to stand-alone learning content. In all of these scenarios, though less in the stand-alone one, indexing and navigation are crucial for real world usability. As a consequence, many approaches like slide based indexing, transcript based indexing, collaborative manual indexing as well as individual or social indexing based on viewing behavior have been devised. The approach proposed in this paper takes individual indexing based on viewing behavior two steps further in that (a) indexes the recording at production time in the lecture hall and (b) actively analyzes the students attention focus instead of passively recording viewing time as done in conventional footprinting. In order to track student attention during the lecture, recoding and analyzing the student's behaviour in parallel to the lecture as well as synchronizing both data streams is necessary. This paper discusses the architecture required for personalized attention based indexing, possible problems and strategies to tackle them.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123122036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seyed Vahid Hosseinioun, Hussein Al Osman, Abdulmotaleb El Saddik
With the remarkable increase in use of sensors in our daily lives, various methods have been devised to detect events in a driving environment using smart-phones as they provide two main advantages: they eliminate the need to have dedicated hardware in vehicles and they are widely accessible. Since rewarding safe driving is an important issue for insurance companies, some companies are implementing Usage-Based Insurance (UBI) as opposed to traditional History-Based plans. The collection of driving events, such as acceleration and turning, is a prerequisite requirement for the adoption of such plans. Mobile phone sensors are capable of detecting whether a car is accelerating or braking, while through service fusion we can detect other events like speeding or instances of severe weather. We propose a new and robust hybrid classification algorithm that detects acceleration-based events with an F1-score of 0.9304 and turn events with an F1-score of 0.9038. We further propose a method for measuring the driving performance index using the detected events.
{"title":"Employing Sensors and Services Fusion to Detect and Assess Driving Events","authors":"Seyed Vahid Hosseinioun, Hussein Al Osman, Abdulmotaleb El Saddik","doi":"10.1109/ISM.2015.121","DOIUrl":"https://doi.org/10.1109/ISM.2015.121","url":null,"abstract":"With the remarkable increase in use of sensors in our daily lives, various methods have been devised to detect events in a driving environment using smart-phones as they provide two main advantages: they eliminate the need to have dedicated hardware in vehicles and they are widely accessible. Since rewarding safe driving is an important issue for insurance companies, some companies are implementing Usage-Based Insurance (UBI) as opposed to traditional History-Based plans. The collection of driving events, such as acceleration and turning, is a prerequisite requirement for the adoption of such plans. Mobile phone sensors are capable of detecting whether a car is accelerating or braking, while through service fusion we can detect other events like speeding or instances of severe weather. We propose a new and robust hybrid classification algorithm that detects acceleration-based events with an F1-score of 0.9304 and turn events with an F1-score of 0.9038. We further propose a method for measuring the driving performance index using the detected events.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120960570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article proposes to analyze the structural regularities from the audio and video streams of TV-programs and explore their potential for the classification of videos into program collections. Our approach is based on the spectral analysis of distance matrices representing the short-and long-term dependancies within the audio and visual modalities of a video. We propose to compare two videos by their respective spectral features. We appreciate the benefits brought by the two modalities on the performances in the context of a K-nearest neighbor classification, and we test our approach in the context of an unsupervised clustering algorithm. These evaluations are performed on two datasets of French and Italian TV programs.
{"title":"Exploring the Complementarity of Audio-Visual Structural Regularities for the Classification of Videos into TV-Program Collections","authors":"G. Sargent, P. Hanna, H. Nicolas, F. Bimbot","doi":"10.1109/ISM.2015.133","DOIUrl":"https://doi.org/10.1109/ISM.2015.133","url":null,"abstract":"This article proposes to analyze the structural regularities from the audio and video streams of TV-programs and explore their potential for the classification of videos into program collections. Our approach is based on the spectral analysis of distance matrices representing the short-and long-term dependancies within the audio and visual modalities of a video. We propose to compare two videos by their respective spectral features. We appreciate the benefits brought by the two modalities on the performances in the context of a K-nearest neighbor classification, and we test our approach in the context of an unsupervised clustering algorithm. These evaluations are performed on two datasets of French and Italian TV programs.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116119635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. VenkataPhaniKumar, K. C. R. C. Varma, S. Mahapatra
In this paper, a novel two-pass rate control scheme is proposed to achieve a consistent visual quality media for variable bit rate (VBR) video streaming. The rate-distortion (RD) characteristics of each frame is used to establish a frame complexity model, which is later used along with statistics collected in the first-pass to derive an optimal quantization parameter for encoding the frame in the second-pass. The experimental results demonstrate that the proposed rate control scheme significantly outperforms the existing rate control mechanism in the Joint Model (JM) reference software in terms of the Peak Signal to Noise Ratio (PSNR) and consistent perceptual visual quality while achieving the target bit rate. Further, the proposed scheme is validated through implementation on a miniature test-bed.
本文提出了一种新的双通率控制方案,以实现可变比特率视频流的一致视觉质量。利用每一帧的率失真(RD)特征建立帧复杂度模型,然后将该模型与第一帧收集的统计数据结合使用,得出第二帧编码的最佳量化参数。实验结果表明,在达到目标比特率的同时,所提出的速率控制方案在峰值信噪比(PSNR)和一致的感知视觉质量方面明显优于Joint Model (JM)参考软件中现有的速率控制机制。最后,在一个小型试验台上对该方案进行了验证。
{"title":"A Novel Two Pass Rate Control Scheme for Variable Bit Rate Video Streaming","authors":"M. VenkataPhaniKumar, K. C. R. C. Varma, S. Mahapatra","doi":"10.1109/ISM.2015.32","DOIUrl":"https://doi.org/10.1109/ISM.2015.32","url":null,"abstract":"In this paper, a novel two-pass rate control scheme is proposed to achieve a consistent visual quality media for variable bit rate (VBR) video streaming. The rate-distortion (RD) characteristics of each frame is used to establish a frame complexity model, which is later used along with statistics collected in the first-pass to derive an optimal quantization parameter for encoding the frame in the second-pass. The experimental results demonstrate that the proposed rate control scheme significantly outperforms the existing rate control mechanism in the Joint Model (JM) reference software in terms of the Peak Signal to Noise Ratio (PSNR) and consistent perceptual visual quality while achieving the target bit rate. Further, the proposed scheme is validated through implementation on a miniature test-bed.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133164391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chun-Fu Chen, G. Lee, Yinglong Xia, Wan-Yi Sabrina Lin, T. Suzumura, Ching-Yung Lin
In this paper, we develop a pipelining schema for image deep learning on GPU cluster to leverage heavy workload of training procedure. In addition, it is usually necessary to train multiple models to obtain a good deep learning model due to the limited a priori knowledge on deep neural network structure. Therefore, adopting parallel and distributed computing appears is an obvious path forward, but the mileage varies depending on how amenable a deep network can be parallelized and the availability of rapid prototyping capabilities with low cost of entry. In this work, we propose a framework to organize the training procedures of multiple deep learning models into a pipeline on a GPU cluster, where each stage is handled by a particular GPU with a partition of the training dataset. Instead of frequently migrating data among the disks, CPUs, and GPUs, our framework only moves partially trained models to reduce bandwidth consumption and to leverage the full computation capability of the cluster. In this paper, we deploy the proposed framework on popular image recognition tasks using deep learning, and the experiments show that the proposed method reduces overall training time up to dozens of hours compared to the baseline method.
{"title":"Efficient Multi-training Framework of Image Deep Learning on GPU Cluster","authors":"Chun-Fu Chen, G. Lee, Yinglong Xia, Wan-Yi Sabrina Lin, T. Suzumura, Ching-Yung Lin","doi":"10.1109/ISM.2015.119","DOIUrl":"https://doi.org/10.1109/ISM.2015.119","url":null,"abstract":"In this paper, we develop a pipelining schema for image deep learning on GPU cluster to leverage heavy workload of training procedure. In addition, it is usually necessary to train multiple models to obtain a good deep learning model due to the limited a priori knowledge on deep neural network structure. Therefore, adopting parallel and distributed computing appears is an obvious path forward, but the mileage varies depending on how amenable a deep network can be parallelized and the availability of rapid prototyping capabilities with low cost of entry. In this work, we propose a framework to organize the training procedures of multiple deep learning models into a pipeline on a GPU cluster, where each stage is handled by a particular GPU with a partition of the training dataset. Instead of frequently migrating data among the disks, CPUs, and GPUs, our framework only moves partially trained models to reduce bandwidth consumption and to leverage the full computation capability of the cluster. In this paper, we deploy the proposed framework on popular image recognition tasks using deep learning, and the experiments show that the proposed method reduces overall training time up to dozens of hours compared to the baseline method.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125674733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nestor Z. Salamon, Julio C. S. Jacques Junior, S. Musse
In this work we propose a framework for group re-identification based on manually defined soft-biometric characteristics. Users are able to choose colors that describe the soft-biometric attributes of each person belonging to the searched group. Our technique matches these structured attributes against image databases using color distance metrics, a novel adaptive threshold selection and people's proximity high level feature. Experimental results show that the proposed approach is able to help the re-identification procedure ranking the most likely results without training data, and also being extensible to work without previous images.
{"title":"A User-Based Framework for Group Re-Identification in Still Images","authors":"Nestor Z. Salamon, Julio C. S. Jacques Junior, S. Musse","doi":"10.1109/ISM.2015.41","DOIUrl":"https://doi.org/10.1109/ISM.2015.41","url":null,"abstract":"In this work we propose a framework for group re-identification based on manually defined soft-biometric characteristics. Users are able to choose colors that describe the soft-biometric attributes of each person belonging to the searched group. Our technique matches these structured attributes against image databases using color distance metrics, a novel adaptive threshold selection and people's proximity high level feature. Experimental results show that the proposed approach is able to help the re-identification procedure ranking the most likely results without training data, and also being extensible to work without previous images.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131651283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Internet-of-Things (IoT) is considered as the next big disruptive technology field which main goal is to achieve social good by enabling collaboration among physical things or sensors. We present a cloud based cyber-physical architecture to leverage the Sensing as-a-Service (SenAS) model, where every physical thing is complemented by a cloud based twin cyber process. In this model, things can communicate using direct physical connections or through the cyber layer using peer-to-peer inter process communications. The proposed model offers simultaneous communication channels among groups of things by uniquely tagging each group with a relationship ID. An intelligent service layer ensures custom privacy and access rights management for the sensor owners. We also present the implementation details of an IoT platform and demonstrate its practicality by developing case study applications for the Internet-of-Vehicles (IoV) and the connected smart home.
{"title":"Design and Development of a Cloud Based Cyber-Physical Architecture for the Internet-of-Things","authors":"K. M. Alam, Alex Sopena, Abdulmotaleb El Saddik","doi":"10.1109/ISM.2015.96","DOIUrl":"https://doi.org/10.1109/ISM.2015.96","url":null,"abstract":"Internet-of-Things (IoT) is considered as the next big disruptive technology field which main goal is to achieve social good by enabling collaboration among physical things or sensors. We present a cloud based cyber-physical architecture to leverage the Sensing as-a-Service (SenAS) model, where every physical thing is complemented by a cloud based twin cyber process. In this model, things can communicate using direct physical connections or through the cyber layer using peer-to-peer inter process communications. The proposed model offers simultaneous communication channels among groups of things by uniquely tagging each group with a relationship ID. An intelligent service layer ensures custom privacy and access rights management for the sensor owners. We also present the implementation details of an IoT platform and demonstrate its practicality by developing case study applications for the Internet-of-Vehicles (IoV) and the connected smart home.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131654296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the exponential growth of web image data, image tagging is becoming crucial in many image based applications such as object recognition and content-based image retrieval. Despite the great progress achieved in automatic recognition technologies, none has yet provided a satisfactory solution to be widely useful in solving generic image recognition problems. So far, only manual tagging can provide reliable tagging results. However, such work is tedious, costly and workers have no motivation. In this paper, we propose an online image tagging system, EyeDentifyIt, driven by image-click-ads framework, which motivates crowdsourcing workers as well as general web users to tag images at high quality for low cost with low workload. A series of usability studies are presented to demonstrate how EyeDentifyIt provides improved user motivations and requires less workload, compared to state-of-the-art approaches.
{"title":"An Unified Image Tagging System Driven by Image-Click-Ads Framework","authors":"Qiong Wu, P. Boulanger","doi":"10.1109/ISM.2015.12","DOIUrl":"https://doi.org/10.1109/ISM.2015.12","url":null,"abstract":"With the exponential growth of web image data, image tagging is becoming crucial in many image based applications such as object recognition and content-based image retrieval. Despite the great progress achieved in automatic recognition technologies, none has yet provided a satisfactory solution to be widely useful in solving generic image recognition problems. So far, only manual tagging can provide reliable tagging results. However, such work is tedious, costly and workers have no motivation. In this paper, we propose an online image tagging system, EyeDentifyIt, driven by image-click-ads framework, which motivates crowdsourcing workers as well as general web users to tag images at high quality for low cost with low workload. A series of usability studies are presented to demonstrate how EyeDentifyIt provides improved user motivations and requires less workload, compared to state-of-the-art approaches.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133660355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christoph Jansen, Radek Mackowiak, N. Hezel, Moritz Ufer, Gregor Altstadt, K. U. Barthel
In this paper, we present a novel approach to reconstruct missing areas in facial images by using a series of Restricted Boltzman Machines (RBMs). RBMs created with a low number of hidden neurons generalize well and are able to reconstruct basic structures in the missing areas. On the other hand networks with many hidden neurons tend to emphasize details, when using the reconstruction of the previous, more generalized RBMs, as their input. Since trained RBMs are fast in encoding and decoding data by design, our method is also suitable for processing video streams.
{"title":"Reconstructing Missing Areas in Facial Images","authors":"Christoph Jansen, Radek Mackowiak, N. Hezel, Moritz Ufer, Gregor Altstadt, K. U. Barthel","doi":"10.1109/ISM.2015.68","DOIUrl":"https://doi.org/10.1109/ISM.2015.68","url":null,"abstract":"In this paper, we present a novel approach to reconstruct missing areas in facial images by using a series of Restricted Boltzman Machines (RBMs). RBMs created with a low number of hidden neurons generalize well and are able to reconstruct basic structures in the missing areas. On the other hand networks with many hidden neurons tend to emphasize details, when using the reconstruction of the previous, more generalized RBMs, as their input. Since trained RBMs are fast in encoding and decoding data by design, our method is also suitable for processing video streams.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131684597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Video Browser Showdown (VBS) is an annual event where researchers evaluate their video search systems in a competitive setting. Searching in videos is often a two-step process: first some sort of pre-filtering is done, where, for example, users query an indexed archive of files, followed by a human-based browsing, where users skim the returned result set in search for the relevant file or portion of it. The VBS aims at this whole search process, focusing in particular on its interactive aspects. Encouraged by previous years' results, we created a system that purely addresses the latter issue, i.e., interface and interaction design. By eliminating all kind of video indexing and query processing, we were aiming to demonstrate the importance of good interface design for video search and that its relevance is often underestimated by today's systems. This claim is clearly proven by the results our system achieved in the VBS 2015 competition, where our approach was on a par with the top performing ones. In this paper, we will describe our system along with related design decisions, present our results from the VBS event, and discuss them in further detail.
{"title":"Human-Based Video Browsing - Investigating Interface Design for Fast Video Browsing","authors":"Wolfgang Hürst, R. V. D. Werken","doi":"10.1109/ISM.2015.104","DOIUrl":"https://doi.org/10.1109/ISM.2015.104","url":null,"abstract":"The Video Browser Showdown (VBS) is an annual event where researchers evaluate their video search systems in a competitive setting. Searching in videos is often a two-step process: first some sort of pre-filtering is done, where, for example, users query an indexed archive of files, followed by a human-based browsing, where users skim the returned result set in search for the relevant file or portion of it. The VBS aims at this whole search process, focusing in particular on its interactive aspects. Encouraged by previous years' results, we created a system that purely addresses the latter issue, i.e., interface and interaction design. By eliminating all kind of video indexing and query processing, we were aiming to demonstrate the importance of good interface design for video search and that its relevance is often underestimated by today's systems. This claim is clearly proven by the results our system achieved in the VBS 2015 competition, where our approach was on a par with the top performing ones. In this paper, we will describe our system along with related design decisions, present our results from the VBS event, and discuss them in further detail.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131982040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}