A TV news program comprises a continuous video stream containing a number of news stories, interspersed with commercials and headlines. This paper presents a method to detect the story boundaries and to separate out the stories from the other components and from each other. The method is based on movement of ticker text bands and repetition of ticker texts during different parts of a news program. The method does not use any language processing tool and is independent of language of telecast. It uses some simple features to distinguish news from the advertisements and can be used for large scale news indexing. We produce some test results on channels telecasting in English and few other Indian languages.
{"title":"Efficient and Language Independent News Story Segmentation for Telecast News Videos","authors":"Anubha Jindal, Aditya Tiwari, Hiranmay Ghosh","doi":"10.1109/ISM.2011.81","DOIUrl":"https://doi.org/10.1109/ISM.2011.81","url":null,"abstract":"A TV news program comprises a continuous video stream containing a number of news stories, interspersed with commercials and headlines. This paper presents a method to detect the story boundaries and to separate out the stories from the other components and from each other. The method is based on movement of ticker text bands and repetition of ticker texts during different parts of a news program. The method does not use any language processing tool and is independent of language of telecast. It uses some simple features to distinguish news from the advertisements and can be used for large scale news indexing. We produce some test results on channels telecasting in English and few other Indian languages.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"480 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124409416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arithmetic Coding (AC) is widely used for the entropy coding of text and multimedia data. It involves recursive partitioning of the range [0,1) in accordance with the relative probabilities of occurrence of the input symbols. In this paper, we present a data (image or video) encryption scheme based on arithmetic coding, which we refer to as Chaotic Arithmetic Coding (CAC). In CAC, a large number of chaotic maps can be used to perform coding, each achieving Shannon optimal compression performance. The exact choice of map is governed by a key. CAC has the effect of scrambling the intervals without making any changes to the width of interval in which the codeword must lie, thereby allowing encryption without sacrificing any coding efficiency. We next describe Binary CAC (BCAC) with some simple Security Enhancement (SE) modes which can alleviate the security of scheme against known cryptanalysis against AC-based encryption techniques. These modes, namely Plaintext Modulation (PM), Pair-Wise Independent Keys (PWIK), and Key and cipher text Mixing (MIX) modes have insignificant computational overhead, while BCAC decoder has lower hardware requirements than BAC coder itself, making BCAC with SE as excellent choice for deployment in secure embedded multimedia systems. A bit sensitivity analysis for key and plaintext is presented along with experimental tests for compression performance.
{"title":"Using Chaotic Maps for Encrypting Image and Video Content","authors":"A. Pande, P. Mohapatra, Joseph Zambreno","doi":"10.1109/ISM.2011.35","DOIUrl":"https://doi.org/10.1109/ISM.2011.35","url":null,"abstract":"Arithmetic Coding (AC) is widely used for the entropy coding of text and multimedia data. It involves recursive partitioning of the range [0,1) in accordance with the relative probabilities of occurrence of the input symbols. In this paper, we present a data (image or video) encryption scheme based on arithmetic coding, which we refer to as Chaotic Arithmetic Coding (CAC). In CAC, a large number of chaotic maps can be used to perform coding, each achieving Shannon optimal compression performance. The exact choice of map is governed by a key. CAC has the effect of scrambling the intervals without making any changes to the width of interval in which the codeword must lie, thereby allowing encryption without sacrificing any coding efficiency. We next describe Binary CAC (BCAC) with some simple Security Enhancement (SE) modes which can alleviate the security of scheme against known cryptanalysis against AC-based encryption techniques. These modes, namely Plaintext Modulation (PM), Pair-Wise Independent Keys (PWIK), and Key and cipher text Mixing (MIX) modes have insignificant computational overhead, while BCAC decoder has lower hardware requirements than BAC coder itself, making BCAC with SE as excellent choice for deployment in secure embedded multimedia systems. A bit sensitivity analysis for key and plaintext is presented along with experimental tests for compression performance.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128525429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a hierarchical model-based approach to frame rate-up conversion is presented. Given a sequence of consecutive video frames, a Spatio-Temporal Conditional Random Field (ST-CRF) is trained to capture both the motion and shape characteristics of objects within consecutive frames. A hierarchical tree is then constructed via hierarchical segmentation that sub-divides frames into regions based on color intensity and regional velocity. A hierarchical sampling approach is then introduced to construct new intermediate frames between adjacent video frames, where estimated intermediate frames are constructed at each level of a hierarchical tree constructed such that the probability of the ST-CRF is maximized. Preliminary results using videos with different motion characteristics show that the proposed approach has potential for producing intermediate frames with high visual quality.
{"title":"A Novel Hierarchical Model-Based Frame Rate Up-Conversion via Spatio-temporal Conditional Random Fields","authors":"M. Shafiee, Z. Azimifar, A. Wong, P. Fieguth","doi":"10.1109/ISM.2011.44","DOIUrl":"https://doi.org/10.1109/ISM.2011.44","url":null,"abstract":"In this paper, a hierarchical model-based approach to frame rate-up conversion is presented. Given a sequence of consecutive video frames, a Spatio-Temporal Conditional Random Field (ST-CRF) is trained to capture both the motion and shape characteristics of objects within consecutive frames. A hierarchical tree is then constructed via hierarchical segmentation that sub-divides frames into regions based on color intensity and regional velocity. A hierarchical sampling approach is then introduced to construct new intermediate frames between adjacent video frames, where estimated intermediate frames are constructed at each level of a hierarchical tree constructed such that the probability of the ST-CRF is maximized. Preliminary results using videos with different motion characteristics show that the proposed approach has potential for producing intermediate frames with high visual quality.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130745983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work aims to develop a system for predicting age progression in children faces. Age progression prediction in children faces is critical to assist missing children searching. An integral module including feature extraction, distance measurement, and face synthesis is devised in this paper to predict faces at different ages. In the proposed method, a curvature-weighted plus bending-energy distance is employed for selecting similar facial components from an aging database. The growth curve of each facial component is used to predict the shape, size, and location of each component at a different age. Thin plate spline method is employed to synthesize a 3-D face model from the predicted components by minimizing the bending energy. Experiments are conducted to test the proposed method with various subjects and the results show that the proposed method is very promising.
{"title":"Exemplar-based Age Progression Prediction in Children Faces","authors":"C. Shen, Wan-Hua Lu, S. Shih, H. Liao","doi":"10.1109/ISM.2011.28","DOIUrl":"https://doi.org/10.1109/ISM.2011.28","url":null,"abstract":"This work aims to develop a system for predicting age progression in children faces. Age progression prediction in children faces is critical to assist missing children searching. An integral module including feature extraction, distance measurement, and face synthesis is devised in this paper to predict faces at different ages. In the proposed method, a curvature-weighted plus bending-energy distance is employed for selecting similar facial components from an aging database. The growth curve of each facial component is used to predict the shape, size, and location of each component at a different age. Thin plate spline method is employed to synthesize a 3-D face model from the predicted components by minimizing the bending energy. Experiments are conducted to test the proposed method with various subjects and the results show that the proposed method is very promising.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127850569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Tai, Rouh-Mei Hu, H. Hsiao, Rong-Ming Chen, J. Tsai
The problem of identifying and counting blood cells within the blood smear is of both theoretical and practical interest. The differential counting of blood cells provides invaluable information to pathologist for diagnosis and treatment of many diseases. In this paper we propose an efficient hierarchical blood cell image identification and classification method based on multi-class support vector machine. In this automated process, segmentation and classification of blood cells are the most important stages. We segment the stained blood cells in digital microscopic images and extract the geometric features for each segment to identify and classify the different types of blood cells. The experimental results are compared with the manual results obtained by the pathologist, and demonstrate the effectiveness of the proposed method.
{"title":"Blood Cell Image Classification Based on Hierarchical SVM","authors":"W. Tai, Rouh-Mei Hu, H. Hsiao, Rong-Ming Chen, J. Tsai","doi":"10.1109/ISM.2011.29","DOIUrl":"https://doi.org/10.1109/ISM.2011.29","url":null,"abstract":"The problem of identifying and counting blood cells within the blood smear is of both theoretical and practical interest. The differential counting of blood cells provides invaluable information to pathologist for diagnosis and treatment of many diseases. In this paper we propose an efficient hierarchical blood cell image identification and classification method based on multi-class support vector machine. In this automated process, segmentation and classification of blood cells are the most important stages. We segment the stained blood cells in digital microscopic images and extract the geometric features for each segment to identify and classify the different types of blood cells. The experimental results are compared with the manual results obtained by the pathologist, and demonstrate the effectiveness of the proposed method.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126447193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haojin Yang, Maria Siebert, Patrick Lühne, Harald Sack, C. Meinel
During the last years, digital lecture libraries and lecture video portals have become more and more popular. However, finding efficient methods for indexing multimedia still remains a challenging task. Since the text displayed in a lecture video is closely related to the lecture content, it provides a valuable source for indexing and retrieving lecture contents. In this paper, we present an approach for automatic lecture video indexing based on video OCR technology. We have developed a novel video segmenter for automated slide video structure analysis and a weighted DCT (discrete cosines transformation) based text detector. A dynamic image constrast/brightness adaption serves the purpose of enhancing the text image quality to make it processible by existing common OCR software. Time-based text occurence information as well as the analyzed text content are further used for indexing. We prove the accuracy of the proposed approach by evaluation.
{"title":"Automatic Lecture Video Indexing Using Video OCR Technology","authors":"Haojin Yang, Maria Siebert, Patrick Lühne, Harald Sack, C. Meinel","doi":"10.1109/ISM.2011.26","DOIUrl":"https://doi.org/10.1109/ISM.2011.26","url":null,"abstract":"During the last years, digital lecture libraries and lecture video portals have become more and more popular. However, finding efficient methods for indexing multimedia still remains a challenging task. Since the text displayed in a lecture video is closely related to the lecture content, it provides a valuable source for indexing and retrieving lecture contents. In this paper, we present an approach for automatic lecture video indexing based on video OCR technology. We have developed a novel video segmenter for automated slide video structure analysis and a weighted DCT (discrete cosines transformation) based text detector. A dynamic image constrast/brightness adaption serves the purpose of enhancing the text image quality to make it processible by existing common OCR software. Time-based text occurence information as well as the analyzed text content are further used for indexing. We prove the accuracy of the proposed approach by evaluation.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122960903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The availability of various photo archives and photo sharing systems made similarity searching much more important because the photos are not usually conveniently tagged. So the photos (images) need to be searched by their content. Moreover, it is important not only to compare images with a query holistically but also to locate images that contain the query as their part. The query can be a picture of a person, building, or an abstract object and the task is to retrieve images of the query object but from a different perspective or images capturing a global scene containing the query object. This retrieval is called the sub-image searching. In this paper, we propose an algorithm for retrieving database images by their similarity to and containment of a query. The novelty of it lies in application of a sequence alignment algorithm, which is commonly used in text retrieval. This forms an orthogonal solution to currently used approaches based on inverted files. The proposed algorithm is evaluated on a real-life data set containing photographs where images of logos are searched. It was compared to a state-of-the-art method and the improvement of 20% in average mean precision was obtained.
{"title":"Searching for Sub-images Using Sequence Alignment","authors":"T. Homoľa, Vlastislav Dohnal, P. Zezula","doi":"10.1109/ISM.2011.19","DOIUrl":"https://doi.org/10.1109/ISM.2011.19","url":null,"abstract":"The availability of various photo archives and photo sharing systems made similarity searching much more important because the photos are not usually conveniently tagged. So the photos (images) need to be searched by their content. Moreover, it is important not only to compare images with a query holistically but also to locate images that contain the query as their part. The query can be a picture of a person, building, or an abstract object and the task is to retrieve images of the query object but from a different perspective or images capturing a global scene containing the query object. This retrieval is called the sub-image searching. In this paper, we propose an algorithm for retrieving database images by their similarity to and containment of a query. The novelty of it lies in application of a sequence alignment algorithm, which is commonly used in text retrieval. This forms an orthogonal solution to currently used approaches based on inverted files. The proposed algorithm is evaluated on a real-life data set containing photographs where images of logos are searched. It was compared to a state-of-the-art method and the improvement of 20% in average mean precision was obtained.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123028979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tabletop interaction systems have been explored actively owing to their advantages of allowing participants to collaborate each other by, in many cases, multi-touch gestures over a single, shared display. However, less research has been done on the exploration of localized auditory feedback. The authors have been investigating an interactive table system that is capable of presenting visual and auditory feedback, as well as gestural input. Especially, as for the auditory feedback, the system can locate multiple sounds simultaneously by controlling the loudness for 16 speakers mounted in the table. In this paper, we describe the design and experimental analysis of sound zone control, aiming to enhance the presence of sounds and interaction among participants in the interactive table environment. User studies indicate that the sound zone can be broadened when a source signal is accompanied by different delayed signals of it, while time intervals for sound position exchange over multiple speakers are not of primary importance in perception.
{"title":"Sound Zone Control in an Interactive Table System Environment","authors":"Ryuji Yamaguchi, S. Sugihara, M. Hirakawa","doi":"10.1109/ISM.2011.40","DOIUrl":"https://doi.org/10.1109/ISM.2011.40","url":null,"abstract":"Tabletop interaction systems have been explored actively owing to their advantages of allowing participants to collaborate each other by, in many cases, multi-touch gestures over a single, shared display. However, less research has been done on the exploration of localized auditory feedback. The authors have been investigating an interactive table system that is capable of presenting visual and auditory feedback, as well as gestural input. Especially, as for the auditory feedback, the system can locate multiple sounds simultaneously by controlling the loudness for 16 speakers mounted in the table. In this paper, we describe the design and experimental analysis of sound zone control, aiming to enhance the presence of sounds and interaction among participants in the interactive table environment. User studies indicate that the sound zone can be broadened when a source signal is accompanied by different delayed signals of it, while time intervals for sound position exchange over multiple speakers are not of primary importance in perception.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122437455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present the current state of our research project which aims to develop an open-source framework for real-time scene analysis and automatic camera control in lecture recording scenarios. The system is designed to run in alliance with the Opencast Matter horn lecture capture system as well as stand-alone. A GPU-based scene segmentation technique using motion cues and background modeling has been implemented using OpenCL. Moving objects are tracked by their centroids and bounding boxes and a dynamic appearance model is used to give persons a relative identity. Development started on a scriptable virtual camera operator module that will drive PTZ-cameras. Other applications of the scene analysis are possible.
{"title":"OpenTrack - Automated Camera Control for Lecture Recordings","authors":"Benjamin Wulff, Rüdiger Rolf","doi":"10.1109/ISM.2011.97","DOIUrl":"https://doi.org/10.1109/ISM.2011.97","url":null,"abstract":"In this paper we present the current state of our research project which aims to develop an open-source framework for real-time scene analysis and automatic camera control in lecture recording scenarios. The system is designed to run in alliance with the Opencast Matter horn lecture capture system as well as stand-alone. A GPU-based scene segmentation technique using motion cues and background modeling has been implemented using OpenCL. Moving objects are tracked by their centroids and bounding boxes and a dynamic appearance model is used to give persons a relative identity. Development started on a scriptable virtual camera operator module that will drive PTZ-cameras. Other applications of the scene analysis are possible.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122488011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motion features contribute significant information about a video content. This paper highlights a novel CBCD (Content-Based Copy Detection) approach, by incorporating several motion activity features. First, we extract both temporal and spatial motion features to describe overall activity of a video sequence. Second, we combine these features in a feasible manner, to generate robust video fingerprints. Third, clustering based pruned search is utilized for similarity matching instead of direct searching of video fingerprints. The proposed system is tested on TRECVID-2007 data set and the results demonstrate the effectiveness of the proposed system against several transformations such as random noise, fast forward, pattern insertion, cropping and picture-inside-picture.
{"title":"A Novel CBCD Approach Using MPEG-7 Motion Activity Descriptors","authors":"R. Roopalakshmi, G. R. M. Reddy","doi":"10.1109/ISM.2011.36","DOIUrl":"https://doi.org/10.1109/ISM.2011.36","url":null,"abstract":"Motion features contribute significant information about a video content. This paper highlights a novel CBCD (Content-Based Copy Detection) approach, by incorporating several motion activity features. First, we extract both temporal and spatial motion features to describe overall activity of a video sequence. Second, we combine these features in a feasible manner, to generate robust video fingerprints. Third, clustering based pruned search is utilized for similarity matching instead of direct searching of video fingerprints. The proposed system is tested on TRECVID-2007 data set and the results demonstrate the effectiveness of the proposed system against several transformations such as random noise, fast forward, pattern insertion, cropping and picture-inside-picture.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132816368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}