Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521726
J. Gemmell, Aleks Aris, Roger Lueder
User authored stories will always be the best stories, and authoring tools will continue to be developed. However, a digital lifetime capture permits storytelling via a lightweight markup structure, combined with location, sensor and usage data. In this paper, we describe support in the MyLifeBits system for such an approach, along with some simple authoring tools
{"title":"Telling Stories with Mylifebits","authors":"J. Gemmell, Aleks Aris, Roger Lueder","doi":"10.1109/ICME.2005.1521726","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521726","url":null,"abstract":"User authored stories will always be the best stories, and authoring tools will continue to be developed. However, a digital lifetime capture permits storytelling via a lightweight markup structure, combined with location, sensor and usage data. In this paper, we describe support in the MyLifeBits system for such an approach, along with some simple authoring tools","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130943314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521491
Motohiro Nakanishi, M. Kobayakawa, M. Hoshi, Tadashi Ohmori
A method for phrasing music data into meaningful musical pieces (e.g., bar and phrase) is an important function to analyze music data. To realize this function, we propose a method for extracting a unit of music data (musical unit) in the compressed domain of TwinVQ audio compression (MPEG-4 audio). Our key idea is to extract a musical unit from a sequence of autocorrelation coefficients computed in the encoding step of TwinVQ audio compression. We call the sequence of the autocorrelation coefficients the "autocorrelation sequence r". We use the k-th autocorrelation sequence r/sub k/ (k=1, 2, ..., 20) of music data for extracting a musical unit of music data. First, we calculate the j/sub k/-th autocorrelation coefficient a/sub k//sup j//sub k/ of the k-th autocorrelation sequence r/sub k/ (j/sub k/=38, 39, ..., 208; k=1, 2, ...,20). Second, for detecting the peak in the sequence (a/sub k//sup 38/, a/sub k//sup 39/, ..., a/sub k//sup 208/), the Laplacian filter is applied to the sequence. We then obtain the order p/sub k/ for which the maximum differential coefficient is attained. Finally, we compute the musical unit using p/sub k/. To evaluate the performance of extracting the musical unit by our method, we collected 64 music data and obtained autocorrelation sequences by applying the TwinVQ encoder to each data. We then applied our extraction algorithm to each autocorrelation sequence. The experimental results reveal a very good performance in the extraction of a musical unit for phrasing music data.
{"title":"A method for extracting a musical unit to phrase music data in the compressed domain of TwinVQ audio compression","authors":"Motohiro Nakanishi, M. Kobayakawa, M. Hoshi, Tadashi Ohmori","doi":"10.1109/ICME.2005.1521491","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521491","url":null,"abstract":"A method for phrasing music data into meaningful musical pieces (e.g., bar and phrase) is an important function to analyze music data. To realize this function, we propose a method for extracting a unit of music data (musical unit) in the compressed domain of TwinVQ audio compression (MPEG-4 audio). Our key idea is to extract a musical unit from a sequence of autocorrelation coefficients computed in the encoding step of TwinVQ audio compression. We call the sequence of the autocorrelation coefficients the \"autocorrelation sequence r\". We use the k-th autocorrelation sequence r/sub k/ (k=1, 2, ..., 20) of music data for extracting a musical unit of music data. First, we calculate the j/sub k/-th autocorrelation coefficient a/sub k//sup j//sub k/ of the k-th autocorrelation sequence r/sub k/ (j/sub k/=38, 39, ..., 208; k=1, 2, ...,20). Second, for detecting the peak in the sequence (a/sub k//sup 38/, a/sub k//sup 39/, ..., a/sub k//sup 208/), the Laplacian filter is applied to the sequence. We then obtain the order p/sub k/ for which the maximum differential coefficient is attained. Finally, we compute the musical unit using p/sub k/. To evaluate the performance of extracting the musical unit by our method, we collected 64 music data and obtained autocorrelation sequences by applying the TwinVQ encoder to each data. We then applied our extraction algorithm to each autocorrelation sequence. The experimental results reveal a very good performance in the extraction of a musical unit for phrasing music data.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133624484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521703
Kai-Chung Hou, Mei-Juan Chen, Ching-Ting Hsu
In this paper, a fast motion estimation algorithm for variable block-size by using a motion vector merging procedure is proposed for H.264. The motion vectors of adjacent small blocks are merged to predict the motion vectors of larger blocks for reducing the computation. Experimental results show that our proposed method has lower computational complexity than full search, fast full search and fast motion estimation of the H.264 reference software JM93 with slight quality decrease and little bit-rate increase
{"title":"Fast Motion Estimation by Motion Vector Merging Procedure for H. 264","authors":"Kai-Chung Hou, Mei-Juan Chen, Ching-Ting Hsu","doi":"10.1109/ICME.2005.1521703","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521703","url":null,"abstract":"In this paper, a fast motion estimation algorithm for variable block-size by using a motion vector merging procedure is proposed for H.264. The motion vectors of adjacent small blocks are merged to predict the motion vectors of larger blocks for reducing the computation. Experimental results show that our proposed method has lower computational complexity than full search, fast full search and fast motion estimation of the H.264 reference software JM93 with slight quality decrease and little bit-rate increase","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133641880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521547
Jinchang Ren, T. Vlachos, V. Argyriou
A user-oriented multimodal interface (MMI) framework is proposed. Considering the complexities of media connotations and uncertainties of the user's demands, content-based retrieval has intrinsic requirements for MMI for effective media-content inter-actions. Through integration of knowledge based conduction, learning of semantic concepts, natural language processing and analysis of users' profiles, our framework can establish a solid basis for design and implementation of general CBR systems satisfying extensibility, condensability and inter-operability
{"title":"A User-Oriented Multimodal-Interface Framework for General Content-Based Multimedia Retrieval","authors":"Jinchang Ren, T. Vlachos, V. Argyriou","doi":"10.1109/ICME.2005.1521547","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521547","url":null,"abstract":"A user-oriented multimodal interface (MMI) framework is proposed. Considering the complexities of media connotations and uncertainties of the user's demands, content-based retrieval has intrinsic requirements for MMI for effective media-content inter-actions. Through integration of knowledge based conduction, learning of semantic concepts, natural language processing and analysis of users' profiles, our framework can establish a solid basis for design and implementation of general CBR systems satisfying extensibility, condensability and inter-operability","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133120539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521382
Xiansheng Hua, Lie Lu, HongJiang Zhang
A robust learning-based TV commercial detection approach is proposed in this paper. Firstly, a set of basic features that facilitate distinguishing commercials from general program are analyzed. Then, a series of context-based features, which are more effective for identifying commercials, are derived from these basic features. Next, each shot is classified as commercial or general program based on these features by a pre-trained SVM classifier. And last, the detection results are further refined by scene grouping and some heuristic rules. Experiments on around 10-hour TV recordings of various genres show that the proposed scheme is able to identify commercial blocks with relatively high detection accuracy.
{"title":"Robust learning-based TV commercial detection","authors":"Xiansheng Hua, Lie Lu, HongJiang Zhang","doi":"10.1109/ICME.2005.1521382","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521382","url":null,"abstract":"A robust learning-based TV commercial detection approach is proposed in this paper. Firstly, a set of basic features that facilitate distinguishing commercials from general program are analyzed. Then, a series of context-based features, which are more effective for identifying commercials, are derived from these basic features. Next, each shot is classified as commercial or general program based on these features by a pre-trained SVM classifier. And last, the detection results are further refined by scene grouping and some heuristic rules. Experiments on around 10-hour TV recordings of various genres show that the proposed scheme is able to identify commercial blocks with relatively high detection accuracy.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131432958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521499
C. Palau, M. Esteve, J. Martínez, B. Molina, I. Pérez-Llopis
Urban traffic control systems have based their technological infrastructure on both advanced analogical close-circuit television systems (TVCC) and point-to-point links, providing low-scalable and very expensive systems. The main goal of an urban traffic monitoring system is to capture, send, play and distribute video information from the streets of a certain city. Current digitalization process of video networks, and the research carried out in the field of streaming media, has led vendors to present proprietary hardware and software solutions resulting in a strong dependency among their customers. The existence of open standards for video encoding and protocols for streaming media transmission over IP networks has led us to propose this system. The work presents an open urban traffic control system which bases its design on COTS philosophy for hardware and software, as well as open source and standardized protocols. The proposed system is a suitable solution in terms of scalability, cost, interoperability and performance for traffic control systems. Furthermore, its architecture can be easily adapted to other video applications and tools
{"title":"Urban Traffic Control: A Streaming Multimedia Approach","authors":"C. Palau, M. Esteve, J. Martínez, B. Molina, I. Pérez-Llopis","doi":"10.1109/ICME.2005.1521499","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521499","url":null,"abstract":"Urban traffic control systems have based their technological infrastructure on both advanced analogical close-circuit television systems (TVCC) and point-to-point links, providing low-scalable and very expensive systems. The main goal of an urban traffic monitoring system is to capture, send, play and distribute video information from the streets of a certain city. Current digitalization process of video networks, and the research carried out in the field of streaming media, has led vendors to present proprietary hardware and software solutions resulting in a strong dependency among their customers. The existence of open standards for video encoding and protocols for streaming media transmission over IP networks has led us to propose this system. The work presents an open urban traffic control system which bases its design on COTS philosophy for hardware and software, as well as open source and standardized protocols. The proposed system is a suitable solution in terms of scalability, cost, interoperability and performance for traffic control systems. Furthermore, its architecture can be easily adapted to other video applications and tools","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127801388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521707
E. Baccaglini, G. Barrenetxea, B. Beferull-Lozano
Sensor networks are usually dense networks where the network diversity can be exploited in order to overcome failures. In this paper, we study the use of multiple description techniques in the context of sensor networks where the cause of failures is due to the usual practical constraint of having finite buffers in the sensors, instead of the more traditional case of link failures considered in previous research. Although from a theoretical point of view we observe that the use of more descriptions provides usually better performance, we show experimentally that this is not the case in practice, when real constraints are introduced, such as finite buffers and the presence of header information, necessary for any real application. Our main result is that the optimal number of descriptions, in terms of average distortion, decreases as the fraction of header information increases for a given buffer size
{"title":"Performance of Multiple Description Coding in Sensor Networks with Finite Buffers","authors":"E. Baccaglini, G. Barrenetxea, B. Beferull-Lozano","doi":"10.1109/ICME.2005.1521707","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521707","url":null,"abstract":"Sensor networks are usually dense networks where the network diversity can be exploited in order to overcome failures. In this paper, we study the use of multiple description techniques in the context of sensor networks where the cause of failures is due to the usual practical constraint of having finite buffers in the sensors, instead of the more traditional case of link failures considered in previous research. Although from a theoretical point of view we observe that the use of more descriptions provides usually better performance, we show experimentally that this is not the case in practice, when real constraints are introduced, such as finite buffers and the presence of header information, necessary for any real application. Our main result is that the optimal number of descriptions, in terms of average distortion, decreases as the fraction of header information increases for a given buffer size","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"33 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131878352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521717
D. Ververidis, Constantine Kotropoulos
Emotional speech classification can be treated as a supervised learning task where the statistical properties of emotional speech segments are the features and the emotional styles form the labels. The Akaike criterion is used for estimating automatically the number of Gaussian densities that model the probability density function of the emotional speech features. A procedure for reducing the computational burden of crossvalidation in sequential floating forward selection algorithm is proposed that applies the t-test on the probability of correct classification for the Bayes classifier designed for various feature sets. For the Bayes classifier, the sequential floating forward selection algorithm is found to yield a higher probability of correct classification by 3% than that of the sequential forward selection algorithm either taking into account the gender information or ignoring it. The experimental results indicate that the utterances from isolated words and sentences are more colored emotional than those from paragraphs. Without taking into account the gender information, the probability of correct classification for the Bayes classifier admits a maximum when the probability density function of emotional speech features extracted from the aforementioned utterances is modeled as a mixture of 2 Gaussian densities
{"title":"Emotional Speech Classification Using Gaussian Mixture Models and the Sequential Floating Forward Selection Algorithm","authors":"D. Ververidis, Constantine Kotropoulos","doi":"10.1109/ICME.2005.1521717","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521717","url":null,"abstract":"Emotional speech classification can be treated as a supervised learning task where the statistical properties of emotional speech segments are the features and the emotional styles form the labels. The Akaike criterion is used for estimating automatically the number of Gaussian densities that model the probability density function of the emotional speech features. A procedure for reducing the computational burden of crossvalidation in sequential floating forward selection algorithm is proposed that applies the t-test on the probability of correct classification for the Bayes classifier designed for various feature sets. For the Bayes classifier, the sequential floating forward selection algorithm is found to yield a higher probability of correct classification by 3% than that of the sequential forward selection algorithm either taking into account the gender information or ignoring it. The experimental results indicate that the utterances from isolated words and sentences are more colored emotional than those from paragraphs. Without taking into account the gender information, the probability of correct classification for the Bayes classifier admits a maximum when the probability density function of emotional speech features extracted from the aforementioned utterances is modeled as a mixture of 2 Gaussian densities","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115513191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521593
B. Erol, Ying Li
Over the past few years, with the rapid adoption of broadband communication and advances in multimedia content capture and delivery, Web-based meetings and lectures, also referred to as e-meeting and e-lecture, have become popular among businesses and academic institutions because of their cost savings and capabilities in providing self-paced education and convenient content access and retrieval. In fact, the technological achievements in capture, analysis, access, and delivery of e-meeting and e-lecture media have already resulted in several working systems that are currently of regular usage. This paper gives an overview of existing work as well as state-of-the-art in these two research areas which are bound to affect the way we teach, learn, and collaborate.
{"title":"An overview of technologies for e-meeting and e-lecture","authors":"B. Erol, Ying Li","doi":"10.1109/ICME.2005.1521593","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521593","url":null,"abstract":"Over the past few years, with the rapid adoption of broadband communication and advances in multimedia content capture and delivery, Web-based meetings and lectures, also referred to as e-meeting and e-lecture, have become popular among businesses and academic institutions because of their cost savings and capabilities in providing self-paced education and convenient content access and retrieval. In fact, the technological achievements in capture, analysis, access, and delivery of e-meeting and e-lecture media have already resulted in several working systems that are currently of regular usage. This paper gives an overview of existing work as well as state-of-the-art in these two research areas which are bound to affect the way we teach, learn, and collaborate.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124125157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521476
Xinguo Yu, D. Farin
Sports video processing is an interesting topic for research, since the clearly defined game rules in sports provide the rich domain knowledge for analysis. Moreover, it is interesting because many specialized applications for sports video processing are emerging. This paper gives an overview of sports video research, where we describe both basic algorithmic techniques and applications.
{"title":"Current and Emerging Topics in Sports Video Processing","authors":"Xinguo Yu, D. Farin","doi":"10.1109/ICME.2005.1521476","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521476","url":null,"abstract":"Sports video processing is an interesting topic for research, since the clearly defined game rules in sports provide the rich domain knowledge for analysis. Moreover, it is interesting because many specialized applications for sports video processing are emerging. This paper gives an overview of sports video research, where we describe both basic algorithmic techniques and applications.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115022469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}