This article argues for the growing importance of quality metadata and the equation of that quality with precision and semantic grounding. Such semantic grounding requires metadata that derives from intentional human intervention as well as mechanistic measurement of content media. In both cases, one chief problem in the automatic generation of semantic metadata is ambiguity leading to the overgeneration of inaccurate annotations. We look at a particular richly annotated image collection to show how context dramatically reduces the problem of ambiguity over this particular corpus. In particular, we consider both the abstract measurement of "contextual ambiguity" over the collection and the application of a particular disambiguation algorithm to synthesized keyword searches across the selection.
{"title":"Context for semantic metadata","authors":"K. Haase","doi":"10.1145/1027527.1027574","DOIUrl":"https://doi.org/10.1145/1027527.1027574","url":null,"abstract":"This article argues for the growing importance of quality metadata and the equation of that quality with precision and semantic grounding. Such semantic grounding requires metadata that derives from intentional human intervention as well as mechanistic measurement of content media. In both cases, one chief problem in the automatic generation of semantic metadata is ambiguity leading to the overgeneration of inaccurate annotations. We look at a particular richly annotated image collection to show how context dramatically reduces the problem of ambiguity over this particular corpus. In particular, we consider both the abstract measurement of \"contextual ambiguity\" over the collection and the application of a particular disambiguation algorithm to synthesized keyword searches across the selection.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"361 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113956197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose an efficient scheme to transport video over wireless networks, specifically cdma2000® 1x. Speech transmission over cdma2000® uses a variable rate voice coder (vocoder) over a channel with multiple fixed rates. We apply these ideas to compressed video transmission over wireless IP networks. Explicit Bit Rate (EBR) video compression is designed to match the video encoder output to a set of fixed channel rates. We show that in comparison with VBR video transmission over a fixed rate wireless channel, EBR video transmission provides improved error resilience, reduced latency and improved efficiency.
{"title":"Video transport over wireless networks","authors":"H. Garudadri, P. Sagetong, S. Nanda","doi":"10.1145/1027527.1027626","DOIUrl":"https://doi.org/10.1145/1027527.1027626","url":null,"abstract":"In this paper, we propose an efficient scheme to transport video over wireless networks, specifically cdma2000® 1x. Speech transmission over cdma2000® uses a variable rate voice coder (vocoder) over a channel with multiple fixed rates. We apply these ideas to compressed video transmission over wireless IP networks. Explicit Bit Rate (EBR) video compression is designed to match the video encoder output to a set of fixed channel rates. We show that in comparison with VBR video transmission over a fixed rate wireless channel, EBR video transmission provides improved error resilience, reduced latency and improved efficiency.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125177922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Cao, Dalei Li, Wallapak Tavanapong, Jung-Hwan Oh, J. Wong, P. C. Groen
Colonoscopy is an important screening tool for colorectal cancer. During a colonoscopic procedure, a tiny video camera at the tip of the endoscope generates a video signal of the internal mucosa of the colon. The video data are displayed on a monitor for real-time analysis by the endoscopist. We call videos captured from colonoscopic procedures colonoscopy videos. Because these videos possess unique characteristics, new types of semantic units and parsing techniques are required. In this paper, we define new semantic units called operation shots, each is a segment of visual and audio data that correspond to a therapeutic or biopsy operation. We introduce a new spatio-temporal analysis technique to detect operation shots. Our experiments on colonoscopy videos demonstrate that the technique does not miss any meaningful operation shots and incurs a small number of false operation shots. Our prototype parsing software implements the operation shot detection technique along with our other techniques previously developed for colonoscopy videos. Our browsing tool enables users to quickly locate operation shots of interest. The proposed technique and software are useful (1) for post-procedure reviews and analyses for causes of complications due to biopsy or therapeutic operations, (2) for developing an effective content-based retrieval system for colonoscopy videos to facilitate endoscopic research and education, and (3) for development of a systematic approach to assess endoscopists' procedural skills.
{"title":"Parsing and browsing tools for colonoscopy videos","authors":"Yu Cao, Dalei Li, Wallapak Tavanapong, Jung-Hwan Oh, J. Wong, P. C. Groen","doi":"10.1145/1027527.1027723","DOIUrl":"https://doi.org/10.1145/1027527.1027723","url":null,"abstract":"Colonoscopy is an important screening tool for colorectal cancer. During a colonoscopic procedure, a tiny video camera at the tip of the endoscope generates a video signal of the internal mucosa of the colon. The video data are displayed on a monitor for real-time analysis by the endoscopist. We call videos captured from colonoscopic procedures <i>colonoscopy videos</i>. Because these videos possess unique characteristics, new types of semantic units and parsing techniques are required. In this paper, we define new semantic units called <i>operation shots</i>, each is a segment of visual and audio data that correspond to a therapeutic or biopsy operation. We introduce a new spatio-temporal analysis technique to detect operation shots. Our experiments on colonoscopy videos demonstrate that the technique does not miss any meaningful operation shots and incurs a small number of false operation shots. Our prototype parsing software implements the operation shot detection technique along with our other techniques previously developed for colonoscopy videos. Our browsing tool enables users to quickly locate operation shots of interest. The proposed technique and software are useful (1) for post-procedure reviews and analyses for causes of complications due to biopsy or therapeutic operations, (2) for developing an effective content-based retrieval system for colonoscopy videos to facilitate endoscopic research and education, and (3) for development of a systematic approach to assess endoscopists' procedural skills.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125100538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The variation of facial texture and surface due to the change of expression is an important cue for analyzing and modeling facial expressions. In this paper, we propose a new approach to represent the facial expression by using a so-called topographic feature. In order to capture the variation of facial surface structure, facial textures are processed by increasing the resolution. The topographical structure of human face is analyzed based on the resolution-enhanced textures. We investigate the relationship between the facial expression and its topographic features, and propose to represent the facial expression by the topographic labels. The detected topographic facial surface and the expressive regions reflect the status of facial skin movement. Based on the observation that the facial texture and its topographic features change along with facial expressions, we compare the disparity of these features between the neutral face and the expressive face to distinguish a number of universal expressions. The experiment demonstrates the feasibility of the proposed approach for facial expression representation and recognition.
{"title":"Facial expression representation and recognition based on texture augmentation and topographic masking","authors":"L. Yin, J. Loi, Wei Xiong","doi":"10.1145/1027527.1027580","DOIUrl":"https://doi.org/10.1145/1027527.1027580","url":null,"abstract":"The variation of facial texture and surface due to the change of expression is an important cue for analyzing and modeling facial expressions. In this paper, we propose a new approach to represent the facial expression by using a so-called topographic feature. In order to capture the variation of facial surface structure, facial textures are processed by increasing the resolution. The topographical structure of human face is analyzed based on the resolution-enhanced textures. We investigate the relationship between the facial expression and its topographic features, and propose to represent the facial expression by the topographic labels. The detected topographic facial surface and the expressive regions reflect the status of facial skin movement. Based on the observation that the facial texture and its topographic features change along with facial expressions, we compare the disparity of these features between the neutral face and the expressive face to distinguish a number of universal expressions. The experiment demonstrates the feasibility of the proposed approach for facial expression representation and recognition.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131449945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a novel approach to the construction of a projector-based augmented reality environment. The approach is based on capturing the dynamic changes of surfaces and projecting the images within a large real environment using a system that includes a laser range finder and a projector, whose optical axes are integrated by mirrors. The proposed method offers two distinct advances: (1) robust 3-D viewing point detection from consecutive range images, and (2) fast view-driven image generation and presentation with view frustum clipping to measured surfaces. A prototype system is shown to confirm the feasibility of the method; it generates view-driven images to suit the user's viewing position that are then projected within dynamic real environment, in real-time.
{"title":"Location-aware projection with robust 3-D viewing point detection and fast image deformation","authors":"J. Shimamura, K. Arakawa","doi":"10.1145/1027527.1027595","DOIUrl":"https://doi.org/10.1145/1027527.1027595","url":null,"abstract":"This paper describes a novel approach to the construction of a projector-based augmented reality environment. The approach is based on capturing the dynamic changes of surfaces and projecting the images within a large real environment using a system that includes a laser range finder and a projector, whose optical axes are integrated by mirrors. The proposed method offers two distinct advances: (1) robust 3-D viewing point detection from consecutive range images, and (2) fast view-driven image generation and presentation with view frustum clipping to measured surfaces. A prototype system is shown to confirm the feasibility of the method; it generates view-driven images to suit the user's viewing position that are then projected within dynamic real environment, in real-time.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"204 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134308462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard J. Anderson, C. Hoyer, Craig Prince, Jonathan Su, F. Videon, S. Wolfman
In this paper, we report on an empirical exploration of digital ink and speech usage in lecture presentation. We studied the video archives of five Master's level Computer Science courses to understand how instructors use ink and speech together while lecturing, and to evaluate techniques for analyzing digital ink. Our interest in understanding how ink and speech are used together is to inform the development of future tools for supporting classroom presentation, distance education, and viewing of archived lectures. We want to make it easier to interact with electronic materials and to extract information from them. We want to provide an empirical basis for addressing challenging problems such as automatically generating full text transcripts of lectures, matching speaker audio with slide content, and recognizing the meaning of the instructor's ink. Our results include an evaluation of handwritten word recognition in the lecture domain, an approach for associating attentional marks with content, an analysis of linkage between speech and ink, and an application of recognition techniques to infer speaker actions.
{"title":"Speech, ink, and slides: the interaction of content channels","authors":"Richard J. Anderson, C. Hoyer, Craig Prince, Jonathan Su, F. Videon, S. Wolfman","doi":"10.1145/1027527.1027713","DOIUrl":"https://doi.org/10.1145/1027527.1027713","url":null,"abstract":"In this paper, we report on an empirical exploration of digital ink and speech usage in lecture presentation. We studied the video archives of five Master's level Computer Science courses to understand how instructors use ink and speech together while lecturing, and to evaluate techniques for analyzing digital ink. Our interest in understanding how ink and speech are used together is to inform the development of future tools for supporting classroom presentation, distance education, and viewing of archived lectures. We want to make it easier to interact with electronic materials and to extract information from them. We want to provide an empirical basis for addressing challenging problems such as automatically generating full text transcripts of lectures, matching speaker audio with slide content, and recognizing the meaning of the instructor's ink. Our results include an evaluation of handwritten word recognition in the lecture domain, an approach for associating attentional marks with content, an analysis of linkage between speech and ink, and an application of recognition techniques to infer speaker actions.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114259673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Ahn, Sung-Hoon Sohn, Chei-Yol Kim, Gyuil Cha, Y. Baek, Sung-In Jung, Myungjoon Kim
The EXT3NS is a scalable file system designed to handle video streaming workload in large-scale on-demand streaming services. It is based on a special H/W device, called Network-Storage card (NS card), which aims at accelerating streaming operation by shortening the data path from storage device to network interface. The design objective of EXT3NS is to minimize the delay and the delay variance of I/O request in the sequential workload on NS card. Metadata structure, file organization, metadata structure, unit of storage, etc. are elaborately tailored to achieve this objective. Further, EXT3NS provides the standard API's to read and write files in storage unit of NS card. The streaming server utilizes it to gain high disk I/O bandwidth, to avoid unnecessary memory copies on the data path from disk to network, and to alleviates CPU's burden by offloading parts of network protocol processing, The EXT3NS is a full functional file system based on the popular EXT3. The performance measurements on our prototype video server show obvious performance improvements. Specifically, we obtain better results from file system benchmark program, and obtain performance improvements in disk read and network transmission, which leads to overall streaming performance increase. Especially, the streaming server shows much less server's CPU utilization and less fluctuation of client bit rate, hence more reliable streaming service is possible.
{"title":"Implementation and evaluation of EXT3NS multimedia file system","authors":"B. Ahn, Sung-Hoon Sohn, Chei-Yol Kim, Gyuil Cha, Y. Baek, Sung-In Jung, Myungjoon Kim","doi":"10.1145/1027527.1027668","DOIUrl":"https://doi.org/10.1145/1027527.1027668","url":null,"abstract":"The EXT3NS is a scalable file system designed to handle video streaming workload in large-scale on-demand streaming services. It is based on a special H/W device, called Network-Storage card (NS card), which aims at accelerating streaming operation by shortening the data path from storage device to network interface. The design objective of EXT3NS is to minimize the delay and the delay variance of I/O request in the sequential workload on NS card. Metadata structure, file organization, metadata structure, unit of storage, etc. are elaborately tailored to achieve this objective. Further, EXT3NS provides the standard API's to read and write files in storage unit of NS card. The streaming server utilizes it to gain high disk I/O bandwidth, to avoid unnecessary memory copies on the data path from disk to network, and to alleviates CPU's burden by offloading parts of network protocol processing, The EXT3NS is a full functional file system based on the popular EXT3. The performance measurements on our prototype video server show obvious performance improvements. Specifically, we obtain better results from file system benchmark program, and obtain performance improvements in disk read and network transmission, which leads to overall streaming performance increase. Especially, the streaming server shows much less server's CPU utilization and less fluctuation of client bit rate, hence more reliable streaming service is possible.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124445983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peer-to-Peer(P2P) media streaming has emerged as a promising solution to media streaming in large distributed systems such as the Internet. Several P2P media streaming solutions have been proposed by researchers, however they all implicitly assume peers are collaborative, thus they suffer from the selfish peers that are not willing to collaborate. In this paper we introduce an incentive mechanism to urge selfish peers to behave collaboratively. It combines the traditional reputation-based approach and an online streaming behavior monitoring scheme. Our preliminary results show that the overall performance achieved by collaborative peers do not suffer from the existence of non-collaborative peers. The incentive mechanism is orthogonal to the existing media streaming solutions and can be integrated into them.
{"title":"Collaboration-aware peer-to-peer media streaming","authors":"S. Ye, F. Makedon","doi":"10.1145/1027527.1027625","DOIUrl":"https://doi.org/10.1145/1027527.1027625","url":null,"abstract":"Peer-to-Peer(P2P) media streaming has emerged as a promising solution to media streaming in large distributed systems such as the Internet. Several P2P media streaming solutions have been proposed by researchers, however they all implicitly assume peers are collaborative, thus they suffer from the selfish peers that are not willing to collaborate. In this paper we introduce an incentive mechanism to urge selfish peers to behave collaboratively. It combines the traditional reputation-based approach and an online streaming behavior monitoring scheme. Our preliminary results show that the overall performance achieved by collaborative peers do not suffer from the existence of non-collaborative peers. The incentive mechanism is orthogonal to the existing media streaming solutions and can be integrated into them.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124193142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most current image retrieval systems and commercial search engines use mainly text annotations to index and retrieve WWW images. This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined list of concepts by fusing evidences from image contents and their associated HTML text. One major practical limitation of employing supervised machine learning approaches is that for effective learning, a large set of labeled training samples is needed. This is tedious and severely impedes the practical development of effective search techniques for WWW images, which are dynamic and fast-changing. As web-based images possess both intrinsic visual contents and text annotations, they provide a strong basis to bootstrap the learning process by adopting a co-training approach involving classifiers based on two orthogonal set of features -- visual and text. The idea of co-training is to start from a small set of labeled training samples, and successively annotate a larger set of unlabeled samples using the two orthogonal classifiers. We carry out experiments using a set of over 5,000 images acquired from the Web. We explore the use of different combinations of HTML text and visual representations. We find that our bootstrapping approach can achieve a performance comparable to that of the supervised learning approach with an F1 measure of over 54%. At the same time, it offers the added advantage of requiring only a small initial set of training samples.
{"title":"A bootstrapping framework for annotating and retrieving WWW images","authors":"Huamin Feng, Rui Shi, Tat-Seng Chua","doi":"10.1145/1027527.1027748","DOIUrl":"https://doi.org/10.1145/1027527.1027748","url":null,"abstract":"Most current image retrieval systems and commercial search engines use mainly text annotations to index and retrieve WWW images. This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined list of concepts by fusing evidences from image contents and their associated HTML text. One major practical limitation of employing supervised machine learning approaches is that for effective learning, a large set of labeled training samples is needed. This is tedious and severely impedes the practical development of effective search techniques for WWW images, which are dynamic and fast-changing. As web-based images possess both intrinsic visual contents and text annotations, they provide a strong basis to bootstrap the learning process by adopting a co-training approach involving classifiers based on two orthogonal set of features -- visual and text. The idea of co-training is to start from a small set of labeled training samples, and successively annotate a larger set of unlabeled samples using the two orthogonal classifiers. We carry out experiments using a set of over 5,000 images acquired from the Web. We explore the use of different combinations of HTML text and visual representations. We find that our bootstrapping approach can achieve a performance comparable to that of the supervised learning approach with an F1 measure of over 54%. At the same time, it offers the added advantage of requiring only a small initial set of training samples.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126185351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Internet is composed of many kinds of networks and the networks are composed of network nodes such as routers. Routers use processor power for forwarding each packet with any size. At that time, node processor would be a bottleneck in respect to the high throughput if there would be too many packets to forward. Then, authors propose the packet assembly method. This aims to decrease the number of packets for the reduction of processor load, based on the fact that there are many packets much smaller than maximum transferable unit in backbone network. For the examination of the packet assembly, authors conducted two experiments. One is the experiment that conducts the packet assembly method for the traffic of digital video, and it provides the comparison of the image of digital video forwarded via routers without packet assembly with the one with packet assembly, and transition of edge router load and core router load. The other is the experiment that conducts the packet assembly method for the traffic of VoIP, and investigated about the influence on PSQM score, latency, and jitter.
{"title":"Application of packet assembly technology to digital video and VoIP","authors":"T. Kanda, K. Shimamura","doi":"10.1145/1027527.1027620","DOIUrl":"https://doi.org/10.1145/1027527.1027620","url":null,"abstract":"The Internet is composed of many kinds of networks and the networks are composed of network nodes such as routers. Routers use processor power for forwarding each packet with any size. At that time, node processor would be a bottleneck in respect to the high throughput if there would be too many packets to forward. Then, authors propose the packet assembly method. This aims to decrease the number of packets for the reduction of processor load, based on the fact that there are many packets much smaller than maximum transferable unit in backbone network.\u0000 For the examination of the packet assembly, authors conducted two experiments. One is the experiment that conducts the packet assembly method for the traffic of digital video, and it provides the comparison of the image of digital video forwarded via routers without packet assembly with the one with packet assembly, and transition of edge router load and core router load. The other is the experiment that conducts the packet assembly method for the traffic of VoIP, and investigated about the influence on PSQM score, latency, and jitter.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129328962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}