Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5661991
Dejan Markovic, A. Canclini, F. Antonacci, A. Sarti, S. Tubaro
In this paper we present a visibility-based beam tracing solution for the simulation of the acoustics of environment that makes use of a projective geometry representation. More specifically, projective geometry turns out to be useful for the pre-computation of the visibility among all the reflectors in the environment. The simulation engine has a straightforward application in the rendering of the acoustics of virtual environments using loudspeaker arrays. More specifically, the acoustic wavefield is conceived as a superposition of acoustic beams, whose parameters (i.e. origin, orientation and aperture) are computed using the fast beam tracing methodology presented here. This information is processed by the rendering engine to compute spatial filters to be applied to the loudspeakers within the array. Simulative results show that an accurate simulation of the acoustic wavefield can be obtained using this approach.
{"title":"Visibility-based beam tracing for soundfield rendering","authors":"Dejan Markovic, A. Canclini, F. Antonacci, A. Sarti, S. Tubaro","doi":"10.1109/MMSP.2010.5661991","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661991","url":null,"abstract":"In this paper we present a visibility-based beam tracing solution for the simulation of the acoustics of environment that makes use of a projective geometry representation. More specifically, projective geometry turns out to be useful for the pre-computation of the visibility among all the reflectors in the environment. The simulation engine has a straightforward application in the rendering of the acoustics of virtual environments using loudspeaker arrays. More specifically, the acoustic wavefield is conceived as a superposition of acoustic beams, whose parameters (i.e. origin, orientation and aperture) are computed using the fast beam tracing methodology presented here. This information is processed by the rendering engine to compute spatial filters to be applied to the loudspeakers within the array. Simulative results show that an accurate simulation of the acoustic wavefield can be obtained using this approach.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121617715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advances in media technologies brought 3-D TV home and 4-D movies to your neighbour. We present a framework for 4-D broadcasting to bring 4-D entertainment home based on MPEG-V standard. A complete framework for 4-D entertainment from authoring of sensory effects to environment description and commanding rendering devices for the sensory effects can supported by MPEG-V and couple of other standards. Part 2 of MPEG-V provides tools for describing capabilities of the sensory devices and sensors, part 3 provides tools to describe sensory effects, and part 5 provides tools to actually interact with the sensory devices and sensors.
{"title":"4-D broadcasting with MPEG-V","authors":"Kyoungro Yoon, Bumsuk Choi, Eun-Seo Lee, Tae-Beom Lim","doi":"10.1109/MMSP.2010.5662029","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662029","url":null,"abstract":"Advances in media technologies brought 3-D TV home and 4-D movies to your neighbour. We present a framework for 4-D broadcasting to bring 4-D entertainment home based on MPEG-V standard. A complete framework for 4-D entertainment from authoring of sensory effects to environment description and commanding rendering devices for the sensory effects can supported by MPEG-V and couple of other standards. Part 2 of MPEG-V provides tools for describing capabilities of the sensory devices and sensors, part 3 provides tools to describe sensory effects, and part 5 provides tools to actually interact with the sensory devices and sensors.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115085833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5661990
Myung-Suk Song, Cha Zhang, D. Florêncio, Hong-Goo Kang
For many years, spatial (3D) sound using headphones has been widely used in a number of applications. A rich spatial sensation is obtained by using head related transfer functions (HRTF) and playing the appropriate sound through headphones. In theory, loudspeaker audio systems would be capable of rendering 3D sound fields almost as rich as headphones, as long as the room impulse responses (RIRs) between the loudspeakers and the ears are known. In practice, however, obtaining these RIRs is hard, and the performance of loudspeaker based systems is far from perfect. New hope has been recently raised by a system that tracks the user's head position and orientation, and incorporates them into the RIRs estimates in real time. That system made two simplifying assumptions: it used generic HRTFs, and it ignored room reverberation. In this paper we tackle the second problem: we incorporate a room reverberation estimate into the RIRs. Note that this is a nontrivial task: RIRs vary significantly with the listener's positions, and even if one could measure them at a few points, they are notoriously hard to interpolate. Instead, we take an indirect approach: we model the room, and from that model we obtain an estimate of the main reflections. Position and characteristics of walls do not vary with the users' movement, yet they allow to quickly compute an estimate of the RIR for each new user position. Of course the key question is whether the estimates are good enough. We show an improvement in localization perception of up to 32% (i.e., reducing average error from 23.5° to 15.9°).
{"title":"Enhancing loudspeaker-based 3D audio with room modeling","authors":"Myung-Suk Song, Cha Zhang, D. Florêncio, Hong-Goo Kang","doi":"10.1109/MMSP.2010.5661990","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661990","url":null,"abstract":"For many years, spatial (3D) sound using headphones has been widely used in a number of applications. A rich spatial sensation is obtained by using head related transfer functions (HRTF) and playing the appropriate sound through headphones. In theory, loudspeaker audio systems would be capable of rendering 3D sound fields almost as rich as headphones, as long as the room impulse responses (RIRs) between the loudspeakers and the ears are known. In practice, however, obtaining these RIRs is hard, and the performance of loudspeaker based systems is far from perfect. New hope has been recently raised by a system that tracks the user's head position and orientation, and incorporates them into the RIRs estimates in real time. That system made two simplifying assumptions: it used generic HRTFs, and it ignored room reverberation. In this paper we tackle the second problem: we incorporate a room reverberation estimate into the RIRs. Note that this is a nontrivial task: RIRs vary significantly with the listener's positions, and even if one could measure them at a few points, they are notoriously hard to interpolate. Instead, we take an indirect approach: we model the room, and from that model we obtain an estimate of the main reflections. Position and characteristics of walls do not vary with the users' movement, yet they allow to quickly compute an estimate of the RIR for each new user position. Of course the key question is whether the estimates are good enough. We show an improvement in localization perception of up to 32% (i.e., reducing average error from 23.5° to 15.9°).","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117171903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662057
Jianfeng Chen, Xiaojun Ma, Jun Yu Li
Rich media application enables plenty of interactive, information-rich services to enhance end user's viewing experience. In current standards released for such service, only one rendering space is defined to handle multiple types of content belonging to the same service. However, a new trend is to assign more than one terminal device to render rich media application in a cooperative way inside a digital connected home network. The conventional audio-visual synchronization mechanism is focused on the packet level QoS (quality of service) control with less consideration of viewing experience. However, the actual QoE (quality of experience) is the final viewer's subjective perception for the displaying visual element. In order to design an optimized media distribution system based on QoE, this paper firstly introduces a subjective visual synchronization test for the same or tight relating contents rendering in dual screens, where the relationship between delay variation and the end user's evaluation is explored. Secondly, a QoE based media distribution mechanism is proposed to dynamically adjust the media flow transmission rate by using delay variation reports from terminals; at the same time, the tradeoff between rate adaptation and buffer overload is also considered. Simulation results show the proposed algorithm can not only improve the overall QoE score under either discrete or continuous delay variations; but also outperform the delay guarantee solutions without consideration of QoE.
{"title":"QoE based adaptation mechanism for media distribution in connected home","authors":"Jianfeng Chen, Xiaojun Ma, Jun Yu Li","doi":"10.1109/MMSP.2010.5662057","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662057","url":null,"abstract":"Rich media application enables plenty of interactive, information-rich services to enhance end user's viewing experience. In current standards released for such service, only one rendering space is defined to handle multiple types of content belonging to the same service. However, a new trend is to assign more than one terminal device to render rich media application in a cooperative way inside a digital connected home network. The conventional audio-visual synchronization mechanism is focused on the packet level QoS (quality of service) control with less consideration of viewing experience. However, the actual QoE (quality of experience) is the final viewer's subjective perception for the displaying visual element. In order to design an optimized media distribution system based on QoE, this paper firstly introduces a subjective visual synchronization test for the same or tight relating contents rendering in dual screens, where the relationship between delay variation and the end user's evaluation is explored. Secondly, a QoE based media distribution mechanism is proposed to dynamically adjust the media flow transmission rate by using delay variation reports from terminals; at the same time, the tradeoff between rate adaptation and buffer overload is also considered. Simulation results show the proposed algorithm can not only improve the overall QoE score under either discrete or continuous delay variations; but also outperform the delay guarantee solutions without consideration of QoE.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126381792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662065
B. Gunel, E. Ekmekcioglu, A. Kondoz
Free viewpoint video enables the visualisation of a scene from arbitrary viewpoints and directions. However, this flexibility in video rendering provides a challenge in 3D media for achieving spatial synchronicity between the audio and video objects. When the viewpoint is changed, its effect on the perceived audio scene should be considered to avoid mismatches in the perceived positions of audiovisual objects. Spatial audio coding with such flexibility requires decomposing the sound scene into audio objects initially, and then synthesizing the new scene according to the geometric relations between the A/V capturing setup, selected viewpoint and the rendering system. This paper proposes a free viewpoint audio coding framework for 3D media systems utilising multiview cameras and a microphone array. A real-time source separation technique is used for object decomposition followed by spatial audio coding. Binaural, multichannel sound systems and wave field synthesis systems are addressed. Subjective test results shows that the method achieves spatial synchronicity for various viewpoints consistently, which is not possible by conventional recording techniques.
{"title":"Spatial synchronization of audiovisual objects by 3D audio object coding","authors":"B. Gunel, E. Ekmekcioglu, A. Kondoz","doi":"10.1109/MMSP.2010.5662065","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662065","url":null,"abstract":"Free viewpoint video enables the visualisation of a scene from arbitrary viewpoints and directions. However, this flexibility in video rendering provides a challenge in 3D media for achieving spatial synchronicity between the audio and video objects. When the viewpoint is changed, its effect on the perceived audio scene should be considered to avoid mismatches in the perceived positions of audiovisual objects. Spatial audio coding with such flexibility requires decomposing the sound scene into audio objects initially, and then synthesizing the new scene according to the geometric relations between the A/V capturing setup, selected viewpoint and the rendering system. This paper proposes a free viewpoint audio coding framework for 3D media systems utilising multiview cameras and a microphone array. A real-time source separation technique is used for object decomposition followed by spatial audio coding. Binaural, multichannel sound systems and wave field synthesis systems are addressed. Subjective test results shows that the method achieves spatial synchronicity for various viewpoints consistently, which is not possible by conventional recording techniques.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124366614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662064
P. Kisilev, Sagi Schein
In this paper we present a novel method for high quality real-time video enhancement; it improves the sharpness and the contrast of video streams, and simultaneously suppresses noise. The method is comprised of three main modules: (1) noise analysis, (2) spatial processing, based on a new multi-scale pseudo-bilateral filter, and (3) temporal processing that includes robust motion detection and recursive temporal noise filtering. To achieve video frame rates for HD signals used in high-end telepresence systems such as the HP Halo room, we employ the computational capacity of modern graphics cards (GPUs), and distribute the tasks of analysis and of processing between CPU and GPU. The proposed scheme allows achieving video quality which is comparable with high-end camera systems, while using much lower cost cameras and reducing channel bandwidth requirements.
{"title":"Real-time video enhancement for high quality videoconferencing","authors":"P. Kisilev, Sagi Schein","doi":"10.1109/MMSP.2010.5662064","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662064","url":null,"abstract":"In this paper we present a novel method for high quality real-time video enhancement; it improves the sharpness and the contrast of video streams, and simultaneously suppresses noise. The method is comprised of three main modules: (1) noise analysis, (2) spatial processing, based on a new multi-scale pseudo-bilateral filter, and (3) temporal processing that includes robust motion detection and recursive temporal noise filtering. To achieve video frame rates for HD signals used in high-end telepresence systems such as the HP Halo room, we employ the computational capacity of modern graphics cards (GPUs), and distribute the tasks of analysis and of processing between CPU and GPU. The proposed scheme allows achieving video quality which is comparable with high-end camera systems, while using much lower cost cameras and reducing channel bandwidth requirements.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123373837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662047
François de Sorbier, Yuko Uematsu, H. Saito
Stereoscopic displays are becoming very popular since more and more contents are now available. As an extension, auto-stereoscopic screens allow several users to watch stereoscopic images without wearing any glasses. For the moment, synthetized content are the easiest solutions to provide, in realtime, all the multiple input images required by such kind of technology. However, live videos are a very important issue in some fields like augmented reality applications, but remain difficult to be applied on auto-stereoscopic displays. In this paper, we present a system based on a depth camera and a color camera that are combined to produce the multiple input images in realtime. The result of this approach can be easily used with any kind of auto-stereoscopic screen.
{"title":"Depth camera based system for auto-stereoscopic displays","authors":"François de Sorbier, Yuko Uematsu, H. Saito","doi":"10.1109/MMSP.2010.5662047","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662047","url":null,"abstract":"Stereoscopic displays are becoming very popular since more and more contents are now available. As an extension, auto-stereoscopic screens allow several users to watch stereoscopic images without wearing any glasses. For the moment, synthetized content are the easiest solutions to provide, in realtime, all the multiple input images required by such kind of technology. However, live videos are a very important issue in some fields like augmented reality applications, but remain difficult to be applied on auto-stereoscopic displays. In this paper, we present a system based on a depth camera and a color camera that are combined to produce the multiple input images in realtime. The result of this approach can be easily used with any kind of auto-stereoscopic screen.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"283 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121447375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5661986
S. Valente, M. Tagliasacchi, F. Antonacci, Paolo Bestagini, A. Sarti, S. Tubaro
This paper proposes a method that solves the problem of geometric calibration of microphone arrays. We consider a distributed system, in which each array is controlled by separate acquisition devices that do not share a common synchronization clock. Given a set of probing sources, e.g. loudspeakers, each array computes an estimate of the source locations using a conventional TDOA-based algorithm. These observations are fused together by the proposed method, in order to estimate the position and pose of one array with respect to the other. Unlike previous approaches, we explicitly consider the anisotropic distribution of localization errors. As such, the proposed method is able to address the problem of geometric calibration when the probing sources are located both in the near- and far-field of the microphone arrays. Experimental results demonstrate that the improvement in terms of calibration accuracy with respect to state-of-the-art algorithms can be substantial, especially in the far-field.
{"title":"Geometric calibration of distributed microphone arrays from acoustic source correspondences","authors":"S. Valente, M. Tagliasacchi, F. Antonacci, Paolo Bestagini, A. Sarti, S. Tubaro","doi":"10.1109/MMSP.2010.5661986","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661986","url":null,"abstract":"This paper proposes a method that solves the problem of geometric calibration of microphone arrays. We consider a distributed system, in which each array is controlled by separate acquisition devices that do not share a common synchronization clock. Given a set of probing sources, e.g. loudspeakers, each array computes an estimate of the source locations using a conventional TDOA-based algorithm. These observations are fused together by the proposed method, in order to estimate the position and pose of one array with respect to the other. Unlike previous approaches, we explicitly consider the anisotropic distribution of localization errors. As such, the proposed method is able to address the problem of geometric calibration when the probing sources are located both in the near- and far-field of the microphone arrays. Experimental results demonstrate that the improvement in terms of calibration accuracy with respect to state-of-the-art algorithms can be substantial, especially in the far-field.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125015390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662067
A. Barachant, S. Bonnet, M. Congedo, C. Jutten
This paper presents a link between the well known Common Spatial Pattern (CSP) algorithm and Riemannian geometry in the context of Brain Computer Interface (BCI). It will be shown that CSP spatial filtering and Log variance features extraction can be resumed as a computation of a Riemann distance in the space of covariances matrices. This fact yields to highlight several approximations with respect to the space topology. According to these conclusions, we propose an improvement of classical CSP method.
{"title":"Common Spatial Pattern revisited by Riemannian geometry","authors":"A. Barachant, S. Bonnet, M. Congedo, C. Jutten","doi":"10.1109/MMSP.2010.5662067","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662067","url":null,"abstract":"This paper presents a link between the well known Common Spatial Pattern (CSP) algorithm and Riemannian geometry in the context of Brain Computer Interface (BCI). It will be shown that CSP spatial filtering and Log variance features extraction can be resumed as a computation of a Riemann distance in the space of covariances matrices. This fact yields to highlight several approximations with respect to the space topology. According to these conclusions, we propose an improvement of classical CSP method.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128839819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662012
Sandy Martedi, Hideaki Uchiyama, H. Saito
This paper presents an Augmented Reality (AR) system for physical text documents that enable users to click a document. In the system, we track the relative pose between a camera and a document to overlay some virtual contents on the document continuously. In addition, we compute the trajectory of a fingertip based on skin color detection for clicking interaction. By merging a document tracking and an interaction technique, we have developed a novel tangible document system. As an application, we develop an AR dictionary system that overlays the meaning and explanation of words by clicking on a document. In the experiment part, we present the accuracy of the clicking interaction and the robustness of our document tracking method against the occlusion.
{"title":"Clickable augmented documents","authors":"Sandy Martedi, Hideaki Uchiyama, H. Saito","doi":"10.1109/MMSP.2010.5662012","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662012","url":null,"abstract":"This paper presents an Augmented Reality (AR) system for physical text documents that enable users to click a document. In the system, we track the relative pose between a camera and a document to overlay some virtual contents on the document continuously. In addition, we compute the trajectory of a fingertip based on skin color detection for clicking interaction. By merging a document tracking and an interaction technique, we have developed a novel tangible document system. As an application, we develop an AR dictionary system that overlays the meaning and explanation of words by clicking on a document. In the experiment part, we present the accuracy of the clicking interaction and the robustness of our document tracking method against the occlusion.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121688605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}