Pub Date : 1999-09-20DOI: 10.1109/PEOPLE.1999.798344
A. Hilton
This paper introduces a model-based approach to capturing a persons shape, appearance and movement. A 3D animated model of a clothed persons whole-body shape and appearance is automatically constructed from a set of orthogonal view colour images. The reconstructed model of a person is then used together with the least-squares inverse-kinematics framework of Bregler and Malik (1998) to capture simple 3D movements from a video image sequence.
{"title":"Towards model-based capture of a persons shape, appearance and motion","authors":"A. Hilton","doi":"10.1109/PEOPLE.1999.798344","DOIUrl":"https://doi.org/10.1109/PEOPLE.1999.798344","url":null,"abstract":"This paper introduces a model-based approach to capturing a persons shape, appearance and movement. A 3D animated model of a clothed persons whole-body shape and appearance is automatically constructed from a set of orthogonal view colour images. The reconstructed model of a person is then used together with the least-squares inverse-kinematics framework of Bregler and Malik (1998) to capture simple 3D movements from a video image sequence.","PeriodicalId":237701,"journal":{"name":"Proceedings IEEE International Workshop on Modelling People. MPeople'99","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114195691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-20DOI: 10.1109/PEOPLE.1999.798346
Jacob Ström, T. Jebara, S. Basu, A. Pentland
A real-time system for tracking and modeling of faces using an analysis-by-synthesis approach is presented. A 3D face model is texture-mapped with a head-on view of the face. Feature points in the face-texture are then selected based on image Hessians. The selected points of the rendered image are tracked in the incoming video using normalized correlation. The result is fed into an extended Kalman filter to recover camera geometry, head pose, and structure from motion. This information is used to rigidly move the face model to render the next image needed for tracking. Every point is tracked from the Kalman filter's estimated position. The variance of each measurement is estimated using a number of factors, including the residual error and the angle between the surface normal and the camera. The estimated head pose can be used to warp the face in the incoming video back to frontal position, and parts of the image can then be subject to eigenspace coding for efficient transmission. The mouth texture is transmitted in this way using 50 bits per frame plus overhead from the person specific eigenspace. The face tracking system runs at 30 Hz, coding the mouth texture slows it down to 12 Hz.
{"title":"Real time tracking and modeling of faces: an EKF-based analysis by synthesis approach","authors":"Jacob Ström, T. Jebara, S. Basu, A. Pentland","doi":"10.1109/PEOPLE.1999.798346","DOIUrl":"https://doi.org/10.1109/PEOPLE.1999.798346","url":null,"abstract":"A real-time system for tracking and modeling of faces using an analysis-by-synthesis approach is presented. A 3D face model is texture-mapped with a head-on view of the face. Feature points in the face-texture are then selected based on image Hessians. The selected points of the rendered image are tracked in the incoming video using normalized correlation. The result is fed into an extended Kalman filter to recover camera geometry, head pose, and structure from motion. This information is used to rigidly move the face model to render the next image needed for tracking. Every point is tracked from the Kalman filter's estimated position. The variance of each measurement is estimated using a number of factors, including the residual error and the angle between the surface normal and the camera. The estimated head pose can be used to warp the face in the incoming video back to frontal position, and parts of the image can then be subject to eigenspace coding for efficient transmission. The mouth texture is transmitted in this way using 50 bits per frame plus overhead from the person specific eigenspace. The face tracking system runs at 30 Hz, coding the mouth texture slows it down to 12 Hz.","PeriodicalId":237701,"journal":{"name":"Proceedings IEEE International Workshop on Modelling People. MPeople'99","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127108600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-20DOI: 10.1109/PEOPLE.1999.798342
C. Wren, Alex Pentland
Human motion can be understood on several levels. The most basic level is the notion that humans are collections of things that have predictable visual appearance. Next is the notion that humans exist in a physical universe, as a consequence of this, a large part of human motion can be modeled and predicted with the laws of physics. Finally there is the notion that humans utilize muscles to actively shape purposeful motion. We employ a recursive framework for real-time, 3-D tracking of human motion that enables pixel-level, probabilistic processes to take advantage of the contextual knowledge encoded in the higher-level models, including models of dynamic constraints on human motion. We will show that models of purposeful action arise naturally from this framework, and further, that those models can be used to improve the perception of human motion. Results are shown that demonstrate both qualitative and quantitative gains in tracking performance.
{"title":"Understanding purposeful human motion","authors":"C. Wren, Alex Pentland","doi":"10.1109/PEOPLE.1999.798342","DOIUrl":"https://doi.org/10.1109/PEOPLE.1999.798342","url":null,"abstract":"Human motion can be understood on several levels. The most basic level is the notion that humans are collections of things that have predictable visual appearance. Next is the notion that humans exist in a physical universe, as a consequence of this, a large part of human motion can be modeled and predicted with the laws of physics. Finally there is the notion that humans utilize muscles to actively shape purposeful motion. We employ a recursive framework for real-time, 3-D tracking of human motion that enables pixel-level, probabilistic processes to take advantage of the contextual knowledge encoded in the higher-level models, including models of dynamic constraints on human motion. We will show that models of purposeful action arise naturally from this framework, and further, that those models can be used to improve the perception of human motion. Results are shown that demonstrate both qualitative and quantitative gains in tracking performance.","PeriodicalId":237701,"journal":{"name":"Proceedings IEEE International Workshop on Modelling People. MPeople'99","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131032624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-20DOI: 10.1109/PEOPLE.1999.798351
Aphrodite Galata, Neil Johnson, D. Hogg
In recent years there has been an increased interest in the modelling and recognition of human activities involving highly structured and semantically rich behaviour such as dance, aerobics, and sign language. A novel approach is presented for automatically acquiring stochastic models of the high-level structure of an activity without the assumption of any prior knowledge. The process involves temporal segmentation into plausible atomic behaviour components and the use of variable length Markov models for the efficient representation of behaviours. Experimental results are presented which demonstrate the generation of realistic sample behaviours and evaluate the performance of models for long-term temporal prediction.
{"title":"Learning structured behaviour models using variable length Markov models","authors":"Aphrodite Galata, Neil Johnson, D. Hogg","doi":"10.1109/PEOPLE.1999.798351","DOIUrl":"https://doi.org/10.1109/PEOPLE.1999.798351","url":null,"abstract":"In recent years there has been an increased interest in the modelling and recognition of human activities involving highly structured and semantically rich behaviour such as dance, aerobics, and sign language. A novel approach is presented for automatically acquiring stochastic models of the high-level structure of an activity without the assumption of any prior knowledge. The process involves temporal segmentation into plausible atomic behaviour components and the use of variable length Markov models for the efficient representation of behaviours. Experimental results are presented which demonstrate the generation of realistic sample behaviours and evaluate the performance of models for long-term temporal prediction.","PeriodicalId":237701,"journal":{"name":"Proceedings IEEE International Workshop on Modelling People. MPeople'99","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134418094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-20DOI: 10.1109/PEOPLE.1999.798341
Eng-Jon Ong, S. Gong
A novel framework is proposed under which robust matching and tracking of a 3D skeleton model of a human body from multiple views can be performed We propose a method for measuring the ambiguity of 2D measurements provided by each view. The ambiguity measurement is then used for selecting the best view for the most accurate match and tracking. A hybrid 2D-3D representation is chosen for modelling human body poses. The hybrid model is learnt using hierarchical principal component analysis. The CONDENSATION algorithm is used to robustly track and match 3D skeleton models in individual views.
{"title":"Tracking hybrid 2D-3D human models from multiple views","authors":"Eng-Jon Ong, S. Gong","doi":"10.1109/PEOPLE.1999.798341","DOIUrl":"https://doi.org/10.1109/PEOPLE.1999.798341","url":null,"abstract":"A novel framework is proposed under which robust matching and tracking of a 3D skeleton model of a human body from multiple views can be performed We propose a method for measuring the ambiguity of 2D measurements provided by each view. The ambiguity measurement is then used for selecting the best view for the most accurate match and tracking. A hybrid 2D-3D representation is chosen for modelling human body poses. The hybrid model is learnt using hierarchical principal component analysis. The CONDENSATION algorithm is used to robustly track and match 3D skeleton models in individual views.","PeriodicalId":237701,"journal":{"name":"Proceedings IEEE International Workshop on Modelling People. MPeople'99","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127378067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-20DOI: 10.1109/PEOPLE.1999.798343
I. Douros, L. Dekker, B. Buxton
There are an increasing number of applications that require the construction of computerised human body models. The work presented here is a follow-up on a previously presented surface reconstruction algorithm that has been greatly improved by following a local approach. The advantages of the method are presented along with explanation of why its results are better compared to the previous algorithm. The result is a compound, multi-segment, and yet entirely smooth and watertight surface. Such a surface has a strong potential of being used for applications of a mainly medical nature, such as calculation of surface area and physical surface reconstruction for prosthetics manufacturing.
{"title":"An improved algorithm for reconstruction of the surface of the human body from 3D scanner data using local B-spline patches","authors":"I. Douros, L. Dekker, B. Buxton","doi":"10.1109/PEOPLE.1999.798343","DOIUrl":"https://doi.org/10.1109/PEOPLE.1999.798343","url":null,"abstract":"There are an increasing number of applications that require the construction of computerised human body models. The work presented here is a follow-up on a previously presented surface reconstruction algorithm that has been greatly improved by following a local approach. The advantages of the method are presented along with explanation of why its results are better compared to the previous algorithm. The result is a compound, multi-segment, and yet entirely smooth and watertight surface. Such a surface has a strong potential of being used for applications of a mainly medical nature, such as calculation of surface area and physical surface reconstruction for prosthetics manufacturing.","PeriodicalId":237701,"journal":{"name":"Proceedings IEEE International Workshop on Modelling People. MPeople'99","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121594945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Synthetic modeling of human bodies and the simulation of motion is a long-standing problem in animation and much work is involved before a near-realistic performance can be achieved. At present, it takes an experienced designer a very long time to build a complete and realistic model that closely resembles a specific person. Our ultimate goal is to automate the process and to produce realistic animation models given a set of video sequences. In this paper we show that, given video sequences of a person moving in front of the camera, we can recover shape information and joint locations. Both of which are essential to instantiate a complete and realistic model that closely resembles a specific person and without knowledge about the position of the articulations a character cannot be animated. This is achieved with minimal human intervention. The recovered shape and motion parameters can be used to reconstruct the original movement or to allow other animation models to mimic the subject's actions.
{"title":"Automated body modeling from video sequences","authors":"Ralf Plänkers, Pascal. Fua, Ralf. Plaenkers, Pascal. Fua","doi":"10.1109/PEOPLE.1999.798345","DOIUrl":"https://doi.org/10.1109/PEOPLE.1999.798345","url":null,"abstract":"Synthetic modeling of human bodies and the simulation of motion is a long-standing problem in animation and much work is involved before a near-realistic performance can be achieved. At present, it takes an experienced designer a very long time to build a complete and realistic model that closely resembles a specific person. Our ultimate goal is to automate the process and to produce realistic animation models given a set of video sequences. In this paper we show that, given video sequences of a person moving in front of the camera, we can recover shape information and joint locations. Both of which are essential to instantiate a complete and realistic model that closely resembles a specific person and without knowledge about the position of the articulations a character cannot be animated. This is achieved with minimal human intervention. The recovered shape and motion parameters can be used to reconstruct the original movement or to allow other animation models to mimic the subject's actions.","PeriodicalId":237701,"journal":{"name":"Proceedings IEEE International Workshop on Modelling People. MPeople'99","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127253321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-20DOI: 10.1109/PEOPLE.1999.798349
R. Stiefelhagen, Jie Yang, A. Waibel
In this paper, we present an approach to model focus of attention of participants in a meeting via hidden Markov models (HMM). We employ HMM to encode and track focus of attention, based on the participants' gaze information and knowledge of their positions. The positions of the participants are detected by face tracking in the view of a panoramic camera mounted on the meeting table. We use neural networks to estimate the participants' gaze from camera images. We discuss the implementation of the approach in detail, including system architecture, data collection, and evaluation. The system has achieved an accuracy rate of up to 93% in detecting focus of attention on test sequences taken from meetings. We have used focus of attention as an index in a multimedia meeting browser.
{"title":"Modeling people's focus of attention","authors":"R. Stiefelhagen, Jie Yang, A. Waibel","doi":"10.1109/PEOPLE.1999.798349","DOIUrl":"https://doi.org/10.1109/PEOPLE.1999.798349","url":null,"abstract":"In this paper, we present an approach to model focus of attention of participants in a meeting via hidden Markov models (HMM). We employ HMM to encode and track focus of attention, based on the participants' gaze information and knowledge of their positions. The positions of the participants are detected by face tracking in the view of a panoramic camera mounted on the meeting table. We use neural networks to estimate the participants' gaze from camera images. We discuss the implementation of the approach in detail, including system architecture, data collection, and evaluation. The system has achieved an accuracy rate of up to 93% in detecting focus of attention on test sequences taken from meetings. We have used focus of attention as an index in a multimedia meeting browser.","PeriodicalId":237701,"journal":{"name":"Proceedings IEEE International Workshop on Modelling People. MPeople'99","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127274656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-20DOI: 10.1109/PEOPLE.1999.798348
J. Amat, A. Casals, M. Frigola
Human body detection and tracking in a scene constitutes a very active working field clue to their applicability to many areas, specially as a man-machine interface (MMI) means. The system presented aims to improve the reliability and efficiency of teleoperation. The system is of application to teleoperated manipulation in civil applications such as big robots in shipyards, mines, public works or even cranes. Image segmentation is performed from movement detection. The recognition of moving bodies is verified by means of a simplified articulated cylindrical model, thus allowing to operate with a low computational cost.
{"title":"Stereoscopic system for human body tracking in natural scenes","authors":"J. Amat, A. Casals, M. Frigola","doi":"10.1109/PEOPLE.1999.798348","DOIUrl":"https://doi.org/10.1109/PEOPLE.1999.798348","url":null,"abstract":"Human body detection and tracking in a scene constitutes a very active working field clue to their applicability to many areas, specially as a man-machine interface (MMI) means. The system presented aims to improve the reliability and efficiency of teleoperation. The system is of application to teleoperated manipulation in civil applications such as big robots in shipyards, mines, public works or even cranes. Image segmentation is performed from movement detection. The recognition of moving bodies is verified by means of a simplified articulated cylindrical model, thus allowing to operate with a low computational cost.","PeriodicalId":237701,"journal":{"name":"Proceedings IEEE International Workshop on Modelling People. MPeople'99","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128784732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-20DOI: 10.1109/PEOPLE.1999.798347
Won-Sook Lee, Nadia Magnenat-Thalmann
This paper describes a simple and robust method for generating photo-realistic animated face population in a virtual world. First we make a small set of 3D-virtual faces just using photo data, using a method so called virtual cloning. Then we use very intuitive 3D-morphing system to generate a new population, which is profited from 3D-structure of existing virtual faces. The virtual cloning method uses a set of orthogonal pictures of a person. This efficient method for reconstructing 3D heads suitable for animation starts with the extraction of feature points from the orthogonal picture sets. A previously constructed, animation-ready generic model is transformed to each individualized head based on the features extracted from the orthogonal pictures. Using projections of the 3D head, a 2D texture image is obtained for an individual reconstructed from pictures, which is then fitted to the clone, a fully automated procedure resulting in 360-degree seamless texture mapping. We also introduce an extremely fast dynamic system for 3D morphing with 3D spatial interpolation and powerful 2D texture-image metamorphosis based on triangulation inherited from 3D structure of virtual faces. The interface allows for the real-time inspection and control of the morphing process.
{"title":"Generating a population of animated faces from pictures","authors":"Won-Sook Lee, Nadia Magnenat-Thalmann","doi":"10.1109/PEOPLE.1999.798347","DOIUrl":"https://doi.org/10.1109/PEOPLE.1999.798347","url":null,"abstract":"This paper describes a simple and robust method for generating photo-realistic animated face population in a virtual world. First we make a small set of 3D-virtual faces just using photo data, using a method so called virtual cloning. Then we use very intuitive 3D-morphing system to generate a new population, which is profited from 3D-structure of existing virtual faces. The virtual cloning method uses a set of orthogonal pictures of a person. This efficient method for reconstructing 3D heads suitable for animation starts with the extraction of feature points from the orthogonal picture sets. A previously constructed, animation-ready generic model is transformed to each individualized head based on the features extracted from the orthogonal pictures. Using projections of the 3D head, a 2D texture image is obtained for an individual reconstructed from pictures, which is then fitted to the clone, a fully automated procedure resulting in 360-degree seamless texture mapping. We also introduce an extremely fast dynamic system for 3D morphing with 3D spatial interpolation and powerful 2D texture-image metamorphosis based on triangulation inherited from 3D structure of virtual faces. The interface allows for the real-time inspection and control of the morphing process.","PeriodicalId":237701,"journal":{"name":"Proceedings IEEE International Workshop on Modelling People. MPeople'99","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131207203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}