Sean Andrist, T. Pejsa, Bilge Mutlu, Michael Gleicher
We present a parametric, computational model of head-eye coordination that can be used in the animation of directed gaze shifts for virtual characters. The model is based on research in human neurophysiology. It incorporates control parameters that allow for adapting gaze shifts to the characteristics of the environment, the gaze targets, and the idiosyncratic behavioral attributes of the virtual character. A user study confirms that the model communicates gaze targets as effectively as real humans do, while being preferred subjectively to state-of-the-art models.
{"title":"A head-eye coordination model for animating gaze shifts of virtual characters","authors":"Sean Andrist, T. Pejsa, Bilge Mutlu, Michael Gleicher","doi":"10.1145/2401836.2401840","DOIUrl":"https://doi.org/10.1145/2401836.2401840","url":null,"abstract":"We present a parametric, computational model of head-eye coordination that can be used in the animation of directed gaze shifts for virtual characters. The model is based on research in human neurophysiology. It incorporates control parameters that allow for adapting gaze shifts to the characteristics of the environment, the gaze targets, and the idiosyncratic behavioral attributes of the virtual character. A user study confirms that the model communicates gaze targets as effectively as real humans do, while being preferred subjectively to state-of-the-art models.","PeriodicalId":272657,"journal":{"name":"Gaze-In '12","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124284185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate human perception of robots' gaze direction is crucial for the design of a natural and fluent situated multimodal face-to-face interaction between humans and machines. In this paper, we present an experiment targeted at quantifying the effects of different gaze cues synthesized using the Furhat back-projected robot head, on the accuracy of perceived spatial direction of gaze by humans using 18 test subjects. The study first quantifies the accuracy of the perceived gaze direction in a human-human setup, and compares that to the use of synthesized gaze movements in different conditions: viewing the robot eyes frontal or at a 45 degrees angle side view. We also study the effect of 3D gaze by controlling both eyes to indicate the depth of the focal point (vergence), the use of gaze or head pose, and the use of static or dynamic eyelids. The findings of the study are highly relevant to the design and control of robots and animated agents in situated face-to-face interaction.
{"title":"Perception of gaze direction for situated interaction","authors":"Samer Al Moubayed, Gabriel Skantze","doi":"10.1145/2401836.2401839","DOIUrl":"https://doi.org/10.1145/2401836.2401839","url":null,"abstract":"Accurate human perception of robots' gaze direction is crucial for the design of a natural and fluent situated multimodal face-to-face interaction between humans and machines. In this paper, we present an experiment targeted at quantifying the effects of different gaze cues synthesized using the Furhat back-projected robot head, on the accuracy of perceived spatial direction of gaze by humans using 18 test subjects. The study first quantifies the accuracy of the perceived gaze direction in a human-human setup, and compares that to the use of synthesized gaze movements in different conditions: viewing the robot eyes frontal or at a 45 degrees angle side view. We also study the effect of 3D gaze by controlling both eyes to indicate the depth of the focal point (vergence), the use of gaze or head pose, and the use of static or dynamic eyelids. The findings of the study are highly relevant to the design and control of robots and animated agents in situated face-to-face interaction.","PeriodicalId":272657,"journal":{"name":"Gaze-In '12","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128248474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saori Yamamoto, Nazomu Teraya, Yumika Nakamura, N. Watanabe, Yande Lin, M. Bono, Yugo Takeuchi
This paper shows a prototype system that provides a natural multi-party conversation environment among participants in different places. Eye gaze is an important feature for maintaining smooth multi-party conversations because it indicates whom the speech is addressing or nominates the next speaker. Nevertheless, most popular video conversation systems, such as Skype or FaceTime, do not support the interaction of eye gaze. Serious confusion is caused in multi-party video conversation systems that have no eye gaze support. For example, who is the addressee of the speech? Who is the next speaker? We propose a simple multi-party video conversation environment called Ptolemaeus that realizes eye gaze interaction among more than three participants without any special equipment. This system provides natural turn-taking in face-to-face video conversations and can be implemented more easily than former schemes concerned with eye gaze interaction.
{"title":"Simple multi-party video conversation system focused on participant eye gaze: \"Ptolemaeus\" provides participants with smooth turn-taking","authors":"Saori Yamamoto, Nazomu Teraya, Yumika Nakamura, N. Watanabe, Yande Lin, M. Bono, Yugo Takeuchi","doi":"10.1145/2401836.2401851","DOIUrl":"https://doi.org/10.1145/2401836.2401851","url":null,"abstract":"This paper shows a prototype system that provides a natural multi-party conversation environment among participants in different places. Eye gaze is an important feature for maintaining smooth multi-party conversations because it indicates whom the speech is addressing or nominates the next speaker. Nevertheless, most popular video conversation systems, such as Skype or FaceTime, do not support the interaction of eye gaze. Serious confusion is caused in multi-party video conversation systems that have no eye gaze support. For example, who is the addressee of the speech? Who is the next speaker? We propose a simple multi-party video conversation environment called Ptolemaeus that realizes eye gaze interaction among more than three participants without any special equipment. This system provides natural turn-taking in face-to-face video conversations and can be implemented more easily than former schemes concerned with eye gaze interaction.","PeriodicalId":272657,"journal":{"name":"Gaze-In '12","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129275768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of virtual conversational agents is awaited in the tutoring of physical skills such as sports or dances. This paper describes about an ongoing project aiming to realize a virtual instructor for ballroom dance. First, a human-human experiment is conducted to collect the interaction corpus between a professional instructor and six learners. The verbal and non-verbal behaviors of the instructor is analyzed and served as the base of a state transition model for ballroom dance tutoring. In order to achieve intuitive and efficient instruction during the multi-modal interaction between the virtual instructor and the learner, the eye gaze patterns of the learner and the reaction from the instructor were analyzed. From the analysis results, it was found that the learner's attitude (confidence and concentration) could be approximated by their gaze patterns, and the instructor's tutoring strategy supported this as well.
{"title":"Analysis on learners' gaze patterns and the instructor's reactions in ballroom dance tutoring","authors":"Kosuke Kimura, Hung-Hsuan Huang, K. Kawagoe","doi":"10.1145/2401836.2401844","DOIUrl":"https://doi.org/10.1145/2401836.2401844","url":null,"abstract":"The use of virtual conversational agents is awaited in the tutoring of physical skills such as sports or dances. This paper describes about an ongoing project aiming to realize a virtual instructor for ballroom dance. First, a human-human experiment is conducted to collect the interaction corpus between a professional instructor and six learners. The verbal and non-verbal behaviors of the instructor is analyzed and served as the base of a state transition model for ballroom dance tutoring. In order to achieve intuitive and efficient instruction during the multi-modal interaction between the virtual instructor and the learner, the eye gaze patterns of the learner and the reaction from the instructor were analyzed. From the analysis results, it was found that the learner's attitude (confidence and concentration) could be approximated by their gaze patterns, and the instructor's tutoring strategy supported this as well.","PeriodicalId":272657,"journal":{"name":"Gaze-In '12","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128721979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a method for identifying the addressee based on speech and gaze information, and shows that the proposed method can be applicable to human-human-agent multiparty conversations in different proxemics. First, we collected human-human-agent interaction in different proxemics, and by analyzing the data, we found that people spoke with a higher tone of voice and more loudly and slowly when they talked to the agent. We also confirmed that this speech style was consistent regardless of the proxemics. Then, by employing SVM, we proposed a general addressee estimation model that can be used in different proxemics, and the model achieved over 80% accuracy in 10-fold cross-validation.
{"title":"Addressee identification for human-human-agent multiparty conversations in different proxemics","authors":"N. Baba, Hung-Hsuan Huang, Y. Nakano","doi":"10.1145/2401836.2401842","DOIUrl":"https://doi.org/10.1145/2401836.2401842","url":null,"abstract":"This paper proposes a method for identifying the addressee based on speech and gaze information, and shows that the proposed method can be applicable to human-human-agent multiparty conversations in different proxemics. First, we collected human-human-agent interaction in different proxemics, and by analyzing the data, we found that people spoke with a higher tone of voice and more loudly and slowly when they talked to the agent. We also confirmed that this speech style was consistent regardless of the proxemics. Then, by employing SVM, we proposed a general addressee estimation model that can be used in different proxemics, and the model achieved over 80% accuracy in 10-fold cross-validation.","PeriodicalId":272657,"journal":{"name":"Gaze-In '12","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129120290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In addition to the contents of their speech, people who are engaged in a conversation express themselves in many nonverbal ways. This means that people interact and are attended to even when they are not speaking. In this pilot study, we created an experimental setup for a three-party interactive situation where one of the participants remained silent throughout the session, and the gaze of one of the active subjects was tracked. The eye-tracked subject was unaware of the setup. The pilot study used only two test subjects, but the results provide some clues towards estimating how the behavior and activity of the non-speaking participant might affect other participants' conversational activity and the situation itself. We also found that the speaker's gaze activity is different in the beginning of the utterance than at the end of the utterance, indicating that the speaker's focus of attention towards the partner differs depending on the turn taking situation. Using the experience gained in this trial, we point out several things to consider that might help to avoid pitfalls when designing a more extensive study into the subject.
{"title":"Visual interaction and conversational activity","authors":"Andres Levitski, J. Radun, Kristiina Jokinen","doi":"10.1145/2401836.2401847","DOIUrl":"https://doi.org/10.1145/2401836.2401847","url":null,"abstract":"In addition to the contents of their speech, people who are engaged in a conversation express themselves in many nonverbal ways. This means that people interact and are attended to even when they are not speaking. In this pilot study, we created an experimental setup for a three-party interactive situation where one of the participants remained silent throughout the session, and the gaze of one of the active subjects was tracked. The eye-tracked subject was unaware of the setup. The pilot study used only two test subjects, but the results provide some clues towards estimating how the behavior and activity of the non-speaking participant might affect other participants' conversational activity and the situation itself. We also found that the speaker's gaze activity is different in the beginning of the utterance than at the end of the utterance, indicating that the speaker's focus of attention towards the partner differs depending on the turn taking situation. Using the experience gained in this trial, we point out several things to consider that might help to avoid pitfalls when designing a more extensive study into the subject.","PeriodicalId":272657,"journal":{"name":"Gaze-In '12","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114257658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an experiment that was conducted to investigate gaze combined with voice commands. There has been very little research about the design of voice commands for this kind of input. It is not known yet if users prefer longer sentences like in natural dialogues or short commands. In the experiment three different voice commands are compared during a simple task in which participants had to drag & drop, rotate, and resize objects. It turned out that the shortness of a voice command -- in terms of number of words -- is more important than it being absolutely natural. Participants preferred the voice command with the fewest words and the fewest syllables. For the voice commands which had the same number of syllables, the users also preferred the one with the fewest words, even though there were no big differences in time and errors.
{"title":"Move it there, or not?: the design of voice commands for gaze with speech","authors":"Monika Elepfandt, Martin Grund","doi":"10.1145/2401836.2401848","DOIUrl":"https://doi.org/10.1145/2401836.2401848","url":null,"abstract":"This paper presents an experiment that was conducted to investigate gaze combined with voice commands. There has been very little research about the design of voice commands for this kind of input. It is not known yet if users prefer longer sentences like in natural dialogues or short commands. In the experiment three different voice commands are compared during a simple task in which participants had to drag & drop, rotate, and resize objects. It turned out that the shortness of a voice command -- in terms of number of words -- is more important than it being absolutely natural. Participants preferred the voice command with the fewest words and the fewest syllables. For the voice commands which had the same number of syllables, the users also preferred the one with the fewest words, even though there were no big differences in time and errors.","PeriodicalId":272657,"journal":{"name":"Gaze-In '12","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133543592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The development of information communication technologies allows learners to study together with others through networks. To realize successful collaborative learning in such distributed environments, supporting their communication is important because participants acquire their knowledge through exchanging utterances. To address this issue, this paper proposes a communication supporting interface for network-based remote collaborative learning. In order to utilize communication opportunities, it is desirable that participants can be aware of the information in collaborative learning environments, and feels the sense of togetherness with others. Our system facilitates three types of awareness for the interface: awareness of participants, that of utterances, and contribution to discussion. We believe our system facilitates communication among the participants in CSCL environment.
{"title":"A communication support interface based on learning awareness for collaborative learning","authors":"Yuki Hayashi, T. Kojiri, Toyohide Watanabe","doi":"10.1145/2401836.2401854","DOIUrl":"https://doi.org/10.1145/2401836.2401854","url":null,"abstract":"The development of information communication technologies allows learners to study together with others through networks. To realize successful collaborative learning in such distributed environments, supporting their communication is important because participants acquire their knowledge through exchanging utterances. To address this issue, this paper proposes a communication supporting interface for network-based remote collaborative learning. In order to utilize communication opportunities, it is desirable that participants can be aware of the information in collaborative learning environments, and feels the sense of togetherness with others. Our system facilitates three types of awareness for the interface: awareness of participants, that of utterances, and contribution to discussion. We believe our system facilitates communication among the participants in CSCL environment.","PeriodicalId":272657,"journal":{"name":"Gaze-In '12","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128799193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erina Ishikawa, Ryo Yonetani, H. Kawashima, Takatsugu Hirayama, T. Matsuyama
This paper presents a novel framework to interpret eye movements using semantic relations and spatial layouts of displayed contents, i.e., the designed structure. We represent eye movements in a multi-scale, interval-based manner and associate them with various semantic relations derived from the designed structure. In preliminary experiments, we apply the proposed framework to the eye movements when browsing catalog contents, and confirm the effectiveness of the framework via user-state estimation.
{"title":"Semantic interpretation of eye movements using designed structures of displayed contents","authors":"Erina Ishikawa, Ryo Yonetani, H. Kawashima, Takatsugu Hirayama, T. Matsuyama","doi":"10.1145/2401836.2401853","DOIUrl":"https://doi.org/10.1145/2401836.2401853","url":null,"abstract":"This paper presents a novel framework to interpret eye movements using semantic relations and spatial layouts of displayed contents, i.e., the designed structure. We represent eye movements in a multi-scale, interval-based manner and associate them with various semantic relations derived from the designed structure. In preliminary experiments, we apply the proposed framework to the eye movements when browsing catalog contents, and confirm the effectiveness of the framework via user-state estimation.","PeriodicalId":272657,"journal":{"name":"Gaze-In '12","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131168621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eye-tracking presents an attractive tool in testing of design alternatives in all stages of interface evaluation. Access to the operator's visual attention behaviors provides information supporting design decisions. While mobile eye-tracking increases ecological validity it also brings about numerous constraints. In this work, we discuss mobile eye-tracking issues in the complex environment of a business jet flight simulator in industrial research settings. The cockpit and low illumination directly limited the setup of the eye-tracker and quality of recordings and evaluations. Here we present lessons learned and the best practices in setting up the eye-tracker under challenging simulation conditions.
{"title":"Hard lessons learned: mobile eye-tracking in cockpits","authors":"Hana Vrzakova, R. Bednarik","doi":"10.1145/2401836.2401843","DOIUrl":"https://doi.org/10.1145/2401836.2401843","url":null,"abstract":"Eye-tracking presents an attractive tool in testing of design alternatives in all stages of interface evaluation. Access to the operator's visual attention behaviors provides information supporting design decisions. While mobile eye-tracking increases ecological validity it also brings about numerous constraints. In this work, we discuss mobile eye-tracking issues in the complex environment of a business jet flight simulator in industrial research settings. The cockpit and low illumination directly limited the setup of the eye-tracker and quality of recordings and evaluations. Here we present lessons learned and the best practices in setting up the eye-tracker under challenging simulation conditions.","PeriodicalId":272657,"journal":{"name":"Gaze-In '12","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129323964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}