Recent research has reopened debates about (neo)Whorfian claims that the language one speaks has an impact on how one thinks---long discounted by mainstream linguistics and anthropology alike. Some of the most striking evidence for such possible impact derives, not surprisingly, from understudied "exotic" languages and, somewhat more surprisingly, from multimodal and notably gestural practices in communities which speak them. In particular, some of my own work on GuuguYimithirr, a Paman language spoken by Aboriginal people in northeastern Australia, and on Tzotzil, a language spoken by Mayan peasants in southeastern Mexico, suggests strong connections between linguistic expressions of spatial relations, gestural practices in talking about location and motion, and cognitive representations of space---what have come to be called spatial "Frames of Reference." In this talk, I will present some of the evidence for such connections, and add to the mix evidence from an emerging, first generation sign language developed spontaneously in a single family by deaf siblings who have had contact with neither other deaf people nor any other sign language.
{"title":"Language and thought: talking, gesturing (and signing) about space","authors":"J. Haviland","doi":"10.1145/1891903.1891905","DOIUrl":"https://doi.org/10.1145/1891903.1891905","url":null,"abstract":"Recent research has reopened debates about (neo)Whorfian claims that the language one speaks has an impact on how one thinks---long discounted by mainstream linguistics and anthropology alike. Some of the most striking evidence for such possible impact derives, not surprisingly, from understudied \"exotic\" languages and, somewhat more surprisingly, from multimodal and notably gestural practices in communities which speak them. In particular, some of my own work on GuuguYimithirr, a Paman language spoken by Aboriginal people in northeastern Australia, and on Tzotzil, a language spoken by Mayan peasants in southeastern Mexico, suggests strong connections between linguistic expressions of spatial relations, gestural practices in talking about location and motion, and cognitive representations of space---what have come to be called spatial \"Frames of Reference.\" In this talk, I will present some of the evidence for such connections, and add to the mix evidence from an emerging, first generation sign language developed spontaneously in a single family by deaf siblings who have had contact with neither other deaf people nor any other sign language.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132868408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco Oliveira, H. Cowan, Bing Fang, Francis K. H. Quek
This paper presents research that shows that a high degree of skilled performance is required for multimodal discourse support. We discuss how students who are blind or visually impaired (SBVI) were able to understand the instructor's pointing gestures during planar geometry and trigonometry classes. For that, the SBVI must attend to the instructor's speech and have simultaneous access to the instructional graphic material, and to the where the instructor is pointing. We developed the Haptic Deictic System - HDS, capable of tracking the instructor's pointing and informing the SBVI, through a haptic glove, where she needs to move her hand understand the instructor's illustration-augmented discourse. Several challenges had to be overcome before the SBVI were able to engage in fluid multimodal discourse with the help of the HDS. We discuss how such challenges were addressed with respect to perception and discourse (especially to mathematics instruction).
本文的研究表明,多模态语篇支持需要高度的熟练表现。我们讨论了学生谁是盲人或视障(SBVI)能够理解教师的指指手势在平面几何和三角课程。为此,SBVI必须参加教师的演讲,并同时访问教学图形材料,并到教师所指向的地方。我们开发了Haptic Deictic System - HDS,能够通过触觉手套跟踪教练的指示并通知SBVI,她需要在哪里移动她的手来理解教练的插图增强话语。必须克服若干挑战,才能使SBVI能够在HDS的帮助下进行流畅的多模态对话。我们讨论了如何在感知和话语(特别是数学教学)方面解决这些挑战。
{"title":"Enabling multimodal discourse for the blind","authors":"Francisco Oliveira, H. Cowan, Bing Fang, Francis K. H. Quek","doi":"10.1145/1891903.1891927","DOIUrl":"https://doi.org/10.1145/1891903.1891927","url":null,"abstract":"This paper presents research that shows that a high degree of skilled performance is required for multimodal discourse support. We discuss how students who are blind or visually impaired (SBVI) were able to understand the instructor's pointing gestures during planar geometry and trigonometry classes. For that, the SBVI must attend to the instructor's speech and have simultaneous access to the instructional graphic material, and to the where the instructor is pointing. We developed the Haptic Deictic System - HDS, capable of tracking the instructor's pointing and informing the SBVI, through a haptic glove, where she needs to move her hand understand the instructor's illustration-augmented discourse. Several challenges had to be overcome before the SBVI were able to engage in fluid multimodal discourse with the help of the HDS. We discuss how such challenges were addressed with respect to perception and discourse (especially to mathematics instruction).","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130937112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vicente Alabau, Daniel Ortiz-Martínez, A. Sanchís, F. Casacuberta
Interactive machine translation (IMT) [1] is an alternative approach to machine translation, integrating human expertise into the automatic translation process. In this framework, a human iteratively interacts with a system until the output desired by the human is completely generated. Traditionally, interaction has been performed using a keyboard and a mouse. However, the use of touchscreens has been popularised recently. Many touchscreen devices already exist in the market, namely mobile phones, laptops and tablet computers like the iPad. In this work, we propose a new interaction modality to take advantage of such devices, for which online handwritten text seems a very natural way of input. Multimodality is formulated as an extension to the traditional IMT protocol where the user can amend errors by writing text with an electronic pen or a stylus on a touchscreen. Different approaches to modality fusion have been studied. In addition, these approaches have been assessed on the Xerox task. Finally, a thorough study of the errors committed by the online handwritten system will show future work directions.
{"title":"Multimodal interactive machine translation","authors":"Vicente Alabau, Daniel Ortiz-Martínez, A. Sanchís, F. Casacuberta","doi":"10.1145/1891903.1891960","DOIUrl":"https://doi.org/10.1145/1891903.1891960","url":null,"abstract":"Interactive machine translation (IMT) [1] is an alternative approach to machine translation, integrating human expertise into the automatic translation process. In this framework, a human iteratively interacts with a system until the output desired by the human is completely generated. Traditionally, interaction has been performed using a keyboard and a mouse. However, the use of touchscreens has been popularised recently. Many touchscreen devices already exist in the market, namely mobile phones, laptops and tablet computers like the iPad. In this work, we propose a new interaction modality to take advantage of such devices, for which online handwritten text seems a very natural way of input. Multimodality is formulated as an extension to the traditional IMT protocol where the user can amend errors by writing text with an electronic pen or a stylus on a touchscreen. Different approaches to modality fusion have been studied. In addition, these approaches have been assessed on the Xerox task. Finally, a thorough study of the errors committed by the online handwritten system will show future work directions.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116356177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic audio feedback enriches the interaction with a mobile device. Novel sensor technologies and audio synthesis tools provide infinite number of possibilities to design the interaction between the sensory input and audio output. This paper presents a study where vocal sketching was used as prototype method to grasp ideas and expectations in early stages of designing multimodal interaction. We introduce an experiment where a graspable mobile device was given to the participants and urged to sketch vocally the sounds to be produced when using the device in a communication and musical expression scenarios. The sensory input methods were limited to gestures such as touch, squeeze and movements. Vocal sketching let us to examine closer how gesture and sound could be coupled in the use of our prototype device, such as moving the device upwards with elevating pitch. The results reported in this paper have already informed our opinions and expectations towards the actual design phase of the audio modality.
{"title":"Vocal sketching: a prototype tool for designing multimodal interaction","authors":"Koray Tahiroglu, T. Ahmaniemi","doi":"10.1145/1891903.1891956","DOIUrl":"https://doi.org/10.1145/1891903.1891956","url":null,"abstract":"Dynamic audio feedback enriches the interaction with a mobile device. Novel sensor technologies and audio synthesis tools provide infinite number of possibilities to design the interaction between the sensory input and audio output. This paper presents a study where vocal sketching was used as prototype method to grasp ideas and expectations in early stages of designing multimodal interaction. We introduce an experiment where a graspable mobile device was given to the participants and urged to sketch vocally the sounds to be produced when using the device in a communication and musical expression scenarios. The sensory input methods were limited to gestures such as touch, squeeze and movements. Vocal sketching let us to examine closer how gesture and sound could be coupled in the use of our prototype device, such as moving the device upwards with elevating pitch. The results reported in this paper have already informed our opinions and expectations towards the actual design phase of the audio modality.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134190887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jean-Yves Lionel Lawson, Mathieu Coterot, C. Carincotte, B. Macq
In order to support interactive high-fidelity prototyping of post-WIMP user interactions, we propose a multi-fidelity design method based on a unifying component-based model and supported by an advanced tool suite, the OpenInterface Platform Workbench. Our approach strives for supporting a collaborative (programmer-designer) and user-centered design activity. The workbench architecture allows exploration of novel interaction techniques through seamless integration and adaptation of heterogeneous components, high-fidelity rapid prototyping, runtime evaluation and fine-tuning of designed systems. This paper illustrates through the iterative construction of a running example how OpenInterface allows the leverage of existing resources and fosters the creation of non-conventional interaction techniques.
{"title":"Component-based high fidelity interactive prototyping of post-WIMP interactions","authors":"Jean-Yves Lionel Lawson, Mathieu Coterot, C. Carincotte, B. Macq","doi":"10.1145/1891903.1891961","DOIUrl":"https://doi.org/10.1145/1891903.1891961","url":null,"abstract":"In order to support interactive high-fidelity prototyping of post-WIMP user interactions, we propose a multi-fidelity design method based on a unifying component-based model and supported by an advanced tool suite, the OpenInterface Platform Workbench. Our approach strives for supporting a collaborative (programmer-designer) and user-centered design activity. The workbench architecture allows exploration of novel interaction techniques through seamless integration and adaptation of heterogeneous components, high-fidelity rapid prototyping, runtime evaluation and fine-tuning of designed systems. This paper illustrates through the iterative construction of a running example how OpenInterface allows the leverage of existing resources and fosters the creation of non-conventional interaction techniques.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130127209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper shows the IMADE (Interaction Measurement, Analysis, and Design Environment) project to build a recording and anlyzing environment of human conversational interactions. The IMADE room is designed to record audio/visual, human-motion, eye gazing data for building interaction corpus mainly focusing on understanding of human nonverbal behaviors. In this paper, we show the notion of interaction corpus and iCorpusStudio, software environment for browsing and analyzing the interaction corpus. We also present a preliminary experiment on multiparty conversations.
{"title":"Analysis environment of conversational structure with nonverbal multimodal data","authors":"Y. Sumi, M. Yano, T. Nishida","doi":"10.1145/1891903.1891958","DOIUrl":"https://doi.org/10.1145/1891903.1891958","url":null,"abstract":"This paper shows the IMADE (Interaction Measurement, Analysis, and Design Environment) project to build a recording and anlyzing environment of human conversational interactions. The IMADE room is designed to record audio/visual, human-motion, eye gazing data for building interaction corpus mainly focusing on understanding of human nonverbal behaviors. In this paper, we show the notion of interaction corpus and iCorpusStudio, software environment for browsing and analyzing the interaction corpus. We also present a preliminary experiment on multiparty conversations.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117275371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quantifying the relationship between group dynamics and group performance is a key issue of increasing group performance. In this paper, we will discuss how group performance is related to several heuristics about group dynamics in performing several typical tasks. We will also give our novel stochastic modeling in learning the structure of group dynamics. Our performance estimators account for between 40 and 60% of the variance across range of group problem solving tasks.
{"title":"Quantifying group problem solving with stochastic analysis","authors":"Wen Dong, A. Pentland","doi":"10.1145/1891903.1891954","DOIUrl":"https://doi.org/10.1145/1891903.1891954","url":null,"abstract":"Quantifying the relationship between group dynamics and group performance is a key issue of increasing group performance. In this paper, we will discuss how group performance is related to several heuristics about group dynamics in performing several typical tasks. We will also give our novel stochastic modeling in learning the structure of group dynamics. Our performance estimators account for between 40 and 60% of the variance across range of group problem solving tasks.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121007646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natural head motion is an indispensable part of realistic facial animation. This paper presents a novel approach to synthesize natural head motion automatically based on grammatical and prosodic features, which are extracted by the text analysis part of a Chinese Text-to-Speech (TTS) system. A two-layer clustering method is proposed to determine elementary head motion patterns from a multimodal database which covers six emotional states. The mapping problem between textual information and elementary head motion patterns is modeled by Classification and Regression Trees (CART). With the emotional state specified by users, results from text analysis are utilized to drive corresponding CART model to create emotional head motion sequence. Then, the generated sequence is interpolated by spineand us ed to drive a Chinese text-driven avatar. The comparison experiment indicates that this approach provides a better head motion and an engaging human-computer comparing to random or none head motion.
{"title":"Mood avatar: automatic text-driven head motion synthesis","authors":"Kaihui Mu, J. Tao, Jianfeng Che, Minghao Yang","doi":"10.1145/1891903.1891951","DOIUrl":"https://doi.org/10.1145/1891903.1891951","url":null,"abstract":"Natural head motion is an indispensable part of realistic facial animation. This paper presents a novel approach to synthesize natural head motion automatically based on grammatical and prosodic features, which are extracted by the text analysis part of a Chinese Text-to-Speech (TTS) system. A two-layer clustering method is proposed to determine elementary head motion patterns from a multimodal database which covers six emotional states. The mapping problem between textual information and elementary head motion patterns is modeled by Classification and Regression Trees (CART). With the emotional state specified by users, results from text analysis are utilized to drive corresponding CART model to create emotional head motion sequence. Then, the generated sequence is interpolated by spineand us ed to drive a Chinese text-driven avatar. The comparison experiment indicates that this approach provides a better head motion and an engaging human-computer comparing to random or none head motion.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"511 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131573412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interaction techniques that require users to adopt new behaviors mean that designers must take into account social acceptability and user experience otherwise the techniques may be rejected by users as they are too embarrassing to do in public. This research uses a set of low cost prototypes to study social acceptability and user perceptions of multimodal mobile interaction techniques early on in the design process. We describe 4 prototypes that were used with 8 focus groups to evaluate user perceptions of novel multimodal interactions using gesture, speech and nonspeech sounds, and gain feedback about the usefulness of the prototypes for studying social acceptability. The results of this research describe user perceptions of social acceptability and the realities of using multimodal interaction techniques in daily life. The results also describe key differences between young users (18-29) and older users (70-95) with respect to evaluation and approach to understanding these interaction techniques.
{"title":"Gesture and voice prototyping for early evaluations of social acceptability in multimodal interfaces","authors":"J. Williamson, S. Brewster","doi":"10.1145/1891903.1891925","DOIUrl":"https://doi.org/10.1145/1891903.1891925","url":null,"abstract":"Interaction techniques that require users to adopt new behaviors mean that designers must take into account social acceptability and user experience otherwise the techniques may be rejected by users as they are too embarrassing to do in public. This research uses a set of low cost prototypes to study social acceptability and user perceptions of multimodal mobile interaction techniques early on in the design process. We describe 4 prototypes that were used with 8 focus groups to evaluate user perceptions of novel multimodal interactions using gesture, speech and nonspeech sounds, and gain feedback about the usefulness of the prototypes for studying social acceptability. The results of this research describe user perceptions of social acceptability and the realities of using multimodal interaction techniques in daily life. The results also describe key differences between young users (18-29) and older users (70-95) with respect to evaluation and approach to understanding these interaction techniques.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122183384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a novel approach to discovering latent structures from multimodal time series. We view a time series as observed data from an underlying dynamical system. In this way, analyzing multimodal time series can be viewed as finding latent structures from dynamical systems. In light this, our approach is based on the concept of generating partition which is the theoretically best symbolization of time series maximizing the information of the underlying original continuous dynamical system. However, generating partition is difficult to achieve for time series without explicit dynamical equations. Different from most previous approaches that attempt to approximate generating partition through various deterministic symbolization processes, our algorithm maintains and estimates a probabilistic distribution over a symbol set for each data point in a time series. To do so, we develop a Bayesian framework for probabilistic symbolization and demonstrate that the approach can be successfully applied to both simulated data and empirical data from multimodal agent-agent interactions. We suggest this unsupervised learning algorithm has a potential to be used in various multimodal datasets as first steps to identify underlying structures between temporal variables.
{"title":"Analyzing multimodal time series as dynamical systems","authors":"S. Hidaka, Chen Yu","doi":"10.1145/1891903.1891968","DOIUrl":"https://doi.org/10.1145/1891903.1891968","url":null,"abstract":"We propose a novel approach to discovering latent structures from multimodal time series. We view a time series as observed data from an underlying dynamical system. In this way, analyzing multimodal time series can be viewed as finding latent structures from dynamical systems. In light this, our approach is based on the concept of generating partition which is the theoretically best symbolization of time series maximizing the information of the underlying original continuous dynamical system. However, generating partition is difficult to achieve for time series without explicit dynamical equations. Different from most previous approaches that attempt to approximate generating partition through various deterministic symbolization processes, our algorithm maintains and estimates a probabilistic distribution over a symbol set for each data point in a time series. To do so, we develop a Bayesian framework for probabilistic symbolization and demonstrate that the approach can be successfully applied to both simulated data and empirical data from multimodal agent-agent interactions. We suggest this unsupervised learning algorithm has a potential to be used in various multimodal datasets as first steps to identify underlying structures between temporal variables.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116874304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}