Stuttering is a common speech disfluency that may persist into adulthood if not treated in its early stages. Techniques from spoken language understanding may be applied to provide auto-mated diagnoses of stuttering from voice recordings; however,there are several difficulties, including the lack of training data involving young children and the high dimensionality of these data. This study investigates how automatic speech recognition(ASR) could help clinicians by providing a tool that automatically recognises stuttering events and provides a useful written transcription of what was said. In addition, to enhance the performance of ASR and to alleviate the lack of stuttering data, this study examines the effect of augmenting the language model with artificially generated data. The performance of the ASR tool with and without language model augmentation is com-pared. Following language model augmentation, the ASR tool’s performance improved recall from 38% to 62.2% and precision from 56.58% to 71%. When mis-recognised events are more coarsely classified as stuttering/ non-stuttering events, the performance improves up to 73% in recall and 84% in precision.Although the obtained results are not perfect, they map to fairly robust stutter/ non-stutter decision boundaries.
{"title":"Automatic recognition of children's read speech for stuttering application","authors":"Sadeen Alharbi, A. Simons, S. Brumfitt, P. Green","doi":"10.21437/WOCCI.2017-1","DOIUrl":"https://doi.org/10.21437/WOCCI.2017-1","url":null,"abstract":"Stuttering is a common speech disfluency that may persist into adulthood if not treated in its early stages. Techniques from spoken language understanding may be applied to provide auto-mated diagnoses of stuttering from voice recordings; however,there are several difficulties, including the lack of training data involving young children and the high dimensionality of these data. This study investigates how automatic speech recognition(ASR) could help clinicians by providing a tool that automatically recognises stuttering events and provides a useful written transcription of what was said. In addition, to enhance the performance of ASR and to alleviate the lack of stuttering data, this study examines the effect of augmenting the language model with artificially generated data. The performance of the ASR tool with and without language model augmentation is com-pared. Following language model augmentation, the ASR tool’s performance improved recall from 38% to 62.2% and precision from 56.58% to 71%. When mis-recognised events are more coarsely classified as stuttering/ non-stuttering events, the performance improves up to 73% in recall and 84% in precision.Although the obtained results are not perfect, they map to fairly robust stutter/ non-stutter decision boundaries.","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72980698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In research on children’s language development, joint book reading is appreciated as being a situation beneficial for language learning. Motivated by this string of research, our aim was to explore whether the situation of joint book reading can be applied to a child–robot interaction. Before investigating whether and how this situation – when applied in child–robot interaction – can be used as a language learning scenario, the interactional requirements for a successful dialogue have to be studied. Our main aim in this paper is to present a study design for a child–robot interaction, in which a robot is introduced as a learner that acquires new color words, and the child is asked to teach the robot those words within a familiar interaction format of joint book reading. We then report the observations that we made in a single-case pilot study conducted with a 4;8year-old child, and discuss these observations in terms of how the robot’s interactional behavior needs to be shaped to successfully participate in the situation of joint book reading.
{"title":"Can you teach me?: Children teaching new words to a robot in a book reading scenario","authors":"Angela Grimminger, K. Rohlfing","doi":"10.21437/WOCCI.2017-5","DOIUrl":"https://doi.org/10.21437/WOCCI.2017-5","url":null,"abstract":"In research on children’s language development, joint book reading is appreciated as being a situation beneficial for language learning. Motivated by this string of research, our aim was to explore whether the situation of joint book reading can be applied to a child–robot interaction. Before investigating whether and how this situation – when applied in child–robot interaction – can be used as a language learning scenario, the interactional requirements for a successful dialogue have to be studied. Our main aim in this paper is to present a study design for a child–robot interaction, in which a robot is introduced as a learner that acquires new color words, and the child is asked to teach the robot those words within a familiar interaction format of joint book reading. We then report the observations that we made in a single-case pilot study conducted with a 4;8year-old child, and discuss these observations in terms of how the robot’s interactional behavior needs to be shaped to successfully participate in the situation of joint book reading.","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87219881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maxime Portaz, Maxime Garcia, A. Barbulescu, A. Bégault, Laurence Boissieux, Marie-Paule Cani, Rémi Ronfard, D. Vaufreydaz
This paper presents Figurines, an offline framework for narrative creation with tangible objects, designed to record storytelling sessions with children, teenagers or adults. This framework uses tangible diegetic objects to record a free narrative from up to two storytellers and construct a fully annotated representation of the story. This representation is composed of the 3D position and orientation of the fig-urines, the position of decor elements and interpretation of the storytellers' actions (facial expression, gestures and voice). While maintaining the playful dimension of the storytelling session, the system must tackle the challenge of recovering the free-form motion of the figurines and the storytellers in uncontrolled environments. To do so, we record the storytelling session using a hybrid setup with two RGB-D sensors and figurines augmented with IMU sensors. The first RGB-D sensor completes IMU information in order to identify figurines and tracks them as well as decor elements. It also tracks the storytellers jointly with the second RGB-D sensor. The framework has been used to record preliminary experiments to validate interest of our approach. These experiments evaluate figurine following and combination of motion and storyteller's voice, gesture and facial expressions. In a make-believe game, this story representation was re-targeted on virtual characters to produce an animated version of the story. The final goal of the Figurines framework is to enhance our understanding of the creative processes at work during immersive storytelling.
{"title":"Figurines, a multimodal framework for tangible storytelling","authors":"Maxime Portaz, Maxime Garcia, A. Barbulescu, A. Bégault, Laurence Boissieux, Marie-Paule Cani, Rémi Ronfard, D. Vaufreydaz","doi":"10.21437/WOCCI.2017-9","DOIUrl":"https://doi.org/10.21437/WOCCI.2017-9","url":null,"abstract":"This paper presents Figurines, an offline framework for narrative creation with tangible objects, designed to record storytelling sessions with children, teenagers or adults. This framework uses tangible diegetic objects to record a free narrative from up to two storytellers and construct a fully annotated representation of the story. This representation is composed of the 3D position and orientation of the fig-urines, the position of decor elements and interpretation of the storytellers' actions (facial expression, gestures and voice). While maintaining the playful dimension of the storytelling session, the system must tackle the challenge of recovering the free-form motion of the figurines and the storytellers in uncontrolled environments. To do so, we record the storytelling session using a hybrid setup with two RGB-D sensors and figurines augmented with IMU sensors. The first RGB-D sensor completes IMU information in order to identify figurines and tracks them as well as decor elements. It also tracks the storytellers jointly with the second RGB-D sensor. The framework has been used to record preliminary experiments to validate interest of our approach. These experiments evaluate figurine following and combination of motion and storyteller's voice, gesture and facial expressions. In a make-believe game, this story representation was re-targeted on virtual characters to produce an animated version of the story. The final goal of the Figurines framework is to enhance our understanding of the creative processes at work during immersive storytelling.","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91124483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper reports on first results in a study on orthography acquisition for children in German elementary school. One major aspect of German orthography concerns the marking of vowel duration, which occurs in every regular word in German. Therefore, mastery of this pattern has a large impact both on accu-rate spelling and very likely on reading ability. However, data shows that acquisition of this sub-skill appears difficult for kids to master, even beyond elementary school. In this paper, we explore a corpus of freely written text data that was collected on a weekly basis across several third grade classes in three different schools over a period of three months. Two of the schools participated in an intervention (iPad Phonics Game for German, called Phontasia). Studying the impact of the game on skill acquisition proved difficult because the ability to employ iPads in the classroom was correlated with socio-economic status of the schools. Results from VERA, a German standardized test, were obtained for several classes, allowing us to add one control group to the study. Preliminary results indicate that Phontasia intervention is worth pursuing in future studies to improve orthographic skills.
{"title":"Phontasia: A phonics game for German and its effect on orthographic skills-first corpus explorations","authors":"Kay Berkling","doi":"10.21437/WOCCI.2017-2","DOIUrl":"https://doi.org/10.21437/WOCCI.2017-2","url":null,"abstract":"This paper reports on first results in a study on orthography acquisition for children in German elementary school. One major aspect of German orthography concerns the marking of vowel duration, which occurs in every regular word in German. Therefore, mastery of this pattern has a large impact both on accu-rate spelling and very likely on reading ability. However, data shows that acquisition of this sub-skill appears difficult for kids to master, even beyond elementary school. In this paper, we explore a corpus of freely written text data that was collected on a weekly basis across several third grade classes in three different schools over a period of three months. Two of the schools participated in an intervention (iPad Phonics Game for German, called Phontasia). Studying the impact of the game on skill acquisition proved difficult because the ability to employ iPads in the classroom was correlated with socio-economic status of the schools. Results from VERA, a German standardized test, were obtained for several classes, allowing us to add one control group to the study. Preliminary results indicate that Phontasia intervention is worth pursuing in future studies to improve orthographic skills.","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77168758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents Kin-LDD (stands for Kinaesthetic Learning Difficulties Diagnosis), which is a tool that supports the special educators during the assessment process of children’s learning difficulties. Children using Kin-LDD, instead of participating in a tedious and extensive process, they are playing a game. The tool is using a natural user interface for the children-computer interaction, combining gestures and typical mouse usage. Kin-LDD provides a set of activities, by presenting the material in text, images, and sounds. Kin-LDD is also available to school teachers and parents, for the early identification of learning disabilities before engaging a special educator, but mostly is a tool for special educators to include the ‘fun’ factor into the diagnostic process. The tool offers activities for spatial orientation, time orientation and storyboard sequencing and reports a set of key performance indicators, related to each child’s performance in these activities, to the special educators helping them towards the diagnosis.
{"title":"A natural user interface game for the evaluation of children with learning difficulties","authors":"E. Chatzidaki, M. Xenos, Charikleia Machaira","doi":"10.21437/WOCCI.2017-3","DOIUrl":"https://doi.org/10.21437/WOCCI.2017-3","url":null,"abstract":"This paper presents Kin-LDD (stands for Kinaesthetic Learning Difficulties Diagnosis), which is a tool that supports the special educators during the assessment process of children’s learning difficulties. Children using Kin-LDD, instead of participating in a tedious and extensive process, they are playing a game. The tool is using a natural user interface for the children-computer interaction, combining gestures and typical mouse usage. Kin-LDD provides a set of activities, by presenting the material in text, images, and sounds. Kin-LDD is also available to school teachers and parents, for the early identification of learning disabilities before engaging a special educator, but mostly is a tool for special educators to include the ‘fun’ factor into the diagnostic process. The tool offers activities for spatial orientation, time orientation and storyboard sequencing and reports a set of key performance indicators, related to each child’s performance in these activities, to the special educators helping them towards the diagnosis.","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77283044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivated by theories of early language development in children we investigate the contribution of affective features to early acquisition of lexical semantics. For the task of semantic similarity between words, semantic and affective spaces are modeled using network-based distributed semantic models. We propose a method for constructing semantic activations from a combination of lexical and affective relations and show that affective information plays a prominent role in our lexical development model.
{"title":"Lexical and affective models in early acquisition of semantics","authors":"Athanasia Kolovou, Elias Iosif, A. Potamianos","doi":"10.21437/WOCCI.2017-7","DOIUrl":"https://doi.org/10.21437/WOCCI.2017-7","url":null,"abstract":"Motivated by theories of early language development in children we investigate the contribution of affective features to early acquisition of lexical semantics. For the task of semantic similarity between words, semantic and affective spaces are modeled using network-based distributed semantic models. We propose a method for constructing semantic activations from a combination of lexical and affective relations and show that affective information plays a prominent role in our lexical development model.","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84075119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
By combining visual-feedback and motivational elements, a speech therapy computer-based system can offer new approaches with various advantages when compared to traditional speech therapy techniques. Through visual-feedback and adaptation of traditional speech sound exercises, it is possible to create an engaging environment with motivation focused elements. These elements can be used in an interactive environment that motivates the therapy attendee towards better performances. Hereby we present an interactive gamified environment for speech therapy that combines visual-feedback and motivational components. The results from a survey and a usability study suggest that children can show more interest in the speech therapy sessions when the proposed environment is used.
{"title":"Visual-feedback in an interactive environment for speech-language therapy","authors":"André Grossinho, João Magalhães, S. Cavaco","doi":"10.21437/WOCCI.2017-6","DOIUrl":"https://doi.org/10.21437/WOCCI.2017-6","url":null,"abstract":"By combining visual-feedback and motivational elements, a speech therapy computer-based system can offer new approaches with various advantages when compared to traditional speech therapy techniques. Through visual-feedback and adaptation of traditional speech sound exercises, it is possible to create an engaging environment with motivation focused elements. These elements can be used in an interactive environment that motivates the therapy attendee towards better performances. Hereby we present an interactive gamified environment for speech therapy that combines visual-feedback and motivational components. The results from a survey and a usability study suggest that children can show more interest in the speech therapy sessions when the proposed environment is used.","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84761233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Speaker recognition is a well established area for research but it mainly focuses on adult speech. Recent work on children’s speech shows that not all the findings from speaker recognition on adult speech are directly applicable on children’s speech. There are a variety of applications for speaker recognition from children’s speech, for example it could be used as a safeguard for a child during her/his interactions on social media network-ing websites. It could also be used as one of the main blocks in automatic tutor systems for educational purposes at schools. In this research we have evaluated two scoring method for speaker recognition within the i-vector framework using two simulated environments; in a classroom (contains 30 students) and in a school (contains 288 students). The first method is based on the PLDA scoring approach and the second method is based on the cosine similarity measure. Results show that the first method outperforms the second approach in a simulated school, but it is the other way around for the recognition of a child in a classroom in which the second scoring method performs better. focused on both speaker identification and verification for text-independent mode of operation.
{"title":"Comparison of two scoring method within i-vector framework for speaker recognition from children's speech","authors":"Saeid Safavi, L. Meng","doi":"10.21437/WOCCI.2017-10","DOIUrl":"https://doi.org/10.21437/WOCCI.2017-10","url":null,"abstract":"Speaker recognition is a well established area for research but it mainly focuses on adult speech. Recent work on children’s speech shows that not all the findings from speaker recognition on adult speech are directly applicable on children’s speech. There are a variety of applications for speaker recognition from children’s speech, for example it could be used as a safeguard for a child during her/his interactions on social media network-ing websites. It could also be used as one of the main blocks in automatic tutor systems for educational purposes at schools. In this research we have evaluated two scoring method for speaker recognition within the i-vector framework using two simulated environments; in a classroom (contains 30 students) and in a school (contains 288 students). The first method is based on the PLDA scoring approach and the second method is based on the cosine similarity measure. Results show that the first method outperforms the second approach in a simulated school, but it is the other way around for the recognition of a child in a classroom in which the second scoring method performs better. focused on both speaker identification and verification for text-independent mode of operation.","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76850775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erika Godde, G. Bailly, D. Escudero, Marie-Line Bosse, Estelle Gillet-Perret
We analyze here readings of the same reference text by 116 children. We show that several factors strongly impact subjective rating of fluency, notably number of correct words, repetitions, errors, syllables spelled per minute. We succeeded in predicting four subjective scores – rated between 1 and 4 by human raters – from such objective measurements with a rather high precision (R > .8 for 3 out of 4 scores). This open the way for automatic multidimensional assessment of reading fluency using calibrated texts.
{"title":"Evaluation of reading performance of primary school children: Objective measurements vs. subjective ratings","authors":"Erika Godde, G. Bailly, D. Escudero, Marie-Line Bosse, Estelle Gillet-Perret","doi":"10.21437/WOCCI.2017-4","DOIUrl":"https://doi.org/10.21437/WOCCI.2017-4","url":null,"abstract":"We analyze here readings of the same reference text by 116 children. We show that several factors strongly impact subjective rating of fluency, notably number of correct words, repetitions, errors, syllables spelled per minute. We succeeded in predicting four subjective scores – rated between 1 and 4 by human raters – from such objective measurements with a rather high precision (R > .8 for 3 out of 4 scores). This open the way for automatic multidimensional assessment of reading fluency using calibrated texts.","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74442128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anastassia Loukina, Beata Beigman Klebanov, P. Lange, Binod Gyawali, Yao Qian
We present a preliminary report on developing technology for an application that supports shared book reading. We dis-cuss how speech processing technology can be used to auto-mate different components of the system for oral reading fluency evaluation during shared book reading and the challenges posed by this new context in comparison to other automated reading tutor systems. We also present performance evaluation of the baseline system on a corpus of read speech. of our is to technology to support the
{"title":"Developing speech processing technologies for shared book reading with a computer","authors":"Anastassia Loukina, Beata Beigman Klebanov, P. Lange, Binod Gyawali, Yao Qian","doi":"10.21437/WOCCI.2017-8","DOIUrl":"https://doi.org/10.21437/WOCCI.2017-8","url":null,"abstract":"We present a preliminary report on developing technology for an application that supports shared book reading. We dis-cuss how speech processing technology can be used to auto-mate different components of the system for oral reading fluency evaluation during shared book reading and the challenges posed by this new context in comparison to other automated reading tutor systems. We also present performance evaluation of the baseline system on a corpus of read speech. of our is to technology to support the","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74644280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}