Mohammad Samin Yasar, Md Mofijul Islam, Tariq Iqbal
Robots are moving from working in isolation to working with humans as a part of human-robot teams. In such situations, they are expected to work with multiple humans and need to understand and predict the team members’ actions. To address this challenge, in this work, we introduce IMPRINT, a multi-agent motion prediction framework that models the interactional dynamics and incorporates the multimodal context (e.g., data from RGB and depth sensors and skeleton joint positions) to accurately predict the motion of all the agents in a team. In IMPRINT, we propose an Interaction module that can extract the intra-agent and inter-agent dynamics before fusing them to obtain the interactional dynamics. Furthermore, we propose a Multimodal Context module that incorporates multimodal context information to improve multi-agent motion prediction. We evaluated IMPRINT by comparing its performance on human-human and human-robot team scenarios against state-of-the-art methods. The results suggest that IMPRINT outperformed all other methods over all evaluated temporal horizons. Additionally, we provide an interpretation of how IMPRINT incorporates the multimodal context information from all the modalities during multi-agent motion prediction. The superior performance of IMPRINT provides a promising direction to integrate motion prediction with robot perception and enable safe and effective human-robot collaboration.
{"title":"IMPRINT: Interactional Dynamics-aware Motion Prediction in Teams using Multimodal Context","authors":"Mohammad Samin Yasar, Md Mofijul Islam, Tariq Iqbal","doi":"10.1145/3626954","DOIUrl":"https://doi.org/10.1145/3626954","url":null,"abstract":"Robots are moving from working in isolation to working with humans as a part of human-robot teams. In such situations, they are expected to work with multiple humans and need to understand and predict the team members’ actions. To address this challenge, in this work, we introduce IMPRINT, a multi-agent motion prediction framework that models the interactional dynamics and incorporates the multimodal context (e.g., data from RGB and depth sensors and skeleton joint positions) to accurately predict the motion of all the agents in a team. In IMPRINT, we propose an Interaction module that can extract the intra-agent and inter-agent dynamics before fusing them to obtain the interactional dynamics. Furthermore, we propose a Multimodal Context module that incorporates multimodal context information to improve multi-agent motion prediction. We evaluated IMPRINT by comparing its performance on human-human and human-robot team scenarios against state-of-the-art methods. The results suggest that IMPRINT outperformed all other methods over all evaluated temporal horizons. Additionally, we provide an interpretation of how IMPRINT incorporates the multimodal context information from all the modalities during multi-agent motion prediction. The superior performance of IMPRINT provides a promising direction to integrate motion prediction with robot perception and enable safe and effective human-robot collaboration.","PeriodicalId":36515,"journal":{"name":"ACM Transactions on Human-Robot Interaction","volume":"225 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136078655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Suguitan, Nick DePalma, Guy Hoffman, Jessica Hodgins
In this work, we present a method for personalizing human-robot interaction by using emotive facial expressions to generate affective robot movements. Movement is an important medium for robots to communicate affective states, but the expertise and time required to craft new robot movements promotes a reliance on fixed preprogrammed behaviors. Enabling robots to respond to multimodal user input with newly generated movements could stave off staleness of interaction and convey a deeper degree of affective understanding than current retrieval-based methods. We use autoencoder neural networks to compress robot movement data and facial expression images into a shared latent embedding space. Then, we use a reconstruction loss to generate movements from these embeddings and triplet loss to align the embeddings by emotion classes rather than data modality. To subjectively evaluate our method, we conducted a user survey and found that generated happy and sad movements could be matched to their source face images. However, angry movements were most often mismatched to sad images. This multimodal data-driven generative method can expand an interactive agent’s behavior library and could be adopted for other multimodal affective applications.
{"title":"Face2Gesture: Translating Facial Expressions Into Robot Movements Through Shared Latent Space Neural Networks","authors":"Michael Suguitan, Nick DePalma, Guy Hoffman, Jessica Hodgins","doi":"10.1145/3623386","DOIUrl":"https://doi.org/10.1145/3623386","url":null,"abstract":"In this work, we present a method for personalizing human-robot interaction by using emotive facial expressions to generate affective robot movements. Movement is an important medium for robots to communicate affective states, but the expertise and time required to craft new robot movements promotes a reliance on fixed preprogrammed behaviors. Enabling robots to respond to multimodal user input with newly generated movements could stave off staleness of interaction and convey a deeper degree of affective understanding than current retrieval-based methods. We use autoencoder neural networks to compress robot movement data and facial expression images into a shared latent embedding space. Then, we use a reconstruction loss to generate movements from these embeddings and triplet loss to align the embeddings by emotion classes rather than data modality. To subjectively evaluate our method, we conducted a user survey and found that generated happy and sad movements could be matched to their source face images. However, angry movements were most often mismatched to sad images. This multimodal data-driven generative method can expand an interactive agent’s behavior library and could be adopted for other multimodal affective applications.","PeriodicalId":36515,"journal":{"name":"ACM Transactions on Human-Robot Interaction","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135596832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Thierauf, Ravenna Thielstrom, Bradley Oosterveld, Will Becker, Matthias Scheutz
Natural language instructions are effective at tasking autonomous robots and for teaching them new knowledge quickly. Yet, human instructors are not perfect and are likely to make mistakes at times, and will correct themselves when they notice errors in their own instructions. In this paper, we introduce a complete system for robot behaviors to handle such corrections, during both task instruction and action execution. We then demonstrate its operation in an integrated cognitive robotic architecture through spoken language in two tasks: a navigation and retrieval task and a meal assembly task. Verbal corrections occur before, during, and after verbally taught sequences of tasks, demonstrating that the proposed methods enable fast corrections not only of the semantics generated from the instructions, but also of overt robot behavior in a manner shown to be reasonable when compared to human behavior and expectations.
{"title":"“Do this instead” – Robots that Adequately Respond to Corrected Instructions","authors":"Christopher Thierauf, Ravenna Thielstrom, Bradley Oosterveld, Will Becker, Matthias Scheutz","doi":"10.1145/3623385","DOIUrl":"https://doi.org/10.1145/3623385","url":null,"abstract":"Natural language instructions are effective at tasking autonomous robots and for teaching them new knowledge quickly. Yet, human instructors are not perfect and are likely to make mistakes at times, and will correct themselves when they notice errors in their own instructions. In this paper, we introduce a complete system for robot behaviors to handle such corrections, during both task instruction and action execution. We then demonstrate its operation in an integrated cognitive robotic architecture through spoken language in two tasks: a navigation and retrieval task and a meal assembly task. Verbal corrections occur before, during, and after verbally taught sequences of tasks, demonstrating that the proposed methods enable fast corrections not only of the semantics generated from the instructions, but also of overt robot behavior in a manner shown to be reasonable when compared to human behavior and expectations.","PeriodicalId":36515,"journal":{"name":"ACM Transactions on Human-Robot Interaction","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136060335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine some interaction types. Some methods do so by assuming that the robot has prior information about the features of the task and the reward structure. By contrast, in this paper we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human’s input to nearby alternatives, i.e., trajectories close to the human’s feedback. We first derive a loss function that trains an ensemble of reward models to match the human’s demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: we enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at: https://youtu.be/FSUJsTYvEKU
{"title":"Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction","authors":"Shaunak A. Mehta, Dylan P. Losey","doi":"10.1145/3623384","DOIUrl":"https://doi.org/10.1145/3623384","url":null,"abstract":"Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine some interaction types. Some methods do so by assuming that the robot has prior information about the features of the task and the reward structure. By contrast, in this paper we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human’s input to nearby alternatives, i.e., trajectories close to the human’s feedback. We first derive a loss function that trains an ensemble of reward models to match the human’s demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: we enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at: https://youtu.be/FSUJsTYvEKU","PeriodicalId":36515,"journal":{"name":"ACM Transactions on Human-Robot Interaction","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136062298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Collaborative human-robot task execution approaches require mutual adaptation, allowing both the human and robot partners to take active roles in action selection and role assignment to achieve a single shared goal. Prior works have utilized a leader-follower paradigm in which either agent must follow the actions specified by the other agent. We introduce the User-aware Hierarchical Task Planning (UHTP) framework, a communication-free human-robot collaborative approach for adaptive execution of multi-step tasks that moves beyond the leader-follower paradigm. Specifically, our approach enables the robot to observe the human, perform actions that support the human’s decisions, and actively select actions that maximize the expected efficiency of the collaborative task. In turn, the human chooses actions based on their observation of the task and the robot, without being dictated by a scheduler or the robot. We evaluate UHTP both in simulation and in a human subjects experiment of a collaborative drill assembly task. Our results show that UHTP achieves more efficient task plans and shorter task completion times than non-adaptive baselines across a wide range of human behaviors, that interacting with a UHTP-controlled robot reduces the human’s cognitive workload, and that humans prefer to work with our adaptive robot over a fixed-policy alternative.
{"title":"UHTP: A User-Aware Hierarchical Task Planning Framework for Communication-Free, Mutually-Adaptive Human-Robot Collaboration","authors":"Kartik Ramachandruni, Cassandra Kent, Sonia Chernova","doi":"10.1145/3623387","DOIUrl":"https://doi.org/10.1145/3623387","url":null,"abstract":"Collaborative human-robot task execution approaches require mutual adaptation, allowing both the human and robot partners to take active roles in action selection and role assignment to achieve a single shared goal. Prior works have utilized a leader-follower paradigm in which either agent must follow the actions specified by the other agent. We introduce the User-aware Hierarchical Task Planning (UHTP) framework, a communication-free human-robot collaborative approach for adaptive execution of multi-step tasks that moves beyond the leader-follower paradigm. Specifically, our approach enables the robot to observe the human, perform actions that support the human’s decisions, and actively select actions that maximize the expected efficiency of the collaborative task. In turn, the human chooses actions based on their observation of the task and the robot, without being dictated by a scheduler or the robot. We evaluate UHTP both in simulation and in a human subjects experiment of a collaborative drill assembly task. Our results show that UHTP achieves more efficient task plans and shorter task completion times than non-adaptive baselines across a wide range of human behaviors, that interacting with a UHTP-controlled robot reduces the human’s cognitive workload, and that humans prefer to work with our adaptive robot over a fixed-policy alternative.","PeriodicalId":36515,"journal":{"name":"ACM Transactions on Human-Robot Interaction","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136059980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shipeng Liu, Cristina G. Wilson, Bhaskar Krishnamachari, Feifei Qian
Truly collaborative scientific field data collection between human scientists and autonomous robot systems requires a shared understanding of the search objectives and tradeoffs faced when making decisions. Therefore, critical to developing intelligent robots to aid human experts, is an understanding of how scientists make such decisions and how they adapt their data collection strategies when presented with new information in situ . In this study we examined the dynamic data collection decisions of 108 expert geoscience researchers using a simulated field scenario. Human data collection behaviors suggested two distinct objectives: an information-based objective to maximize information coverage, and a discrepancy-based objective to maximize hypothesis verification. We developed a highly-simplified quantitative decision model that allows the robot to predict potential human data collection locations based on the two observed human data collection objectives. Predictions from the simple model revealed a transition from information-based to discrepancy-based objective as the level of information increased. The findings will allow robotic teammates to connect experts’ dynamic science objectives with the adaptation of their sampling behaviors, and in the long term, enable the development of more cognitively-compatible robotic field assistants.
{"title":"Understanding Human Dynamic Sampling Objectives to Enable Robot-assisted Scientific Decision Making","authors":"Shipeng Liu, Cristina G. Wilson, Bhaskar Krishnamachari, Feifei Qian","doi":"10.1145/3623383","DOIUrl":"https://doi.org/10.1145/3623383","url":null,"abstract":"Truly collaborative scientific field data collection between human scientists and autonomous robot systems requires a shared understanding of the search objectives and tradeoffs faced when making decisions. Therefore, critical to developing intelligent robots to aid human experts, is an understanding of how scientists make such decisions and how they adapt their data collection strategies when presented with new information in situ . In this study we examined the dynamic data collection decisions of 108 expert geoscience researchers using a simulated field scenario. Human data collection behaviors suggested two distinct objectives: an information-based objective to maximize information coverage, and a discrepancy-based objective to maximize hypothesis verification. We developed a highly-simplified quantitative decision model that allows the robot to predict potential human data collection locations based on the two observed human data collection objectives. Predictions from the simple model revealed a transition from information-based to discrepancy-based objective as the level of information increased. The findings will allow robotic teammates to connect experts’ dynamic science objectives with the adaptation of their sampling behaviors, and in the long term, enable the development of more cognitively-compatible robotic field assistants.","PeriodicalId":36515,"journal":{"name":"ACM Transactions on Human-Robot Interaction","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135736411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maia Stiber, Yuxiang Gao, R. Taylor, Chien-Ming Huang
Productive human-robot partnerships are vital to successful integration of assistive robots into everyday life. While prior research has explored techniques to facilitate collaboration during human-robot interaction, the work described here aims to forge productive partnerships prior to human-robot interaction, drawing upon team building activities’ aid in establishing effective human teams. Through a 2 (group membership: ingroup and outgroup) × 3 (robot error: main task errors, side task errors, and no errors) online study (N = 62), we demonstrate that 1) a non-social pre-task exercise can help form ingroup relationships; 2) an ingroup robot is perceived as a better, more committed teammate than an outgroup robot (despite the two behaving identically); and 3) participants are more tolerant of negative outcomes when working with an ingroup robot. We discuss how pre-task exercises may serve as an active task failure mitigation strategy.
{"title":"Forging Productive Human-Robot Partnerships Through Task Training","authors":"Maia Stiber, Yuxiang Gao, R. Taylor, Chien-Ming Huang","doi":"10.1145/3611657","DOIUrl":"https://doi.org/10.1145/3611657","url":null,"abstract":"Productive human-robot partnerships are vital to successful integration of assistive robots into everyday life. While prior research has explored techniques to facilitate collaboration during human-robot interaction, the work described here aims to forge productive partnerships prior to human-robot interaction, drawing upon team building activities’ aid in establishing effective human teams. Through a 2 (group membership: ingroup and outgroup) × 3 (robot error: main task errors, side task errors, and no errors) online study (N = 62), we demonstrate that 1) a non-social pre-task exercise can help form ingroup relationships; 2) an ingroup robot is perceived as a better, more committed teammate than an outgroup robot (despite the two behaving identically); and 3) participants are more tolerant of negative outcomes when working with an ingroup robot. We discuss how pre-task exercises may serve as an active task failure mitigation strategy.","PeriodicalId":36515,"journal":{"name":"ACM Transactions on Human-Robot Interaction","volume":"36 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75149830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher M. Reardon, J. Gregory, Kerstin S Haring, Benjamin Dossett, Ori Miller, A. Inyang
The creation of information transparency solutions to enable humans to understand robot perception is a challenging requirement for autonomous and artificially intelligent robots to impact a multitude of domains. By taking advantage of comprehensive and high-volume data from robot teammates’ advanced perception and reasoning capabilities, humans will be able to make better decisions, with significant impacts from safety to functionality. We present a solution to this challenge by coupling augmented reality (AR) with an intelligent mobile robot that is autonomously detecting novel changes in an environment. We show that the human teammate can understand and make decisions based on information shared via AR by the robot. Sharing of robot-perceived information is enabled by the robot’s online calculation of the human’s relative position, making the system robust to environments without external instrumentation such as GPS. Our robotic system performs change detection by comparing current metric sensor readings against a previous reading to identify differences. We experimentally explore the design of change detection visualizations and the aggregation of information, the impact of instruction on communication understanding, the effects of visualization and alignment error, and the relationship between situated 3D visualization in AR and human movement in the operational environment on shared situational awareness in human-robot teams. We demonstrate this novel capability and assess the effectiveness of human-robot teaming in crowdsourced data-driven studies, as well as an in-person study where participants are equipped with a commercial off-the-shelf AR headset and teamed with a small ground robot which maneuvers through the environment. The mobile robot scans for changes, which are visualized via AR to the participant. The effectiveness of this communication is evaluated through accuracy and subjective assessment metrics to provide insight into interpretation and experience.
{"title":"Augmented Reality Visualization of Autonomous Mobile Robot Change Detection in Uninstrumented Environments","authors":"Christopher M. Reardon, J. Gregory, Kerstin S Haring, Benjamin Dossett, Ori Miller, A. Inyang","doi":"10.1145/3611654","DOIUrl":"https://doi.org/10.1145/3611654","url":null,"abstract":"The creation of information transparency solutions to enable humans to understand robot perception is a challenging requirement for autonomous and artificially intelligent robots to impact a multitude of domains. By taking advantage of comprehensive and high-volume data from robot teammates’ advanced perception and reasoning capabilities, humans will be able to make better decisions, with significant impacts from safety to functionality. We present a solution to this challenge by coupling augmented reality (AR) with an intelligent mobile robot that is autonomously detecting novel changes in an environment. We show that the human teammate can understand and make decisions based on information shared via AR by the robot. Sharing of robot-perceived information is enabled by the robot’s online calculation of the human’s relative position, making the system robust to environments without external instrumentation such as GPS. Our robotic system performs change detection by comparing current metric sensor readings against a previous reading to identify differences. We experimentally explore the design of change detection visualizations and the aggregation of information, the impact of instruction on communication understanding, the effects of visualization and alignment error, and the relationship between situated 3D visualization in AR and human movement in the operational environment on shared situational awareness in human-robot teams. We demonstrate this novel capability and assess the effectiveness of human-robot teaming in crowdsourced data-driven studies, as well as an in-person study where participants are equipped with a commercial off-the-shelf AR headset and teamed with a small ground robot which maneuvers through the environment. The mobile robot scans for changes, which are visualized via AR to the participant. The effectiveness of this communication is evaluated through accuracy and subjective assessment metrics to provide insight into interpretation and experience.","PeriodicalId":36515,"journal":{"name":"ACM Transactions on Human-Robot Interaction","volume":"1 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77009714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas C Georgiou, Rebecca Ramnauth, Emmanuel Adéníran, Michael Lee, Lila Selin, B. Scassellati
Social robots in the home will need to solve audio identification problems to better interact with their users. This paper focuses on the classification between a) natural conversation that includes at least one co-located user and b) media that is playing from electronic sources and does not require a social response, such as television shows. This classification can help social robots detect a user’s social presence using sound. Social robots that are able to solve this problem can apply this information to assist them in making decisions, such as determining when and how to appropriately engage human users. We compiled a dataset from a variety of acoustic environments which contained either natural or media audio, including audio that we recorded in our own homes. Using this dataset, we performed an experimental evaluation on a range of traditional machine learning classifiers, and assessed the classifiers’ abilities to generalize to new recordings, acoustic conditions, and environments. We conclude that a C-Support Vector Classification (SVC) algorithm outperformed other classifiers. Finally, we present a classification pipeline that in-home robots can utilize, and discuss the timing and size of the trained classifiers, as well as privacy and ethics considerations.
{"title":"Is Someone There Or Is That The TV? Detecting Social Presence Using Sound","authors":"Nicholas C Georgiou, Rebecca Ramnauth, Emmanuel Adéníran, Michael Lee, Lila Selin, B. Scassellati","doi":"10.1145/3611658","DOIUrl":"https://doi.org/10.1145/3611658","url":null,"abstract":"Social robots in the home will need to solve audio identification problems to better interact with their users. This paper focuses on the classification between a) natural conversation that includes at least one co-located user and b) media that is playing from electronic sources and does not require a social response, such as television shows. This classification can help social robots detect a user’s social presence using sound. Social robots that are able to solve this problem can apply this information to assist them in making decisions, such as determining when and how to appropriately engage human users. We compiled a dataset from a variety of acoustic environments which contained either natural or media audio, including audio that we recorded in our own homes. Using this dataset, we performed an experimental evaluation on a range of traditional machine learning classifiers, and assessed the classifiers’ abilities to generalize to new recordings, acoustic conditions, and environments. We conclude that a C-Support Vector Classification (SVC) algorithm outperformed other classifiers. Finally, we present a classification pipeline that in-home robots can utilize, and discuss the timing and size of the trained classifiers, as well as privacy and ethics considerations.","PeriodicalId":36515,"journal":{"name":"ACM Transactions on Human-Robot Interaction","volume":"41 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73935061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bastian Orthmann, Iolanda Leite, R. Bresin, Ilaria Torre
Non-verbal communication is important in HRI, particularly when humans and robots do not need to actively engage in a task together, but rather they co-exist in a shared space. Robots might still need to communicate states such as urgency or availability, and where they intend to go, to avoid collisions and disruptions. Sounds could be used to communicate such states and intentions in an intuitive and non-disruptive way. Here, we propose a multi-layer classification system for displaying various robot information simultaneously via sound. We first conceptualise which robot features could be displayed (robot size, speed, availability for interaction, urgency, and directionality); we then map them to a set of audio parameters. The designed sounds were then evaluated in 5 online studies, where people listened to the sounds and were asked to identify the associated robot features. The sounds were generally understood as intended by participants, especially when they were evaluated one feature at a time, and partially when they were evaluated two features simultaneously. The results of these evaluations suggest that sounds can be successfully used to communicate robot states and intended actions implicitly and intuitively.
{"title":"Sounding Robots: Design and Evaluation of Auditory Displays for Unintentional Human-Robot Interaction","authors":"Bastian Orthmann, Iolanda Leite, R. Bresin, Ilaria Torre","doi":"10.1145/3611655","DOIUrl":"https://doi.org/10.1145/3611655","url":null,"abstract":"Non-verbal communication is important in HRI, particularly when humans and robots do not need to actively engage in a task together, but rather they co-exist in a shared space. Robots might still need to communicate states such as urgency or availability, and where they intend to go, to avoid collisions and disruptions. Sounds could be used to communicate such states and intentions in an intuitive and non-disruptive way. Here, we propose a multi-layer classification system for displaying various robot information simultaneously via sound. We first conceptualise which robot features could be displayed (robot size, speed, availability for interaction, urgency, and directionality); we then map them to a set of audio parameters. The designed sounds were then evaluated in 5 online studies, where people listened to the sounds and were asked to identify the associated robot features. The sounds were generally understood as intended by participants, especially when they were evaluated one feature at a time, and partially when they were evaluated two features simultaneously. The results of these evaluations suggest that sounds can be successfully used to communicate robot states and intended actions implicitly and intuitively.","PeriodicalId":36515,"journal":{"name":"ACM Transactions on Human-Robot Interaction","volume":"387 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78106150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}