Xingyu Bruce Liu, Jiahao Nick Li, David Kim, Xiang 'Anthony' Chen, Ruofei Du
Situationally Induced Impairments and Disabilities (SIIDs) can significantly hinder user experience in contexts such as poor lighting, noise, and multi-tasking. While prior research has introduced algorithms and systems to address these impairments, they predominantly cater to specific tasks or environments and fail to accommodate the diverse and dynamic nature of SIIDs. We introduce Human I/O, a unified approach to detecting a wide range of SIIDs by gauging the availability of human input/output channels. Leveraging egocentric vision, multimodal sensing and reasoning with large language models, Human I/O achieves a 0.22 mean absolute error and a 82% accuracy in availability prediction across 60 in-the-wild egocentric video recordings in 32 different scenarios. Furthermore, while the core focus of our work is on the detection of SIIDs rather than the creation of adaptive user interfaces, we showcase the efficacy of our prototype via a user study with 10 participants. Findings suggest that Human I/O significantly reduces effort and improves user experience in the presence of SIIDs, paving the way for more adaptive and accessible interactive systems in the future.
{"title":"Human I/O: Towards a Unified Approach to Detecting Situational Impairments","authors":"Xingyu Bruce Liu, Jiahao Nick Li, David Kim, Xiang 'Anthony' Chen, Ruofei Du","doi":"10.1145/3613904.3642065","DOIUrl":"https://doi.org/10.1145/3613904.3642065","url":null,"abstract":"Situationally Induced Impairments and Disabilities (SIIDs) can significantly hinder user experience in contexts such as poor lighting, noise, and multi-tasking. While prior research has introduced algorithms and systems to address these impairments, they predominantly cater to specific tasks or environments and fail to accommodate the diverse and dynamic nature of SIIDs. We introduce Human I/O, a unified approach to detecting a wide range of SIIDs by gauging the availability of human input/output channels. Leveraging egocentric vision, multimodal sensing and reasoning with large language models, Human I/O achieves a 0.22 mean absolute error and a 82% accuracy in availability prediction across 60 in-the-wild egocentric video recordings in 32 different scenarios. Furthermore, while the core focus of our work is on the detection of SIIDs rather than the creation of adaptive user interfaces, we showcase the efficacy of our prototype via a user study with 10 participants. Findings suggest that Human I/O significantly reduces effort and improves user experience in the presence of SIIDs, paving the way for more adaptive and accessible interactive systems in the future.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"1 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we introduce I3DE (Inconsistency Inspecting IDE) - an IDE plugin to inspect inconsistencies in PL/SQL code. We first observed the potential issues, e.g., misuses or bugs, that are introduced by the inconsistent understanding of PL/SQL semantics by PL/SQL programmers and DBMS developers, and propose a metamorphic testing-based approach for inspecting such inconsistencies in PL/SQL code. We design and implement our approach in I3DE, a widely usable plugin for the IntelliJ Platform. We conducted a comparative user study involving 16 participants, and the findings indicate that I3DE is consistently effective and efficient in helping programmers identify and avoid inconsistencies across different programming difficulties
{"title":"I3DE: An IDE for Inspecting Inconsistencies in PL/SQL Code","authors":"Jiangshan Liu, Shuang Liu, Junjie Chen","doi":"10.1145/3643796.3648461","DOIUrl":"https://doi.org/10.1145/3643796.3648461","url":null,"abstract":"In this paper, we introduce I3DE (Inconsistency Inspecting IDE) - an IDE plugin to inspect inconsistencies in PL/SQL code. We first observed the potential issues, e.g., misuses or bugs, that are introduced by the inconsistent understanding of PL/SQL semantics by PL/SQL programmers and DBMS developers, and propose a metamorphic testing-based approach for inspecting such inconsistencies in PL/SQL code. We design and implement our approach in I3DE, a widely usable plugin for the IntelliJ Platform. We conducted a comparative user study involving 16 participants, and the findings indicate that I3DE is consistently effective and efficient in helping programmers identify and avoid inconsistencies across different programming difficulties","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"4 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juntong Chen, Haiwen Huang, Huayuan Ye, Zhong Peng, Chenhui Li, Changbo Wang
The voluminous nature of geospatial temporal data from physical monitors and simulation models poses challenges to efficient data access, often resulting in cumbersome temporal selection experiences in web-based data portals. Thus, selecting a subset of time steps for prioritized visualization and pre-loading is highly desirable. Addressing this issue, this paper establishes a multifaceted definition of salient time steps via extensive need-finding studies with domain experts to understand their workflows. Building on this, we propose a novel approach that leverages autoencoders and dynamic programming to facilitate user-driven temporal selections. Structural features, statistical variations, and distance penalties are incorporated to make more flexible selections. User-specified priorities, spatial regions, and aggregations are used to combine different perspectives. We design and implement a web-based interface to enable efficient and context-aware selection of time steps and evaluate its efficacy and usability through case studies, quantitative evaluations, and expert interviews.
{"title":"SalienTime: User-driven Selection of Salient Time Steps for Large-Scale Geospatial Data Visualization","authors":"Juntong Chen, Haiwen Huang, Huayuan Ye, Zhong Peng, Chenhui Li, Changbo Wang","doi":"10.1145/3613904.3642944","DOIUrl":"https://doi.org/10.1145/3613904.3642944","url":null,"abstract":"The voluminous nature of geospatial temporal data from physical monitors and simulation models poses challenges to efficient data access, often resulting in cumbersome temporal selection experiences in web-based data portals. Thus, selecting a subset of time steps for prioritized visualization and pre-loading is highly desirable. Addressing this issue, this paper establishes a multifaceted definition of salient time steps via extensive need-finding studies with domain experts to understand their workflows. Building on this, we propose a novel approach that leverages autoencoders and dynamic programming to facilitate user-driven temporal selections. Structural features, statistical variations, and distance penalties are incorporated to make more flexible selections. User-specified priorities, spatial regions, and aggregations are used to combine different perspectives. We design and implement a web-based interface to enable efficient and context-aware selection of time steps and evaluate its efficacy and usability through case studies, quantitative evaluations, and expert interviews.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"4 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-06DOI: 10.1609/aaai.v38i8.28727
Ruoqi Liu, Lingfei Wu, Ping Zhang
Treatment effect estimation (TEE) is the task of determining the impact of various treatments on patient outcomes. Current TEE methods fall short due to reliance on limited labeled data and challenges posed by sparse and high-dimensional observational patient data. To address the challenges, we introduce a novel pre-training and fine-tuning framework, KG-TREAT, which synergizes large-scale observational patient data with biomedical knowledge graphs (KGs) to enhance TEE. Unlike previous approaches, KG-TREAT constructs dual-focus KGs and integrates a deep bi-level attention synergy method for in-depth information fusion, enabling distinct encoding of treatment-covariate and outcome-covariate relationships. KG-TREAT also incorporates two pre-training tasks to ensure a thorough grounding and contextualization of patient data and KGs. Evaluation on four downstream TEE tasks shows KG-TREAT’s superiority over existing methods, with an average improvement of 7% in Area under the ROC Curve (AUC) and 9% in Influence Function-based Precision of Estimating Heterogeneous Effects (IF-PEHE). The effectiveness of our estimated treatment effects is further affirmed by alignment with established randomized clinical trial findings.
治疗效果估计(TEE)是确定各种治疗方法对患者预后影响的任务。由于依赖于有限的标记数据以及稀疏和高维观察性患者数据带来的挑战,目前的 TEE 方法存在不足。为了应对这些挑战,我们引入了一个新颖的预训练和微调框架 KG-TREAT,该框架将大规模患者观察数据与生物医学知识图谱(KGs)协同作用,以增强 TEE。与以往的方法不同,KG-TREAT 构建了双焦点 KG,并集成了一种深度双级注意力协同方法,用于深度信息融合,从而实现治疗-协变量和结果-协变量关系的不同编码。KG-TREAT 还包含两个预训练任务,以确保患者数据和 KG 的全面基础化和情境化。对四项下游 TEE 任务的评估表明,KG-TREAT 比现有方法更具优势,ROC 曲线下面积(AUC)平均提高了 7%,基于影响函数的异质性效应估计精度(IF-PEHE)平均提高了 9%。我们估计的治疗效果与既定的随机临床试验结果一致,这进一步证实了我们估计的治疗效果的有效性。
{"title":"KG-TREAT: Pre-training for Treatment Effect Estimation by Synergizing Patient Data with Knowledge Graphs","authors":"Ruoqi Liu, Lingfei Wu, Ping Zhang","doi":"10.1609/aaai.v38i8.28727","DOIUrl":"https://doi.org/10.1609/aaai.v38i8.28727","url":null,"abstract":"Treatment effect estimation (TEE) is the task of determining the impact of various treatments on patient outcomes. Current TEE methods fall short due to reliance on limited labeled data and challenges posed by sparse and high-dimensional observational patient data. To address the challenges, we introduce a novel pre-training and fine-tuning framework, KG-TREAT, which synergizes large-scale observational patient data with biomedical knowledge graphs (KGs) to enhance TEE. Unlike previous approaches, KG-TREAT constructs dual-focus KGs and integrates a deep bi-level attention synergy method for in-depth information fusion, enabling distinct encoding of treatment-covariate and outcome-covariate relationships. KG-TREAT also incorporates two pre-training tasks to ensure a thorough grounding and contextualization of patient data and KGs. Evaluation on four downstream TEE tasks shows KG-TREAT’s superiority over existing methods, with an average improvement of 7% in Area under the ROC Curve (AUC) and 9% in Influence Function-based Precision of Estimating Heterogeneous Effects (IF-PEHE). The effectiveness of our estimated treatment effects is further affirmed by alignment with established randomized clinical trial findings.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"5 13","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yoonjoo Lee, Hyeonsu B Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, Pao Siangliulue
With the rapid growth of scholarly archives, researchers subscribe to"paper alert"systems that periodically provide them with recommendations of recently published papers that are similar to previously collected papers. However, researchers sometimes struggle to make sense of nuanced connections between recommended papers and their own research context, as existing systems only present paper titles and abstracts. To help researchers spot these connections, we present PaperWeaver, an enriched paper alerts system that provides contextualized text descriptions of recommended papers based on user-collected papers. PaperWeaver employs a computational method based on Large Language Models (LLMs) to infer users' research interests from their collected papers, extract context-specific aspects of papers, and compare recommended and collected papers on these aspects. Our user study (N=15) showed that participants using PaperWeaver were able to better understand the relevance of recommended papers and triage them more confidently when compared to a baseline that presented the related work sections from recommended papers.
{"title":"PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers","authors":"Yoonjoo Lee, Hyeonsu B Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, Pao Siangliulue","doi":"10.1145/3613904.3642196","DOIUrl":"https://doi.org/10.1145/3613904.3642196","url":null,"abstract":"With the rapid growth of scholarly archives, researchers subscribe to\"paper alert\"systems that periodically provide them with recommendations of recently published papers that are similar to previously collected papers. However, researchers sometimes struggle to make sense of nuanced connections between recommended papers and their own research context, as existing systems only present paper titles and abstracts. To help researchers spot these connections, we present PaperWeaver, an enriched paper alerts system that provides contextualized text descriptions of recommended papers based on user-collected papers. PaperWeaver employs a computational method based on Large Language Models (LLMs) to infer users' research interests from their collected papers, extract context-specific aspects of papers, and compare recommended and collected papers on these aspects. Our user study (N=15) showed that participants using PaperWeaver were able to better understand the relevance of recommended papers and triage them more confidently when compared to a baseline that presented the related work sections from recommended papers.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"332 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-05DOI: 10.1609/aaai.v38i12.29202
Haneol Kang, Dong-Wan Choi
The stability-plasticity dilemma is a major challenge in continual learning, as it involves balancing the conflicting objectives of maintaining performance on previous tasks while learning new tasks. In this paper, we propose the recalloriented continual learning framework to address this challenge. Inspired by the human brain’s ability to separate the mechanisms responsible for stability and plasticity, our framework consists of a two-level architecture where an inference network effectively acquires new knowledge and a generative network recalls past knowledge when necessary. In particular, to maximize the stability of past knowledge, we investigate the complexity of knowledge depending on different representations, and thereby introducing generative adversarial meta-model (GAMM) that incrementally learns task-specific parameters instead of input data samples of the task. Through our experiments, we show that our framework not only effectively learns new knowledge without any disruption but also achieves high stability of previous knowledge in both task-aware and task-agnostic learning scenarios. Our code is available at: https://github.com/bigdata-inha/recall-orientedcl-framework.
{"title":"Recall-Oriented Continual Learning with Generative Adversarial Meta-Model","authors":"Haneol Kang, Dong-Wan Choi","doi":"10.1609/aaai.v38i12.29202","DOIUrl":"https://doi.org/10.1609/aaai.v38i12.29202","url":null,"abstract":"The stability-plasticity dilemma is a major challenge in continual learning, as it involves balancing the conflicting objectives of maintaining performance on previous tasks while learning new tasks. In this paper, we propose the recalloriented continual learning framework to address this challenge. Inspired by the human brain’s ability to separate the mechanisms responsible for stability and plasticity, our framework consists of a two-level architecture where an inference network effectively acquires new knowledge and a generative network recalls past knowledge when necessary. In particular, to maximize the stability of past knowledge, we investigate the complexity of knowledge depending on different representations, and thereby introducing generative adversarial meta-model (GAMM) that incrementally learns task-specific parameters instead of input data samples of the task. Through our experiments, we show that our framework not only effectively learns new knowledge without any disruption but also achieves high stability of previous knowledge in both task-aware and task-agnostic learning scenarios. Our code is available at: https://github.com/bigdata-inha/recall-orientedcl-framework.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"329 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JiWoong Jang, Sanika Moharana, Patrick Carrington, Andrew Begel
Autistic adults often experience stigma and discrimination at work, leading them to seek social communication support from coworkers, friends, and family despite emotional risks. Large language models (LLMs) are increasingly considered an alternative. In this work, we investigate the phenomenon of LLM use by autistic adults at work and explore opportunities and risks of LLMs as a source of social communication advice. We asked 11 autistic participants to present questions about their own workplace-related social difficulties to (1) a GPT-4-based chatbot and (2) a disguised human confederate. Our evaluation shows that participants strongly preferred LLM over confederate interactions. However, a coach specializing in supporting autistic job-seekers raised concerns that the LLM was dispensing questionable advice. We highlight how this divergence in participant and practitioner attitudes reflects existing schisms in HCI on the relative privileging of end-user wants versus normative good and propose design considerations for LLMs to center autistic experiences.
{"title":"\"It's the only thing I can trust\": Envisioning Large Language Model Use by Autistic Workers for Communication Assistance","authors":"JiWoong Jang, Sanika Moharana, Patrick Carrington, Andrew Begel","doi":"10.1145/3613904.3642894","DOIUrl":"https://doi.org/10.1145/3613904.3642894","url":null,"abstract":"Autistic adults often experience stigma and discrimination at work, leading them to seek social communication support from coworkers, friends, and family despite emotional risks. Large language models (LLMs) are increasingly considered an alternative. In this work, we investigate the phenomenon of LLM use by autistic adults at work and explore opportunities and risks of LLMs as a source of social communication advice. We asked 11 autistic participants to present questions about their own workplace-related social difficulties to (1) a GPT-4-based chatbot and (2) a disguised human confederate. Our evaluation shows that participants strongly preferred LLM over confederate interactions. However, a coach specializing in supporting autistic job-seekers raised concerns that the LLM was dispensing questionable advice. We highlight how this divergence in participant and practitioner attitudes reflects existing schisms in HCI on the relative privileging of end-user wants versus normative good and propose design considerations for LLMs to center autistic experiences.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"361 14","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adolescent peer relationships, essential for their development, are increasingly mediated by digital technologies. As this trend continues, wearable devices, especially smartwatches tailored for adolescents, are reshaping their socialization. In China, smartwatches like XTC have gained wide popularity, introducing unique features such as"Bump-to-Connect"and exclusive social platforms. Nonetheless, how these devices influence adolescents' peer experience remains unknown. Addressing this, we interviewed 18 Chinese adolescents (age: 11 -- 16), discovering a smartwatch-mediated social ecosystem. Our findings highlight the ice-breaking role of smartwatches in friendship initiation and their use for secret messaging with local peers. Within the online smartwatch community, peer status is determined by likes and visibility, leading to diverse pursuit activities (i.e., chu guanxi, jiazu, kuolie) and negative social dynamics. We discuss the core affordances of smartwatches and Chinese cultural factors that influence adolescent social behavior and offer implications for designing future wearables that responsibly and safely support adolescent socialization.
{"title":"Wrist-bound Guanxi, Jiazu, and Kuolie: Unpacking Chinese Adolescent Smartwatch-Mediated Socialization","authors":"Lanjing Liu, Chao Zhang, Zhicong Lu","doi":"10.1145/3613904.3642044","DOIUrl":"https://doi.org/10.1145/3613904.3642044","url":null,"abstract":"Adolescent peer relationships, essential for their development, are increasingly mediated by digital technologies. As this trend continues, wearable devices, especially smartwatches tailored for adolescents, are reshaping their socialization. In China, smartwatches like XTC have gained wide popularity, introducing unique features such as\"Bump-to-Connect\"and exclusive social platforms. Nonetheless, how these devices influence adolescents' peer experience remains unknown. Addressing this, we interviewed 18 Chinese adolescents (age: 11 -- 16), discovering a smartwatch-mediated social ecosystem. Our findings highlight the ice-breaking role of smartwatches in friendship initiation and their use for secret messaging with local peers. Within the online smartwatch community, peer status is determined by likes and visibility, leading to diverse pursuit activities (i.e., chu guanxi, jiazu, kuolie) and negative social dynamics. We discuss the core affordances of smartwatches and Chinese cultural factors that influence adolescent social behavior and offer implications for designing future wearables that responsibly and safely support adolescent socialization.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"353 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recommender systems are vulnerable to injective attacks, which inject limited fake users into the platforms to manipulate the exposure of target items to all users. In this work, we identify that conventional injective attackers overlook the fact that each item has its unique potential audience, and meanwhile, the attack difficulty across different users varies. Blindly attacking all users will result in a waste of fake user budgets and inferior attack performance. To address these issues, we focus on an under-explored attack task called target user attacks, aiming at promoting target items to a particular user group. In addition, we formulate the varying attack difficulty as heterogeneous treatment effects through a causal lens and propose an Uplift-guided Budget Allocation (UBA) framework. UBA estimates the treatment effect on each target user and optimizes the allocation of fake user budgets to maximize the attack performance. Theoretical and empirical analysis demonstrates the rationality of treatment effect estimation methods of UBA. By instantiating UBA on multiple attackers, we conduct extensive experiments on three datasets under various settings with different target items, target users, fake user budgets, victim models, and defense models, validating the effectiveness and robustness of UBA.
{"title":"Uplift Modeling for Target User Attacks on Recommender Systems","authors":"Wenjie Wang, Changsheng Wang, Fuli Feng, Wentao Shi, Daizong Ding, Tat-seng Chua","doi":"10.1145/3589334.3645403","DOIUrl":"https://doi.org/10.1145/3589334.3645403","url":null,"abstract":"Recommender systems are vulnerable to injective attacks, which inject limited fake users into the platforms to manipulate the exposure of target items to all users. In this work, we identify that conventional injective attackers overlook the fact that each item has its unique potential audience, and meanwhile, the attack difficulty across different users varies. Blindly attacking all users will result in a waste of fake user budgets and inferior attack performance. To address these issues, we focus on an under-explored attack task called target user attacks, aiming at promoting target items to a particular user group. In addition, we formulate the varying attack difficulty as heterogeneous treatment effects through a causal lens and propose an Uplift-guided Budget Allocation (UBA) framework. UBA estimates the treatment effect on each target user and optimizes the allocation of fake user budgets to maximize the attack performance. Theoretical and empirical analysis demonstrates the rationality of treatment effect estimation methods of UBA. By instantiating UBA on multiple attackers, we conduct extensive experiments on three datasets under various settings with different target items, target users, fake user budgets, victim models, and defense models, validating the effectiveness and robustness of UBA.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"362 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we study how well human speech can automatically be filtered when this overlaps with the voice and fan noise of a social robot, Pepper. We ultimately aim for an HRI scenario where the microphone can remain open when the robot is speaking, enabling a more natural turn-taking scheme where the human can interrupt the robot. To respond appropriately, the robot would need to understand what the interlocutor said in the overlapping part of the speech, which can be accomplished by target speech extraction (TSE). To investigate how well TSE can be accomplished in the context of the popular social robot Pepper, we set out to manufacture a datase composed of a mixture of recorded speech of Pepper itself, its fan noise (which is close to the microphones), and human speech as recorded by the Pepper microphone, in a room with low reverberation and high reverberation. Comparing a signal processing approach, with and without post-filtering, and a convolutional recurrent neural network (CRNN) approach to a state-of-the-art speaker identification-based TSE model, we found that the signal processing approach without post-filtering yielded the best performance in terms of Word Error Rate on the overlapping speech signals with low reverberation, while the CRNN approach is more robust for reverberation. These results show that estimating the human voice in overlapping speech with a robot is possible in real-life application, provided that the room reverberation is low and the human speech has a high volume or high pitch.
{"title":"Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction","authors":"Yue Li, Koen V. Hindriks, Florian Kunneman","doi":"10.1145/3648536.3648539","DOIUrl":"https://doi.org/10.1145/3648536.3648539","url":null,"abstract":"In this paper, we study how well human speech can automatically be filtered when this overlaps with the voice and fan noise of a social robot, Pepper. We ultimately aim for an HRI scenario where the microphone can remain open when the robot is speaking, enabling a more natural turn-taking scheme where the human can interrupt the robot. To respond appropriately, the robot would need to understand what the interlocutor said in the overlapping part of the speech, which can be accomplished by target speech extraction (TSE). To investigate how well TSE can be accomplished in the context of the popular social robot Pepper, we set out to manufacture a datase composed of a mixture of recorded speech of Pepper itself, its fan noise (which is close to the microphones), and human speech as recorded by the Pepper microphone, in a room with low reverberation and high reverberation. Comparing a signal processing approach, with and without post-filtering, and a convolutional recurrent neural network (CRNN) approach to a state-of-the-art speaker identification-based TSE model, we found that the signal processing approach without post-filtering yielded the best performance in terms of Word Error Rate on the overlapping speech signals with low reverberation, while the CRNN approach is more robust for reverberation. These results show that estimating the human voice in overlapping speech with a robot is possible in real-life application, provided that the room reverberation is low and the human speech has a high volume or high pitch.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"358 15","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}