首页 > 最新文献

ArXiv最新文献

英文 中文
Say Anything with Any Style 随心所欲
Pub Date : 2024-03-11 DOI: 10.1609/aaai.v38i5.28314
Shuai Tan, Bin Ji, Yu Ding, Ye Pan
Generating stylized talking head with diverse head motions is crucial for achieving natural-looking videos but still remains challenging. Previous works either adopt a regressive method to capture the speaking style, resulting in a coarse style that is averaged across all training data, or employ a universal network to synthesize videos with different styles which causes suboptimal performance. To address these, we propose a novel dynamic-weight method, namely Say Anything with Any Style (SAAS), which queries the discrete style representation via a generative model with a learned style codebook. Specifically, we develop a multi-task VQ-VAE that incorporates three closely related tasks to learn a style codebook as a prior for style extraction. This discrete prior, along with the generative model, enhances the precision and robustness when extracting the speaking styles of the given style clips. By utilizing the extracted style, a residual architecture comprising a canonical branch and style-specific branch is employed to predict the mouth shapes conditioned on any driving audio while transferring the speaking style from the source to any desired one. To adapt to different speaking styles, we steer clear of employing a universal network by exploring an elaborate HyperStyle to produce the style-specific weights offset for the style branch. Furthermore, we construct a pose generator and a pose codebook to store the quantized pose representation, allowing us to sample diverse head motions aligned with the audio and the extracted style. Experiments demonstrate that our approach surpasses state-of-the-art methods in terms of both lip-synchronization and stylized expression. Besides, we extend our SAAS to video-driven style editing field and achieve satisfactory performance as well.
生成具有不同头部动作的风格化说话头像对于实现自然的视频效果至关重要,但仍然具有挑战性。以往的研究要么采用回归法来捕捉说话风格,从而在所有训练数据中平均出一种粗略的风格;要么采用通用网络来合成不同风格的视频,从而导致性能不理想。为了解决这些问题,我们提出了一种新颖的动态加权方法,即 "随心所欲地说任何风格"(SAAS),它通过一个生成模型和一个已学风格代码集来查询离散风格表示。具体来说,我们开发了一种多任务 VQ-VAE,它结合了三个密切相关的任务来学习风格编码本,作为风格提取的先验。这种离散先验与生成模型一起,提高了从给定风格片段中提取说话风格的精确度和鲁棒性。通过利用所提取的风格,一个由典型分支和特定风格分支组成的残差架构被用来预测任何驱动音频条件下的口型,同时将说话风格从源传输到任何所需的风格。为了适应不同的说话风格,我们没有采用通用网络,而是通过探索精心设计的 HyperStyle 来为风格分支生成特定风格的权重偏移。此外,我们还构建了一个姿势生成器和一个姿势编码本,用于存储量化的姿势表示,使我们能够采样与音频和提取的风格相一致的各种头部动作。实验证明,我们的方法在唇语同步和风格化表达方面都超越了最先进的方法。此外,我们还将 SAAS 扩展到了视频驱动的风格编辑领域,并取得了令人满意的性能。
{"title":"Say Anything with Any Style","authors":"Shuai Tan, Bin Ji, Yu Ding, Ye Pan","doi":"10.1609/aaai.v38i5.28314","DOIUrl":"https://doi.org/10.1609/aaai.v38i5.28314","url":null,"abstract":"Generating stylized talking head with diverse head motions is crucial for achieving natural-looking videos but still remains challenging. Previous works either adopt a regressive method to capture the speaking style, resulting in a coarse style that is averaged across all training data, or employ a universal network to synthesize videos with different styles which causes suboptimal performance. To address these, we propose a novel dynamic-weight method, namely Say Anything with Any Style (SAAS), which queries the discrete style representation via a generative model with a learned style codebook. Specifically, we develop a multi-task VQ-VAE that incorporates three closely related tasks to learn a style codebook as a prior for style extraction. This discrete prior, along with the generative model, enhances the precision and robustness when extracting the speaking styles of the given style clips. By utilizing the extracted style, a residual architecture comprising a canonical branch and style-specific branch is employed to predict the mouth shapes conditioned on any driving audio while transferring the speaking style from the source to any desired one. To adapt to different speaking styles, we steer clear of employing a universal network by exploring an elaborate HyperStyle to produce the style-specific weights offset for the style branch. Furthermore, we construct a pose generator and a pose codebook to store the quantized pose representation, allowing us to sample diverse head motions aligned with the audio and the extracted style. Experiments demonstrate that our approach surpasses state-of-the-art methods in terms of both lip-synchronization and stylized expression. Besides, we extend our SAAS to video-driven style editing field and achieve satisfactory performance as well.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"31 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Born to Run, Programmed to Play: Mapping the Extended Reality Exergames Landscape 天生奔跑,编程游戏:绘制扩展现实体外游戏地图
Pub Date : 2024-03-11 DOI: 10.1145/3613904.3642124
Sukran Karaosmanoglu, S. Cmentowski, Lennart E. Nacke, Frank Steinicke
Many people struggle to exercise regularly, raising the risk of serious health-related issues. Extended reality (XR) exergames address these hurdles by combining physical exercises with enjoyable, immersive gameplay. While a growing body of research explores XR exergames, no previous review has structured this rapidly expanding research landscape. We conducted a scoping review of the current state of XR exergame research to (i) provide a structured overview, (ii) highlight trends, and (iii) uncover knowledge gaps. After identifying 1318 papers in human-computer interaction and medical databases, we ultimately included 186 papers in our analysis. We provide a quantitative and qualitative summary of XR exergame research, showing current trends and potential future considerations. Finally, we provide a taxonomy of XR exergames to help future design and methodological investigation and reporting.
许多人都很难做到定期锻炼,从而增加了出现严重健康问题的风险。扩展现实(XR)电子游戏通过将体育锻炼与令人愉悦、身临其境的游戏相结合,解决了这些障碍。虽然越来越多的研究对 XR 外部游戏进行了探索,但以前的综述还没有对这一迅速扩展的研究领域进行结构化分析。我们对 XR 外部游戏研究的现状进行了一次范围界定综述,旨在(i)提供一个结构化的概览,(ii)突出趋势,以及(iii)发现知识差距。在人机交互和医学数据库中识别了 1318 篇论文后,我们最终将 186 篇论文纳入了分析范围。我们对 XR 外部游戏研究进行了定量和定性总结,展示了当前的趋势和未来的潜在考虑因素。最后,我们提供了 XR 外部游戏分类法,以帮助未来的设计和方法调查与报告。
{"title":"Born to Run, Programmed to Play: Mapping the Extended Reality Exergames Landscape","authors":"Sukran Karaosmanoglu, S. Cmentowski, Lennart E. Nacke, Frank Steinicke","doi":"10.1145/3613904.3642124","DOIUrl":"https://doi.org/10.1145/3613904.3642124","url":null,"abstract":"Many people struggle to exercise regularly, raising the risk of serious health-related issues. Extended reality (XR) exergames address these hurdles by combining physical exercises with enjoyable, immersive gameplay. While a growing body of research explores XR exergames, no previous review has structured this rapidly expanding research landscape. We conducted a scoping review of the current state of XR exergame research to (i) provide a structured overview, (ii) highlight trends, and (iii) uncover knowledge gaps. After identifying 1318 papers in human-computer interaction and medical databases, we ultimately included 186 papers in our analysis. We provide a quantitative and qualitative summary of XR exergame research, showing current trends and potential future considerations. Finally, we provide a taxonomy of XR exergames to help future design and methodological investigation and reporting.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"27 37","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation 是什么让大型语言模型的量化变得困难?从扰动的角度进行实证研究
Pub Date : 2024-03-11 DOI: 10.1609/aaai.v38i16.29765
Zhuocheng Gong, Jiahao Liu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan
Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it as perturbations added to the weights and activations of LLMs. We call this approach ``the lens of perturbation". Using this lens, we conduct experiments with various artificial perturbations to explore their impact on LLM performance. Our findings reveal several connections between the properties of perturbations and LLM performance, providing insights into the failure cases of uniform quantization and suggesting potential solutions to improve the robustness of LLM quantization.To demonstrate the significance of our findings, we implement a simple non-uniform quantization approach based on our insights. Our experiments show that this approach achieves minimal performance degradation on both 4-bit weight quantization and 8-bit quantization for weights and activations. These results validate the correctness of our approach and highlight its potential to improve the efficiency of LLMs without sacrificing performance.
量化技术已成为提高大型语言模型(LLM)内存和计算效率的一种有前途的技术。虽然性能与效率之间的权衡已众所周知,但量化与 LLM 性能之间的关系仍有许多问题需要了解。为了阐明这种关系,我们提出了量化的新视角,将其视为添加到 LLM 权重和激活中的扰动。我们称这种方法为 "扰动透镜"。利用这一视角,我们对各种人工扰动进行了实验,以探索它们对 LLM 性能的影响。我们的研究结果揭示了扰动特性与 LLM 性能之间的若干联系,提供了对均匀量化失败案例的见解,并提出了提高 LLM 量化鲁棒性的潜在解决方案。为了证明我们研究结果的重要性,我们根据我们的见解实施了一种简单的非均匀量化方法。我们的实验表明,这种方法在权重和激活度的 4 位权重量化和 8 位量化上都实现了最小的性能下降。这些结果验证了我们方法的正确性,并凸显了它在不牺牲性能的前提下提高 LLM 效率的潜力。
{"title":"What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation","authors":"Zhuocheng Gong, Jiahao Liu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan","doi":"10.1609/aaai.v38i16.29765","DOIUrl":"https://doi.org/10.1609/aaai.v38i16.29765","url":null,"abstract":"Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it as perturbations added to the weights and activations of LLMs. We call this approach ``the lens of perturbation\". Using this lens, we conduct experiments with various artificial perturbations to explore their impact on LLM performance. Our findings reveal several connections between the properties of perturbations and LLM performance, providing insights into the failure cases of uniform quantization and suggesting potential solutions to improve the robustness of LLM quantization.\u0000To demonstrate the significance of our findings, we implement a simple non-uniform quantization approach based on our insights. Our experiments show that this approach achieves minimal performance degradation on both 4-bit weight quantization and 8-bit quantization for weights and activations. These results validate the correctness of our approach and highlight its potential to improve the efficiency of LLMs without sacrificing performance.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"26 48","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thought Graph: Generating Thought Process for Biological Reasoning 思维图谱生成生物推理的思维过程
Pub Date : 2024-03-11 DOI: 10.1145/3589335.3651572
Chi-Yang Hsu, Kyle Cox, Jiawei Xu, Zhen Tan, Tianhua Zhai, Mengzhou Hu, Dexter Pratt, Tianlong Chen, Ziniu Hu, Ying Ding
We present the Thought Graph as a novel framework to support complex reasoning and use gene set analysis as an example to uncover semantic relationships between biological processes. Our framework stands out for its ability to provide a deeper understanding of gene sets, significantly surpassing GSEA by 40.28% and LLM baselines by 5.38% based on cosine similarity to human annotations. Our analysis further provides insights into future directions of biological processes naming, and implications for bioinformatics and precision medicine.
我们提出了思想图谱(Thought Graph)这一支持复杂推理的新型框架,并以基因组分析为例,揭示了生物过程之间的语义关系。根据与人类注释的余弦相似度,我们的框架在深入理解基因组方面表现突出,比 GSEA 和 LLM 基线分别高出 40.28% 和 5.38%。我们的分析为生物过程命名的未来方向以及对生物信息学和精准医学的影响提供了进一步的见解。
{"title":"Thought Graph: Generating Thought Process for Biological Reasoning","authors":"Chi-Yang Hsu, Kyle Cox, Jiawei Xu, Zhen Tan, Tianhua Zhai, Mengzhou Hu, Dexter Pratt, Tianlong Chen, Ziniu Hu, Ying Ding","doi":"10.1145/3589335.3651572","DOIUrl":"https://doi.org/10.1145/3589335.3651572","url":null,"abstract":"We present the Thought Graph as a novel framework to support complex reasoning and use gene set analysis as an example to uncover semantic relationships between biological processes. Our framework stands out for its ability to provide a deeper understanding of gene sets, significantly surpassing GSEA by 40.28% and LLM baselines by 5.38% based on cosine similarity to human annotations. Our analysis further provides insights into future directions of biological processes naming, and implications for bioinformatics and precision medicine.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"23 46","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deriving Dependently-Typed OOP from First Principles - Extended Version with Additional Appendices 从第一原理衍生出依赖类型的 OOP--扩展版及附加附录
Pub Date : 2024-03-11 DOI: 10.1145/3649846
David Binder, Ingo Skupin, Tim Süberkrüb, Klaus Ostermann
The expression problem describes how most types can easily be extended with new ways to produce the type or new ways to consume the type, but not both. When abstract syntax trees are defined as an algebraic data type, for example, they can easily be extended with new consumers, such as print or eval, but adding a new constructor requires the modification of all existing pattern matches. The expression problem is one way to elucidate the difference between functional or data-oriented programs (easily extendable by new consumers) and object-oriented programs (easily extendable by new producers). This difference between programs which are extensible by new producers or new consumers also exists for dependently typed programming, but with one core difference: Dependently-typed programming almost exclusively follows the functional programming model and not the object-oriented model, which leaves an interesting space in the programming language landscape unexplored. In this paper, we explore the field of dependently-typed object-oriented programming by deriving it from first principles using the principle of duality. That is, we do not extend an existing object-oriented formalism with dependent types in an ad-hoc fashion, but instead start from a familiar data-oriented language and derive its dual fragment by the systematic use of defunctionalization and refunctionalization. Our central contribution is a dependently typed calculus which contains two dual language fragments. We provide type- and semantics-preserving transformations between these two language fragments: defunctionalization and refunctionalization. We have implemented this language and these transformations and use this implementation to explain the various ways in which constructions in dependently typed programming can be explained as special instances of the phenomenon of duality.
表达式问题描述了大多数类型是如何通过新的生成方式或新的消耗方式来轻松扩展的,但两者都不能。例如,当抽象语法树被定义为代数数据类型时,它们可以很容易地扩展为新的消费类型,如 print 或 eval,但添加一个新的构造函数则需要修改所有现有的模式匹配。表达式问题是阐明函数式或面向数据的程序(可通过新的消费者轻松扩展)与面向对象的程序(可通过新的生产者轻松扩展)之间区别的一种方法。在依赖类型编程中,新的生产者或新的消费者也可以扩展程序,但两者之间存在一个核心区别:依赖类型编程几乎完全遵循函数式编程模型,而不是面向对象的模型,这就在编程语言领域留下了一个尚未开发的有趣空间。在本文中,我们利用二元性原则从第一原理出发,探索依赖类型的面向对象编程领域。也就是说,我们不是以临时的方式用依赖类型扩展现有的面向对象形式主义,而是从熟悉的面向数据语言出发,通过系统地使用去函数化和再函数化,推导出其对偶片段。我们的核心贡献是一种依赖类型的微积分,它包含两个对偶语言片段。我们在这两个语言片段之间提供了类型和语义保留转换:去功能化和重功能化。我们已经实现了这种语言和这些转换,并用这种实现解释了依赖类型编程中的构造可以解释为对偶现象的特殊实例的各种方式。
{"title":"Deriving Dependently-Typed OOP from First Principles - Extended Version with Additional Appendices","authors":"David Binder, Ingo Skupin, Tim Süberkrüb, Klaus Ostermann","doi":"10.1145/3649846","DOIUrl":"https://doi.org/10.1145/3649846","url":null,"abstract":"The expression problem describes how most types can easily be extended with new ways to produce the type or new ways to consume the type, but not both. When abstract syntax trees are defined as an algebraic data type, for example, they can easily be extended with new consumers, such as print or eval, but adding a new constructor requires the modification of all existing pattern matches. The expression problem is one way to elucidate the difference between functional or data-oriented programs (easily extendable by new consumers) and object-oriented programs (easily extendable by new producers). This difference between programs which are extensible by new producers or new consumers also exists for dependently typed programming, but with one core difference: Dependently-typed programming almost exclusively follows the functional programming model and not the object-oriented model, which leaves an interesting space in the programming language landscape unexplored. In this paper, we explore the field of dependently-typed object-oriented programming by deriving it from first principles using the principle of duality. That is, we do not extend an existing object-oriented formalism with dependent types in an ad-hoc fashion, but instead start from a familiar data-oriented language and derive its dual fragment by the systematic use of defunctionalization and refunctionalization. Our central contribution is a dependently typed calculus which contains two dual language fragments. We provide type- and semantics-preserving transformations between these two language fragments: defunctionalization and refunctionalization. We have implemented this language and these transformations and use this implementation to explain the various ways in which constructions in dependently typed programming can be explained as special instances of the phenomenon of duality.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"29 17","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140395881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Cubes in Hand: A Design Space of Tangible Cubes for Visualizing 3D Spatio-Temporal Data in Mixed Reality 手握数据立方体:在混合现实中可视化三维时空数据的有形立方体设计空间
Pub Date : 2024-03-11 DOI: 10.1145/3613904.3642740
Shuqi He, Haonan Yao, Luyan Jiang, Kaiwen Li, Nan Xiang, Yue Li, Hai-Ning Liang, Lingyun Yu
Tangible interfaces in mixed reality (MR) environments allow for intuitive data interactions. Tangible cubes, with their rich interaction affordances, high maneuverability, and stable structure, are particularly well-suited for exploring multi-dimensional data types. However, the design potential of these cubes is underexplored. This study introduces a design space for tangible cubes in MR, focusing on interaction space, visualization space, sizes, and multiplicity. Using spatio-temporal data, we explored the interaction affordances of these cubes in a workshop (N=24). We identified unique interactions like rotating, tapping, and stacking, which are linked to augmented reality (AR) visualization commands. Integrating user-identified interactions, we created a design space for tangible-cube interactions and visualization. A prototype visualizing global health spending with small cubes was developed and evaluated, supporting both individual and combined cube manipulation. This research enhances our grasp of tangible interaction in MR, offering insights for future design and application in diverse data contexts.
混合现实(MR)环境中的有形界面可以实现直观的数据交互。有形立方体具有丰富的交互能力、高可操作性和稳定的结构,特别适合探索多维数据类型。然而,这些立方体的设计潜力尚未得到充分开发。本研究介绍了 MR 中有形立方体的设计空间,重点关注交互空间、可视化空间、尺寸和多重性。利用时空数据,我们在一个工作坊(24 人)中探索了这些立方体的交互能力。我们确定了独特的交互方式,如旋转、点击和堆叠,这些都与增强现实(AR)可视化命令相关联。通过整合用户识别的交互方式,我们为有形立方体的交互和可视化创建了一个设计空间。我们开发并评估了使用小立方体可视化全球卫生支出的原型,该原型支持单个和组合立方体操作。这项研究增强了我们对 MR 中有形交互的掌握,为未来在各种数据环境中的设计和应用提供了启示。
{"title":"Data Cubes in Hand: A Design Space of Tangible Cubes for Visualizing 3D Spatio-Temporal Data in Mixed Reality","authors":"Shuqi He, Haonan Yao, Luyan Jiang, Kaiwen Li, Nan Xiang, Yue Li, Hai-Ning Liang, Lingyun Yu","doi":"10.1145/3613904.3642740","DOIUrl":"https://doi.org/10.1145/3613904.3642740","url":null,"abstract":"Tangible interfaces in mixed reality (MR) environments allow for intuitive data interactions. Tangible cubes, with their rich interaction affordances, high maneuverability, and stable structure, are particularly well-suited for exploring multi-dimensional data types. However, the design potential of these cubes is underexplored. This study introduces a design space for tangible cubes in MR, focusing on interaction space, visualization space, sizes, and multiplicity. Using spatio-temporal data, we explored the interaction affordances of these cubes in a workshop (N=24). We identified unique interactions like rotating, tapping, and stacking, which are linked to augmented reality (AR) visualization commands. Integrating user-identified interactions, we created a design space for tangible-cube interactions and visualization. A prototype visualizing global health spending with small cubes was developed and evaluated, supporting both individual and combined cube manipulation. This research enhances our grasp of tangible interaction in MR, offering insights for future design and application in diverse data contexts.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"29 37","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transparent AI Disclosure Obligations: Who, What, When, Where, Why, How 透明的人工智能披露义务:谁、做什么、何时、何地、为什么、如何做
Pub Date : 2024-03-11 DOI: 10.1145/3613905.3650750
Abdallah El Ali, Karthikeya Puttur Venkatraj, Sophie Morosoli, Laurens Naudts, Natali Helberger, Pablo César
Advances in Generative Artificial Intelligence (AI) are resulting in AI-generated media output that is (nearly) indistinguishable from human-created content. This can drastically impact users and the media sector, especially given global risks of misinformation. While the currently discussed European AI Act aims at addressing these risks through Article 52's AI transparency obligations, its interpretation and implications remain unclear. In this early work, we adopt a participatory AI approach to derive key questions based on Article 52's disclosure obligations. We ran two workshops with researchers, designers, and engineers across disciplines (N=16), where participants deconstructed Article 52's relevant clauses using the 5W1H framework. We contribute a set of 149 questions clustered into five themes and 18 sub-themes. We believe these can not only help inform future legal developments and interpretations of Article 52, but also provide a starting point for Human-Computer Interaction research to (re-)examine disclosure transparency from a human-centered AI lens.
生成式人工智能(AI)的进步正导致人工智能生成的媒体输出(几乎)与人类创作的内容无异。这会对用户和媒体行业产生巨大影响,尤其是考虑到全球的错误信息风险。虽然目前讨论的《欧洲人工智能法》旨在通过第 52 条的人工智能透明度义务来应对这些风险,但其解释和影响仍不明确。在这项早期工作中,我们采用了参与式人工智能方法,根据第 52 条的披露义务提出关键问题。我们与不同学科的研究人员、设计师和工程师(16 人)举办了两次研讨会,与会者使用 5W1H 框架解构了第 52 条的相关条款。我们提出了 149 个问题,分为 5 个主题和 18 个子主题。我们相信,这些问题不仅有助于为未来的法律发展和对第 52 条的解释提供信息,还能为人机交互研究提供一个起点,从以人为本的人工智能视角来(重新)审视信息披露的透明度。
{"title":"Transparent AI Disclosure Obligations: Who, What, When, Where, Why, How","authors":"Abdallah El Ali, Karthikeya Puttur Venkatraj, Sophie Morosoli, Laurens Naudts, Natali Helberger, Pablo César","doi":"10.1145/3613905.3650750","DOIUrl":"https://doi.org/10.1145/3613905.3650750","url":null,"abstract":"Advances in Generative Artificial Intelligence (AI) are resulting in AI-generated media output that is (nearly) indistinguishable from human-created content. This can drastically impact users and the media sector, especially given global risks of misinformation. While the currently discussed European AI Act aims at addressing these risks through Article 52's AI transparency obligations, its interpretation and implications remain unclear. In this early work, we adopt a participatory AI approach to derive key questions based on Article 52's disclosure obligations. We ran two workshops with researchers, designers, and engineers across disciplines (N=16), where participants deconstructed Article 52's relevant clauses using the 5W1H framework. We contribute a set of 149 questions clustered into five themes and 18 sub-themes. We believe these can not only help inform future legal developments and interpretations of Article 52, but also provide a starting point for Human-Computer Interaction research to (re-)examine disclosure transparency from a human-centered AI lens.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"29 51","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains 针对创意领域多模态问题的横向评估 MAP-Elites
Pub Date : 2024-03-11 DOI: 10.1007/978-3-031-56992-0_26
Marvin Zammit, Antonios Liapis, Georgios N. Yannakakis
{"title":"MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains","authors":"Marvin Zammit, Antonios Liapis, Georgios N. Yannakakis","doi":"10.1007/978-3-031-56992-0_26","DOIUrl":"https://doi.org/10.1007/978-3-031-56992-0_26","url":null,"abstract":"","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"29 27","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140395873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chart4Blind: An Intelligent Interface for Chart Accessibility Conversion Chart4Blind:图表无障碍转换的智能界面
Pub Date : 2024-03-11 DOI: 10.1145/3640543.3645175
Omar Moured, Morris Baumgarten-Egemole, Alina Roitberg, Karin Müller, Thorsten Schwarz, Rainer Stiefelhagen
In a world driven by data visualization, ensuring the inclusive accessibility of charts for Blind and Visually Impaired (BVI) individuals remains a significant challenge. Charts are usually presented as raster graphics without textual and visual metadata needed for an equivalent exploration experience for BVI people. Additionally, converting these charts into accessible formats requires considerable effort from sighted individuals. Digitizing charts with metadata extraction is just one aspect of the issue; transforming it into accessible modalities, such as tactile graphics, presents another difficulty. To address these disparities, we propose Chart4Blind, an intelligent user interface that converts bitmap image representations of line charts into universally accessible formats. Chart4Blind achieves this transformation by generating Scalable Vector Graphics (SVG), Comma-Separated Values (CSV), and alternative text exports, all comply with established accessibility standards. Through interviews and a formal user study, we demonstrate that even inexperienced sighted users can make charts accessible in an average of 4 minutes using Chart4Blind, achieving a System Usability Scale rating of 90%. In comparison to existing approaches, Chart4Blind provides a comprehensive solution, generating end-to-end accessible SVGs suitable for assistive technologies such as embossed prints (papers and laser cut), 2D tactile displays, and screen readers. For additional information, including open-source codes and demos, please visit our project page https://moured.github.io/chart4blind/.
在以数据可视化为驱动力的世界里,确保盲人和视障人士(BVI)能够无障碍地使用图表仍然是一项重大挑战。图表通常以光栅图形的形式呈现,没有为盲人和视障人士提供同等探索体验所需的文字和视觉元数据。此外,将这些图表转换为无障碍格式需要视力正常的人付出相当大的努力。将图表数字化并提取元数据只是问题的一个方面,将其转换为无障碍模式(如触觉图形)则是另一个难题。为了解决这些差异,我们提出了 Chart4Blind,这是一个智能用户界面,可将线形图的位图图像表示转换为普遍可访问的格式。Chart4Blind 通过生成可缩放矢量图形 (SVG)、逗号分隔值 (CSV) 和替代文本导出来实现这种转换,所有这些都符合既定的无障碍标准。通过访谈和正式的用户研究,我们证明了即使是缺乏经验的视力正常用户也能使用 Chart4Blind 在平均 4 分钟内制作出无障碍图表,系统可用性量表评分达到 90%。与现有方法相比,Chart4Blind 提供了一个全面的解决方案,可生成端到端的无障碍 SVG,适用于浮雕印刷(纸张和激光切割)、二维触觉显示器和屏幕阅读器等辅助技术。如需了解更多信息,包括开放源代码和演示,请访问我们的项目页面 https://moured.github.io/chart4blind/。
{"title":"Chart4Blind: An Intelligent Interface for Chart Accessibility Conversion","authors":"Omar Moured, Morris Baumgarten-Egemole, Alina Roitberg, Karin Müller, Thorsten Schwarz, Rainer Stiefelhagen","doi":"10.1145/3640543.3645175","DOIUrl":"https://doi.org/10.1145/3640543.3645175","url":null,"abstract":"In a world driven by data visualization, ensuring the inclusive accessibility of charts for Blind and Visually Impaired (BVI) individuals remains a significant challenge. Charts are usually presented as raster graphics without textual and visual metadata needed for an equivalent exploration experience for BVI people. Additionally, converting these charts into accessible formats requires considerable effort from sighted individuals. Digitizing charts with metadata extraction is just one aspect of the issue; transforming it into accessible modalities, such as tactile graphics, presents another difficulty. To address these disparities, we propose Chart4Blind, an intelligent user interface that converts bitmap image representations of line charts into universally accessible formats. Chart4Blind achieves this transformation by generating Scalable Vector Graphics (SVG), Comma-Separated Values (CSV), and alternative text exports, all comply with established accessibility standards. Through interviews and a formal user study, we demonstrate that even inexperienced sighted users can make charts accessible in an average of 4 minutes using Chart4Blind, achieving a System Usability Scale rating of 90%. In comparison to existing approaches, Chart4Blind provides a comprehensive solution, generating end-to-end accessible SVGs suitable for assistive technologies such as embossed prints (papers and laser cut), 2D tactile displays, and screen readers. For additional information, including open-source codes and demos, please visit our project page https://moured.github.io/chart4blind/.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"10 14","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140395834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SoniWeight Shoes: Investigating Effects and Personalization of a Wearable Sound Device for Altering Body Perception and Behavior SoniWeight 鞋:调查用于改变身体感知和行为的可穿戴声音设备的效果和个性化设计
Pub Date : 2024-03-11 DOI: 10.1145/3613904.3642651
A. D'Adamo, M. Roel-Lesur, L. Turmo-Vidal, Mohammad Mahdi Dehshibi, D. D. L. Prida, J. R. Díaz-Durán, L. A. Azpicueta-Ruiz, A. Valjamae, A. T. I. Lab, Dei Interactive Systems Group, Department of Computer Science, Engineering, U. C. I. Madrid, Madrid, Spain, Department of Quantitative Theory, Communications, Johan Skytte Institute of Political Studies, University of Tartu, Tartu, Estonia., Ucl Interaction Centre, University College London, London, United Kingdom.
Changes in body perception influence behavior and emotion and can be induced through multisensory feedback. Auditory feedback to one's actions can trigger such alterations; however, it is unclear which individual factors modulate these effects. We employ and evaluate SoniWeight Shoes, a wearable device based on literature for altering one's weight perception through manipulated footstep sounds. In a healthy population sample across a spectrum of individuals (n=84) with varying degrees of eating disorder symptomatology, physical activity levels, body concerns, and mental imagery capacities, we explore the effects of three sound conditions (low-frequency, high-frequency and control) on extensive body perception measures (demographic, behavioral, physiological, psychological, and subjective). Analyses revealed an impact of individual differences in each of these dimensions. Besides replicating previous findings, we reveal and highlight the role of individual differences in body perception, offering avenues for personalized sonification strategies. Datasets, technical refinements, and novel body map quantification tools are provided.
身体感知的变化会影响行为和情绪,并可通过多感官反馈诱发。对个人行为的听觉反馈可以引发这种变化;然而,目前还不清楚是哪些个体因素调节了这些影响。我们采用并评估了 SoniWeight Shoes,这是一种基于文献的可穿戴设备,通过操纵脚步声来改变人的体重感知。我们以健康人群为样本,研究了三种声音条件(低频、高频和控制)对广泛身体感知测量(人口、行为、生理、心理和主观)的影响。分析表明,个体差异对上述每个方面都有影响。除了重复以前的研究结果,我们还揭示并强调了个体差异在身体感知中的作用,为个性化声波策略提供了途径。我们还提供了数据集、技术改进和新颖的人体图量化工具。
{"title":"SoniWeight Shoes: Investigating Effects and Personalization of a Wearable Sound Device for Altering Body Perception and Behavior","authors":"A. D'Adamo, M. Roel-Lesur, L. Turmo-Vidal, Mohammad Mahdi Dehshibi, D. D. L. Prida, J. R. Díaz-Durán, L. A. Azpicueta-Ruiz, A. Valjamae, A. T. I. Lab, Dei Interactive Systems Group, Department of Computer Science, Engineering, U. C. I. Madrid, Madrid, Spain, Department of Quantitative Theory, Communications, Johan Skytte Institute of Political Studies, University of Tartu, Tartu, Estonia., Ucl Interaction Centre, University College London, London, United Kingdom.","doi":"10.1145/3613904.3642651","DOIUrl":"https://doi.org/10.1145/3613904.3642651","url":null,"abstract":"Changes in body perception influence behavior and emotion and can be induced through multisensory feedback. Auditory feedback to one's actions can trigger such alterations; however, it is unclear which individual factors modulate these effects. We employ and evaluate SoniWeight Shoes, a wearable device based on literature for altering one's weight perception through manipulated footstep sounds. In a healthy population sample across a spectrum of individuals (n=84) with varying degrees of eating disorder symptomatology, physical activity levels, body concerns, and mental imagery capacities, we explore the effects of three sound conditions (low-frequency, high-frequency and control) on extensive body perception measures (demographic, behavioral, physiological, psychological, and subjective). Analyses revealed an impact of individual differences in each of these dimensions. Besides replicating previous findings, we reveal and highlight the role of individual differences in body perception, offering avenues for personalized sonification strategies. Datasets, technical refinements, and novel body map quantification tools are provided.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"26 18","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ArXiv
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1