首页 > 最新文献

ArXiv最新文献

英文 中文
Reg-NF: Efficient Registration of Implicit Surfaces within Neural Fields Reg-NF:神经场内隐含曲面的高效注册
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09722
Stephen Hausler, David Hall, Sutharsan Mahendren, Peyman Moghadam
Neural fields, coordinate-based neural networks, have recently gained popularity for implicitly representing a scene. In contrast to classical methods that are based on explicit representations such as point clouds, neural fields provide a continuous scene representation able to represent 3D geometry and appearance in a way which is compact and ideal for robotics applications. However, limited prior methods have investigated registering multiple neural fields by directly utilising these continuous implicit representations. In this paper, we present Reg-NF, a neural fields-based registration that optimises for the relative 6-DoF transformation between two arbitrary neural fields, even if those two fields have different scale factors. Key components of Reg-NF include a bidirectional registration loss, multi-view surface sampling, and utilisation of volumetric signed distance functions (SDFs). We showcase our approach on a new neural field dataset for evaluating registration problems. We provide an exhaustive set of experiments and ablation studies to identify the performance of our approach, while also discussing limitations to provide future direction to the research community on open challenges in utilizing neural fields in unconstrained environments.
神经场是一种基于坐标的神经网络,最近在隐式表示场景方面大受欢迎。与基于显式表示(如点云)的传统方法相比,神经场提供了一种连续的场景表示,能够以一种紧凑的方式表示三维几何和外观,是机器人应用的理想选择。然而,此前通过直接利用这些连续的隐式表示来研究多个神经场注册的方法非常有限。在本文中,我们介绍了 Reg-NF,这是一种基于神经场的配准方法,可优化两个任意神经场之间的相对 6-DoF 变换,即使这两个神经场具有不同的比例因子。Reg-NF 的关键组成部分包括双向配准损失、多视角表面采样和利用体积符号距离函数 (SDF)。我们在一个用于评估配准问题的新神经场数据集上展示了我们的方法。我们提供了一套详尽的实验和消融研究,以确定我们方法的性能,同时还讨论了局限性,为研究界在无约束环境中利用神经场的公开挑战提供了未来方向。
{"title":"Reg-NF: Efficient Registration of Implicit Surfaces within Neural Fields","authors":"Stephen Hausler, David Hall, Sutharsan Mahendren, Peyman Moghadam","doi":"10.48550/arXiv.2402.09722","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09722","url":null,"abstract":"Neural fields, coordinate-based neural networks, have recently gained popularity for implicitly representing a scene. In contrast to classical methods that are based on explicit representations such as point clouds, neural fields provide a continuous scene representation able to represent 3D geometry and appearance in a way which is compact and ideal for robotics applications. However, limited prior methods have investigated registering multiple neural fields by directly utilising these continuous implicit representations. In this paper, we present Reg-NF, a neural fields-based registration that optimises for the relative 6-DoF transformation between two arbitrary neural fields, even if those two fields have different scale factors. Key components of Reg-NF include a bidirectional registration loss, multi-view surface sampling, and utilisation of volumetric signed distance functions (SDFs). We showcase our approach on a new neural field dataset for evaluating registration problems. We provide an exhaustive set of experiments and ablation studies to identify the performance of our approach, while also discussing limitations to provide future direction to the research community on open challenges in utilizing neural fields in unconstrained environments.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"18 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition LLMs as Bridges:重构基础多模态命名实体识别
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09989
Jinyuan Li, Han Li, Di Sun, Jiahao Wang, Wenkun Zhang, Zan Wang, Gang Pan
Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodal task that aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging properties: 1) The weak correlation between image-text pairs in social media results in a significant portion of named entities being ungroundable. 2) There exists a distinction between coarse-grained referring expressions commonly used in similar tasks (e.g., phrase localization, referring expression comprehension) and fine-grained named entities. In this paper, we propose RiVEG, a unified framework that reformulates GMNER into a joint MNER-VE-VG task by leveraging large language models (LLMs) as a connecting bridge. This reformulation brings two benefits: 1) It maintains the optimal MNER performance and eliminates the need for employing object detection methods to pre-extract regional features, thereby naturally addressing two major limitations of existing GMNER methods. 2) The introduction of entity expansion expression and Visual Entailment (VE) Module unifies Visual Grounding (VG) and Entity Grounding (EG). It enables RiVEG to effortlessly inherit the Visual Entailment and Visual Grounding capabilities of any current or prospective multimodal pretraining models. Extensive experiments demonstrate that RiVEG outperforms state-of-the-art methods on the existing GMNER dataset and achieves absolute leads of 10.65%, 6.21%, and 8.83% in all three subtasks.
基础多模态命名实体识别(GMNER)是一项新兴的多模态任务,旨在识别命名实体、实体类型及其相应的视觉区域。GMNER 任务有两个具有挑战性的特性:1) 社交媒体中图像-文本对之间的相关性很弱,这导致相当一部分命名实体是不成立的。2) 类似任务中常用的粗粒度指代表达(如短语定位、指代表达理解)与细粒度命名实体之间存在区别。在本文中,我们提出了 RiVEG 这一统一框架,通过利用大型语言模型(LLM)作为连接桥梁,将 GMNER 重新表述为 MNER-VE-VG 联合任务。这种重构带来了两个好处:1) 它保持了最佳的 MNER 性能,并且无需使用对象检测方法来预先提取区域特征,从而自然而然地解决了现有 GMNER 方法的两大局限性。2) 引入实体扩展表达式和 Visual Entailment(VE)模块,将视觉接地(VG)和实体接地(EG)统一起来。它使 RiVEG 能够毫不费力地继承任何当前或未来多模态预训练模型的 Visual Entailment 和 Visual Grounding 功能。广泛的实验证明,在现有的 GMNER 数据集上,RiVEG 的表现优于最先进的方法,并在所有三个子任务中分别取得了 10.65%、6.21% 和 8.83% 的绝对领先优势。
{"title":"LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition","authors":"Jinyuan Li, Han Li, Di Sun, Jiahao Wang, Wenkun Zhang, Zan Wang, Gang Pan","doi":"10.48550/arXiv.2402.09989","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09989","url":null,"abstract":"Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodal task that aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging properties: 1) The weak correlation between image-text pairs in social media results in a significant portion of named entities being ungroundable. 2) There exists a distinction between coarse-grained referring expressions commonly used in similar tasks (e.g., phrase localization, referring expression comprehension) and fine-grained named entities. In this paper, we propose RiVEG, a unified framework that reformulates GMNER into a joint MNER-VE-VG task by leveraging large language models (LLMs) as a connecting bridge. This reformulation brings two benefits: 1) It maintains the optimal MNER performance and eliminates the need for employing object detection methods to pre-extract regional features, thereby naturally addressing two major limitations of existing GMNER methods. 2) The introduction of entity expansion expression and Visual Entailment (VE) Module unifies Visual Grounding (VG) and Entity Grounding (EG). It enables RiVEG to effortlessly inherit the Visual Entailment and Visual Grounding capabilities of any current or prospective multimodal pretraining models. Extensive experiments demonstrate that RiVEG outperforms state-of-the-art methods on the existing GMNER dataset and achieves absolute leads of 10.65%, 6.21%, and 8.83% in all three subtasks.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"26 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Symmetry-Breaking Augmentations for Ad Hoc Teamwork 用于临时团队协作的对称性破坏增强技术
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09984
Ravi Hammond, Dustin Craggs, Mingyu Guo, Jakob Foerster, Ian Reid
In many collaborative settings, artificial intelligence (AI) agents must be able to adapt to new teammates that use unknown or previously unobserved strategies. While often simple for humans, this can be challenging for AI agents. For example, if an AI agent learns to drive alongside others (a training set) that only drive on one side of the road, it may struggle to adapt this experience to coordinate with drivers on the opposite side, even if their behaviours are simply flipped along the left-right symmetry. To address this we introduce symmetry-breaking augmentations (SBA), which increases diversity in the behaviour of training teammates by applying a symmetry-flipping operation. By learning a best-response to the augmented set of teammates, our agent is exposed to a wider range of behavioural conventions, improving performance when deployed with novel teammates. We demonstrate this experimentally in two settings, and show that our approach improves upon previous ad hoc teamwork results in the challenging card game Hanabi. We also propose a general metric for estimating symmetry-dependency amongst a given set of policies.
在许多协作环境中,人工智能(AI)代理必须能够适应使用未知或以前未观察到的策略的新队友。虽然这对人类来说通常很简单,但对人工智能代理来说却极具挑战性。例如,如果一个人工智能代理学会了与只在道路一侧驾驶的其他人(训练集)并肩驾驶,那么即使他们的行为只是沿着左右对称的方向翻转,它也可能难以调整这种经验来与对面的驾驶员协调。为了解决这个问题,我们引入了对称破缺增强(SBA),通过应用对称翻转操作来增加训练队友行为的多样性。通过学习对增强队友集的最佳响应,我们的代理可以接触到更广泛的行为惯例,从而在与新队友一起部署时提高性能。我们在两种环境中进行了实验演示,结果表明我们的方法改进了之前在具有挑战性的纸牌游戏 "花牌"(Hanabi)中的临时团队合作结果。我们还提出了一种通用指标,用于估算给定策略集之间的对称依赖性。
{"title":"Symmetry-Breaking Augmentations for Ad Hoc Teamwork","authors":"Ravi Hammond, Dustin Craggs, Mingyu Guo, Jakob Foerster, Ian Reid","doi":"10.48550/arXiv.2402.09984","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09984","url":null,"abstract":"In many collaborative settings, artificial intelligence (AI) agents must be able to adapt to new teammates that use unknown or previously unobserved strategies. While often simple for humans, this can be challenging for AI agents. For example, if an AI agent learns to drive alongside others (a training set) that only drive on one side of the road, it may struggle to adapt this experience to coordinate with drivers on the opposite side, even if their behaviours are simply flipped along the left-right symmetry. To address this we introduce symmetry-breaking augmentations (SBA), which increases diversity in the behaviour of training teammates by applying a symmetry-flipping operation. By learning a best-response to the augmented set of teammates, our agent is exposed to a wider range of behavioural conventions, improving performance when deployed with novel teammates. We demonstrate this experimentally in two settings, and show that our approach improves upon previous ad hoc teamwork results in the challenging card game Hanabi. We also propose a general metric for estimating symmetry-dependency amongst a given set of policies.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"28 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validation of homogenized finite element models of human metastatic vertebrae using digital volume correlation 利用数字体积相关性验证人体转移椎体的均质化有限元模型
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09828
Chiara Garavelli, A. Aldieri, M. Palanca, Enrico Dall'Ara, M. Viceconti
The incidence of vertebral fragility fracture is increased by the presence of preexisting pathologies such as metastatic disease. Computational tools could support the fracture prediction and consequently the decision of the best medical treatment. Anyway, validation is required to use these tools in clinical practice. To address this necessity, in this study subject-specific homogenized finite element models of single vertebrae were generated from micro CT images for both healthy and metastatic vertebrae and validated against experimental data. More in detail, spine segments were tested under compression and imaged with micro CT. The displacements field could be extracted for each vertebra singularly using the digital volume correlation full-field technique. Homogenized finite element models of each vertebra could hence be built from the micro CT images, applying boundary conditions consistent with the experimental displacements at the endplates. Numerical and experimental displacements and strains fields were eventually compared. In addition, the outcomes of a micro CT based homogenized model were compared to the ones of a clinical-CT based model. Good agreement between experimental and computational displacement fields, both for healthy and metastatic vertebrae, was found. Comparison between micro CT based and clinical-CT based outcomes showed strong correlations. Furthermore, models were able to qualitatively identify the regions which experimentally showed the highest strain concentration. In conclusion, the combination of experimental full-field technique and the in-silico modelling allowed the development of a promising pipeline for validation of fracture risk predictors, although further improvements in both fields are needed to better analyse quantitatively the post-yield behaviour of the vertebra.
椎体脆性骨折的发生率会因转移性疾病等原有病变的存在而增加。计算工具可以帮助预测骨折,从而决定最佳的治疗方案。然而,在临床实践中使用这些工具需要验证。为了解决这一问题,本研究根据健康椎体和转移椎体的显微 CT 图像生成了单个椎体的特定受试者同质化有限元模型,并根据实验数据进行了验证。更详细地说,脊柱节段在压缩条件下进行了测试,并通过微型 CT 进行了成像。利用数字体积相关全场技术,可单独提取每个椎体的位移场。因此,每个椎体的均质化有限元模型可根据显微 CT 图像建立,并应用与实验中的终板位移一致的边界条件。最终对数值和实验位移和应变场进行了比较。此外,还将基于微型 CT 的均质化模型结果与基于临床 CT 的模型结果进行了比较。结果发现,无论是健康椎体还是转移椎体,实验位移场和计算位移场都非常一致。基于微型 CT 的结果与基于临床 CT 的结果之间的比较显示出很强的相关性。此外,模型还能定性地识别出实验中应变集中度最高的区域。总之,实验全场技术与室内建模相结合,为验证骨折风险预测指标开发了一个前景广阔的管道,尽管还需要在这两个领域进一步改进,以更好地定量分析椎体的屈服后行为。
{"title":"Validation of homogenized finite element models of human metastatic vertebrae using digital volume correlation","authors":"Chiara Garavelli, A. Aldieri, M. Palanca, Enrico Dall'Ara, M. Viceconti","doi":"10.48550/arXiv.2402.09828","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09828","url":null,"abstract":"The incidence of vertebral fragility fracture is increased by the presence of preexisting pathologies such as metastatic disease. Computational tools could support the fracture prediction and consequently the decision of the best medical treatment. Anyway, validation is required to use these tools in clinical practice. To address this necessity, in this study subject-specific homogenized finite element models of single vertebrae were generated from micro CT images for both healthy and metastatic vertebrae and validated against experimental data. More in detail, spine segments were tested under compression and imaged with micro CT. The displacements field could be extracted for each vertebra singularly using the digital volume correlation full-field technique. Homogenized finite element models of each vertebra could hence be built from the micro CT images, applying boundary conditions consistent with the experimental displacements at the endplates. Numerical and experimental displacements and strains fields were eventually compared. In addition, the outcomes of a micro CT based homogenized model were compared to the ones of a clinical-CT based model. Good agreement between experimental and computational displacement fields, both for healthy and metastatic vertebrae, was found. Comparison between micro CT based and clinical-CT based outcomes showed strong correlations. Furthermore, models were able to qualitatively identify the regions which experimentally showed the highest strain concentration. In conclusion, the combination of experimental full-field technique and the in-silico modelling allowed the development of a promising pipeline for validation of fracture risk predictors, although further improvements in both fields are needed to better analyse quantitatively the post-yield behaviour of the vertebra.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"26 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community 社交奖励:通过来自在线创意社区的百万用户反馈评估和改进生成式人工智能
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09872
Arman Isajanyan, Artur Shatveryan, David Kocharyan, Zhangyang Wang, Humphrey Shi
Social reward as a form of community recognition provides a strong source of motivation for users of online platforms to engage and contribute with content. The recent progress of text-conditioned image synthesis has ushered in a collaborative era where AI empowers users to craft original visual artworks seeking community validation. Nevertheless, assessing these models in the context of collective community preference introduces distinct challenges. Existing evaluation methods predominantly center on limited size user studies guided by image quality and prompt alignment. This work pioneers a paradigm shift, unveiling Social Reward - an innovative reward modeling framework that leverages implicit feedback from social network users engaged in creative editing of generated images. We embark on an extensive journey of dataset curation and refinement, drawing from Picsart: an online visual creation and editing platform, yielding a first million-user-scale dataset of implicit human preferences for user-generated visual art named Picsart Image-Social. Our analysis exposes the shortcomings of current metrics in modeling community creative preference of text-to-image models' outputs, compelling us to introduce a novel predictive model explicitly tailored to address these limitations. Rigorous quantitative experiments and user study show that our Social Reward model aligns better with social popularity than existing metrics. Furthermore, we utilize Social Reward to fine-tune text-to-image models, yielding images that are more favored by not only Social Reward, but also other established metrics. These findings highlight the relevance and effectiveness of Social Reward in assessing community appreciation for AI-generated artworks, establishing a closer alignment with users' creative goals: creating popular visual art. Codes can be accessed at https://github.com/Picsart-AI-Research/Social-Reward
社交奖励作为一种社区认可形式,为网络平台用户参与和贡献内容提供了强大的动力。文本条件图像合成技术的最新进展开创了一个协作时代,人工智能赋予用户制作原创视觉艺术作品以寻求社区认可的能力。然而,在社区集体偏好的背景下评估这些模型带来了独特的挑战。现有的评估方法主要集中在以图像质量和提示对齐为指导的规模有限的用户研究上。这项工作开创了范式转变的先河,揭开了社交奖励的神秘面纱--这是一个创新的奖励建模框架,它利用了社交网络用户对生成图像进行创意编辑时的隐性反馈。我们从在线视觉创作和编辑平台 Picsart 出发,对数据集进行了广泛的整理和完善,首次建立了百万用户规模的数据集,其中包含了人类对用户生成的视觉艺术的隐性偏好,该数据集被命名为 Picsart Image-Social。我们的分析揭示了当前在对文本到图像模型输出的社区创意偏好进行建模时所使用的指标存在缺陷,这迫使我们引入了一个新的预测模型来明确解决这些局限性。严格的定量实验和用户研究表明,与现有指标相比,我们的 Social Reward 模型更符合社会流行度。此外,我们还利用 "社交奖赏 "对文本到图像的模型进行了微调,得出的图像不仅更受 "社交奖赏 "的青睐,也更受其他既定指标的青睐。这些发现凸显了 Social Reward 在评估社区对人工智能生成的艺术作品的赞赏方面的相关性和有效性,从而与用户的创作目标--创造受欢迎的视觉艺术--建立了更紧密的联系。代码可从以下网址获取:https://github.com/Picsart-AI-Research/Social-Reward
{"title":"Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community","authors":"Arman Isajanyan, Artur Shatveryan, David Kocharyan, Zhangyang Wang, Humphrey Shi","doi":"10.48550/arXiv.2402.09872","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09872","url":null,"abstract":"Social reward as a form of community recognition provides a strong source of motivation for users of online platforms to engage and contribute with content. The recent progress of text-conditioned image synthesis has ushered in a collaborative era where AI empowers users to craft original visual artworks seeking community validation. Nevertheless, assessing these models in the context of collective community preference introduces distinct challenges. Existing evaluation methods predominantly center on limited size user studies guided by image quality and prompt alignment. This work pioneers a paradigm shift, unveiling Social Reward - an innovative reward modeling framework that leverages implicit feedback from social network users engaged in creative editing of generated images. We embark on an extensive journey of dataset curation and refinement, drawing from Picsart: an online visual creation and editing platform, yielding a first million-user-scale dataset of implicit human preferences for user-generated visual art named Picsart Image-Social. Our analysis exposes the shortcomings of current metrics in modeling community creative preference of text-to-image models' outputs, compelling us to introduce a novel predictive model explicitly tailored to address these limitations. Rigorous quantitative experiments and user study show that our Social Reward model aligns better with social popularity than existing metrics. Furthermore, we utilize Social Reward to fine-tune text-to-image models, yielding images that are more favored by not only Social Reward, but also other established metrics. These findings highlight the relevance and effectiveness of Social Reward in assessing community appreciation for AI-generated artworks, establishing a closer alignment with users' creative goals: creating popular visual art. Codes can be accessed at https://github.com/Picsart-AI-Research/Social-Reward","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"24 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative Programming 探索大型语言模型在艺术创作中的潜力:创意编程的合作与反思
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09750
Anqi Wang, Zhizhuo Yin, Yulu Hu, Yuanyuan Mao, Pan Hui
Recently, the potential of large language models (LLMs) has been widely used in assisting programming. However, current research does not explore the artist potential of LLMs in creative coding within artist and AI collaboration. Our work probes the reflection type of artists in the creation process with such collaboration. We compare two common collaboration approaches: invoking the entire program and multiple subtasks. Our findings exhibit artists' different stimulated reflections in two different methods. Our finding also shows the correlation of reflection type with user performance, user satisfaction, and subjective experience in two collaborations through conducting two methods, including experimental data and qualitative interviews. In this sense, our work reveals the artistic potential of LLM in creative coding. Meanwhile, we provide a critical lens of human-AI collaboration from the artists' perspective and expound design suggestions for future work of AI-assisted creative tasks.
最近,大型语言模型(LLM)的潜力已被广泛用于辅助编程。然而,目前的研究并没有探索 LLM 在艺术家与人工智能合作的创意编码中的艺术家潜力。我们的工作探究了艺术家在创作过程中对这种合作的反思类型。我们比较了两种常见的合作方式:调用整个程序和多个子任务。我们的研究结果表明,在两种不同的方法中,艺术家们的反思受到了不同的刺激。通过实验数据和定性访谈等两种方法,我们的研究结果还显示了在两种协作中,反思类型与用户表现、用户满意度和主观体验之间的相关性。从这个意义上说,我们的工作揭示了 LLM 在创意编码方面的艺术潜力。同时,我们还从艺术家的视角为人类与人工智能的合作提供了一个批判性视角,并为人工智能辅助创意任务的未来工作提出了设计建议。
{"title":"Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative Programming","authors":"Anqi Wang, Zhizhuo Yin, Yulu Hu, Yuanyuan Mao, Pan Hui","doi":"10.48550/arXiv.2402.09750","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09750","url":null,"abstract":"Recently, the potential of large language models (LLMs) has been widely used in assisting programming. However, current research does not explore the artist potential of LLMs in creative coding within artist and AI collaboration. Our work probes the reflection type of artists in the creation process with such collaboration. We compare two common collaboration approaches: invoking the entire program and multiple subtasks. Our findings exhibit artists' different stimulated reflections in two different methods. Our finding also shows the correlation of reflection type with user performance, user satisfaction, and subjective experience in two collaborations through conducting two methods, including experimental data and qualitative interviews. In this sense, our work reveals the artistic potential of LLM in creative coding. Meanwhile, we provide a critical lens of human-AI collaboration from the artists' perspective and expound design suggestions for future work of AI-assisted creative tasks.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"11 21","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DE-COP: Detecting Copyrighted Content in Language Models Training Data DE-COP:检测语言模型训练数据中的版权内容
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09910
Andr'e V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li
How can we detect if copyrighted content was used in the training process of a language model, considering that the training data is typically undisclosed? We are motivated by the premise that a language model is likely to identify verbatim excerpts from its training text. We propose DE-COP, a method to determine whether a piece of copyrighted content was included in training. DE-COP's core approach is to probe an LLM with multiple-choice questions, whose options include both verbatim text and their paraphrases. We construct BookTection, a benchmark with excerpts from 165 books published prior and subsequent to a model's training cutoff, along with their paraphrases. Our experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give $approx$ 4% accuracy. Our code and datasets are available at https://github.com/avduarte333/DE-COP_Method
考虑到训练数据通常不公开,我们如何检测语言模型的训练过程中是否使用了受版权保护的内容?我们的出发点是,语言模型很可能会识别训练文本中的逐字节选。我们提出了 DE-COP,一种确定训练中是否包含受版权保护内容的方法。DE-COP 的核心方法是用多选题探查 LLM,其选项包括逐字文本及其转述。我们构建了一个 BookTection 基准,其中包含模型训练截止日期前后出版的 165 本书籍的节选及其释义。我们的实验表明,在有对数可用的模型上,DE-COP 的检测性能(AUC)比之前的最佳方法高出 9.6%。此外,DE-COP 在完全黑箱模型上检测可疑图书的平均准确率也达到了 72%,而之前的方法只有 $approx$ 4% 的准确率。我们的代码和数据集可在 https://github.com/avduarte333/DE-COP_Method 上获取。
{"title":"DE-COP: Detecting Copyrighted Content in Language Models Training Data","authors":"Andr'e V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li","doi":"10.48550/arXiv.2402.09910","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09910","url":null,"abstract":"How can we detect if copyrighted content was used in the training process of a language model, considering that the training data is typically undisclosed? We are motivated by the premise that a language model is likely to identify verbatim excerpts from its training text. We propose DE-COP, a method to determine whether a piece of copyrighted content was included in training. DE-COP's core approach is to probe an LLM with multiple-choice questions, whose options include both verbatim text and their paraphrases. We construct BookTection, a benchmark with excerpts from 165 books published prior and subsequent to a model's training cutoff, along with their paraphrases. Our experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give $approx$ 4% accuracy. Our code and datasets are available at https://github.com/avduarte333/DE-COP_Method","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"11 20","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining Pathological Bias in a Generative Adversarial Network Discriminator: A Case Study on a StyleGAN3 Model 检查生成式对抗网络判别器中的病态偏差:StyleGAN3 模型案例研究
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09786
Alvin Grissom II, Ryan F. Lei, Jeova Farias Sales Rocha Neto, Bailey Lin, Ryan Trotter
Generative adversarial networks generate photorealistic faces that are often indistinguishable by humans from real faces. We find that the discriminator in the pre-trained StyleGAN3 model, a popular GAN network, systematically stratifies scores by both image- and face-level qualities and that this disproportionately affects images across gender, race, and other categories. We examine the discriminator's bias for color and luminance across axes perceived race and gender; we then examine axes common in research on stereotyping in social psychology.
生成对抗网络生成的逼真人脸通常无法被人类与真实人脸区分开来。我们发现,预先训练好的 StyleGAN3 模型(一种流行的 GAN 网络)中的判别器会根据图像和人脸级别的特质对得分进行系统分层,这对不同性别、种族和其他类别的图像产生了不成比例的影响。我们研究了判别器在感知种族和性别的轴上对颜色和亮度的偏差;然后我们研究了社会心理学中刻板印象研究中常见的轴。
{"title":"Examining Pathological Bias in a Generative Adversarial Network Discriminator: A Case Study on a StyleGAN3 Model","authors":"Alvin Grissom II, Ryan F. Lei, Jeova Farias Sales Rocha Neto, Bailey Lin, Ryan Trotter","doi":"10.48550/arXiv.2402.09786","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09786","url":null,"abstract":"Generative adversarial networks generate photorealistic faces that are often indistinguishable by humans from real faces. We find that the discriminator in the pre-trained StyleGAN3 model, a popular GAN network, systematically stratifies scores by both image- and face-level qualities and that this disproportionately affects images across gender, race, and other categories. We examine the discriminator's bias for color and luminance across axes perceived race and gender; we then examine axes common in research on stereotyping in social psychology.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"17 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana 有效和可扩展的数学支持:人工智能辅导员对加纳数学成绩影响的证据
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09809
Owen Henkel, Hannah Horne-Robinson, Nessie Kozhakhmetova, Amanda Lee
This study evaluates the impact of Rori, an AI powered conversational math tutor accessible via WhatsApp, on the math performance of approximately 1,000 students in grades 3-9 across 11 schools in Ghana. Each school was assigned to a treatment group or control group; the students in the control group continued their regular math instruction, while students in the treatment group engaged with Rori, for two 30-minute sessions per week over 8 months in addition to regular math instruction. We find that the math growth scores were substantially higher for the treatment group with an effect size of 0.37, and that the results were statistically significant (p<0.001). The fact that Rori works with basic mobile devices on low-bandwidth data networks gives the intervention strong potential to support personalized learning on other low-and-middle-income countries (LMICs), where laptop ownership and high-speed internet - prerequisite for many video-centered learning platforms - remain extremely limited. While the results should be interpreted judiciously, as they only report on year 1 of the intervention, and future research is necessary to better understand which conditions are necessary for successful implementation, they do suggest that chat-based tutoring solutions leveraging artificial intelligence could offer a costeffective approach to enhancing learning outcomes for millions of students globally.
本研究评估了通过 WhatsApp 访问的人工智能对话式数学辅导 Rori 对加纳 11 所学校约 1000 名三至九年级学生数学成绩的影响。每所学校被分配到治疗组和对照组;对照组的学生继续接受常规数学教学,而治疗组的学生则在 8 个月的时间里,除了常规数学教学外,每周还与 Rori 进行两次 30 分钟的对话。我们发现,治疗组学生的数学成绩大幅提高,效应大小为 0.37,而且结果具有统计学意义(P<0.001)。Rori 可在低带宽数据网络上使用基本的移动设备,这使得该干预措施在支持其他中低收入国家的个性化学习方面具有强大的潜力,因为在这些国家,笔记本电脑的拥有率和高速互联网(许多以视频为中心的学习平台的先决条件)仍然极为有限。虽然这些结果只报告了第一年的干预情况,而且未来的研究也需要更好地了解成功实施的必要条件,但这些结果确实表明,利用人工智能的聊天辅导解决方案可以为全球数百万学生提供一种具有成本效益的提高学习成绩的方法。
{"title":"Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana","authors":"Owen Henkel, Hannah Horne-Robinson, Nessie Kozhakhmetova, Amanda Lee","doi":"10.48550/arXiv.2402.09809","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09809","url":null,"abstract":"This study evaluates the impact of Rori, an AI powered conversational math tutor accessible via WhatsApp, on the math performance of approximately 1,000 students in grades 3-9 across 11 schools in Ghana. Each school was assigned to a treatment group or control group; the students in the control group continued their regular math instruction, while students in the treatment group engaged with Rori, for two 30-minute sessions per week over 8 months in addition to regular math instruction. We find that the math growth scores were substantially higher for the treatment group with an effect size of 0.37, and that the results were statistically significant (p<0.001). The fact that Rori works with basic mobile devices on low-bandwidth data networks gives the intervention strong potential to support personalized learning on other low-and-middle-income countries (LMICs), where laptop ownership and high-speed internet - prerequisite for many video-centered learning platforms - remain extremely limited. While the results should be interpreted judiciously, as they only report on year 1 of the intervention, and future research is necessary to better understand which conditions are necessary for successful implementation, they do suggest that chat-based tutoring solutions leveraging artificial intelligence could offer a costeffective approach to enhancing learning outcomes for millions of students globally.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"17 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partial synchrony for free? New bounds for Byzantine agreement via a generic transformation across network models 部分同步免费?通过跨网络模型的通用转换实现拜占庭协议的新界限
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10059
P. Civit, M. A. Dzulfikar, S. Gilbert, R. Guerraoui, J. Komatovic, M. Vidigueira, I. Zablotchi
Byzantine consensus allows n processes to decide on a common value, in spite of arbitrary failures. The seminal Dolev-Reischuk bound states that any deterministic solution to Byzantine consensus exchanges Omega(n^2) bits. In recent years, great advances have been made in deterministic Byzantine agreement for partially synchronous networks, with state-of-the-art cryptographic solutions achieving O(n^2 kappa) bits (where $kappa$ is the security parameter) and nearly matching the lower bound. In contrast, for synchronous networks, optimal solutions with O(n^2) bits, with no cryptography and the same failure tolerance, have been known for more than three decades. Can this gap in network models be closed? In this paper, we present Repeater, the first generic transformation of Byzantine agreement algorithms from synchrony to partial synchrony. Repeater is modular, relying on existing and novel algorithms for its sub-modules. With the right choice of modules, Repeater requires no additional cryptography, is optimally resilient (n = 3t+1, where t is the maximum number of failures) and, for constant-size inputs, preserves the worst-case per-process bit complexity of the transformed synchronous algorithm. Leveraging Repeater, we present the first partially synchronous algorithm that (1) achieves optimal bit complexity (O(n^2) bits), (2) resists a computationally unbounded adversary (no cryptography), and (3) is optimally-resilient (n = 3t+1), thus showing that the Dolev-Reischuk bound is tight in partial synchrony. Moreover, we adapt Repeater for long inputs, introducing several new algorithms with improved complexity and weaker (or completely absent) cryptographic assumptions.
拜占庭共识允许 n 个进程在任意失败的情况下决定一个共同值。开创性的 Dolev-Reischuk 定界指出,拜占庭共识的任何确定性解决方案都会交换欧米茄(n^2)比特。近年来,部分同步网络的确定性拜占庭协议取得了长足进步,最先进的加密解决方案实现了 O(n^2 kappa) 比特(其中 $kappa$ 是安全参数),并几乎与下界相匹配。与此相反,对于同步网络,在没有加密技术和相同故障容忍度的情况下,实现 O(n^2) 比特的最佳解决方案已经问世三十多年了。网络模型中的这一差距能否弥合?在本文中,我们提出了 Repeater,这是拜占庭协议算法从同步到部分同步的首次通用转换。Repeater 是模块化的,其子模块依赖于现有的和新颖的算法。通过正确选择模块,Repeater 不需要额外的加密技术,具有最佳弹性(n = 3t+1,其中 t 为最大故障次数),并且对于恒定大小的输入,保留了转换后同步算法的最坏情况下的每个进程比特复杂度。利用 Repeater,我们提出了第一种部分同步算法,该算法 (1) 实现了最佳比特复杂度(O(n^2) 比特),(2) 抵御了计算上无限制的对手(无密码学),(3) 具有最佳弹性(n = 3t+1),从而证明了 Dolev-Reischuk 约束在部分同步中是紧密的。此外,我们还对 Repeater 进行了调整,使其适用于长输入,并引入了几种新算法,这些算法的复杂度有所提高,加密假设也更弱(或完全不存在)。
{"title":"Partial synchrony for free? New bounds for Byzantine agreement via a generic transformation across network models","authors":"P. Civit, M. A. Dzulfikar, S. Gilbert, R. Guerraoui, J. Komatovic, M. Vidigueira, I. Zablotchi","doi":"10.48550/arXiv.2402.10059","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10059","url":null,"abstract":"Byzantine consensus allows n processes to decide on a common value, in spite of arbitrary failures. The seminal Dolev-Reischuk bound states that any deterministic solution to Byzantine consensus exchanges Omega(n^2) bits. In recent years, great advances have been made in deterministic Byzantine agreement for partially synchronous networks, with state-of-the-art cryptographic solutions achieving O(n^2 kappa) bits (where $kappa$ is the security parameter) and nearly matching the lower bound. In contrast, for synchronous networks, optimal solutions with O(n^2) bits, with no cryptography and the same failure tolerance, have been known for more than three decades. Can this gap in network models be closed? In this paper, we present Repeater, the first generic transformation of Byzantine agreement algorithms from synchrony to partial synchrony. Repeater is modular, relying on existing and novel algorithms for its sub-modules. With the right choice of modules, Repeater requires no additional cryptography, is optimally resilient (n = 3t+1, where t is the maximum number of failures) and, for constant-size inputs, preserves the worst-case per-process bit complexity of the transformed synchronous algorithm. Leveraging Repeater, we present the first partially synchronous algorithm that (1) achieves optimal bit complexity (O(n^2) bits), (2) resists a computationally unbounded adversary (no cryptography), and (3) is optimally-resilient (n = 3t+1), thus showing that the Dolev-Reischuk bound is tight in partial synchrony. Moreover, we adapt Repeater for long inputs, introducing several new algorithms with improved complexity and weaker (or completely absent) cryptographic assumptions.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"12 24","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ArXiv
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1