ArXiv

Pub Date : 2024-03-07 DOI: 10.1609/aaai.v38i2.27847

Qingyuan Cai, Xuecai Hu, Saihui Hou, Li Yao, Yongzhen Huang

Recently, diffusion-based methods for monocular 3D human pose estimation have achieved state-of-the-art (SOTA) performance by directly regressing the 3D joint coordinates from the 2D pose sequence. Although some methods decompose the task into bone length and bone direction prediction based on the human anatomical skeleton to explicitly incorporate more human body prior constraints, the performance of these methods is significantly lower than that of the SOTA diffusion-based methods. This can be attributed to the tree structure of the human skeleton. Direct application of the disentangled method could amplify the accumulation of hierarchical errors, propagating through each hierarchy. Meanwhile, the hierarchical information has not been fully explored by the previous methods. To address these problems, a Disentangled Diffusion-based 3D human Pose Estimation method with Hierarchical Spatial and Temporal Denoiser is proposed, termed DDHPose. In our approach: (1) We disentangle the 3d pose and diffuse the bone length and bone direction during the forward process of the diffusion model to effectively model the human pose prior. A disentanglement loss is proposed to supervise diffusion model learning. (2) For the reverse process, we propose Hierarchical Spatial and Temporal Denoiser (HSTDenoiser) to improve the hierarchical modelling of each joint. Our HSTDenoiser comprises two components: the Hierarchical-Related Spatial Transformer (HRST) and the Hierarchical-Related Temporal Transformer (HRTT). HRST exploits joint spatial information and the influence of the parent joint on each joint for spatial modeling, while HRTT utilizes information from both the joint and its hierarchical adjacent joints to explore the hierarchical temporal correlations among joints. Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets show that our method outperforms the SOTA disentangled-based, non-disentangled based, and probabilistic approaches by 10.0%, 2.0%, and 1.3%, respectively.

最近，基于扩散的单目三维人体姿态估计方法通过直接回归二维姿态序列的三维关节坐标，实现了最先进的（SOTA）性能。虽然有些方法根据人体解剖骨架将任务分解为骨骼长度和骨骼方向预测，以明确纳入更多人体先验约束条件，但这些方法的性能明显低于基于 SOTA 扩散的方法。这可归因于人体骨骼的树状结构。直接应用分解方法可能会放大分层误差的积累，并通过每个层次传播。同时，以往的方法并没有充分挖掘层次信息。为了解决这些问题，我们提出了一种带有分层空间和时间去噪器的基于扩散的三维人体姿态估计方法，称为 DDHPose。在我们的方法中：(1) 在扩散模型的前向过程中，我们将三维姿势与骨骼长度和骨骼方向分离开来，从而有效地建立人体姿势先验模型。我们提出了一种不纠缠损失来监督扩散模型的学习。(2) 在反向过程中，我们提出了分层空间和时间去噪器（HSTDenoiser）来改进每个关节的分层建模。我们的 HSTDenoiser 由两部分组成：层次相关空间变换器（HRST）和层次相关时间变换器（HRTT）。HRST 利用关节空间信息和父关节对每个关节的影响来建立空间模型，而 HRTT 则利用关节及其分层相邻关节的信息来探索关节之间的分层时间相关性。在 Human3.6M 和 MPI-INF-3DHP 数据集上进行的大量实验表明，我们的方法比基于 SOTA 的非纠缠方法、基于非纠缠方法和概率方法分别优胜 10.0%、2.0% 和 1.3%。

{"title":"Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser","authors":"Qingyuan Cai, Xuecai Hu, Saihui Hou, Li Yao, Yongzhen Huang","doi":"10.1609/aaai.v38i2.27847","DOIUrl":"https://doi.org/10.1609/aaai.v38i2.27847","url":null,"abstract":"Recently, diffusion-based methods for monocular 3D human pose estimation have achieved state-of-the-art (SOTA) performance by directly regressing the 3D joint coordinates from the 2D pose sequence. Although some methods decompose the task into bone length and bone direction prediction based on the human anatomical skeleton to explicitly incorporate more human body prior constraints, the performance of these methods is significantly lower than that of the SOTA diffusion-based methods. This can be attributed to the tree structure of the human skeleton. Direct application of the disentangled method could amplify the accumulation of hierarchical errors, propagating through each hierarchy. Meanwhile, the hierarchical information has not been fully explored by the previous methods. To address these problems, a Disentangled Diffusion-based 3D human Pose Estimation method with Hierarchical Spatial and Temporal Denoiser is proposed, termed DDHPose. In our approach: (1) We disentangle the 3d pose and diffuse the bone length and bone direction during the forward process of the diffusion model to effectively model the human pose prior. A disentanglement loss is proposed to supervise diffusion model learning. (2) For the reverse process, we propose Hierarchical Spatial and Temporal Denoiser (HSTDenoiser) to improve the hierarchical modelling of each joint. Our HSTDenoiser comprises two components: the Hierarchical-Related Spatial Transformer (HRST) and the Hierarchical-Related Temporal Transformer (HRTT). HRST exploits joint spatial information and the influence of the parent joint on each joint for spatial modeling, while HRTT utilizes information from both the joint and its hierarchical adjacent joints to explore the hierarchical temporal correlations among joints. Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets show that our method outperforms the SOTA disentangled-based, non-disentangled based, and probabilistic approaches by 10.0%, 2.0%, and 1.3%, respectively.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"18 18","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Robustness Analysis of E-Commerce Ranking System 实现电子商务排名系统的稳健性分析

ArXiv

Pub Date : 2024-03-07 DOI: 10.1145/3589335.3648335

Ningfei Wang, Yupin Huang, Han Cheng, Jiri Gesi, Xiaojie Wang, Vivek Mittal

Information retrieval (IR) is a pivotal component in various applications. Recent advances in machine learning (ML) have enabled the integration of ML algorithms into IR, particularly in ranking systems. While there is a plethora of research on the robustness of ML-based ranking systems, these studies largely neglect commercial e-commerce systems and fail to establish a connection between real-world and manipulated query relevance. In this paper, we present the first systematic measurement study on the robustness of e-commerce ranking systems. We define robustness as the consistency of ranking outcomes for semantically identical queries. To quantitatively analyze robustness, we propose a novel metric that considers both ranking position and item-specific information that are absent in existing metrics. Our large-scale measurement study with real-world data from e-commerce retailers reveals an open opportunity to measure and improve robustness since semantically identical queries often yield inconsistent ranking results. Based on our observations, we propose several solution directions to enhance robustness, such as the use of Large Language Models. Note that the issue of robustness discussed herein does not constitute an error or oversight. Rather, in scenarios where there exists a vast array of choices, it is feasible to present a multitude of products in various permutations, all of which could be equally appealing. However, this extensive selection may lead to customer confusion. As e-commerce retailers use various techniques to improve the quality of search results, we hope that this research offers valuable guidance for measuring the robustness of the ranking systems.

信息检索（IR）是各种应用中的关键组成部分。机器学习（ML）领域的最新进展使 ML 算法得以集成到 IR 中，特别是集成到排名系统中。虽然对基于 ML 的排名系统的鲁棒性进行了大量研究，但这些研究在很大程度上忽视了商业电子商务系统，也未能在真实世界和操纵查询相关性之间建立联系。在本文中，我们首次对电子商务排名系统的稳健性进行了系统的测量研究。我们将稳健性定义为语义相同的查询的排名结果的一致性。为了定量分析稳健性，我们提出了一种新的度量方法，该方法同时考虑了现有度量方法中缺乏的排名位置和特定项目信息。我们利用电子商务零售商的真实数据进行的大规模测量研究揭示了测量和改进稳健性的机会，因为语义相同的查询往往会产生不一致的排名结果。根据我们的观察，我们提出了几个增强稳健性的解决方案方向，例如使用大型语言模型。请注意，本文讨论的稳健性问题并不构成错误或疏忽。相反，在存在大量选择的情况下，以各种排列组合的方式呈现多种产品是可行的，所有这些产品都可能具有同样的吸引力。然而，这种广泛的选择可能会导致客户混淆。随着电子商务零售商使用各种技术来提高搜索结果的质量，我们希望这项研究能为衡量排名系统的稳健性提供有价值的指导。

{"title":"Towards Robustness Analysis of E-Commerce Ranking System","authors":"Ningfei Wang, Yupin Huang, Han Cheng, Jiri Gesi, Xiaojie Wang, Vivek Mittal","doi":"10.1145/3589335.3648335","DOIUrl":"https://doi.org/10.1145/3589335.3648335","url":null,"abstract":"Information retrieval (IR) is a pivotal component in various applications. Recent advances in machine learning (ML) have enabled the integration of ML algorithms into IR, particularly in ranking systems. While there is a plethora of research on the robustness of ML-based ranking systems, these studies largely neglect commercial e-commerce systems and fail to establish a connection between real-world and manipulated query relevance. In this paper, we present the first systematic measurement study on the robustness of e-commerce ranking systems. We define robustness as the consistency of ranking outcomes for semantically identical queries. To quantitatively analyze robustness, we propose a novel metric that considers both ranking position and item-specific information that are absent in existing metrics. Our large-scale measurement study with real-world data from e-commerce retailers reveals an open opportunity to measure and improve robustness since semantically identical queries often yield inconsistent ranking results. Based on our observations, we propose several solution directions to enhance robustness, such as the use of Large Language Models. Note that the issue of robustness discussed herein does not constitute an error or oversight. Rather, in scenarios where there exists a vast array of choices, it is feasible to present a multitude of products in various permutations, all of which could be equally appealing. However, this extensive selection may lead to customer confusion. As e-commerce retailers use various techniques to improve the quality of search results, we hope that this research offers valuable guidance for measuring the robustness of the ranking systems.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"23 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries iScore：解读语言模型如何为摘要自动评分的可视化分析技术

ArXiv

Pub Date : 2024-03-07 DOI: 10.1145/3640543.3645142

Adam Joseph Coscia, Langdon Holmes, Wesley Morris, Joon Suh Choi, Scott Crossley, A. Endert

The recent explosion in popularity of large language models (LLMs) has inspired learning engineers to incorporate them into adaptive educational tools that automatically score summary writing. Understanding and evaluating LLMs is vital before deploying them in critical learning environments, yet their unprecedented size and expanding number of parameters inhibits transparency and impedes trust when they underperform. Through a collaborative user-centered design process with several learning engineers building and deploying summary scoring LLMs, we characterized fundamental design challenges and goals around interpreting their models, including aggregating large text inputs, tracking score provenance, and scaling LLM interpretability methods. To address their concerns, we developed iScore, an interactive visual analytics tool for learning engineers to upload, score, and compare multiple summaries simultaneously. Tightly integrated views allow users to iteratively revise the language in summaries, track changes in the resulting LLM scores, and visualize model weights at multiple levels of abstraction. To validate our approach, we deployed iScore with three learning engineers over the course of a month. We present a case study where interacting with iScore led a learning engineer to improve their LLM's score accuracy by three percentage points. Finally, we conducted qualitative interviews with the learning engineers that revealed how iScore enabled them to understand, evaluate, and build trust in their LLMs during deployment.

最近，大型语言模型（LLMs）大受追捧，这激发了学习工程师们将其纳入自动为摘要写作评分的自适应教育工具的热情。在关键的学习环境中部署 LLMs 之前，了解和评估 LLMs 至关重要，然而 LLMs 前所未有的规模和不断增加的参数数量阻碍了其透明度，并在表现不佳时妨碍了信任。通过与几位正在构建和部署摘要评分 LLM 的学习工程师开展以用户为中心的协作设计过程，我们确定了围绕解释其模型的基本设计挑战和目标，包括聚合大量文本输入、跟踪分数来源和扩展 LLM 可解释性方法。为了解决他们所关心的问题，我们开发了 iScore，这是一款交互式可视化分析工具，可供学习工程师同时上传、评分和比较多个摘要。紧密集成的视图允许用户迭代修改摘要中的语言，跟踪所产生的 LLM 分数的变化，并在多个抽象层次上可视化模型权重。为了验证我们的方法，我们与三位学习工程师一起部署了 iScore，历时一个月。我们介绍了一个案例研究，通过与 iScore 的互动，一位学习工程师将他们的 LLM 分数准确率提高了三个百分点。最后，我们对学习工程师进行了定性访谈，揭示了 iScore 如何帮助他们在部署过程中理解、评估 LLM 并建立信任。

{"title":"iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries","authors":"Adam Joseph Coscia, Langdon Holmes, Wesley Morris, Joon Suh Choi, Scott Crossley, A. Endert","doi":"10.1145/3640543.3645142","DOIUrl":"https://doi.org/10.1145/3640543.3645142","url":null,"abstract":"The recent explosion in popularity of large language models (LLMs) has inspired learning engineers to incorporate them into adaptive educational tools that automatically score summary writing. Understanding and evaluating LLMs is vital before deploying them in critical learning environments, yet their unprecedented size and expanding number of parameters inhibits transparency and impedes trust when they underperform. Through a collaborative user-centered design process with several learning engineers building and deploying summary scoring LLMs, we characterized fundamental design challenges and goals around interpreting their models, including aggregating large text inputs, tracking score provenance, and scaling LLM interpretability methods. To address their concerns, we developed iScore, an interactive visual analytics tool for learning engineers to upload, score, and compare multiple summaries simultaneously. Tightly integrated views allow users to iteratively revise the language in summaries, track changes in the resulting LLM scores, and visualize model weights at multiple levels of abstraction. To validate our approach, we deployed iScore with three learning engineers over the course of a month. We present a case study where interacting with iScore led a learning engineer to improve their LLM's score accuracy by three percentage points. Finally, we conducted qualitative interviews with the learning engineers that revealed how iScore enabled them to understand, evaluate, and build trust in their LLMs during deployment.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"22 40","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepSee: Multidimensional Visualizations of Seabed Ecosystems DeepSee：多维可视化海底生态系统

ArXiv

Pub Date : 2024-03-07 DOI: 10.1145/3613904.3642001

Adam Joseph Coscia, H. Sapers, Noah Deutsch, Malika Khurana, J. Magyar, Sergio A. Parra, Daniel R. Utter, R.L. Wipfler, D. Caress, Eric J. Martin, J. Paduan, M. Hendrie, S. Lombeyda, H. Mushkin, A. Endert, Scott Davidoff, V. Orphan

Scientists studying deep ocean microbial ecosystems use limited numbers of sediment samples collected from the seafloor to characterize important life-sustaining biogeochemical cycles in the environment. Yet conducting fieldwork to sample these extreme remote environments is both expensive and time consuming, requiring tools that enable scientists to explore the sampling history of field sites and predict where taking new samples is likely to maximize scientific return. We conducted a collaborative, user-centered design study with a team of scientific researchers to develop DeepSee, an interactive data workspace that visualizes 2D and 3D interpolations of biogeochemical and microbial processes in context together with sediment sampling history overlaid on 2D seafloor maps. Based on a field deployment and qualitative interviews, we found that DeepSee increased the scientific return from limited sample sizes, catalyzed new research workflows, reduced long-term costs of sharing data, and supported teamwork and communication between team members with diverse research goals.

研究深海微生物生态系统的科学家利用从海底采集的数量有限的沉积物样本来描述环境中重要的维持生命的生物地球化学循环。然而，在这些极端偏远的环境中进行实地采样既昂贵又耗时，这就需要有工具能让科学家探索实地采样点的采样历史，并预测在哪些地方采集新样本有可能获得最大的科学回报。我们与科研人员团队合作开展了一项以用户为中心的设计研究，以开发 DeepSee，这是一个交互式数据工作区，可将生物地球化学和微生物过程的二维和三维插值与沉积物取样历史叠加在二维海底地图上，实现可视化。通过实地部署和定性访谈，我们发现 DeepSee 提高了有限样本量的科学回报，催化了新的研究工作流程，降低了数据共享的长期成本，并支持了具有不同研究目标的团队成员之间的团队合作与交流。

{"title":"DeepSee: Multidimensional Visualizations of Seabed Ecosystems","authors":"Adam Joseph Coscia, H. Sapers, Noah Deutsch, Malika Khurana, J. Magyar, Sergio A. Parra, Daniel R. Utter, R.L. Wipfler, D. Caress, Eric J. Martin, J. Paduan, M. Hendrie, S. Lombeyda, H. Mushkin, A. Endert, Scott Davidoff, V. Orphan","doi":"10.1145/3613904.3642001","DOIUrl":"https://doi.org/10.1145/3613904.3642001","url":null,"abstract":"Scientists studying deep ocean microbial ecosystems use limited numbers of sediment samples collected from the seafloor to characterize important life-sustaining biogeochemical cycles in the environment. Yet conducting fieldwork to sample these extreme remote environments is both expensive and time consuming, requiring tools that enable scientists to explore the sampling history of field sites and predict where taking new samples is likely to maximize scientific return. We conducted a collaborative, user-centered design study with a team of scientific researchers to develop DeepSee, an interactive data workspace that visualizes 2D and 3D interpolations of biogeochemical and microbial processes in context together with sediment sampling history overlaid on 2D seafloor maps. Based on a field deployment and qualitative interviews, we found that DeepSee increased the scientific return from limited sample sizes, catalyzed new research workflows, reduced long-term costs of sharing data, and supported teamwork and communication between team members with diverse research goals.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"23 25","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Discovering and Merging for Incremental Novel Class Discovery 增量式新类别发现的自适应发现与合并

ArXiv

Pub Date : 2024-03-06 DOI: 10.1609/aaai.v38i10.29006

Guangyao Chen, Peixi Peng, Yangru Huang, Mengyue Geng, Yonghong Tian

One important desideratum of lifelong learning aims to discover novel classes from unlabelled data in a continuous manner. The central challenge is twofold: discovering and learning novel classes while mitigating the issue of catastrophic forgetting of established knowledge. To this end, we introduce a new paradigm called Adaptive Discovering and Merging (ADM) to discover novel categories adaptively in the incremental stage and integrate novel knowledge into the model without affecting the original knowledge. To discover novel classes adaptively, we decouple representation learning and novel class discovery, and use Triple Comparison (TC) and Probability Regularization (PR) to constrain the probability discrepancy and diversity for adaptive category assignment. To merge the learned novel knowledge adaptively, we propose a hybrid structure with base and novel branches named Adaptive Model Merging (AMM), which reduces the interference of the novel branch on the old classes to preserve the previous knowledge, and merges the novel branch to the base model without performance loss and parameter growth. Extensive experiments on several datasets show that ADM significantly outperforms existing class-incremental Novel Class Discovery (class-iNCD) approaches. Moreover, our AMM also benefits the class-incremental Learning (class-IL) task by alleviating the catastrophic forgetting problem. The source code is included in the supplementary materials.

终生学习的一个重要目的是从无标签数据中持续发现新的类别。核心挑战有两个方面：在发现和学习新类别的同时，减少对已有知识的灾难性遗忘。为此，我们引入了一种名为 "自适应发现与合并"（Adaptive Discovering and Merging，ADM）的新范式，以在增量阶段自适应地发现新类别，并在不影响原有知识的情况下将新知识整合到模型中。为了自适应地发现新类别，我们将表示学习和新类别发现解耦，并使用三重比较（TC）和概率正则化（PR）来限制自适应类别分配的概率差异和多样性。为了自适应地合并所学到的新知识，我们提出了一种具有基础分支和新分支的混合结构，命名为自适应模型合并（AMM），它可以减少新分支对旧类的干扰，从而保留以前的知识，并在不损失性能和不增加参数的情况下将新分支合并到基础模型中。在多个数据集上进行的大量实验表明，ADM 明显优于现有的类递增新类发现（class-iNCD）方法。此外，通过缓解灾难性遗忘问题，我们的AMM还有利于类递增学习（class-IL）任务。源代码包含在补充材料中。

{"title":"Adaptive Discovering and Merging for Incremental Novel Class Discovery","authors":"Guangyao Chen, Peixi Peng, Yangru Huang, Mengyue Geng, Yonghong Tian","doi":"10.1609/aaai.v38i10.29006","DOIUrl":"https://doi.org/10.1609/aaai.v38i10.29006","url":null,"abstract":"One important desideratum of lifelong learning aims to discover novel classes from unlabelled data in a continuous manner. The central challenge is twofold: discovering and learning novel classes while mitigating the issue of catastrophic forgetting of established knowledge. To this end, we introduce a new paradigm called Adaptive Discovering and Merging (ADM) to discover novel categories adaptively in the incremental stage and integrate novel knowledge into the model without affecting the original knowledge. To discover novel classes adaptively, we decouple representation learning and novel class discovery, and use Triple Comparison (TC) and Probability Regularization (PR) to constrain the probability discrepancy and diversity for adaptive category assignment. To merge the learned novel knowledge adaptively, we propose a hybrid structure with base and novel branches named Adaptive Model Merging (AMM), which reduces the interference of the novel branch on the old classes to preserve the previous knowledge, and merges the novel branch to the base model without performance loss and parameter growth. Extensive experiments on several datasets show that ADM significantly outperforms existing class-incremental Novel Class Discovery (class-iNCD) approaches. Moreover, our AMM also benefits the class-incremental Learning (class-IL) task by alleviating the catastrophic forgetting problem. The source code is included in the supplementary materials.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"8 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement PromptCharm：通过多模式提示和细化实现文本到图像的生成

ArXiv

Pub Date : 2024-03-06 DOI: 10.1145/3613904.3642803

Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, Tianyi Zhang

The recent advancements in Generative AI have significantly advanced the field of text-to-image generation. The state-of-the-art text-to-image model, Stable Diffusion, is now capable of synthesizing high-quality images with a strong sense of aesthetics. Crafting text prompts that align with the model's interpretation and the user's intent thus becomes crucial. However, prompting remains challenging for novice users due to the complexity of the stable diffusion model and the non-trivial efforts required for iteratively editing and refining the text prompts. To address these challenges, we propose PromptCharm, a mixed-initiative system that facilitates text-to-image creation through multi-modal prompt engineering and refinement. To assist novice users in prompting, PromptCharm first automatically refines and optimizes the user's initial prompt. Furthermore, PromptCharm supports the user in exploring and selecting different image styles within a large database. To assist users in effectively refining their prompts and images, PromptCharm renders model explanations by visualizing the model's attention values. If the user notices any unsatisfactory areas in the generated images, they can further refine the images through model attention adjustment or image inpainting within the rich feedback loop of PromptCharm. To evaluate the effectiveness and usability of PromptCharm, we conducted a controlled user study with 12 participants and an exploratory user study with another 12 participants. These two studies show that participants using PromptCharm were able to create images with higher quality and better aligned with the user's expectations compared with using two variants of PromptCharm that lacked interaction or visualization support.

生成式人工智能的最新进展极大地推动了文本到图像生成领域的发展。目前，最先进的文本到图像模型--稳定扩散模型--能够合成具有强烈美感的高质量图像。因此，制作符合模型解释和用户意图的文本提示就变得至关重要。然而，由于稳定扩散模型的复杂性以及反复编辑和完善文本提示所需的大量工作，提示对于新手用户来说仍然具有挑战性。为了应对这些挑战，我们提出了 PromptCharm，这是一个混合倡议系统，通过多模式提示工程和完善来促进文本到图像的创建。为了帮助新手用户进行提示，PromptCharm 首先会自动完善和优化用户的初始提示。此外，PromptCharm 还支持用户在大型数据库中探索和选择不同的图像风格。为了帮助用户有效改进提示和图像，PromptCharm 通过可视化模型的注意力值来渲染模型解释。如果用户在生成的图像中发现任何不满意的地方，他们可以在 PromptCharm 丰富的反馈回路中通过调整模型关注度或绘制图像来进一步完善图像。为了评估 PromptCharm 的有效性和可用性，我们对 12 名参与者进行了控制性用户研究，并对另外 12 名参与者进行了探索性用户研究。这两项研究表明，与使用两种缺乏交互或可视化支持的 PromptCharm 变体相比，使用 PromptCharm 的参与者能够创作出质量更高、更符合用户期望的图像。

{"title":"PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement","authors":"Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, Tianyi Zhang","doi":"10.1145/3613904.3642803","DOIUrl":"https://doi.org/10.1145/3613904.3642803","url":null,"abstract":"The recent advancements in Generative AI have significantly advanced the field of text-to-image generation. The state-of-the-art text-to-image model, Stable Diffusion, is now capable of synthesizing high-quality images with a strong sense of aesthetics. Crafting text prompts that align with the model's interpretation and the user's intent thus becomes crucial. However, prompting remains challenging for novice users due to the complexity of the stable diffusion model and the non-trivial efforts required for iteratively editing and refining the text prompts. To address these challenges, we propose PromptCharm, a mixed-initiative system that facilitates text-to-image creation through multi-modal prompt engineering and refinement. To assist novice users in prompting, PromptCharm first automatically refines and optimizes the user's initial prompt. Furthermore, PromptCharm supports the user in exploring and selecting different image styles within a large database. To assist users in effectively refining their prompts and images, PromptCharm renders model explanations by visualizing the model's attention values. If the user notices any unsatisfactory areas in the generated images, they can further refine the images through model attention adjustment or image inpainting within the rich feedback loop of PromptCharm. To evaluate the effectiveness and usability of PromptCharm, we conducted a controlled user study with 12 participants and an exploratory user study with another 12 participants. These two studies show that participants using PromptCharm were able to create images with higher quality and better aligned with the user's expectations compared with using two variants of PromptCharm that lacked interaction or visualization support.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"3 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Whodunit: Classifying Code as Human Authored or GPT-4 Generated - A case study on CodeChef problems 侦探：将代码分类为人工编写还是 GPT-4 生成--关于 CodeChef 问题的案例研究

ArXiv

Pub Date : 2024-03-06 DOI: 10.1145/3643991.3644926

Oseremen Joy Idialu, N. Mathews, Rungroj Maipradit, J. Atlee, Mei Nagappan

Artificial intelligence (AI) assistants such as GitHub Copilot and ChatGPT, built on large language models like GPT-4, are revolutionizing how programming tasks are performed, raising questions about whether code is authored by generative AI models. Such questions are of particular interest to educators, who worry that these tools enable a new form of academic dishonesty, in which students submit AI generated code as their own work. Our research explores the viability of using code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code. Our dataset comprises human-authored solutions from CodeChef and AI-authored solutions generated by GPT-4. Our classifier outperforms baselines, with an F1-score and AUC-ROC score of 0.91. A variant of our classifier that excludes gameable features (e.g., empty lines, whitespace) still performs well with an F1-score and AUC-ROC score of 0.89. We also evaluated our classifier with respect to the difficulty of the programming problem and found that there was almost no difference between easier and intermediate problems, and the classifier performed only slightly worse on harder problems. Our study shows that code stylometry is a promising approach for distinguishing between GPT-4 generated code and human-authored code.

以 GPT-4 等大型语言模型为基础的 GitHub Copilot 和 ChatGPT 等人工智能（AI）助手正在彻底改变编程任务的执行方式，从而引发了关于代码是否由生成式 AI 模型编写的问题。教育工作者对这些问题尤其感兴趣，他们担心这些工具会助长一种新的学术不诚实行为，即学生将人工智能生成的代码作为自己的作品提交。我们的研究探索了使用代码风格测量和机器学习来区分 GPT-4 生成的代码和人类编写的代码的可行性。我们的数据集包括来自 CodeChef 的人类编写的解决方案和由 GPT-4 生成的人工智能编写的解决方案。我们的分类器表现优于基线，F1 分数和 AUC-ROC 分数均为 0.91。我们的分类器的变体排除了游戏特征（如空行、空白），仍然表现出色，F1 分数和 AUC-ROC 分数均为 0.89。我们还根据编程问题的难度对我们的分类器进行了评估，发现较简单的问题和中等难度的问题几乎没有区别，分类器在较难的问题上的表现也只是稍差一些。我们的研究表明，代码风格测量法是区分 GPT-4 生成的代码和人类编写的代码的一种很有前途的方法。

{"title":"Whodunit: Classifying Code as Human Authored or GPT-4 Generated - A case study on CodeChef problems","authors":"Oseremen Joy Idialu, N. Mathews, Rungroj Maipradit, J. Atlee, Mei Nagappan","doi":"10.1145/3643991.3644926","DOIUrl":"https://doi.org/10.1145/3643991.3644926","url":null,"abstract":"Artificial intelligence (AI) assistants such as GitHub Copilot and ChatGPT, built on large language models like GPT-4, are revolutionizing how programming tasks are performed, raising questions about whether code is authored by generative AI models. Such questions are of particular interest to educators, who worry that these tools enable a new form of academic dishonesty, in which students submit AI generated code as their own work. Our research explores the viability of using code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code. Our dataset comprises human-authored solutions from CodeChef and AI-authored solutions generated by GPT-4. Our classifier outperforms baselines, with an F1-score and AUC-ROC score of 0.91. A variant of our classifier that excludes gameable features (e.g., empty lines, whitespace) still performs well with an F1-score and AUC-ROC score of 0.89. We also evaluated our classifier with respect to the difficulty of the programming problem and found that there was almost no difference between easier and intermediate problems, and the classifier performed only slightly worse on harder problems. Our study shows that code stylometry is a promising approach for distinguishing between GPT-4 generated code and human-authored code.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dcl-Net: Dual Contrastive Learning Network for Semi-Supervised Multi-Organ Segmentation Dcl-Net：用于半监督多器官分割的双对比学习网络

ArXiv

Pub Date : 2024-03-06 DOI: 10.1109/icassp48485.2024.10447495

L. Wen, Zheng-Kai Feng, Yun Hou, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang

Semi-supervised learning is a sound measure to relieve the strict demand of abundant annotated datasets, especially for challenging multi-organ segmentation . However, most existing SSL methods predict pixels in a single image independently, ignoring the relations among images and categories. In this paper, we propose a two-stage Dual Contrastive Learning Network for semi-supervised MoS, which utilizes global and local contrastive learning to strengthen the relations among images and classes. Concretely, in Stage 1, we develop a similarity-guided global contrastive learning to explore the implicit continuity and similarity among images and learn global context. Then, in Stage 2, we present an organ-aware local contrastive learning to further attract the class representations. To ease the computation burden, we introduce a mask center computation algorithm to compress the category representations for local contrastive learning. Experiments conducted on the public 2017 ACDC dataset and an in-house RC-OARs dataset has demonstrated the superior performance of our method.

半监督学习（SSL）是一种有效的措施，可以缓解对大量标注数据集的严格要求，尤其适用于具有挑战性的多器官分割。然而，大多数现有的半监督学习方法都是独立预测单幅图像中的像素，忽略了图像和类别之间的关系。在本文中，我们提出了一种用于半监督 MoS 的两阶段双对比学习网络，它利用全局和局部对比学习来加强图像和类别之间的关系。具体来说，在第一阶段，我们开发了一种相似性引导的全局对比学习，以探索图像之间隐含的连续性和相似性，并学习全局上下文。然后，在第二阶段，我们提出了器官感知局部对比学习，以进一步吸引类表征。为了减轻计算负担，我们引入了一种掩码中心计算算法来压缩局部对比学习的类别表征。在 2017 年公开的 ACDC 数据集和内部的 RC-OARs 数据集上进行的实验证明了我们的方法性能优越。

引用次数: 0

The Visual Debugger: Past, Present, and Future 可视化调试器：过去、现在和未来

ArXiv

Pub Date : 2024-03-06 DOI: 10.1145/3643796.3648443

Tim Kräuter, Patrick Stünkel, Adrian Rutle, Yngve Lamo

The Visual Debugger is an IntelliJ IDEA plugin that presents debug information as an object diagram to enhance program understanding. Reflecting on our past development, we detail the lessons learned and roadblocks we have experienced while implementing and integrating the Visual Debugger into the IntelliJ IDEA. Furthermore, we describe recent improvements to the Visual Debugger, greatly enhancing the plugin in the present. Looking into the future, we propose solutions to overcome the roadblocks encountered while developing the plugin and further plans for the Visual Debugger.

可视化调试器是一个 IntelliJ IDEA 插件，它能以对象图的形式显示调试信息，以增强对程序的理解。回顾过去的开发历程，我们详细介绍了在将 Visual Debugger 实施和集成到 IntelliJ IDEA 的过程中吸取的经验教训和遇到的障碍。此外，我们还介绍了 Visual Debugger 的最新改进，这些改进极大地增强了该插件的功能。展望未来，我们提出了克服开发插件过程中遇到的障碍的解决方案，以及 Visual Debugger 的进一步计划。

引用次数: 0

German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset 德国人也会产生幻觉！利用 Absinth 数据集检测新闻摘要中的不一致性

ArXiv

Pub Date : 2024-03-06 DOI: 10.3929/ethz-b-000661775

Laura Mascarell, Ribin Chalumattu, Annette Rios

The advent of Large Language Models (LLMs) has led to remarkable progress on a wide range of natural language processing tasks. Despite the advances, these large-sized models still suffer from hallucinating information in their output, which poses a major issue in automatic text summarization, as we must guarantee that the generated summary is consistent with the content of the source document. Previous research addresses the challenging task of detecting hallucinations in the output (i.e. inconsistency detection) in order to evaluate the faithfulness of the generated summaries. However, these works primarily focus on English and recent multilingual approaches lack German data. This work presents absinth, a manually annotated dataset for hallucination detection in German news summarization and explores the capabilities of novel open-source LLMs on this task in both fine-tuning and in-context learning settings. We open-source and release the absinth dataset to foster further research on hallucination detection in German.

大型语言模型（LLMs）的出现，使各种自然语言处理任务取得了显著进展。尽管取得了这些进步，但这些大型模型的输出中仍然会出现幻觉信息，这给自动文本摘要化带来了重大问题，因为我们必须保证生成的摘要与源文件的内容一致。以往的研究解决了检测输出中的幻觉（即不一致性检测）这一具有挑战性的任务，以评估生成摘要的忠实性。不过，这些研究主要集中在英语领域，而最近的多语言方法缺乏德语数据。本作品介绍了用于德语新闻摘要中幻觉检测的人工标注数据集 absinth，并探索了新型开源 LLM 在微调和上下文学习环境下完成该任务的能力。我们开源并发布了苦艾酒数据集，以促进对德语幻觉检测的进一步研究。

引用次数: 0

ArXiv最新文献