Pub Date : 2024-03-07DOI: 10.1609/aaai.v38i2.27847
Qingyuan Cai, Xuecai Hu, Saihui Hou, Li Yao, Yongzhen Huang
Recently, diffusion-based methods for monocular 3D human pose estimation have achieved state-of-the-art (SOTA) performance by directly regressing the 3D joint coordinates from the 2D pose sequence. Although some methods decompose the task into bone length and bone direction prediction based on the human anatomical skeleton to explicitly incorporate more human body prior constraints, the performance of these methods is significantly lower than that of the SOTA diffusion-based methods. This can be attributed to the tree structure of the human skeleton. Direct application of the disentangled method could amplify the accumulation of hierarchical errors, propagating through each hierarchy. Meanwhile, the hierarchical information has not been fully explored by the previous methods. To address these problems, a Disentangled Diffusion-based 3D human Pose Estimation method with Hierarchical Spatial and Temporal Denoiser is proposed, termed DDHPose. In our approach: (1) We disentangle the 3d pose and diffuse the bone length and bone direction during the forward process of the diffusion model to effectively model the human pose prior. A disentanglement loss is proposed to supervise diffusion model learning. (2) For the reverse process, we propose Hierarchical Spatial and Temporal Denoiser (HSTDenoiser) to improve the hierarchical modelling of each joint. Our HSTDenoiser comprises two components: the Hierarchical-Related Spatial Transformer (HRST) and the Hierarchical-Related Temporal Transformer (HRTT). HRST exploits joint spatial information and the influence of the parent joint on each joint for spatial modeling, while HRTT utilizes information from both the joint and its hierarchical adjacent joints to explore the hierarchical temporal correlations among joints. Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets show that our method outperforms the SOTA disentangled-based, non-disentangled based, and probabilistic approaches by 10.0%, 2.0%, and 1.3%, respectively.
最近,基于扩散的单目三维人体姿态估计方法通过直接回归二维姿态序列的三维关节坐标,实现了最先进的(SOTA)性能。虽然有些方法根据人体解剖骨架将任务分解为骨骼长度和骨骼方向预测,以明确纳入更多人体先验约束条件,但这些方法的性能明显低于基于 SOTA 扩散的方法。这可归因于人体骨骼的树状结构。直接应用分解方法可能会放大分层误差的积累,并通过每个层次传播。同时,以往的方法并没有充分挖掘层次信息。为了解决这些问题,我们提出了一种带有分层空间和时间去噪器的基于扩散的三维人体姿态估计方法,称为 DDHPose。在我们的方法中:(1) 在扩散模型的前向过程中,我们将三维姿势与骨骼长度和骨骼方向分离开来,从而有效地建立人体姿势先验模型。我们提出了一种不纠缠损失来监督扩散模型的学习。(2) 在反向过程中,我们提出了分层空间和时间去噪器(HSTDenoiser)来改进每个关节的分层建模。我们的 HSTDenoiser 由两部分组成:层次相关空间变换器(HRST)和层次相关时间变换器(HRTT)。HRST 利用关节空间信息和父关节对每个关节的影响来建立空间模型,而 HRTT 则利用关节及其分层相邻关节的信息来探索关节之间的分层时间相关性。在 Human3.6M 和 MPI-INF-3DHP 数据集上进行的大量实验表明,我们的方法比基于 SOTA 的非纠缠方法、基于非纠缠方法和概率方法分别优胜 10.0%、2.0% 和 1.3%。
{"title":"Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser","authors":"Qingyuan Cai, Xuecai Hu, Saihui Hou, Li Yao, Yongzhen Huang","doi":"10.1609/aaai.v38i2.27847","DOIUrl":"https://doi.org/10.1609/aaai.v38i2.27847","url":null,"abstract":"Recently, diffusion-based methods for monocular 3D human pose estimation have achieved state-of-the-art (SOTA) performance by directly regressing the 3D joint coordinates from the 2D pose sequence. Although some methods decompose the task into bone length and bone direction prediction based on the human anatomical skeleton to explicitly incorporate more human body prior constraints, the performance of these methods is significantly lower than that of the SOTA diffusion-based methods. This can be attributed to the tree structure of the human skeleton. Direct application of the disentangled method could amplify the accumulation of hierarchical errors, propagating through each hierarchy. Meanwhile, the hierarchical information has not been fully explored by the previous methods. To address these problems, a Disentangled Diffusion-based 3D human Pose Estimation method with Hierarchical Spatial and Temporal Denoiser is proposed, termed DDHPose. In our approach: (1) We disentangle the 3d pose and diffuse the bone length and bone direction during the forward process of the diffusion model to effectively model the human pose prior. A disentanglement loss is proposed to supervise diffusion model learning. (2) For the reverse process, we propose Hierarchical Spatial and Temporal Denoiser (HSTDenoiser) to improve the hierarchical modelling of each joint. Our HSTDenoiser comprises two components: the Hierarchical-Related Spatial Transformer (HRST) and the Hierarchical-Related Temporal Transformer (HRTT). HRST exploits joint spatial information and the influence of the parent joint on each joint for spatial modeling, while HRTT utilizes information from both the joint and its hierarchical adjacent joints to explore the hierarchical temporal correlations among joints. Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets show that our method outperforms the SOTA disentangled-based, non-disentangled based, and probabilistic approaches by 10.0%, 2.0%, and 1.3%, respectively.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"18 18","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information retrieval (IR) is a pivotal component in various applications. Recent advances in machine learning (ML) have enabled the integration of ML algorithms into IR, particularly in ranking systems. While there is a plethora of research on the robustness of ML-based ranking systems, these studies largely neglect commercial e-commerce systems and fail to establish a connection between real-world and manipulated query relevance. In this paper, we present the first systematic measurement study on the robustness of e-commerce ranking systems. We define robustness as the consistency of ranking outcomes for semantically identical queries. To quantitatively analyze robustness, we propose a novel metric that considers both ranking position and item-specific information that are absent in existing metrics. Our large-scale measurement study with real-world data from e-commerce retailers reveals an open opportunity to measure and improve robustness since semantically identical queries often yield inconsistent ranking results. Based on our observations, we propose several solution directions to enhance robustness, such as the use of Large Language Models. Note that the issue of robustness discussed herein does not constitute an error or oversight. Rather, in scenarios where there exists a vast array of choices, it is feasible to present a multitude of products in various permutations, all of which could be equally appealing. However, this extensive selection may lead to customer confusion. As e-commerce retailers use various techniques to improve the quality of search results, we hope that this research offers valuable guidance for measuring the robustness of the ranking systems.
信息检索(IR)是各种应用中的关键组成部分。机器学习(ML)领域的最新进展使 ML 算法得以集成到 IR 中,特别是集成到排名系统中。虽然对基于 ML 的排名系统的鲁棒性进行了大量研究,但这些研究在很大程度上忽视了商业电子商务系统,也未能在真实世界和操纵查询相关性之间建立联系。在本文中,我们首次对电子商务排名系统的稳健性进行了系统的测量研究。我们将稳健性定义为语义相同的查询的排名结果的一致性。为了定量分析稳健性,我们提出了一种新的度量方法,该方法同时考虑了现有度量方法中缺乏的排名位置和特定项目信息。我们利用电子商务零售商的真实数据进行的大规模测量研究揭示了测量和改进稳健性的机会,因为语义相同的查询往往会产生不一致的排名结果。根据我们的观察,我们提出了几个增强稳健性的解决方案方向,例如使用大型语言模型。请注意,本文讨论的稳健性问题并不构成错误或疏忽。相反,在存在大量选择的情况下,以各种排列组合的方式呈现多种产品是可行的,所有这些产品都可能具有同样的吸引力。然而,这种广泛的选择可能会导致客户混淆。随着电子商务零售商使用各种技术来提高搜索结果的质量,我们希望这项研究能为衡量排名系统的稳健性提供有价值的指导。
{"title":"Towards Robustness Analysis of E-Commerce Ranking System","authors":"Ningfei Wang, Yupin Huang, Han Cheng, Jiri Gesi, Xiaojie Wang, Vivek Mittal","doi":"10.1145/3589335.3648335","DOIUrl":"https://doi.org/10.1145/3589335.3648335","url":null,"abstract":"Information retrieval (IR) is a pivotal component in various applications. Recent advances in machine learning (ML) have enabled the integration of ML algorithms into IR, particularly in ranking systems. While there is a plethora of research on the robustness of ML-based ranking systems, these studies largely neglect commercial e-commerce systems and fail to establish a connection between real-world and manipulated query relevance. In this paper, we present the first systematic measurement study on the robustness of e-commerce ranking systems. We define robustness as the consistency of ranking outcomes for semantically identical queries. To quantitatively analyze robustness, we propose a novel metric that considers both ranking position and item-specific information that are absent in existing metrics. Our large-scale measurement study with real-world data from e-commerce retailers reveals an open opportunity to measure and improve robustness since semantically identical queries often yield inconsistent ranking results. Based on our observations, we propose several solution directions to enhance robustness, such as the use of Large Language Models. Note that the issue of robustness discussed herein does not constitute an error or oversight. Rather, in scenarios where there exists a vast array of choices, it is feasible to present a multitude of products in various permutations, all of which could be equally appealing. However, this extensive selection may lead to customer confusion. As e-commerce retailers use various techniques to improve the quality of search results, we hope that this research offers valuable guidance for measuring the robustness of the ranking systems.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"23 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adam Joseph Coscia, Langdon Holmes, Wesley Morris, Joon Suh Choi, Scott Crossley, A. Endert
The recent explosion in popularity of large language models (LLMs) has inspired learning engineers to incorporate them into adaptive educational tools that automatically score summary writing. Understanding and evaluating LLMs is vital before deploying them in critical learning environments, yet their unprecedented size and expanding number of parameters inhibits transparency and impedes trust when they underperform. Through a collaborative user-centered design process with several learning engineers building and deploying summary scoring LLMs, we characterized fundamental design challenges and goals around interpreting their models, including aggregating large text inputs, tracking score provenance, and scaling LLM interpretability methods. To address their concerns, we developed iScore, an interactive visual analytics tool for learning engineers to upload, score, and compare multiple summaries simultaneously. Tightly integrated views allow users to iteratively revise the language in summaries, track changes in the resulting LLM scores, and visualize model weights at multiple levels of abstraction. To validate our approach, we deployed iScore with three learning engineers over the course of a month. We present a case study where interacting with iScore led a learning engineer to improve their LLM's score accuracy by three percentage points. Finally, we conducted qualitative interviews with the learning engineers that revealed how iScore enabled them to understand, evaluate, and build trust in their LLMs during deployment.
{"title":"iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries","authors":"Adam Joseph Coscia, Langdon Holmes, Wesley Morris, Joon Suh Choi, Scott Crossley, A. Endert","doi":"10.1145/3640543.3645142","DOIUrl":"https://doi.org/10.1145/3640543.3645142","url":null,"abstract":"The recent explosion in popularity of large language models (LLMs) has inspired learning engineers to incorporate them into adaptive educational tools that automatically score summary writing. Understanding and evaluating LLMs is vital before deploying them in critical learning environments, yet their unprecedented size and expanding number of parameters inhibits transparency and impedes trust when they underperform. Through a collaborative user-centered design process with several learning engineers building and deploying summary scoring LLMs, we characterized fundamental design challenges and goals around interpreting their models, including aggregating large text inputs, tracking score provenance, and scaling LLM interpretability methods. To address their concerns, we developed iScore, an interactive visual analytics tool for learning engineers to upload, score, and compare multiple summaries simultaneously. Tightly integrated views allow users to iteratively revise the language in summaries, track changes in the resulting LLM scores, and visualize model weights at multiple levels of abstraction. To validate our approach, we deployed iScore with three learning engineers over the course of a month. We present a case study where interacting with iScore led a learning engineer to improve their LLM's score accuracy by three percentage points. Finally, we conducted qualitative interviews with the learning engineers that revealed how iScore enabled them to understand, evaluate, and build trust in their LLMs during deployment.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"22 40","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adam Joseph Coscia, H. Sapers, Noah Deutsch, Malika Khurana, J. Magyar, Sergio A. Parra, Daniel R. Utter, R.L. Wipfler, D. Caress, Eric J. Martin, J. Paduan, M. Hendrie, S. Lombeyda, H. Mushkin, A. Endert, Scott Davidoff, V. Orphan
Scientists studying deep ocean microbial ecosystems use limited numbers of sediment samples collected from the seafloor to characterize important life-sustaining biogeochemical cycles in the environment. Yet conducting fieldwork to sample these extreme remote environments is both expensive and time consuming, requiring tools that enable scientists to explore the sampling history of field sites and predict where taking new samples is likely to maximize scientific return. We conducted a collaborative, user-centered design study with a team of scientific researchers to develop DeepSee, an interactive data workspace that visualizes 2D and 3D interpolations of biogeochemical and microbial processes in context together with sediment sampling history overlaid on 2D seafloor maps. Based on a field deployment and qualitative interviews, we found that DeepSee increased the scientific return from limited sample sizes, catalyzed new research workflows, reduced long-term costs of sharing data, and supported teamwork and communication between team members with diverse research goals.
{"title":"DeepSee: Multidimensional Visualizations of Seabed Ecosystems","authors":"Adam Joseph Coscia, H. Sapers, Noah Deutsch, Malika Khurana, J. Magyar, Sergio A. Parra, Daniel R. Utter, R.L. Wipfler, D. Caress, Eric J. Martin, J. Paduan, M. Hendrie, S. Lombeyda, H. Mushkin, A. Endert, Scott Davidoff, V. Orphan","doi":"10.1145/3613904.3642001","DOIUrl":"https://doi.org/10.1145/3613904.3642001","url":null,"abstract":"Scientists studying deep ocean microbial ecosystems use limited numbers of sediment samples collected from the seafloor to characterize important life-sustaining biogeochemical cycles in the environment. Yet conducting fieldwork to sample these extreme remote environments is both expensive and time consuming, requiring tools that enable scientists to explore the sampling history of field sites and predict where taking new samples is likely to maximize scientific return. We conducted a collaborative, user-centered design study with a team of scientific researchers to develop DeepSee, an interactive data workspace that visualizes 2D and 3D interpolations of biogeochemical and microbial processes in context together with sediment sampling history overlaid on 2D seafloor maps. Based on a field deployment and qualitative interviews, we found that DeepSee increased the scientific return from limited sample sizes, catalyzed new research workflows, reduced long-term costs of sharing data, and supported teamwork and communication between team members with diverse research goals.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"23 25","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One important desideratum of lifelong learning aims to discover novel classes from unlabelled data in a continuous manner. The central challenge is twofold: discovering and learning novel classes while mitigating the issue of catastrophic forgetting of established knowledge. To this end, we introduce a new paradigm called Adaptive Discovering and Merging (ADM) to discover novel categories adaptively in the incremental stage and integrate novel knowledge into the model without affecting the original knowledge. To discover novel classes adaptively, we decouple representation learning and novel class discovery, and use Triple Comparison (TC) and Probability Regularization (PR) to constrain the probability discrepancy and diversity for adaptive category assignment. To merge the learned novel knowledge adaptively, we propose a hybrid structure with base and novel branches named Adaptive Model Merging (AMM), which reduces the interference of the novel branch on the old classes to preserve the previous knowledge, and merges the novel branch to the base model without performance loss and parameter growth. Extensive experiments on several datasets show that ADM significantly outperforms existing class-incremental Novel Class Discovery (class-iNCD) approaches. Moreover, our AMM also benefits the class-incremental Learning (class-IL) task by alleviating the catastrophic forgetting problem. The source code is included in the supplementary materials.
终生学习的一个重要目的是从无标签数据中持续发现新的类别。核心挑战有两个方面:在发现和学习新类别的同时,减少对已有知识的灾难性遗忘。为此,我们引入了一种名为 "自适应发现与合并"(Adaptive Discovering and Merging,ADM)的新范式,以在增量阶段自适应地发现新类别,并在不影响原有知识的情况下将新知识整合到模型中。为了自适应地发现新类别,我们将表示学习和新类别发现解耦,并使用三重比较(TC)和概率正则化(PR)来限制自适应类别分配的概率差异和多样性。为了自适应地合并所学到的新知识,我们提出了一种具有基础分支和新分支的混合结构,命名为自适应模型合并(AMM),它可以减少新分支对旧类的干扰,从而保留以前的知识,并在不损失性能和不增加参数的情况下将新分支合并到基础模型中。在多个数据集上进行的大量实验表明,ADM 明显优于现有的类递增新类发现(class-iNCD)方法。此外,通过缓解灾难性遗忘问题,我们的AMM还有利于类递增学习(class-IL)任务。源代码包含在补充材料中。
{"title":"Adaptive Discovering and Merging for Incremental Novel Class Discovery","authors":"Guangyao Chen, Peixi Peng, Yangru Huang, Mengyue Geng, Yonghong Tian","doi":"10.1609/aaai.v38i10.29006","DOIUrl":"https://doi.org/10.1609/aaai.v38i10.29006","url":null,"abstract":"One important desideratum of lifelong learning aims to discover novel classes from unlabelled data in a continuous manner. The central challenge is twofold: discovering and learning novel classes while mitigating the issue of catastrophic forgetting of established knowledge. To this end, we introduce a new paradigm called Adaptive Discovering and Merging (ADM) to discover novel categories adaptively in the incremental stage and integrate novel knowledge into the model without affecting the original knowledge. To discover novel classes adaptively, we decouple representation learning and novel class discovery, and use Triple Comparison (TC) and Probability Regularization (PR) to constrain the probability discrepancy and diversity for adaptive category assignment. To merge the learned novel knowledge adaptively, we propose a hybrid structure with base and novel branches named Adaptive Model Merging (AMM), which reduces the interference of the novel branch on the old classes to preserve the previous knowledge, and merges the novel branch to the base model without performance loss and parameter growth. Extensive experiments on several datasets show that ADM significantly outperforms existing class-incremental Novel Class Discovery (class-iNCD) approaches. Moreover, our AMM also benefits the class-incremental Learning (class-IL) task by alleviating the catastrophic forgetting problem. The source code is included in the supplementary materials.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"8 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, Tianyi Zhang
The recent advancements in Generative AI have significantly advanced the field of text-to-image generation. The state-of-the-art text-to-image model, Stable Diffusion, is now capable of synthesizing high-quality images with a strong sense of aesthetics. Crafting text prompts that align with the model's interpretation and the user's intent thus becomes crucial. However, prompting remains challenging for novice users due to the complexity of the stable diffusion model and the non-trivial efforts required for iteratively editing and refining the text prompts. To address these challenges, we propose PromptCharm, a mixed-initiative system that facilitates text-to-image creation through multi-modal prompt engineering and refinement. To assist novice users in prompting, PromptCharm first automatically refines and optimizes the user's initial prompt. Furthermore, PromptCharm supports the user in exploring and selecting different image styles within a large database. To assist users in effectively refining their prompts and images, PromptCharm renders model explanations by visualizing the model's attention values. If the user notices any unsatisfactory areas in the generated images, they can further refine the images through model attention adjustment or image inpainting within the rich feedback loop of PromptCharm. To evaluate the effectiveness and usability of PromptCharm, we conducted a controlled user study with 12 participants and an exploratory user study with another 12 participants. These two studies show that participants using PromptCharm were able to create images with higher quality and better aligned with the user's expectations compared with using two variants of PromptCharm that lacked interaction or visualization support.
{"title":"PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement","authors":"Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, Tianyi Zhang","doi":"10.1145/3613904.3642803","DOIUrl":"https://doi.org/10.1145/3613904.3642803","url":null,"abstract":"The recent advancements in Generative AI have significantly advanced the field of text-to-image generation. The state-of-the-art text-to-image model, Stable Diffusion, is now capable of synthesizing high-quality images with a strong sense of aesthetics. Crafting text prompts that align with the model's interpretation and the user's intent thus becomes crucial. However, prompting remains challenging for novice users due to the complexity of the stable diffusion model and the non-trivial efforts required for iteratively editing and refining the text prompts. To address these challenges, we propose PromptCharm, a mixed-initiative system that facilitates text-to-image creation through multi-modal prompt engineering and refinement. To assist novice users in prompting, PromptCharm first automatically refines and optimizes the user's initial prompt. Furthermore, PromptCharm supports the user in exploring and selecting different image styles within a large database. To assist users in effectively refining their prompts and images, PromptCharm renders model explanations by visualizing the model's attention values. If the user notices any unsatisfactory areas in the generated images, they can further refine the images through model attention adjustment or image inpainting within the rich feedback loop of PromptCharm. To evaluate the effectiveness and usability of PromptCharm, we conducted a controlled user study with 12 participants and an exploratory user study with another 12 participants. These two studies show that participants using PromptCharm were able to create images with higher quality and better aligned with the user's expectations compared with using two variants of PromptCharm that lacked interaction or visualization support.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"3 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oseremen Joy Idialu, N. Mathews, Rungroj Maipradit, J. Atlee, Mei Nagappan
Artificial intelligence (AI) assistants such as GitHub Copilot and ChatGPT, built on large language models like GPT-4, are revolutionizing how programming tasks are performed, raising questions about whether code is authored by generative AI models. Such questions are of particular interest to educators, who worry that these tools enable a new form of academic dishonesty, in which students submit AI generated code as their own work. Our research explores the viability of using code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code. Our dataset comprises human-authored solutions from CodeChef and AI-authored solutions generated by GPT-4. Our classifier outperforms baselines, with an F1-score and AUC-ROC score of 0.91. A variant of our classifier that excludes gameable features (e.g., empty lines, whitespace) still performs well with an F1-score and AUC-ROC score of 0.89. We also evaluated our classifier with respect to the difficulty of the programming problem and found that there was almost no difference between easier and intermediate problems, and the classifier performed only slightly worse on harder problems. Our study shows that code stylometry is a promising approach for distinguishing between GPT-4 generated code and human-authored code.
{"title":"Whodunit: Classifying Code as Human Authored or GPT-4 Generated - A case study on CodeChef problems","authors":"Oseremen Joy Idialu, N. Mathews, Rungroj Maipradit, J. Atlee, Mei Nagappan","doi":"10.1145/3643991.3644926","DOIUrl":"https://doi.org/10.1145/3643991.3644926","url":null,"abstract":"Artificial intelligence (AI) assistants such as GitHub Copilot and ChatGPT, built on large language models like GPT-4, are revolutionizing how programming tasks are performed, raising questions about whether code is authored by generative AI models. Such questions are of particular interest to educators, who worry that these tools enable a new form of academic dishonesty, in which students submit AI generated code as their own work. Our research explores the viability of using code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code. Our dataset comprises human-authored solutions from CodeChef and AI-authored solutions generated by GPT-4. Our classifier outperforms baselines, with an F1-score and AUC-ROC score of 0.91. A variant of our classifier that excludes gameable features (e.g., empty lines, whitespace) still performs well with an F1-score and AUC-ROC score of 0.89. We also evaluated our classifier with respect to the difficulty of the programming problem and found that there was almost no difference between easier and intermediate problems, and the classifier performed only slightly worse on harder problems. Our study shows that code stylometry is a promising approach for distinguishing between GPT-4 generated code and human-authored code.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-06DOI: 10.1109/icassp48485.2024.10447495
L. Wen, Zheng-Kai Feng, Yun Hou, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang
Semi-supervised learning is a sound measure to relieve the strict demand of abundant annotated datasets, especially for challenging multi-organ segmentation . However, most existing SSL methods predict pixels in a single image independently, ignoring the relations among images and categories. In this paper, we propose a two-stage Dual Contrastive Learning Network for semi-supervised MoS, which utilizes global and local contrastive learning to strengthen the relations among images and classes. Concretely, in Stage 1, we develop a similarity-guided global contrastive learning to explore the implicit continuity and similarity among images and learn global context. Then, in Stage 2, we present an organ-aware local contrastive learning to further attract the class representations. To ease the computation burden, we introduce a mask center computation algorithm to compress the category representations for local contrastive learning. Experiments conducted on the public 2017 ACDC dataset and an in-house RC-OARs dataset has demonstrated the superior performance of our method.
半监督学习(SSL)是一种有效的措施,可以缓解对大量标注数据集的严格要求,尤其适用于具有挑战性的多器官分割。然而,大多数现有的半监督学习方法都是独立预测单幅图像中的像素,忽略了图像和类别之间的关系。在本文中,我们提出了一种用于半监督 MoS 的两阶段双对比学习网络,它利用全局和局部对比学习来加强图像和类别之间的关系。具体来说,在第一阶段,我们开发了一种相似性引导的全局对比学习,以探索图像之间隐含的连续性和相似性,并学习全局上下文。然后,在第二阶段,我们提出了器官感知局部对比学习,以进一步吸引类表征。为了减轻计算负担,我们引入了一种掩码中心计算算法来压缩局部对比学习的类别表征。在 2017 年公开的 ACDC 数据集和内部的 RC-OARs 数据集上进行的实验证明了我们的方法性能优越。
{"title":"Dcl-Net: Dual Contrastive Learning Network for Semi-Supervised Multi-Organ Segmentation","authors":"L. Wen, Zheng-Kai Feng, Yun Hou, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang","doi":"10.1109/icassp48485.2024.10447495","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10447495","url":null,"abstract":"Semi-supervised learning is a sound measure to relieve the strict demand of abundant annotated datasets, especially for challenging multi-organ segmentation . However, most existing SSL methods predict pixels in a single image independently, ignoring the relations among images and categories. In this paper, we propose a two-stage Dual Contrastive Learning Network for semi-supervised MoS, which utilizes global and local contrastive learning to strengthen the relations among images and classes. Concretely, in Stage 1, we develop a similarity-guided global contrastive learning to explore the implicit continuity and similarity among images and learn global context. Then, in Stage 2, we present an organ-aware local contrastive learning to further attract the class representations. To ease the computation burden, we introduce a mask center computation algorithm to compress the category representations for local contrastive learning. Experiments conducted on the public 2017 ACDC dataset and an in-house RC-OARs dataset has demonstrated the superior performance of our method.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tim Kräuter, Patrick Stünkel, Adrian Rutle, Yngve Lamo
The Visual Debugger is an IntelliJ IDEA plugin that presents debug information as an object diagram to enhance program understanding. Reflecting on our past development, we detail the lessons learned and roadblocks we have experienced while implementing and integrating the Visual Debugger into the IntelliJ IDEA. Furthermore, we describe recent improvements to the Visual Debugger, greatly enhancing the plugin in the present. Looking into the future, we propose solutions to overcome the roadblocks encountered while developing the plugin and further plans for the Visual Debugger.
可视化调试器是一个 IntelliJ IDEA 插件,它能以对象图的形式显示调试信息,以增强对程序的理解。回顾过去的开发历程,我们详细介绍了在将 Visual Debugger 实施和集成到 IntelliJ IDEA 的过程中吸取的经验教训和遇到的障碍。此外,我们还介绍了 Visual Debugger 的最新改进,这些改进极大地增强了该插件的功能。展望未来,我们提出了克服开发插件过程中遇到的障碍的解决方案,以及 Visual Debugger 的进一步计划。
{"title":"The Visual Debugger: Past, Present, and Future","authors":"Tim Kräuter, Patrick Stünkel, Adrian Rutle, Yngve Lamo","doi":"10.1145/3643796.3648443","DOIUrl":"https://doi.org/10.1145/3643796.3648443","url":null,"abstract":"The Visual Debugger is an IntelliJ IDEA plugin that presents debug information as an object diagram to enhance program understanding. Reflecting on our past development, we detail the lessons learned and roadblocks we have experienced while implementing and integrating the Visual Debugger into the IntelliJ IDEA. Furthermore, we describe recent improvements to the Visual Debugger, greatly enhancing the plugin in the present. Looking into the future, we propose solutions to overcome the roadblocks encountered while developing the plugin and further plans for the Visual Debugger.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"7 18","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-06DOI: 10.3929/ethz-b-000661775
Laura Mascarell, Ribin Chalumattu, Annette Rios
The advent of Large Language Models (LLMs) has led to remarkable progress on a wide range of natural language processing tasks. Despite the advances, these large-sized models still suffer from hallucinating information in their output, which poses a major issue in automatic text summarization, as we must guarantee that the generated summary is consistent with the content of the source document. Previous research addresses the challenging task of detecting hallucinations in the output (i.e. inconsistency detection) in order to evaluate the faithfulness of the generated summaries. However, these works primarily focus on English and recent multilingual approaches lack German data. This work presents absinth, a manually annotated dataset for hallucination detection in German news summarization and explores the capabilities of novel open-source LLMs on this task in both fine-tuning and in-context learning settings. We open-source and release the absinth dataset to foster further research on hallucination detection in German.
{"title":"German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset","authors":"Laura Mascarell, Ribin Chalumattu, Annette Rios","doi":"10.3929/ethz-b-000661775","DOIUrl":"https://doi.org/10.3929/ethz-b-000661775","url":null,"abstract":"The advent of Large Language Models (LLMs) has led to remarkable progress on a wide range of natural language processing tasks. Despite the advances, these large-sized models still suffer from hallucinating information in their output, which poses a major issue in automatic text summarization, as we must guarantee that the generated summary is consistent with the content of the source document. Previous research addresses the challenging task of detecting hallucinations in the output (i.e. inconsistency detection) in order to evaluate the faithfulness of the generated summaries. However, these works primarily focus on English and recent multilingual approaches lack German data. This work presents absinth, a manually annotated dataset for hallucination detection in German news summarization and explores the capabilities of novel open-source LLMs on this task in both fine-tuning and in-context learning settings. We open-source and release the absinth dataset to foster further research on hallucination detection in German.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"3 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}