ArXiv最新文献

英文中文

BoostER: Leveraging Large Language Models for Enhancing Entity Resolution BoostER：利用大型语言模型增强实体解析能力

ArXiv

Pub Date : 2024-03-11 DOI: 10.1145/3589335.3651245 10.1145/3589335.3651245 10.1145/3589335.3651245

Huahang Li, Shuangyin Li, Fei Hao, C. Zhang, Yuanfeng Song, Lei Chen

Entity resolution, which involves identifying and merging records that refer to the same real-world entity, is a crucial task in areas like Web data integration. This importance is underscored by the presence of numerous duplicated and multi-version data resources on the Web. However, achieving high-quality entity resolution typically demands significant effort. The advent of Large Language Models (LLMs) like GPT-4 has demonstrated advanced linguistic capabilities, which can be a new paradigm for this task. In this paper, we propose a demonstration system named BoostER that examines the possibility of leveraging LLMs in the entity resolution process, revealing advantages in both easy deployment and low cost. Our approach optimally selects a set of matching questions and poses them to LLMs for verification, then refines the distribution of entity resolution results with the response of LLMs. This offers promising prospects to achieve a high-quality entity resolution result for real-world applications, especially to individuals or small companies without the need for extensive model training or significant financial investment.

实体解析涉及识别和合并指向同一现实世界实体的记录，是网络数据集成等领域的一项重要任务。网络上存在大量重复和多版本的数据资源，这就凸显了这项任务的重要性。然而，实现高质量的实体解析通常需要付出巨大的努力。像 GPT-4 这样的大型语言模型（LLM）的出现展示了先进的语言能力，可以成为这项任务的新范例。在本文中，我们提出了一个名为 BoostER 的演示系统，该系统研究了在实体解析过程中利用 LLM 的可能性，揭示了 LLM 在易于部署和低成本方面的优势。我们的方法以最佳方式选择一组匹配问题，并将其提交给 LLMs 进行验证，然后根据 LLMs 的响应完善实体解析结果的分布。这为现实世界的应用，尤其是个人或小公司的应用，提供了实现高质量实体解析结果的广阔前景，而无需大量的模型训练或大量的资金投入。

{"title":"BoostER: Leveraging Large Language Models for Enhancing Entity Resolution","authors":"Huahang Li, Shuangyin Li, Fei Hao, C. Zhang, Yuanfeng Song, Lei Chen","doi":"10.1145/3589335.3651245 10.1145/3589335.3651245 10.1145/3589335.3651245","DOIUrl":"https://doi.org/10.1145/3589335.3651245 10.1145/3589335.3651245 10.1145/3589335.3651245","url":null,"abstract":"Entity resolution, which involves identifying and merging records that refer to the same real-world entity, is a crucial task in areas like Web data integration. This importance is underscored by the presence of numerous duplicated and multi-version data resources on the Web. However, achieving high-quality entity resolution typically demands significant effort. The advent of Large Language Models (LLMs) like GPT-4 has demonstrated advanced linguistic capabilities, which can be a new paradigm for this task. In this paper, we propose a demonstration system named BoostER that examines the possibility of leveraging LLMs in the entity resolution process, revealing advantages in both easy deployment and low cost. Our approach optimally selects a set of matching questions and poses them to LLMs for verification, then refines the distribution of entity resolution results with the response of LLMs. This offers promising prospects to achieve a high-quality entity resolution result for real-world applications, especially to individuals or small companies without the need for extensive model training or significant financial investment.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"28 37","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring Large Language Models and Hierarchical Frameworks for Classification of Large Unstructured Legal Documents 探索用于大型非结构化法律文件分类的大型语言模型和层次框架

ArXiv

Pub Date : 2024-03-11 DOI: 10.1007/978-3-031-56060-6_15

Nishchal Prasad, M. Boughanem, T. Dkaki

引用次数: 0

RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems RecAI：利用大型语言模型开发新一代推荐系统

ArXiv

Pub Date : 2024-03-11 DOI: 10.1145/3589335.3651242

Jianxun Lian, Yuxuan Lei, Xu Huang, Jing Yao, Wei Xu, Xing Xie

This paper introduces RecAI, a practical toolkit designed to augment or even revolutionize recommender systems with the advanced capabilities of Large Language Models (LLMs). RecAI provides a suite of tools, including Recommender AI Agent, Recommendation-oriented Language Models, Knowledge Plugin, RecExplainer, and Evaluator, to facilitate the integration of LLMs into recommender systems from multifaceted perspectives. The new generation of recommender systems, empowered by LLMs, are expected to be more versatile, explainable, conversational, and controllable, paving the way for more intelligent and user-centric recommendation experiences. We hope the open-source of RecAI can help accelerate evolution of new advanced recommender systems. The source code of RecAI is available at url{https://github.com/microsoft/RecAI}.

本文介绍 RecAI，这是一个实用的工具包，旨在利用大型语言模型（LLM）的先进功能来增强甚至革新推荐系统。RecAI 提供了一套工具，包括推荐人工智能代理、面向推荐的语言模型、知识插件、RecExplainer 和评估器，从多角度促进 LLMs 与推荐系统的整合。有了 LLMs 的加持，新一代的推荐系统有望变得更加通用、可解释、可对话和可控制，从而为更加智能和以用户为中心的推荐体验铺平道路。我们希望 RecAI 的开源能有助于加速新的高级推荐系统的发展。RecAI 的源代码可在 url{https://github.com/microsoft/RecAI} 上获取。

引用次数: 0

AI as a Child of Mother Earth: Regrounding Human-AI Interaction in Ecological Thinking 人工智能是地球母亲的孩子：以生态思维重新审视人类与人工智能的互动

ArXiv

Pub Date : 2024-03-11 DOI: 10.1145/3613905.3644065

Chunchen Xu, Xiao Ge

The anthropocentric cultural idea that humans are active agents exerting control over their environments has been largely normalized and inscribed in practices, policies, and products of contemporary industrialized societies. This view underlies a human-ecology relationship based on resource and knowledge extraction. To create a more sustainable and equitable future, it is essential to consider alternative cultural ideas rooted in ecological thinking. This perspective underscores the interconnectedness between humans and more-than-human worlds. We propose a path to reshape the human-ecology relationship by advocating for alternative human-AI interactions. In this paper, we undertake a critical comparison between anthropocentrism and ecological thinking, using storytelling to illustrate various human-AI interactions that embody ecological thinking. We also delineate a set of design principles aimed at guiding AI developments toward fostering a more caring human-ecology relationship.

以人类为中心的文化观念认为，人类是对其环境进行控制的积极主动者，这种观念已在很大程度上被规范化，并被刻印在当代工业化社会的实践、政策和产品中。这种观点是以资源和知识攫取为基础的人类-生态关系的基础。为了创造一个更加可持续和公平的未来，必须考虑植根于生态思维的其他文化理念。这种观点强调了人类与超人类世界之间的相互联系。我们提出了一条重塑人类与生态关系的道路，倡导人类与人工智能的替代性互动。在本文中，我们对人类中心主义和生态思维进行了批判性比较，用讲故事的方式说明了体现生态思维的各种人类-人工智能互动。我们还阐述了一套设计原则，旨在引导人工智能的发展，以促进更加关爱人类与生态的关系。

引用次数: 0

Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach 大规模云系统中的知识感知警报聚合：一种混合方法

ArXiv

Pub Date : 2024-03-11 DOI: 10.1145/3639477.3639745

Jinxi Kuang, Jinyang Liu, Junjie Huang, Renyi Zhong, Jiazhen Gu, Lan Yu, Rui Tan, Zengyin Yang, Michael R. Lyu

Due to the scale and complexity of cloud systems, a system failure would trigger an"alert storm", i.e., massive correlated alerts. Although these alerts can be traced back to a few root causes, the overwhelming number makes it infeasible for manual handling. Alert aggregation is thus critical to help engineers concentrate on the root cause and facilitate failure resolution. Existing methods typically utilize semantic similarity-based methods or statistical methods to aggregate alerts. However, semantic similarity-based methods overlook the causal rationale of alerts, while statistical methods can hardly handle infrequent alerts. To tackle these limitations, we introduce leveraging external knowledge, i.e., Standard Operation Procedure (SOP) of alerts as a supplement. We propose COLA, a novel hybrid approach based on correlation mining and LLM (Large Language Model) reasoning for online alert aggregation. The correlation mining module effectively captures the temporal and spatial relations between alerts, measuring their correlations in an efficient manner. Subsequently, only uncertain pairs with low confidence are forwarded to the LLM reasoning module for detailed analysis. This hybrid design harnesses both statistical evidence for frequent alerts and the reasoning capabilities of computationally intensive LLMs, ensuring the overall efficiency of COLA in handling large volumes of alerts in practical scenarios. We evaluate COLA on three datasets collected from the production environment of a large-scale cloud platform. The experimental results show COLA achieves F1-scores from 0.901 to 0.930, outperforming state-of-the-art methods and achieving comparable efficiency. We also share our experience in deploying COLA in our real-world cloud system, Cloud X.

由于云系统的规模和复杂性，系统故障会引发 "警报风暴"，即大量相关警报。虽然这些警报可以追溯到一些根本原因，但由于数量庞大，人工处理并不可行。因此，警报聚合对于帮助工程师集中精力查找根本原因和促进故障解决至关重要。现有方法通常利用基于语义相似性的方法或统计方法来聚合警报。然而，基于语义相似性的方法会忽略警报的因果关系，而统计方法则难以处理不常见的警报。为了解决这些局限性，我们引入了外部知识，即警报的标准操作程序（SOP）作为补充。我们提出了基于关联挖掘和 LLM（大语言模型）推理的新型混合方法 COLA，用于在线警报聚合。相关性挖掘模块能有效捕捉警报之间的时间和空间关系，以高效的方式测量它们之间的相关性。随后，只有置信度较低的不确定配对才会被转发到 LLM（语言模型）推理模块进行详细分析。这种混合设计既利用了频繁警报的统计证据，又利用了计算密集型 LLM 的推理能力，确保了 COLA 在实际场景中处理大量警报的整体效率。我们在大型云平台生产环境中收集的三个数据集上对 COLA 进行了评估。实验结果表明，COLA 的 F1 分数从 0.901 到 0.930 不等，超过了最先进的方法，实现了相当的效率。我们还分享了在实际云系统 Cloud X 中部署 COLA 的经验。

{"title":"Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach","authors":"Jinxi Kuang, Jinyang Liu, Junjie Huang, Renyi Zhong, Jiazhen Gu, Lan Yu, Rui Tan, Zengyin Yang, Michael R. Lyu","doi":"10.1145/3639477.3639745","DOIUrl":"https://doi.org/10.1145/3639477.3639745","url":null,"abstract":"Due to the scale and complexity of cloud systems, a system failure would trigger an\"alert storm\", i.e., massive correlated alerts. Although these alerts can be traced back to a few root causes, the overwhelming number makes it infeasible for manual handling. Alert aggregation is thus critical to help engineers concentrate on the root cause and facilitate failure resolution. Existing methods typically utilize semantic similarity-based methods or statistical methods to aggregate alerts. However, semantic similarity-based methods overlook the causal rationale of alerts, while statistical methods can hardly handle infrequent alerts. To tackle these limitations, we introduce leveraging external knowledge, i.e., Standard Operation Procedure (SOP) of alerts as a supplement. We propose COLA, a novel hybrid approach based on correlation mining and LLM (Large Language Model) reasoning for online alert aggregation. The correlation mining module effectively captures the temporal and spatial relations between alerts, measuring their correlations in an efficient manner. Subsequently, only uncertain pairs with low confidence are forwarded to the LLM reasoning module for detailed analysis. This hybrid design harnesses both statistical evidence for frequent alerts and the reasoning capabilities of computationally intensive LLMs, ensuring the overall efficiency of COLA in handling large volumes of alerts in practical scenarios. We evaluate COLA on three datasets collected from the production environment of a large-scale cloud platform. The experimental results show COLA achieves F1-scores from 0.901 to 0.930, outperforming state-of-the-art methods and achieving comparable efficiency. We also share our experience in deploying COLA in our real-world cloud system, Cloud X.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"30 22","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attacking Transformers with Feature Diversity Adversarial Perturbation 利用特征多样性逆向扰动攻击变压器

ArXiv

Pub Date : 2024-03-10 DOI: 10.1609/aaai.v38i3.27947

Chenxing Gao, Hang Zhou, Junqing Yu, Yuteng Ye, Jiale Cai, Junle Wang, Wei Yang

Understanding the mechanisms behind Vision Transformer (ViT), particularly its vulnerability to adversarial perturbations, is crucial for addressing challenges in its real-world applications. Existing ViT adversarial attackers rely on labels to calculate the gradient for perturbation, and exhibit low transferability to other structures and tasks. In this paper, we present a label-free white-box attack approach for ViT-based models that exhibits strong transferability to various black-box models, including most ViT variants, CNNs, and MLPs, even for models developed for other modalities. Our inspiration comes from the feature collapse phenomenon in ViTs, where the critical attention mechanism overly depends on the low-frequency component of features, causing the features in middle-to-end layers to become increasingly similar and eventually collapse. We propose the feature diversity attacker to naturally accelerate this process and achieve remarkable performance and transferability.

了解视觉变换器（ViT）背后的机制，特别是它在对抗性扰动面前的脆弱性，对于应对其在现实世界应用中的挑战至关重要。现有的 ViT 对抗性攻击依赖于标签来计算扰动梯度，对其他结构和任务的可移植性较低。在本文中，我们针对基于 ViT 的模型提出了一种无标签白箱攻击方法，这种方法对各种黑箱模型（包括大多数 ViT 变体、CNN 和 MLP）具有很强的可移植性，甚至对为其他模态开发的模型也是如此。我们的灵感来自于 ViT 中的特征坍塌现象，即临界注意力机制过度依赖于特征的低频分量，导致中层到末层的特征越来越相似，最终坍塌。我们提出了特征多样性攻击器，以自然地加速这一过程，并实现显著的性能和可移植性。

引用次数: 0

FARPLS: A Feature-Augmented Robot Trajectory Preference Labeling System to Assist Human Labelers' Preference Elicitation FARPLS：特征增强型机器人轨迹偏好标注系统辅助人类标注者的偏好激发

ArXiv

Pub Date : 2024-03-10 DOI: 10.1145/3640543.3645145

Hanfang Lyu, Yuanchen Bai, Xin Liang, Ujaan Das, Chuhan Shi, Leiliang Gong, Yingchi Li, Mingfei Sun, Ming Ge, Xiaojuan Ma

Preference-based learning aims to align robot task objectives with human values. One of the most common methods to infer human preferences is by pairwise comparisons of robot task trajectories. Traditional comparison-based preference labeling systems seldom support labelers to digest and identify critical differences between complex trajectories recorded in videos. Our formative study (N = 12) suggests that individuals may overlook non-salient task features and establish biased preference criteria during their preference elicitation process because of partial observations. In addition, they may experience mental fatigue when given many pairs to compare, causing their label quality to deteriorate. To mitigate these issues, we propose FARPLS, a Feature-Augmented Robot trajectory Preference Labeling System. FARPLS highlights potential outliers in a wide variety of task features that matter to humans and extracts the corresponding video keyframes for easy review and comparison. It also dynamically adjusts the labeling order according to users' familiarities, difficulties of the trajectory pair, and level of disagreements. At the same time, the system monitors labelers' consistency and provides feedback on labeling progress to keep labelers engaged. A between-subjects study (N = 42, 105 pairs of robot pick-and-place trajectories per person) shows that FARPLS can help users establish preference criteria more easily and notice more relevant details in the presented trajectories than the conventional interface. FARPLS also improves labeling consistency and engagement, mitigating challenges in preference elicitation without raising cognitive loads significantly

基于偏好的学习旨在使机器人的任务目标与人类的价值观保持一致。推断人类偏好的最常用方法之一是对机器人任务轨迹进行成对比较。传统的基于比较的偏好标记系统很少支持标记者消化和识别视频中记录的复杂轨迹之间的关键差异。我们的形成性研究（N = 12）表明，在偏好激发过程中，由于观察不全面，人类可能会忽略非刺激性任务特征，并建立有偏差的偏好标准。此外，当需要比较多对标签时，他们可能会产生心理疲劳，导致标签质量下降。为了缓解这些问题，我们提出了特征增强机器人轨迹偏好标签系统 FARPLS。FARPLS 可突出显示对人类至关重要的各种任务特征中的潜在异常值，并提取相应的视频关键帧，以便于查看和比较。它还会根据用户的熟悉程度、轨迹对的难度和分歧程度动态调整标签顺序。与此同时，系统还能监控标注者的一致性，并提供标注进度反馈，以保持标注者的参与度。一项主体间研究（N = 42，每人 105 对机器人拾放轨迹）表明，与传统界面相比，FARPLS 可以帮助用户更容易地建立偏好标准，并注意到所显示轨迹中更多的相关细节。FARPLS 还能提高标注的一致性和参与度，在不显著增加认知负荷的情况下减轻偏好激发方面的挑战。

{"title":"FARPLS: A Feature-Augmented Robot Trajectory Preference Labeling System to Assist Human Labelers' Preference Elicitation","authors":"Hanfang Lyu, Yuanchen Bai, Xin Liang, Ujaan Das, Chuhan Shi, Leiliang Gong, Yingchi Li, Mingfei Sun, Ming Ge, Xiaojuan Ma","doi":"10.1145/3640543.3645145","DOIUrl":"https://doi.org/10.1145/3640543.3645145","url":null,"abstract":"Preference-based learning aims to align robot task objectives with human values. One of the most common methods to infer human preferences is by pairwise comparisons of robot task trajectories. Traditional comparison-based preference labeling systems seldom support labelers to digest and identify critical differences between complex trajectories recorded in videos. Our formative study (N = 12) suggests that individuals may overlook non-salient task features and establish biased preference criteria during their preference elicitation process because of partial observations. In addition, they may experience mental fatigue when given many pairs to compare, causing their label quality to deteriorate. To mitigate these issues, we propose FARPLS, a Feature-Augmented Robot trajectory Preference Labeling System. FARPLS highlights potential outliers in a wide variety of task features that matter to humans and extracts the corresponding video keyframes for easy review and comparison. It also dynamically adjusts the labeling order according to users' familiarities, difficulties of the trajectory pair, and level of disagreements. At the same time, the system monitors labelers' consistency and provides feedback on labeling progress to keep labelers engaged. A between-subjects study (N = 42, 105 pairs of robot pick-and-place trajectories per person) shows that FARPLS can help users establish preference criteria more easily and notice more relevant details in the presented trajectories than the conventional interface. FARPLS also improves labeling consistency and engagement, mitigating challenges in preference elicitation without raising cognitive loads significantly","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"19 18","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Decoupled Contrastive Learning for Long-Tailed Recognition 针对长尾识别的解耦对比学习

ArXiv

Pub Date : 2024-03-10 DOI: 10.1609/aaai.v38i6.28459

Shiyu Xuan, Shiliang Zhang

Supervised Contrastive Loss (SCL) is popular in visual representation learning. Given an anchor image, SCL pulls two types of positive samples, i.e., its augmentation and other images from the same class together, while pushes negative images apart to optimize the learned embedding. In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance. In addition, similarity relationship among negative samples, that are ignored by SCL, also presents meaningful semantic cues. To improve the performance on long-tailed recognition, this paper addresses those two issues of SCL by decoupling the training objective. Specifically, it decouples two types of positives in SCL and optimizes their relations toward different objectives to alleviate the influence of the imbalanced dataset. We further propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes. It uses patch-based features to mine shared visual patterns among different instances and leverages a self distillation procedure to transfer such knowledge. Experiments on different long-tailed classification benchmarks demonstrate the superiority of our method. For instance, it achieves the 57.7% top-1 accuracy on the ImageNet-LT dataset. Combined with the ensemble-based method, the performance can be further boosted to 59.7%, which substantially outperforms many recent works. Our code will be released.

有监督对比损失（SCL）在视觉表征学习中非常流行。在给定锚图像的情况下，SCL 会将两类正样本（即其增强样本和同类的其他图像）拉到一起，同时将负样本推开，以优化学习到的嵌入。在长尾识别的情况下，每个类别中的样本数量是不平衡的，对两类正样本一视同仁会导致类别内距离的优化出现偏差。此外，被 SCL 忽视的负样本之间的相似性关系也提供了有意义的语义线索。为了提高长尾识别的性能，本文通过解耦训练目标来解决 SCL 的这两个问题。具体来说，它将 SCL 中的两类阳性解耦，并优化它们与不同目标的关系，以减轻不平衡数据集的影响。我们进一步提出了一种基于补丁的自我提炼方法，将头部类别的知识转移到尾部类别，以缓解尾部类别代表性不足的问题。它利用基于补丁的特征来挖掘不同实例之间的共享视觉模式，并利用自我蒸馏程序来转移这些知识。对不同长尾分类基准的实验证明了我们方法的优越性。例如，它在 ImageNet-LT 数据集上达到了 57.7% 的最高准确率。结合基于集合的方法，其性能可进一步提高到 59.7%，大大超过了许多最新的研究成果。我们的代码即将发布。

{"title":"Decoupled Contrastive Learning for Long-Tailed Recognition","authors":"Shiyu Xuan, Shiliang Zhang","doi":"10.1609/aaai.v38i6.28459","DOIUrl":"https://doi.org/10.1609/aaai.v38i6.28459","url":null,"abstract":"Supervised Contrastive Loss (SCL) is popular in visual representation learning.\u0000 Given an anchor image, SCL pulls two types of positive samples, i.e., its augmentation and other images from the same class together, while pushes negative images apart to optimize the learned embedding. In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance. In addition, similarity relationship among negative samples, that are ignored by SCL, also presents meaningful semantic cues. To improve the performance on long-tailed recognition, this paper addresses those two issues of SCL by decoupling the training objective. Specifically, it decouples two types of positives in SCL and optimizes their relations toward different objectives to alleviate the influence of the imbalanced dataset. We further propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes. It uses patch-based features to mine shared visual patterns among different instances and leverages a self distillation procedure to transfer such knowledge. Experiments on different long-tailed classification benchmarks demonstrate the superiority of our method. For instance, it achieves the 57.7% top-1 accuracy on the ImageNet-LT dataset. Combined with the ensemble-based method, the performance can be further boosted to 59.7%, which substantially outperforms many recent works. Our code will be released.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"20 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations SCORE：自我监督对应关系微调，改进内容表征

ArXiv

Pub Date : 2024-03-10 DOI: 10.1109/icassp48485.2024.10448060

Amit Meghanani, Thomas Hain

There is a growing interest in cost-effective self-supervised fine-tuning (SSFT) of self-supervised learning (SSL)-based speech models to obtain task-specific representations. These task-specific representations are used for robust performance on various downstream tasks by fine-tuning on the labelled data. This work presents a cost-effective SSFT method named Self-supervised Correspondence (SCORE) fine-tuning to adapt the SSL speech representations for content-related tasks. The proposed method uses a correspondence training strategy, aiming to learn similar representations from perturbed speech and original speech. Commonly used data augmentation techniques for content-related tasks (ASR) are applied to obtain perturbed speech. SCORE fine-tuned HuBERT outperforms the vanilla HuBERT on SUPERB benchmark with only a few hours of fine-tuning (<5 hrs) on a single GPU for automatic speech recognition, phoneme recognition, and query-by-example tasks, with relative improvements of 1.09%, 3.58%, and 12.65%, respectively. SCORE provides competitive results with the recently proposed SSFT method SPIN, using only 1/3 of the processed speech compared to SPIN.

人们对基于自我监督学习（SSL）的语音模型进行经济有效的自我监督微调（SSFT）以获得特定任务表示越来越感兴趣。通过对标注数据进行微调，这些特定任务表示法可在各种下游任务中发挥强大的性能。本研究提出了一种名为 "自监督对应"（SCORE）微调的高性价比 SSFT 方法，用于调整 SSL 语音表征以适应与内容相关的任务。该方法采用对应训练策略，旨在从扰动语音和原始语音中学习相似的表征。内容相关任务（ASR）中常用的数据增强技术被用于获取扰动语音。在 SUPERB 基准测试中，SCORE 微调后的 HuBERT 在自动语音识别、音素识别和逐例查询任务方面的表现优于 vanilla HuBERT，在单 GPU 上只需几个小时（小于 5 小时）的微调，相对改进幅度分别为 1.09%、3.58% 和 12.65%。与最近提出的 SSFT 方法 SPIN 相比，SCORE 仅使用了处理后语音的 1/3，其结果具有竞争力。

引用次数: 1

LLMs Still Can't Avoid Instanceof: An Investigation Into GPT-3.5, GPT-4 and Bard's Capacity to Handle Object-Oriented Programming Assignments LLMs 仍然无法避免 Instanceof：调查 GPT-3.5、GPT-4 和 Bard 处理面向对象编程作业的能力

ArXiv

Pub Date : 2024-03-10 DOI: 10.1145/3639474.3640052

Bruno Pereira Cipriano, P. Alves

Large Language Models (LLMs) have emerged as promising tools to assist students while solving programming assignments. However, object-oriented programming (OOP), with its inherent complexity involving the identification of entities, relationships, and responsibilities, is not yet mastered by these tools. Contrary to introductory programming exercises, there exists a research gap with regard to the behavior of LLMs in OOP contexts. In this study, we experimented with three prominent LLMs - GPT-3.5, GPT-4, and Bard - to solve real-world OOP exercises used in educational settings, subsequently validating their solutions using an Automatic Assessment Tool (AAT). The findings revealed that while the models frequently achieved mostly working solutions to the exercises, they often overlooked the best practices of OOP. GPT-4 stood out as the most proficient, followed by GPT-3.5, with Bard trailing last. We advocate for a renewed emphasis on code quality when employing these models and explore the potential of pairing LLMs with AATs in pedagogical settings. In conclusion, while GPT-4 showcases promise, the deployment of these models in OOP education still mandates supervision.

大型语言模型（LLM）已成为协助学生完成编程作业的理想工具。然而，面向对象编程（OOP）因其固有的复杂性（涉及实体、关系和责任的识别），尚未被这些工具所掌握。与编程入门练习相反，关于 LLM 在 OOP 环境中的行为的研究还存在空白。在本研究中，我们试用了三种著名的 LLM--GPT-3.5、GPT-4 和 Bard--来解决教育环境中使用的实际 OOP 练习，随后使用自动评估工具 (AAT) 验证了它们的解决方案。研究结果表明，虽然这些模型通常都能为练习提供基本可行的解决方案，但它们往往忽略了 OOP 的最佳实践。GPT-4 是最熟练的，其次是 GPT-3.5，而 Bard 则排在最后。我们主张在使用这些模型时重新强调代码质量，并探索在教学环境中将 LLM 与 AAT 配对的可能性。总之，虽然 GPT-4 展示了前景，但在 OOP 教育中部署这些模型仍然需要监督。

{"title":"LLMs Still Can't Avoid Instanceof: An Investigation Into GPT-3.5, GPT-4 and Bard's Capacity to Handle Object-Oriented Programming Assignments","authors":"Bruno Pereira Cipriano, P. Alves","doi":"10.1145/3639474.3640052","DOIUrl":"https://doi.org/10.1145/3639474.3640052","url":null,"abstract":"Large Language Models (LLMs) have emerged as promising tools to assist students while solving programming assignments. However, object-oriented programming (OOP), with its inherent complexity involving the identification of entities, relationships, and responsibilities, is not yet mastered by these tools. Contrary to introductory programming exercises, there exists a research gap with regard to the behavior of LLMs in OOP contexts. In this study, we experimented with three prominent LLMs - GPT-3.5, GPT-4, and Bard - to solve real-world OOP exercises used in educational settings, subsequently validating their solutions using an Automatic Assessment Tool (AAT). The findings revealed that while the models frequently achieved mostly working solutions to the exercises, they often overlooked the best practices of OOP. GPT-4 stood out as the most proficient, followed by GPT-3.5, with Bard trailing last. We advocate for a renewed emphasis on code quality when employing these models and explore the potential of pairing LLMs with AATs in pedagogical settings. In conclusion, while GPT-4 showcases promise, the deployment of these models in OOP education still mandates supervision.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"20 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ArXiv

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀