首页 > 最新文献

arXiv - CS - Computation and Language最新文献

英文 中文
The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal CLC-UKET 数据集:英国就业法庭案件结果预测基准
Pub Date : 2024-09-12 DOI: arxiv-2409.08098
Huiyuan Xie, Felix Steffek, Joana Ribeiro de Faria, Christine Carter, Jonathan Rutherford
This paper explores the intersection of technological innovation and accessto justice by developing a benchmark for predicting case outcomes in the UKEmployment Tribunal (UKET). To address the challenge of extensive manualannotation, the study employs a large language model (LLM) for automaticannotation, resulting in the creation of the CLC-UKET dataset. The datasetconsists of approximately 19,000 UKET cases and their metadata. Comprehensivelegal annotations cover facts, claims, precedent references, statutoryreferences, case outcomes, reasons and jurisdiction codes. Facilitated by theCLC-UKET data, we examine a multi-class case outcome prediction task in theUKET. Human predictions are collected to establish a performance reference formodel comparison. Empirical results from baseline models indicate thatfinetuned transformer models outperform zero-shot and few-shot LLMs on the UKETprediction task. The performance of zero-shot LLMs can be enhanced byintegrating task-related information into few-shot examples. We hope that theCLC-UKET dataset, along with human annotations and empirical findings, canserve as a valuable benchmark for employment-related dispute resolution.
本文通过开发英国就业法庭(UKET)案件结果预测基准,探讨了技术创新与诉诸司法的交叉点。为了应对大量人工标注的挑战,本研究采用了大型语言模型(LLM)进行自动标注,从而创建了 CLC-UKET 数据集。该数据集包含约 19,000 个 UKET 案例及其元数据。全面的法律注释涵盖事实、诉求、先例参考、法规参考、案件结果、理由和管辖区代码。在 CLC-UKET 数据的帮助下,我们研究了 UKET 中的多类案件结果预测任务。我们收集了人工预测结果,以建立模型比较的性能参考。基线模型的实证结果表明,在 UKET 预测任务中,经过inetuned transformer 模型的表现优于零次和少量 LLM。通过将与任务相关的信息整合到少数几个实例中,可以提高零镜头 LLM 的性能。我们希望CLC-UKET数据集以及人类注释和实证研究结果能够成为就业相关争议解决的宝贵基准。
{"title":"The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal","authors":"Huiyuan Xie, Felix Steffek, Joana Ribeiro de Faria, Christine Carter, Jonathan Rutherford","doi":"arxiv-2409.08098","DOIUrl":"https://doi.org/arxiv-2409.08098","url":null,"abstract":"This paper explores the intersection of technological innovation and access\u0000to justice by developing a benchmark for predicting case outcomes in the UK\u0000Employment Tribunal (UKET). To address the challenge of extensive manual\u0000annotation, the study employs a large language model (LLM) for automatic\u0000annotation, resulting in the creation of the CLC-UKET dataset. The dataset\u0000consists of approximately 19,000 UKET cases and their metadata. Comprehensive\u0000legal annotations cover facts, claims, precedent references, statutory\u0000references, case outcomes, reasons and jurisdiction codes. Facilitated by the\u0000CLC-UKET data, we examine a multi-class case outcome prediction task in the\u0000UKET. Human predictions are collected to establish a performance reference for\u0000model comparison. Empirical results from baseline models indicate that\u0000finetuned transformer models outperform zero-shot and few-shot LLMs on the UKET\u0000prediction task. The performance of zero-shot LLMs can be enhanced by\u0000integrating task-related information into few-shot examples. We hope that the\u0000CLC-UKET dataset, along with human annotations and empirical findings, can\u0000serve as a valuable benchmark for employment-related dispute resolution.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TravelAgent: An AI Assistant for Personalized Travel Planning TravelAgent:个性化旅行规划的人工智能助手
Pub Date : 2024-09-12 DOI: arxiv-2409.08069
Aili Chen, Xuyang Ge, Ziquan Fu, Yanghua Xiao, Jiangjie Chen
As global tourism expands and artificial intelligence technology advances,intelligent travel planning services have emerged as a significant researchfocus. Within dynamic real-world travel scenarios with multi-dimensionalconstraints, services that support users in automatically creating practicaland customized travel itineraries must address three key objectives:Rationality, Comprehensiveness, and Personalization. However, existing systemswith rule-based combinations or LLM-based planning methods struggle to fullysatisfy these criteria. To overcome the challenges, we introduce TravelAgent, atravel planning system powered by large language models (LLMs) designed toprovide reasonable, comprehensive, and personalized travel itineraries groundedin dynamic scenarios. TravelAgent comprises four modules: Tool-usage,Recommendation, Planning, and Memory Module. We evaluate TravelAgent'sperformance with human and simulated users, demonstrating its overalleffectiveness in three criteria and confirming the accuracy of personalizedrecommendations.
随着全球旅游业的发展和人工智能技术的进步,智能旅行规划服务已成为一个重要的研究焦点。在具有多维约束条件的动态真实世界旅行场景中,支持用户自动创建实用和定制化旅行路线的服务必须满足三个关键目标:合理性、全面性和个性化。然而,现有系统中基于规则的组合或基于 LLM 的规划方法很难完全满足这些标准。为了克服这些挑战,我们推出了 TravelAgent,一个由大型语言模型(LLM)驱动的旅行规划系统,旨在提供基于动态场景的合理、全面和个性化的旅行路线。TravelAgent 包括四个模块:工具使用模块、推荐模块、规划模块和记忆模块。我们通过人类用户和模拟用户对 TravelAgent 的性能进行了评估,证明了它在三个标准上的超强功效,并证实了个性化推荐的准确性。
{"title":"TravelAgent: An AI Assistant for Personalized Travel Planning","authors":"Aili Chen, Xuyang Ge, Ziquan Fu, Yanghua Xiao, Jiangjie Chen","doi":"arxiv-2409.08069","DOIUrl":"https://doi.org/arxiv-2409.08069","url":null,"abstract":"As global tourism expands and artificial intelligence technology advances,\u0000intelligent travel planning services have emerged as a significant research\u0000focus. Within dynamic real-world travel scenarios with multi-dimensional\u0000constraints, services that support users in automatically creating practical\u0000and customized travel itineraries must address three key objectives:\u0000Rationality, Comprehensiveness, and Personalization. However, existing systems\u0000with rule-based combinations or LLM-based planning methods struggle to fully\u0000satisfy these criteria. To overcome the challenges, we introduce TravelAgent, a\u0000travel planning system powered by large language models (LLMs) designed to\u0000provide reasonable, comprehensive, and personalized travel itineraries grounded\u0000in dynamic scenarios. TravelAgent comprises four modules: Tool-usage,\u0000Recommendation, Planning, and Memory Module. We evaluate TravelAgent's\u0000performance with human and simulated users, demonstrating its overall\u0000effectiveness in three criteria and confirming the accuracy of personalized\u0000recommendations.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Source2Synth:基于真实数据源的合成数据生成和整理
Pub Date : 2024-09-12 DOI: arxiv-2409.08239
Alisia Lupidi, Carlos Gemmell, Nicola Cancedda, Jane Dwivedi-Yu, Jason Weston, Jakob Foerster, Roberta Raileanu, Maria Lomeli
Large Language Models still struggle in challenging scenarios that leveragestructured data, complex reasoning, or tool usage. In this paper, we proposeSource2Synth: a new method that can be used for teaching LLMs new skillswithout relying on costly human annotations. Source2Synth takes as input acustom data source and produces synthetic data points with intermediatereasoning steps grounded in real-world sources. Source2Synth improves thedataset quality by discarding low-quality generations based on theiranswerability. We demonstrate the generality of this approach by applying it totwo challenging domains: we test reasoning abilities in multi-hop questionanswering (MHQA), and tool usage in tabular question answering (TQA). Ourmethod improves performance by 25.51% for TQA on WikiSQL and 22.57% for MHQA onHotPotQA compared to the fine-tuned baselines.
大型语言模型在利用结构化数据、复杂推理或工具使用等具有挑战性的场景中仍然举步维艰。在本文中,我们提出了 Source2Synth:一种可用于教授大型语言模型新技能的新方法,无需依赖昂贵的人工注释。Source2Synth 将自定义数据源作为输入,并根据真实世界的数据源生成具有中间推理步骤的合成数据点。Source2Synth 会根据可回答性丢弃低质量的数据,从而提高数据集的质量。我们将这种方法应用于两个具有挑战性的领域,证明了它的通用性:我们测试了多跳问题解答(MHQA)中的推理能力,以及表格问题解答(TQA)中的工具使用情况。与微调基线相比,我们的方法在 WikiSQL 上的 TQA 性能提高了 25.51%,在 HotPotQA 上的 MHQA 性能提高了 22.57%。
{"title":"Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources","authors":"Alisia Lupidi, Carlos Gemmell, Nicola Cancedda, Jane Dwivedi-Yu, Jason Weston, Jakob Foerster, Roberta Raileanu, Maria Lomeli","doi":"arxiv-2409.08239","DOIUrl":"https://doi.org/arxiv-2409.08239","url":null,"abstract":"Large Language Models still struggle in challenging scenarios that leverage\u0000structured data, complex reasoning, or tool usage. In this paper, we propose\u0000Source2Synth: a new method that can be used for teaching LLMs new skills\u0000without relying on costly human annotations. Source2Synth takes as input a\u0000custom data source and produces synthetic data points with intermediate\u0000reasoning steps grounded in real-world sources. Source2Synth improves the\u0000dataset quality by discarding low-quality generations based on their\u0000answerability. We demonstrate the generality of this approach by applying it to\u0000two challenging domains: we test reasoning abilities in multi-hop question\u0000answering (MHQA), and tool usage in tabular question answering (TQA). Our\u0000method improves performance by 25.51% for TQA on WikiSQL and 22.57% for MHQA on\u0000HotPotQA compared to the fine-tuned baselines.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supporting Online Discussions: Integrating AI Into the adhocracy+ Participation Platform To Enhance Deliberation 支持在线讨论:将人工智能融入adhocracy+参与平台,以提高议事能力
Pub Date : 2024-09-12 DOI: arxiv-2409.07780
Maike Behrendt, Stefan Sylvius Wagner, Stefan Harmeling
Online spaces allow people to discuss important issues and make jointdecisions, regardless of their location or time zone. However, without propersupport and thoughtful design, these discussions often lack structure andpoliteness during the exchanges of opinions. Artificial intelligence (AI)represents an opportunity to support both participants and organizers oflarge-scale online participation processes. In this paper, we present anextension of adhocracy+, a large-scale open source participation platform, thatprovides two additional debate modules that are supported by AI to enhance thediscussion quality and participant interaction.
在线空间使人们能够讨论重要问题并做出共同决定,而不受地点或时区的限制。然而,如果没有适当的支持和周到的设计,这些讨论在交换意见时往往缺乏条理和礼貌。人工智能(AI)为大规模在线参与过程的参与者和组织者提供了支持。在本文中,我们介绍了大型开源参与平台 adhocracy+ 的一个扩展版本,该版本提供了两个额外的辩论模块,在人工智能的支持下提高了讨论质量和参与者的互动性。
{"title":"Supporting Online Discussions: Integrating AI Into the adhocracy+ Participation Platform To Enhance Deliberation","authors":"Maike Behrendt, Stefan Sylvius Wagner, Stefan Harmeling","doi":"arxiv-2409.07780","DOIUrl":"https://doi.org/arxiv-2409.07780","url":null,"abstract":"Online spaces allow people to discuss important issues and make joint\u0000decisions, regardless of their location or time zone. However, without proper\u0000support and thoughtful design, these discussions often lack structure and\u0000politeness during the exchanges of opinions. Artificial intelligence (AI)\u0000represents an opportunity to support both participants and organizers of\u0000large-scale online participation processes. In this paper, we present an\u0000extension of adhocracy+, a large-scale open source participation platform, that\u0000provides two additional debate modules that are supported by AI to enhance the\u0000discussion quality and participant interaction.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Unsupervised Dialogue Topic Segmentation Model Based on Utterance Rewriting 基于语句重写的无监督对话主题分割模型
Pub Date : 2024-09-12 DOI: arxiv-2409.07672
Xia Hou, Qifeng Li, Tongliang Li
Dialogue topic segmentation plays a crucial role in various types of dialoguemodeling tasks. The state-of-the-art unsupervised DTS methods learn topic-awarediscourse representations from conversation data through adjacent discoursematching and pseudo segmentation to further mine useful clues in unlabeledconversational relations. However, in multi-round dialogs, discourses oftenhave co-references or omissions, leading to the fact that direct use of thesediscourses for representation learning may negatively affect the semanticsimilarity computation in the neighboring discourse matching task. In order tofully utilize the useful cues in conversational relations, this study proposesa novel unsupervised dialog topic segmentation method that combines theUtterance Rewriting (UR) technique with an unsupervised learning algorithm toefficiently utilize the useful cues in unlabeled dialogs by rewriting thedialogs in order to recover the co-referents and omitted words. Compared withexisting unsupervised models, the proposed Discourse Rewriting TopicSegmentation Model (UR-DTS) significantly improves the accuracy of topicsegmentation. The main finding is that the performance on DialSeg711 improvesby about 6% in terms of absolute error score and WD, achieving 11.42% in termsof absolute error score and 12.97% in terms of WD. on Doc2Dial the absoluteerror score and WD improves by about 3% and 2%, respectively, resulting in SOTAreaching 35.17% in terms of absolute error score and 38.49% in terms of WD.This shows that the model is very effective in capturing the nuances ofconversational topics, as well as the usefulness and challenges of utilizingunlabeled conversations.
对话主题分割在各类对话建模任务中发挥着至关重要的作用。最先进的无监督 DTS 方法通过相邻话语匹配和伪分段,从对话数据中学习话题分段话语表征,从而进一步挖掘未标记对话关系中的有用线索。然而,在多轮对话中,话语经常会有共指或遗漏,导致直接使用这些话语进行表征学习可能会对相邻话语匹配任务中的语义相似性计算产生负面影响。为了充分利用对话关系中的有用线索,本研究提出了一种新型的无监督对话主题分割方法,该方法结合了语篇重写(UR)技术和无监督学习算法,通过重写对话来恢复共指词和遗漏词,从而有效地利用未标记对话中的有用线索。与现有的无监督模型相比,所提出的话语重写主题分割模型(UR-DTS)显著提高了主题分割的准确性。主要发现是,在 DialSeg711 上,绝对错误分值和 WD 的性能提高了约 6%,绝对错误分值提高了 11.42%,WD 提高了 12.97%;在 Doc2Dial 上,绝对错误分值和 WD 分别提高了约 3% 和 2%,SOTA 达到 35.这表明该模型在捕捉对话主题的细微差别以及利用无标记对话的实用性和挑战方面非常有效。
{"title":"An Unsupervised Dialogue Topic Segmentation Model Based on Utterance Rewriting","authors":"Xia Hou, Qifeng Li, Tongliang Li","doi":"arxiv-2409.07672","DOIUrl":"https://doi.org/arxiv-2409.07672","url":null,"abstract":"Dialogue topic segmentation plays a crucial role in various types of dialogue\u0000modeling tasks. The state-of-the-art unsupervised DTS methods learn topic-aware\u0000discourse representations from conversation data through adjacent discourse\u0000matching and pseudo segmentation to further mine useful clues in unlabeled\u0000conversational relations. However, in multi-round dialogs, discourses often\u0000have co-references or omissions, leading to the fact that direct use of these\u0000discourses for representation learning may negatively affect the semantic\u0000similarity computation in the neighboring discourse matching task. In order to\u0000fully utilize the useful cues in conversational relations, this study proposes\u0000a novel unsupervised dialog topic segmentation method that combines the\u0000Utterance Rewriting (UR) technique with an unsupervised learning algorithm to\u0000efficiently utilize the useful cues in unlabeled dialogs by rewriting the\u0000dialogs in order to recover the co-referents and omitted words. Compared with\u0000existing unsupervised models, the proposed Discourse Rewriting Topic\u0000Segmentation Model (UR-DTS) significantly improves the accuracy of topic\u0000segmentation. The main finding is that the performance on DialSeg711 improves\u0000by about 6% in terms of absolute error score and WD, achieving 11.42% in terms\u0000of absolute error score and 12.97% in terms of WD. on Doc2Dial the absolute\u0000error score and WD improves by about 3% and 2%, respectively, resulting in SOTA\u0000reaching 35.17% in terms of absolute error score and 38.49% in terms of WD.\u0000This shows that the model is very effective in capturing the nuances of\u0000conversational topics, as well as the usefulness and challenges of utilizing\u0000unlabeled conversations.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-object event graph representation learning for Video Question Answering 用于视频问题解答的多对象事件图表示学习
Pub Date : 2024-09-12 DOI: arxiv-2409.07747
Yanan Wang, Shuichiro Haruta, Donghuo Zeng, Julio Vizcarra, Mori Kurokawa
Video question answering (VideoQA) is a task to predict the correct answer toquestions posed about a given video. The system must comprehend spatial andtemporal relationships among objects extracted from videos to perform causaland temporal reasoning. While prior works have focused on modeling individualobject movements using transformer-based methods, they falter when capturingcomplex scenarios involving multiple objects (e.g., "a boy is throwing a ballin a hoop"). We propose a contrastive language event graph representationlearning method called CLanG to address this limitation. Aiming to captureevent representations associated with multiple objects, our method employs amulti-layer GNN-cluster module for adversarial graph representation learning,enabling contrastive learning between the question text and its relevantmulti-object event graph. Our method outperforms a strong baseline, achievingup to 2.2% higher accuracy on two challenging VideoQA datasets, NExT-QA andTGIF-QA-R. In particular, it is 2.8% better than baselines in handling causaland temporal questions, highlighting its strength in reasoning multipleobject-based events.
视频问题解答(VideoQA)是一项预测给定视频问题正确答案的任务。系统必须理解从视频中提取的物体之间的空间和时间关系,以执行因果和时间推理。之前的研究主要是使用基于变换器的方法对单个物体的运动进行建模,但在捕捉涉及多个物体的复杂场景时(如 "一个男孩正在把球扔进一个篮圈"),这些方法就显得力不从心了。针对这一局限,我们提出了一种名为 CLanG 的对比语言事件图表征学习方法。为了捕捉与多个对象相关的事件表征,我们的方法采用了多层 GNN 簇模块进行对抗图表征学习,实现了问题文本与其相关的多对象事件图之间的对比学习。我们的方法优于强大的基线,在两个具有挑战性的视频质量保证数据集 NExT-QA 和 TGIF-QA-R 上的准确率最高提高了 2.2%。特别是在处理因果和时间问题上,它比基线方法高出 2.8%,突出了它在推理多对象事件方面的优势。
{"title":"Multi-object event graph representation learning for Video Question Answering","authors":"Yanan Wang, Shuichiro Haruta, Donghuo Zeng, Julio Vizcarra, Mori Kurokawa","doi":"arxiv-2409.07747","DOIUrl":"https://doi.org/arxiv-2409.07747","url":null,"abstract":"Video question answering (VideoQA) is a task to predict the correct answer to\u0000questions posed about a given video. The system must comprehend spatial and\u0000temporal relationships among objects extracted from videos to perform causal\u0000and temporal reasoning. While prior works have focused on modeling individual\u0000object movements using transformer-based methods, they falter when capturing\u0000complex scenarios involving multiple objects (e.g., \"a boy is throwing a ball\u0000in a hoop\"). We propose a contrastive language event graph representation\u0000learning method called CLanG to address this limitation. Aiming to capture\u0000event representations associated with multiple objects, our method employs a\u0000multi-layer GNN-cluster module for adversarial graph representation learning,\u0000enabling contrastive learning between the question text and its relevant\u0000multi-object event graph. Our method outperforms a strong baseline, achieving\u0000up to 2.2% higher accuracy on two challenging VideoQA datasets, NExT-QA and\u0000TGIF-QA-R. In particular, it is 2.8% better than baselines in handling causal\u0000and temporal questions, highlighting its strength in reasoning multiple\u0000object-based events.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stable Language Model Pre-training by Reducing Embedding Variability 通过减少嵌入变异性实现稳定的语言模型预训练
Pub Date : 2024-09-12 DOI: arxiv-2409.07787
Woojin Chung, Jiwoo Hong, Na Min An, James Thorne, Se-Young Yun
Stable pre-training is essential for achieving better-performing languagemodels. However, tracking pre-training stability by calculating gradientvariance at every step is impractical due to the significant computationalcosts. We explore Token Embedding Variability (TEV) as a simple and efficientproxy for assessing pre-training stability in language models with pre-layernormalization, given that shallower layers are more prone to gradient explosion(section 2.2). Moreover, we propose Multi-head Low-Rank Attention (MLRA) as anarchitecture to alleviate such instability by limiting the exponential growthof output embedding variance, thereby preventing the gradient explosion(section 3.2). Empirical results on GPT-2 with MLRA demonstrate increasedstability and lower perplexity, particularly in deeper models.
稳定的预训练对于获得性能更好的语言模型至关重要。然而,通过计算每一步的梯度方差来跟踪预训练的稳定性是不切实际的,因为计算成本很高。考虑到较浅的层更容易发生梯度爆炸(2.2 节),我们探索了代词嵌入变异性(TEV),将其作为评估预分层规范化语言模型预训练稳定性的一种简单而有效的代理方法。此外,我们还提出了多头低阶注意力(MLRA)架构,通过限制输出嵌入方差的指数增长来缓解这种不稳定性,从而防止梯度爆炸(3.2 节)。使用 MLRA 的 GPT-2 的实证结果表明,稳定性提高了,复杂度降低了,尤其是在更深的模型中。
{"title":"Stable Language Model Pre-training by Reducing Embedding Variability","authors":"Woojin Chung, Jiwoo Hong, Na Min An, James Thorne, Se-Young Yun","doi":"arxiv-2409.07787","DOIUrl":"https://doi.org/arxiv-2409.07787","url":null,"abstract":"Stable pre-training is essential for achieving better-performing language\u0000models. However, tracking pre-training stability by calculating gradient\u0000variance at every step is impractical due to the significant computational\u0000costs. We explore Token Embedding Variability (TEV) as a simple and efficient\u0000proxy for assessing pre-training stability in language models with pre-layer\u0000normalization, given that shallower layers are more prone to gradient explosion\u0000(section 2.2). Moreover, we propose Multi-head Low-Rank Attention (MLRA) as an\u0000architecture to alleviate such instability by limiting the exponential growth\u0000of output embedding variance, thereby preventing the gradient explosion\u0000(section 3.2). Empirical results on GPT-2 with MLRA demonstrate increased\u0000stability and lower perplexity, particularly in deeper models.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Role of Context in Reading Time Prediction 论语境在阅读时间预测中的作用
Pub Date : 2024-09-12 DOI: arxiv-2409.08160
Andreas Opedal, Eleanor Chodroff, Ryan Cotterell, Ethan Gotlieb Wilcox
We present a new perspective on how readers integrate context duringreal-time language comprehension. Our proposals build on surprisal theory,which posits that the processing effort of a linguistic unit (e.g., a word) isan affine function of its in-context information content. We first observe thatsurprisal is only one out of many potential ways that a contextual predictorcan be derived from a language model. Another one is the pointwise mutualinformation (PMI) between a unit and its context, which turns out to yield thesame predictive power as surprisal when controlling for unigram frequency.Moreover, both PMI and surprisal are correlated with frequency. This means thatneither PMI nor surprisal contains information about context alone. In responseto this, we propose a technique where we project surprisal onto the orthogonalcomplement of frequency, yielding a new contextual predictor that isuncorrelated with frequency. Our experiments show that the proportion ofvariance in reading times explained by context is a lot smaller when context isrepresented by the orthogonalized predictor. From an interpretabilitystandpoint, this indicates that previous studies may have overstated the rolethat context has in predicting reading times.
我们从一个新的角度来探讨读者在实时语言理解过程中如何整合语境。我们的建议建立在惊奇理论(surprisal theory)的基础上,该理论认为语言单位(如单词)的处理难度是其上下文信息内容的仿函数。我们首先发现,惊喜理论只是从语言模型中推导出语境预测器的众多潜在方法之一。另一种方法是单位与其上下文之间的点式互信息(PMI),在控制单字节频率的情况下,其预测能力与惊奇值相同。这意味着 PMI 和 surpriseisal 都不只包含上下文的信息。针对这种情况,我们提出了一种技术,即把惊奇值投射到频率的正交互补上,从而得到一个与频率无关的新的语境预测因子。我们的实验表明,当上下文由正交化预测因子代表时,上下文所解释的阅读时间差异比例要小得多。从可解释性的角度来看,这表明以前的研究可能夸大了语境在预测阅读时间中的作用。
{"title":"On the Role of Context in Reading Time Prediction","authors":"Andreas Opedal, Eleanor Chodroff, Ryan Cotterell, Ethan Gotlieb Wilcox","doi":"arxiv-2409.08160","DOIUrl":"https://doi.org/arxiv-2409.08160","url":null,"abstract":"We present a new perspective on how readers integrate context during\u0000real-time language comprehension. Our proposals build on surprisal theory,\u0000which posits that the processing effort of a linguistic unit (e.g., a word) is\u0000an affine function of its in-context information content. We first observe that\u0000surprisal is only one out of many potential ways that a contextual predictor\u0000can be derived from a language model. Another one is the pointwise mutual\u0000information (PMI) between a unit and its context, which turns out to yield the\u0000same predictive power as surprisal when controlling for unigram frequency.\u0000Moreover, both PMI and surprisal are correlated with frequency. This means that\u0000neither PMI nor surprisal contains information about context alone. In response\u0000to this, we propose a technique where we project surprisal onto the orthogonal\u0000complement of frequency, yielding a new contextual predictor that is\u0000uncorrelated with frequency. Our experiments show that the proportion of\u0000variance in reading times explained by context is a lot smaller when context is\u0000represented by the orthogonalized predictor. From an interpretability\u0000standpoint, this indicates that previous studies may have overstated the role\u0000that context has in predicting reading times.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Rules from KGs Guided by Language Models 在语言模型指导下从幼稚园学习规则
Pub Date : 2024-09-12 DOI: arxiv-2409.07869
Zihang Peng, Daria Stepanova, Vinh Thinh Ho, Heike Adel, Alessandra Russo, Simon Ott
Advances in information extraction have enabled the automatic construction oflarge knowledge graphs (e.g., Yago, Wikidata or Google KG), which are widelyused in many applications like semantic search or data analytics. However, dueto their semi-automatic construction, KGs are often incomplete. Rule learningmethods, concerned with the extraction of frequent patterns from KGs andcasting them into rules, can be applied to predict potentially missing facts. Acrucial step in this process is rule ranking. Ranking of rules is especiallychallenging over highly incomplete or biased KGs (e.g., KGs predominantlystoring facts about famous people), as in this case biased rules might fit thedata best and be ranked at the top based on standard statistical metrics likerule confidence. To address this issue, prior works proposed to rank rules notonly relying on the original KG but also facts predicted by a KG embeddingmodel. At the same time, with the recent rise of Language Models (LMs), severalworks have claimed that LMs can be used as alternative means for KG completion.In this work, our goal is to verify to which extent the exploitation of LMs ishelpful for improving the quality of rule learning systems.
信息提取技术的进步使得自动构建大型知识图谱(如 Yago、Wikidata 或 Google KG)成为可能,这些图谱在语义搜索或数据分析等许多应用中得到了广泛应用。然而,由于是半自动构建,知识图谱往往是不完整的。规则学习方法涉及从 KG 中提取频繁模式并将其转化为规则,可用于预测可能缺失的事实。这一过程的关键步骤是规则排序。规则排序对于高度不完整或有偏见的 KG(例如主要存储名人事实的 KG)尤其具有挑战性,因为在这种情况下,有偏见的规则可能最适合数据,并根据标准统计指标(如规则置信度)被排在最前面。为了解决这个问题,之前的研究提出不仅要根据原始 KG,还要根据 KG 嵌入模型预测的事实对规则进行排序。与此同时,随着语言模型(LMs)的兴起,一些工作声称 LMs 可以作为完成 KG 的替代手段。在这项工作中,我们的目标是验证利用 LMs 在多大程度上有助于提高规则学习系统的质量。
{"title":"Learning Rules from KGs Guided by Language Models","authors":"Zihang Peng, Daria Stepanova, Vinh Thinh Ho, Heike Adel, Alessandra Russo, Simon Ott","doi":"arxiv-2409.07869","DOIUrl":"https://doi.org/arxiv-2409.07869","url":null,"abstract":"Advances in information extraction have enabled the automatic construction of\u0000large knowledge graphs (e.g., Yago, Wikidata or Google KG), which are widely\u0000used in many applications like semantic search or data analytics. However, due\u0000to their semi-automatic construction, KGs are often incomplete. Rule learning\u0000methods, concerned with the extraction of frequent patterns from KGs and\u0000casting them into rules, can be applied to predict potentially missing facts. A\u0000crucial step in this process is rule ranking. Ranking of rules is especially\u0000challenging over highly incomplete or biased KGs (e.g., KGs predominantly\u0000storing facts about famous people), as in this case biased rules might fit the\u0000data best and be ranked at the top based on standard statistical metrics like\u0000rule confidence. To address this issue, prior works proposed to rank rules not\u0000only relying on the original KG but also facts predicted by a KG embedding\u0000model. At the same time, with the recent rise of Language Models (LMs), several\u0000works have claimed that LMs can be used as alternative means for KG completion.\u0000In this work, our goal is to verify to which extent the exploitation of LMs is\u0000helpful for improving the quality of rule learning systems.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WhisperNER: Unified Open Named Entity and Speech Recognition WhisperNER:统一开放式命名实体和语音识别
Pub Date : 2024-09-12 DOI: arxiv-2409.08107
Gil Ayache, Menachem Pirchi, Aviv Navon, Aviv Shamsian, Gill Hetz, Joseph Keshet
Integrating named entity recognition (NER) with automatic speech recognition(ASR) can significantly enhance transcription accuracy and informativeness. Inthis paper, we introduce WhisperNER, a novel model that allows joint speechtranscription and entity recognition. WhisperNER supports open-type NER,enabling recognition of diverse and evolving entities at inference. Building onrecent advancements in open NER research, we augment a large synthetic datasetwith synthetic speech samples. This allows us to train WhisperNER on a largenumber of examples with diverse NER tags. During training, the model isprompted with NER labels and optimized to output the transcribed utterancealong with the corresponding tagged entities. To evaluate WhisperNER, wegenerate synthetic speech for commonly used NER benchmarks and annotateexisting ASR datasets with open NER tags. Our experiments demonstrate thatWhisperNER outperforms natural baselines on both out-of-domain open type NERand supervised finetuning.
将命名实体识别(NER)与自动语音识别(ASR)相结合,可以大大提高转录的准确性和信息量。在本文中,我们介绍了 WhisperNER,这是一种新型模型,可实现语音转录和实体识别的联合。WhisperNER 支持开放式 NER,可在推理时识别多样化和不断发展的实体。基于开放式 NER 研究的最新进展,我们用合成语音样本增强了一个大型合成数据集。这样,我们就能在大量带有不同 NER 标记的示例上训练 WhisperNER。在训练过程中,模型会受到 NER 标签的提示,并经过优化以输出转录语句和相应的标记实体。为了评估 WhisperNER,我们为常用的 NER 基准生成了合成语音,并为现有的 ASR 数据集标注了开放的 NER 标记。实验证明,WhisperNER 在域外开放式 NER 和监督微调方面的表现都优于自然基准。
{"title":"WhisperNER: Unified Open Named Entity and Speech Recognition","authors":"Gil Ayache, Menachem Pirchi, Aviv Navon, Aviv Shamsian, Gill Hetz, Joseph Keshet","doi":"arxiv-2409.08107","DOIUrl":"https://doi.org/arxiv-2409.08107","url":null,"abstract":"Integrating named entity recognition (NER) with automatic speech recognition\u0000(ASR) can significantly enhance transcription accuracy and informativeness. In\u0000this paper, we introduce WhisperNER, a novel model that allows joint speech\u0000transcription and entity recognition. WhisperNER supports open-type NER,\u0000enabling recognition of diverse and evolving entities at inference. Building on\u0000recent advancements in open NER research, we augment a large synthetic dataset\u0000with synthetic speech samples. This allows us to train WhisperNER on a large\u0000number of examples with diverse NER tags. During training, the model is\u0000prompted with NER labels and optimized to output the transcribed utterance\u0000along with the corresponding tagged entities. To evaluate WhisperNER, we\u0000generate synthetic speech for commonly used NER benchmarks and annotate\u0000existing ASR datasets with open NER tags. Our experiments demonstrate that\u0000WhisperNER outperforms natural baselines on both out-of-domain open type NER\u0000and supervised finetuning.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Computation and Language
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1