Huiyuan Xie, Felix Steffek, Joana Ribeiro de Faria, Christine Carter, Jonathan Rutherford
This paper explores the intersection of technological innovation and access to justice by developing a benchmark for predicting case outcomes in the UK Employment Tribunal (UKET). To address the challenge of extensive manual annotation, the study employs a large language model (LLM) for automatic annotation, resulting in the creation of the CLC-UKET dataset. The dataset consists of approximately 19,000 UKET cases and their metadata. Comprehensive legal annotations cover facts, claims, precedent references, statutory references, case outcomes, reasons and jurisdiction codes. Facilitated by the CLC-UKET data, we examine a multi-class case outcome prediction task in the UKET. Human predictions are collected to establish a performance reference for model comparison. Empirical results from baseline models indicate that finetuned transformer models outperform zero-shot and few-shot LLMs on the UKET prediction task. The performance of zero-shot LLMs can be enhanced by integrating task-related information into few-shot examples. We hope that the CLC-UKET dataset, along with human annotations and empirical findings, can serve as a valuable benchmark for employment-related dispute resolution.
{"title":"The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal","authors":"Huiyuan Xie, Felix Steffek, Joana Ribeiro de Faria, Christine Carter, Jonathan Rutherford","doi":"arxiv-2409.08098","DOIUrl":"https://doi.org/arxiv-2409.08098","url":null,"abstract":"This paper explores the intersection of technological innovation and access\u0000to justice by developing a benchmark for predicting case outcomes in the UK\u0000Employment Tribunal (UKET). To address the challenge of extensive manual\u0000annotation, the study employs a large language model (LLM) for automatic\u0000annotation, resulting in the creation of the CLC-UKET dataset. The dataset\u0000consists of approximately 19,000 UKET cases and their metadata. Comprehensive\u0000legal annotations cover facts, claims, precedent references, statutory\u0000references, case outcomes, reasons and jurisdiction codes. Facilitated by the\u0000CLC-UKET data, we examine a multi-class case outcome prediction task in the\u0000UKET. Human predictions are collected to establish a performance reference for\u0000model comparison. Empirical results from baseline models indicate that\u0000finetuned transformer models outperform zero-shot and few-shot LLMs on the UKET\u0000prediction task. The performance of zero-shot LLMs can be enhanced by\u0000integrating task-related information into few-shot examples. We hope that the\u0000CLC-UKET dataset, along with human annotations and empirical findings, can\u0000serve as a valuable benchmark for employment-related dispute resolution.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"157 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As global tourism expands and artificial intelligence technology advances, intelligent travel planning services have emerged as a significant research focus. Within dynamic real-world travel scenarios with multi-dimensional constraints, services that support users in automatically creating practical and customized travel itineraries must address three key objectives: Rationality, Comprehensiveness, and Personalization. However, existing systems with rule-based combinations or LLM-based planning methods struggle to fully satisfy these criteria. To overcome the challenges, we introduce TravelAgent, a travel planning system powered by large language models (LLMs) designed to provide reasonable, comprehensive, and personalized travel itineraries grounded in dynamic scenarios. TravelAgent comprises four modules: Tool-usage, Recommendation, Planning, and Memory Module. We evaluate TravelAgent's performance with human and simulated users, demonstrating its overall effectiveness in three criteria and confirming the accuracy of personalized recommendations.
{"title":"TravelAgent: An AI Assistant for Personalized Travel Planning","authors":"Aili Chen, Xuyang Ge, Ziquan Fu, Yanghua Xiao, Jiangjie Chen","doi":"arxiv-2409.08069","DOIUrl":"https://doi.org/arxiv-2409.08069","url":null,"abstract":"As global tourism expands and artificial intelligence technology advances,\u0000intelligent travel planning services have emerged as a significant research\u0000focus. Within dynamic real-world travel scenarios with multi-dimensional\u0000constraints, services that support users in automatically creating practical\u0000and customized travel itineraries must address three key objectives:\u0000Rationality, Comprehensiveness, and Personalization. However, existing systems\u0000with rule-based combinations or LLM-based planning methods struggle to fully\u0000satisfy these criteria. To overcome the challenges, we introduce TravelAgent, a\u0000travel planning system powered by large language models (LLMs) designed to\u0000provide reasonable, comprehensive, and personalized travel itineraries grounded\u0000in dynamic scenarios. TravelAgent comprises four modules: Tool-usage,\u0000Recommendation, Planning, and Memory Module. We evaluate TravelAgent's\u0000performance with human and simulated users, demonstrating its overall\u0000effectiveness in three criteria and confirming the accuracy of personalized\u0000recommendations.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alisia Lupidi, Carlos Gemmell, Nicola Cancedda, Jane Dwivedi-Yu, Jason Weston, Jakob Foerster, Roberta Raileanu, Maria Lomeli
Large Language Models still struggle in challenging scenarios that leverage structured data, complex reasoning, or tool usage. In this paper, we propose Source2Synth: a new method that can be used for teaching LLMs new skills without relying on costly human annotations. Source2Synth takes as input a custom data source and produces synthetic data points with intermediate reasoning steps grounded in real-world sources. Source2Synth improves the dataset quality by discarding low-quality generations based on their answerability. We demonstrate the generality of this approach by applying it to two challenging domains: we test reasoning abilities in multi-hop question answering (MHQA), and tool usage in tabular question answering (TQA). Our method improves performance by 25.51% for TQA on WikiSQL and 22.57% for MHQA on HotPotQA compared to the fine-tuned baselines.
{"title":"Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources","authors":"Alisia Lupidi, Carlos Gemmell, Nicola Cancedda, Jane Dwivedi-Yu, Jason Weston, Jakob Foerster, Roberta Raileanu, Maria Lomeli","doi":"arxiv-2409.08239","DOIUrl":"https://doi.org/arxiv-2409.08239","url":null,"abstract":"Large Language Models still struggle in challenging scenarios that leverage\u0000structured data, complex reasoning, or tool usage. In this paper, we propose\u0000Source2Synth: a new method that can be used for teaching LLMs new skills\u0000without relying on costly human annotations. Source2Synth takes as input a\u0000custom data source and produces synthetic data points with intermediate\u0000reasoning steps grounded in real-world sources. Source2Synth improves the\u0000dataset quality by discarding low-quality generations based on their\u0000answerability. We demonstrate the generality of this approach by applying it to\u0000two challenging domains: we test reasoning abilities in multi-hop question\u0000answering (MHQA), and tool usage in tabular question answering (TQA). Our\u0000method improves performance by 25.51% for TQA on WikiSQL and 22.57% for MHQA on\u0000HotPotQA compared to the fine-tuned baselines.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maike Behrendt, Stefan Sylvius Wagner, Stefan Harmeling
Online spaces allow people to discuss important issues and make joint decisions, regardless of their location or time zone. However, without proper support and thoughtful design, these discussions often lack structure and politeness during the exchanges of opinions. Artificial intelligence (AI) represents an opportunity to support both participants and organizers of large-scale online participation processes. In this paper, we present an extension of adhocracy+, a large-scale open source participation platform, that provides two additional debate modules that are supported by AI to enhance the discussion quality and participant interaction.
{"title":"Supporting Online Discussions: Integrating AI Into the adhocracy+ Participation Platform To Enhance Deliberation","authors":"Maike Behrendt, Stefan Sylvius Wagner, Stefan Harmeling","doi":"arxiv-2409.07780","DOIUrl":"https://doi.org/arxiv-2409.07780","url":null,"abstract":"Online spaces allow people to discuss important issues and make joint\u0000decisions, regardless of their location or time zone. However, without proper\u0000support and thoughtful design, these discussions often lack structure and\u0000politeness during the exchanges of opinions. Artificial intelligence (AI)\u0000represents an opportunity to support both participants and organizers of\u0000large-scale online participation processes. In this paper, we present an\u0000extension of adhocracy+, a large-scale open source participation platform, that\u0000provides two additional debate modules that are supported by AI to enhance the\u0000discussion quality and participant interaction.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dialogue topic segmentation plays a crucial role in various types of dialogue modeling tasks. The state-of-the-art unsupervised DTS methods learn topic-aware discourse representations from conversation data through adjacent discourse matching and pseudo segmentation to further mine useful clues in unlabeled conversational relations. However, in multi-round dialogs, discourses often have co-references or omissions, leading to the fact that direct use of these discourses for representation learning may negatively affect the semantic similarity computation in the neighboring discourse matching task. In order to fully utilize the useful cues in conversational relations, this study proposes a novel unsupervised dialog topic segmentation method that combines the Utterance Rewriting (UR) technique with an unsupervised learning algorithm to efficiently utilize the useful cues in unlabeled dialogs by rewriting the dialogs in order to recover the co-referents and omitted words. Compared with existing unsupervised models, the proposed Discourse Rewriting Topic Segmentation Model (UR-DTS) significantly improves the accuracy of topic segmentation. The main finding is that the performance on DialSeg711 improves by about 6% in terms of absolute error score and WD, achieving 11.42% in terms of absolute error score and 12.97% in terms of WD. on Doc2Dial the absolute error score and WD improves by about 3% and 2%, respectively, resulting in SOTA reaching 35.17% in terms of absolute error score and 38.49% in terms of WD. This shows that the model is very effective in capturing the nuances of conversational topics, as well as the usefulness and challenges of utilizing unlabeled conversations.
{"title":"An Unsupervised Dialogue Topic Segmentation Model Based on Utterance Rewriting","authors":"Xia Hou, Qifeng Li, Tongliang Li","doi":"arxiv-2409.07672","DOIUrl":"https://doi.org/arxiv-2409.07672","url":null,"abstract":"Dialogue topic segmentation plays a crucial role in various types of dialogue\u0000modeling tasks. The state-of-the-art unsupervised DTS methods learn topic-aware\u0000discourse representations from conversation data through adjacent discourse\u0000matching and pseudo segmentation to further mine useful clues in unlabeled\u0000conversational relations. However, in multi-round dialogs, discourses often\u0000have co-references or omissions, leading to the fact that direct use of these\u0000discourses for representation learning may negatively affect the semantic\u0000similarity computation in the neighboring discourse matching task. In order to\u0000fully utilize the useful cues in conversational relations, this study proposes\u0000a novel unsupervised dialog topic segmentation method that combines the\u0000Utterance Rewriting (UR) technique with an unsupervised learning algorithm to\u0000efficiently utilize the useful cues in unlabeled dialogs by rewriting the\u0000dialogs in order to recover the co-referents and omitted words. Compared with\u0000existing unsupervised models, the proposed Discourse Rewriting Topic\u0000Segmentation Model (UR-DTS) significantly improves the accuracy of topic\u0000segmentation. The main finding is that the performance on DialSeg711 improves\u0000by about 6% in terms of absolute error score and WD, achieving 11.42% in terms\u0000of absolute error score and 12.97% in terms of WD. on Doc2Dial the absolute\u0000error score and WD improves by about 3% and 2%, respectively, resulting in SOTA\u0000reaching 35.17% in terms of absolute error score and 38.49% in terms of WD.\u0000This shows that the model is very effective in capturing the nuances of\u0000conversational topics, as well as the usefulness and challenges of utilizing\u0000unlabeled conversations.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gil Ayache, Menachem Pirchi, Aviv Navon, Aviv Shamsian, Gill Hetz, Joseph Keshet
Integrating named entity recognition (NER) with automatic speech recognition (ASR) can significantly enhance transcription accuracy and informativeness. In this paper, we introduce WhisperNER, a novel model that allows joint speech transcription and entity recognition. WhisperNER supports open-type NER, enabling recognition of diverse and evolving entities at inference. Building on recent advancements in open NER research, we augment a large synthetic dataset with synthetic speech samples. This allows us to train WhisperNER on a large number of examples with diverse NER tags. During training, the model is prompted with NER labels and optimized to output the transcribed utterance along with the corresponding tagged entities. To evaluate WhisperNER, we generate synthetic speech for commonly used NER benchmarks and annotate existing ASR datasets with open NER tags. Our experiments demonstrate that WhisperNER outperforms natural baselines on both out-of-domain open type NER and supervised finetuning.
将命名实体识别(NER)与自动语音识别(ASR)相结合,可以大大提高转录的准确性和信息量。在本文中,我们介绍了 WhisperNER,这是一种新型模型,可实现语音转录和实体识别的联合。WhisperNER 支持开放式 NER,可在推理时识别多样化和不断发展的实体。基于开放式 NER 研究的最新进展,我们用合成语音样本增强了一个大型合成数据集。这样,我们就能在大量带有不同 NER 标记的示例上训练 WhisperNER。在训练过程中,模型会受到 NER 标签的提示,并经过优化以输出转录语句和相应的标记实体。为了评估 WhisperNER,我们为常用的 NER 基准生成了合成语音,并为现有的 ASR 数据集标注了开放的 NER 标记。实验证明,WhisperNER 在域外开放式 NER 和监督微调方面的表现都优于自然基准。
{"title":"WhisperNER: Unified Open Named Entity and Speech Recognition","authors":"Gil Ayache, Menachem Pirchi, Aviv Navon, Aviv Shamsian, Gill Hetz, Joseph Keshet","doi":"arxiv-2409.08107","DOIUrl":"https://doi.org/arxiv-2409.08107","url":null,"abstract":"Integrating named entity recognition (NER) with automatic speech recognition\u0000(ASR) can significantly enhance transcription accuracy and informativeness. In\u0000this paper, we introduce WhisperNER, a novel model that allows joint speech\u0000transcription and entity recognition. WhisperNER supports open-type NER,\u0000enabling recognition of diverse and evolving entities at inference. Building on\u0000recent advancements in open NER research, we augment a large synthetic dataset\u0000with synthetic speech samples. This allows us to train WhisperNER on a large\u0000number of examples with diverse NER tags. During training, the model is\u0000prompted with NER labels and optimized to output the transcribed utterance\u0000along with the corresponding tagged entities. To evaluate WhisperNER, we\u0000generate synthetic speech for commonly used NER benchmarks and annotate\u0000existing ASR datasets with open NER tags. Our experiments demonstrate that\u0000WhisperNER outperforms natural baselines on both out-of-domain open type NER\u0000and supervised finetuning.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanan Wang, Shuichiro Haruta, Donghuo Zeng, Julio Vizcarra, Mori Kurokawa
Video question answering (VideoQA) is a task to predict the correct answer to questions posed about a given video. The system must comprehend spatial and temporal relationships among objects extracted from videos to perform causal and temporal reasoning. While prior works have focused on modeling individual object movements using transformer-based methods, they falter when capturing complex scenarios involving multiple objects (e.g., "a boy is throwing a ball in a hoop"). We propose a contrastive language event graph representation learning method called CLanG to address this limitation. Aiming to capture event representations associated with multiple objects, our method employs a multi-layer GNN-cluster module for adversarial graph representation learning, enabling contrastive learning between the question text and its relevant multi-object event graph. Our method outperforms a strong baseline, achieving up to 2.2% higher accuracy on two challenging VideoQA datasets, NExT-QA and TGIF-QA-R. In particular, it is 2.8% better than baselines in handling causal and temporal questions, highlighting its strength in reasoning multiple object-based events.
{"title":"Multi-object event graph representation learning for Video Question Answering","authors":"Yanan Wang, Shuichiro Haruta, Donghuo Zeng, Julio Vizcarra, Mori Kurokawa","doi":"arxiv-2409.07747","DOIUrl":"https://doi.org/arxiv-2409.07747","url":null,"abstract":"Video question answering (VideoQA) is a task to predict the correct answer to\u0000questions posed about a given video. The system must comprehend spatial and\u0000temporal relationships among objects extracted from videos to perform causal\u0000and temporal reasoning. While prior works have focused on modeling individual\u0000object movements using transformer-based methods, they falter when capturing\u0000complex scenarios involving multiple objects (e.g., \"a boy is throwing a ball\u0000in a hoop\"). We propose a contrastive language event graph representation\u0000learning method called CLanG to address this limitation. Aiming to capture\u0000event representations associated with multiple objects, our method employs a\u0000multi-layer GNN-cluster module for adversarial graph representation learning,\u0000enabling contrastive learning between the question text and its relevant\u0000multi-object event graph. Our method outperforms a strong baseline, achieving\u0000up to 2.2% higher accuracy on two challenging VideoQA datasets, NExT-QA and\u0000TGIF-QA-R. In particular, it is 2.8% better than baselines in handling causal\u0000and temporal questions, highlighting its strength in reasoning multiple\u0000object-based events.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Woojin Chung, Jiwoo Hong, Na Min An, James Thorne, Se-Young Yun
Stable pre-training is essential for achieving better-performing language models. However, tracking pre-training stability by calculating gradient variance at every step is impractical due to the significant computational costs. We explore Token Embedding Variability (TEV) as a simple and efficient proxy for assessing pre-training stability in language models with pre-layer normalization, given that shallower layers are more prone to gradient explosion (section 2.2). Moreover, we propose Multi-head Low-Rank Attention (MLRA) as an architecture to alleviate such instability by limiting the exponential growth of output embedding variance, thereby preventing the gradient explosion (section 3.2). Empirical results on GPT-2 with MLRA demonstrate increased stability and lower perplexity, particularly in deeper models.
{"title":"Stable Language Model Pre-training by Reducing Embedding Variability","authors":"Woojin Chung, Jiwoo Hong, Na Min An, James Thorne, Se-Young Yun","doi":"arxiv-2409.07787","DOIUrl":"https://doi.org/arxiv-2409.07787","url":null,"abstract":"Stable pre-training is essential for achieving better-performing language\u0000models. However, tracking pre-training stability by calculating gradient\u0000variance at every step is impractical due to the significant computational\u0000costs. We explore Token Embedding Variability (TEV) as a simple and efficient\u0000proxy for assessing pre-training stability in language models with pre-layer\u0000normalization, given that shallower layers are more prone to gradient explosion\u0000(section 2.2). Moreover, we propose Multi-head Low-Rank Attention (MLRA) as an\u0000architecture to alleviate such instability by limiting the exponential growth\u0000of output embedding variance, thereby preventing the gradient explosion\u0000(section 3.2). Empirical results on GPT-2 with MLRA demonstrate increased\u0000stability and lower perplexity, particularly in deeper models.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Opedal, Eleanor Chodroff, Ryan Cotterell, Ethan Gotlieb Wilcox
We present a new perspective on how readers integrate context during real-time language comprehension. Our proposals build on surprisal theory, which posits that the processing effort of a linguistic unit (e.g., a word) is an affine function of its in-context information content. We first observe that surprisal is only one out of many potential ways that a contextual predictor can be derived from a language model. Another one is the pointwise mutual information (PMI) between a unit and its context, which turns out to yield the same predictive power as surprisal when controlling for unigram frequency. Moreover, both PMI and surprisal are correlated with frequency. This means that neither PMI nor surprisal contains information about context alone. In response to this, we propose a technique where we project surprisal onto the orthogonal complement of frequency, yielding a new contextual predictor that is uncorrelated with frequency. Our experiments show that the proportion of variance in reading times explained by context is a lot smaller when context is represented by the orthogonalized predictor. From an interpretability standpoint, this indicates that previous studies may have overstated the role that context has in predicting reading times.
{"title":"On the Role of Context in Reading Time Prediction","authors":"Andreas Opedal, Eleanor Chodroff, Ryan Cotterell, Ethan Gotlieb Wilcox","doi":"arxiv-2409.08160","DOIUrl":"https://doi.org/arxiv-2409.08160","url":null,"abstract":"We present a new perspective on how readers integrate context during\u0000real-time language comprehension. Our proposals build on surprisal theory,\u0000which posits that the processing effort of a linguistic unit (e.g., a word) is\u0000an affine function of its in-context information content. We first observe that\u0000surprisal is only one out of many potential ways that a contextual predictor\u0000can be derived from a language model. Another one is the pointwise mutual\u0000information (PMI) between a unit and its context, which turns out to yield the\u0000same predictive power as surprisal when controlling for unigram frequency.\u0000Moreover, both PMI and surprisal are correlated with frequency. This means that\u0000neither PMI nor surprisal contains information about context alone. In response\u0000to this, we propose a technique where we project surprisal onto the orthogonal\u0000complement of frequency, yielding a new contextual predictor that is\u0000uncorrelated with frequency. Our experiments show that the proportion of\u0000variance in reading times explained by context is a lot smaller when context is\u0000represented by the orthogonalized predictor. From an interpretability\u0000standpoint, this indicates that previous studies may have overstated the role\u0000that context has in predicting reading times.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zihang Peng, Daria Stepanova, Vinh Thinh Ho, Heike Adel, Alessandra Russo, Simon Ott
Advances in information extraction have enabled the automatic construction of large knowledge graphs (e.g., Yago, Wikidata or Google KG), which are widely used in many applications like semantic search or data analytics. However, due to their semi-automatic construction, KGs are often incomplete. Rule learning methods, concerned with the extraction of frequent patterns from KGs and casting them into rules, can be applied to predict potentially missing facts. A crucial step in this process is rule ranking. Ranking of rules is especially challenging over highly incomplete or biased KGs (e.g., KGs predominantly storing facts about famous people), as in this case biased rules might fit the data best and be ranked at the top based on standard statistical metrics like rule confidence. To address this issue, prior works proposed to rank rules not only relying on the original KG but also facts predicted by a KG embedding model. At the same time, with the recent rise of Language Models (LMs), several works have claimed that LMs can be used as alternative means for KG completion. In this work, our goal is to verify to which extent the exploitation of LMs is helpful for improving the quality of rule learning systems.
信息提取技术的进步使得自动构建大型知识图谱(如 Yago、Wikidata 或 Google KG)成为可能,这些图谱在语义搜索或数据分析等许多应用中得到了广泛应用。然而,由于是半自动构建,知识图谱往往是不完整的。规则学习方法涉及从 KG 中提取频繁模式并将其转化为规则,可用于预测可能缺失的事实。这一过程的关键步骤是规则排序。规则排序对于高度不完整或有偏见的 KG(例如主要存储名人事实的 KG)尤其具有挑战性,因为在这种情况下,有偏见的规则可能最适合数据,并根据标准统计指标(如规则置信度)被排在最前面。为了解决这个问题,之前的研究提出不仅要根据原始 KG,还要根据 KG 嵌入模型预测的事实对规则进行排序。与此同时,随着语言模型(LMs)的兴起,一些工作声称 LMs 可以作为完成 KG 的替代手段。在这项工作中,我们的目标是验证利用 LMs 在多大程度上有助于提高规则学习系统的质量。
{"title":"Learning Rules from KGs Guided by Language Models","authors":"Zihang Peng, Daria Stepanova, Vinh Thinh Ho, Heike Adel, Alessandra Russo, Simon Ott","doi":"arxiv-2409.07869","DOIUrl":"https://doi.org/arxiv-2409.07869","url":null,"abstract":"Advances in information extraction have enabled the automatic construction of\u0000large knowledge graphs (e.g., Yago, Wikidata or Google KG), which are widely\u0000used in many applications like semantic search or data analytics. However, due\u0000to their semi-automatic construction, KGs are often incomplete. Rule learning\u0000methods, concerned with the extraction of frequent patterns from KGs and\u0000casting them into rules, can be applied to predict potentially missing facts. A\u0000crucial step in this process is rule ranking. Ranking of rules is especially\u0000challenging over highly incomplete or biased KGs (e.g., KGs predominantly\u0000storing facts about famous people), as in this case biased rules might fit the\u0000data best and be ranked at the top based on standard statistical metrics like\u0000rule confidence. To address this issue, prior works proposed to rank rules not\u0000only relying on the original KG but also facts predicted by a KG embedding\u0000model. At the same time, with the recent rise of Language Models (LMs), several\u0000works have claimed that LMs can be used as alternative means for KG completion.\u0000In this work, our goal is to verify to which extent the exploitation of LMs is\u0000helpful for improving the quality of rule learning systems.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}