首页 > 最新文献

Findings最新文献

英文 中文
Super Speeders: Mining Speed-Camera Data to Analyze Extreme Recidivism in New York City 超级超速者:挖掘测速摄像数据,分析纽约市的极端累犯现象
Pub Date : 2024-02-01 DOI: 10.32866/001c.92690
Marcel E. Moran
New York City maintains the country’s largest network of automated speed cameras, though fines are set lower than if written by police, and are omitted from license-suspension calculations, regardless of the number incurred. What does such a program design entail in terms of speeding recidivism, particularly at the extreme end? Mining a decade of publicly available speed-camera data finds that beginning in 2020, individual automobiles crossed the threshold of 100 or more camera-based speeding violations per year, 25 times the number that would prompt license suspension if manually written by police. Cross-referencing these ‘super speeders’ against traffic-violation data finds they average 35 non-speeding violations (including driving through red lights), as well as roughly $11,000 in unpaid fines each. The emergence and growth of this extreme recidivism indicates the need to evaluate and potentially modify the penalty design and enforcement of New York City’s camera-based speed program.
纽约市拥有全国最大的自动测速摄像头网络,但罚款额度低于警方开具的罚单,而且在计算吊销执照时,无论发生了多少次超速,罚款额度都会被省略。从超速累犯,尤其是极端超速累犯的角度来看,这样的程序设计会带来什么影响呢?对十年来公开的测速摄像头数据进行分析后发现,从 2020 年开始,每辆汽车每年都会出现 100 次或 100 次以上基于摄像头的超速违规行为,是警方人工记录的吊销执照次数的 25 倍。将这些 "超级超速者 "与交通违规数据进行交叉对比后发现,他们平均有 35 次非超速违规行为(包括闯红灯),以及每人约 11,000 美元的未付罚款。这种极端惯犯的出现和增长表明,有必要对纽约市基于摄像头的超速项目的处罚设计和执行进行评估和潜在修改。
{"title":"Super Speeders: Mining Speed-Camera Data to Analyze Extreme Recidivism in New York City","authors":"Marcel E. Moran","doi":"10.32866/001c.92690","DOIUrl":"https://doi.org/10.32866/001c.92690","url":null,"abstract":"New York City maintains the country’s largest network of automated speed cameras, though fines are set lower than if written by police, and are omitted from license-suspension calculations, regardless of the number incurred. What does such a program design entail in terms of speeding recidivism, particularly at the extreme end? Mining a decade of publicly available speed-camera data finds that beginning in 2020, individual automobiles crossed the threshold of 100 or more camera-based speeding violations per year, 25 times the number that would prompt license suspension if manually written by police. Cross-referencing these ‘super speeders’ against traffic-violation data finds they average 35 non-speeding violations (including driving through red lights), as well as roughly $11,000 in unpaid fines each. The emergence and growth of this extreme recidivism indicates the need to evaluate and potentially modify the penalty design and enforcement of New York City’s camera-based speed program.","PeriodicalId":508951,"journal":{"name":"Findings","volume":"777 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139830734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Super Speeders: Mining Speed-Camera Data to Analyze Extreme Recidivism in New York City 超级超速者:挖掘测速摄像数据,分析纽约市的极端累犯现象
Pub Date : 2024-02-01 DOI: 10.32866/001c.92690
Marcel E. Moran
New York City maintains the country’s largest network of automated speed cameras, though fines are set lower than if written by police, and are omitted from license-suspension calculations, regardless of the number incurred. What does such a program design entail in terms of speeding recidivism, particularly at the extreme end? Mining a decade of publicly available speed-camera data finds that beginning in 2020, individual automobiles crossed the threshold of 100 or more camera-based speeding violations per year, 25 times the number that would prompt license suspension if manually written by police. Cross-referencing these ‘super speeders’ against traffic-violation data finds they average 35 non-speeding violations (including driving through red lights), as well as roughly $11,000 in unpaid fines each. The emergence and growth of this extreme recidivism indicates the need to evaluate and potentially modify the penalty design and enforcement of New York City’s camera-based speed program.
纽约市拥有全国最大的自动测速摄像头网络,但罚款额度低于警方开具的罚单,而且在计算吊销执照时,无论发生了多少次超速,罚款额度都会被省略。从超速累犯,尤其是极端超速累犯的角度来看,这样的程序设计会带来什么影响呢?对十年来公开的测速摄像头数据进行分析后发现,从 2020 年开始,每辆汽车每年都会出现 100 次或 100 次以上基于摄像头的超速违规行为,是警方人工记录的吊销执照次数的 25 倍。将这些 "超级超速者 "与交通违规数据进行交叉对比后发现,他们平均有 35 次非超速违规行为(包括闯红灯),以及每人约 11,000 美元的未付罚款。这种极端惯犯的出现和增长表明,有必要对纽约市基于摄像头的超速项目的处罚设计和执行进行评估和潜在修改。
{"title":"Super Speeders: Mining Speed-Camera Data to Analyze Extreme Recidivism in New York City","authors":"Marcel E. Moran","doi":"10.32866/001c.92690","DOIUrl":"https://doi.org/10.32866/001c.92690","url":null,"abstract":"New York City maintains the country’s largest network of automated speed cameras, though fines are set lower than if written by police, and are omitted from license-suspension calculations, regardless of the number incurred. What does such a program design entail in terms of speeding recidivism, particularly at the extreme end? Mining a decade of publicly available speed-camera data finds that beginning in 2020, individual automobiles crossed the threshold of 100 or more camera-based speeding violations per year, 25 times the number that would prompt license suspension if manually written by police. Cross-referencing these ‘super speeders’ against traffic-violation data finds they average 35 non-speeding violations (including driving through red lights), as well as roughly $11,000 in unpaid fines each. The emergence and growth of this extreme recidivism indicates the need to evaluate and potentially modify the penalty design and enforcement of New York City’s camera-based speed program.","PeriodicalId":508951,"journal":{"name":"Findings","volume":"51 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139890496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entity Linking in the Job Market Domain 就业市场领域的实体链接
Pub Date : 2024-01-31 DOI: 10.48550/arXiv.2401.17979
Mike Zhang, R. Goot, Barbara Plank
In Natural Language Processing, entity linking (EL) has centered around Wikipedia, but yet remains underexplored for the job market domain. Disambiguating skill mentions can help us get insight into the current labor market demands. In this work, we are the first to explore EL in this domain, specifically targeting the linkage of occupational skills to the ESCO taxonomy (le Vrang et al., 2014). Previous efforts linked coarse-grained (full) sentences to a corresponding ESCO skill. In this work, we link more fine-grained span-level mentions of skills. We tune two high-performing neural EL models, a bi-encoder (Wu et al., 2020) and an autoregressive model (Cao et al., 2021), on a synthetically generated mention–skill pair dataset and evaluate them on a human-annotated skill-linking benchmark. Our findings reveal that both models are capable of linking implicit mentions of skills to their correct taxonomy counterparts. Empirically, BLINK outperforms GENRE in strict evaluation, but GENRE performs better in loose evaluation (accuracy@k).
在自然语言处理领域,实体链接(EL)一直以维基百科为中心,但在就业市场领域仍未得到充分探索。消除技能提及的歧义可以帮助我们深入了解当前劳动力市场的需求。在这项工作中,我们首次探索了该领域的 EL,特别是针对职业技能与 ESCO 分类法的链接(le Vrang 等人,2014 年)。之前的研究将粗粒度(完整)句子与相应的 ESCO 技能联系起来。在这项工作中,我们将更细粒度的跨度级技能链接起来。我们在合成生成的提及-技能对数据集上调整了两个高性能的神经 EL 模型,即双编码器(Wu 等人,2020 年)和自回归模型(Cao 等人,2021 年),并在人类标注的技能链接基准上对它们进行了评估。我们的研究结果表明,这两种模型都能将技能的隐式提及与正确的分类法对应词联系起来。根据经验,BLINK 在严格评估中的表现优于 GENRE,但 GENRE 在宽松评估中的表现更好(准确率@k)。
{"title":"Entity Linking in the Job Market Domain","authors":"Mike Zhang, R. Goot, Barbara Plank","doi":"10.48550/arXiv.2401.17979","DOIUrl":"https://doi.org/10.48550/arXiv.2401.17979","url":null,"abstract":"In Natural Language Processing, entity linking (EL) has centered around Wikipedia, but yet remains underexplored for the job market domain. Disambiguating skill mentions can help us get insight into the current labor market demands. In this work, we are the first to explore EL in this domain, specifically targeting the linkage of occupational skills to the ESCO taxonomy (le Vrang et al., 2014). Previous efforts linked coarse-grained (full) sentences to a corresponding ESCO skill. In this work, we link more fine-grained span-level mentions of skills. We tune two high-performing neural EL models, a bi-encoder (Wu et al., 2020) and an autoregressive model (Cao et al., 2021), on a synthetically generated mention–skill pair dataset and evaluate them on a human-annotated skill-linking benchmark. Our findings reveal that both models are capable of linking implicit mentions of skills to their correct taxonomy counterparts. Empirically, BLINK outperforms GENRE in strict evaluation, but GENRE performs better in loose evaluation (accuracy@k).","PeriodicalId":508951,"journal":{"name":"Findings","volume":"727 ","pages":"410-419"},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140479372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding Space using Contrastive Learning 道德是非二元的:利用对比学习构建多元道德句子嵌入空间
Pub Date : 2024-01-30 DOI: 10.48550/arXiv.2401.17228
Jeongwoo Park, Enrico Liscio, P. Murukannaiah
Recent advances in NLP show that language models retain a discernible level of knowledge in deontological ethics and moral norms. However, existing works often treat morality as binary, ranging from right to wrong. This simplistic view does not capture the nuances of moral judgment. Pluralist moral philosophers argue that human morality can be deconstructed into a finite number of elements, respecting individual differences in moral judgment. In line with this view, we build a pluralist moral sentence embedding space via a state-of-the-art contrastive learning approach. We systematically investigate the embedding space by studying the emergence of relationships among moral elements, both quantitatively and qualitatively. Our results show that a pluralist approach to morality can be captured in an embedding space. However, moral pluralism is challenging to deduce via self-supervision alone and requires a supervised approach with human labels.
近来在 NLP 方面取得的进展表明,语言模型在 "义务论 "伦理学和道德规范方面保留了一定程度的知识。然而,现有的著作通常将道德视为二元对立,从对到错。这种简单化的观点无法捕捉道德判断的细微差别。多元主义道德哲学家认为,人类道德可以解构为有限的几个要素,尊重道德判断的个体差异。根据这一观点,我们通过最先进的对比学习方法构建了一个多元道德句子嵌入空间。我们通过定量和定性研究道德元素之间关系的出现,对嵌入空间进行了系统研究。我们的研究结果表明,道德多元化方法可以在嵌入空间中得到体现。然而,仅通过自我监督来推导道德多元化是具有挑战性的,需要使用带有人类标签的监督方法。
{"title":"Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding Space using Contrastive Learning","authors":"Jeongwoo Park, Enrico Liscio, P. Murukannaiah","doi":"10.48550/arXiv.2401.17228","DOIUrl":"https://doi.org/10.48550/arXiv.2401.17228","url":null,"abstract":"Recent advances in NLP show that language models retain a discernible level of knowledge in deontological ethics and moral norms. However, existing works often treat morality as binary, ranging from right to wrong. This simplistic view does not capture the nuances of moral judgment. Pluralist moral philosophers argue that human morality can be deconstructed into a finite number of elements, respecting individual differences in moral judgment. In line with this view, we build a pluralist moral sentence embedding space via a state-of-the-art contrastive learning approach. We systematically investigate the embedding space by studying the emergence of relationships among moral elements, both quantitatively and qualitatively. Our results show that a pluralist approach to morality can be captured in an embedding space. However, moral pluralism is challenging to deduce via self-supervision alone and requires a supervised approach with human labels.","PeriodicalId":508951,"journal":{"name":"Findings","volume":"39 1","pages":"654-673"},"PeriodicalIF":0.0,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140481477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contextualization Distillation from Large Language Model for Knowledge Graph Completion 从大型语言模型中提炼语境,促进知识图谱的完善
Pub Date : 2024-01-28 DOI: 10.48550/arXiv.2402.01729
Dawei Li, Zhen Tan, Tianlong Chen, Huan Liu
While textual information significantly enhances the performance of pre-trained language models (PLMs) in knowledge graph completion (KGC), the static and noisy nature of existing corpora collected from Wikipedia articles or synsets definitions often limits the potential of PLM-based KGC models. To surmount these challenges, we introduce the Contextualization Distillation strategy, a versatile plug-in-and-play approach compatible with both discriminative and generative KGC frameworks. Our method begins by instructing large language models (LLMs) to transform compact, structural triplets into context-rich segments. Subsequently, we introduce two tailored auxiliary tasks—reconstruction and contextualization—allowing smaller KGC models to assimilate insights from these enriched triplets. Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach, revealing consistent performance enhancements irrespective of underlying pipelines or architectures. Moreover, our analysis makes our method more explainable and provides insight into how to generate high-quality corpora for KGC, as well as the selection of suitable distillation tasks.
虽然文本信息能显著提高预训练语言模型(PLM)在知识图谱补全(KGC)中的性能,但从维基百科文章或同义词集定义中收集的现有语料库的静态和噪声特性往往限制了基于 PLM 的 KGC 模型的潜力。为了克服这些挑战,我们引入了语境化蒸馏策略,这是一种通用的即插即用方法,与判别式和生成式 KGC 框架兼容。我们的方法首先指示大型语言模型(LLM)将结构紧凑的三连音转换为语境丰富的片段。随后,我们引入了两个量身定制的辅助任务--重构和语境化--允许较小的 KGC 模型从这些丰富的三连音中吸收洞察力。对不同数据集和 KGC 技术的全面评估凸显了我们方法的有效性和适应性,揭示了与底层管道或架构无关的一致的性能提升。此外,我们的分析使我们的方法更易于解释,并为如何为 KGC 生成高质量语料库以及选择合适的提炼任务提供了启示。
{"title":"Contextualization Distillation from Large Language Model for Knowledge Graph Completion","authors":"Dawei Li, Zhen Tan, Tianlong Chen, Huan Liu","doi":"10.48550/arXiv.2402.01729","DOIUrl":"https://doi.org/10.48550/arXiv.2402.01729","url":null,"abstract":"While textual information significantly enhances the performance of pre-trained language models (PLMs) in knowledge graph completion (KGC), the static and noisy nature of existing corpora collected from Wikipedia articles or synsets definitions often limits the potential of PLM-based KGC models. To surmount these challenges, we introduce the Contextualization Distillation strategy, a versatile plug-in-and-play approach compatible with both discriminative and generative KGC frameworks. Our method begins by instructing large language models (LLMs) to transform compact, structural triplets into context-rich segments. Subsequently, we introduce two tailored auxiliary tasks—reconstruction and contextualization—allowing smaller KGC models to assimilate insights from these enriched triplets. Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach, revealing consistent performance enhancements irrespective of underlying pipelines or architectures. Moreover, our analysis makes our method more explainable and provides insight into how to generate high-quality corpora for KGC, as well as the selection of suitable distillation tasks.","PeriodicalId":508951,"journal":{"name":"Findings","volume":"108 1","pages":"458-477"},"PeriodicalIF":0.0,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140491219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms MEDs for PETs:针对潜在委婉用语的多语种委婉消歧义法
Pub Date : 2024-01-25 DOI: 10.48550/arXiv.2401.14526
Patrick Lee, Alain Chirino Trujillo, Diana Cuevas Plancarte, O. E. Ojo, Xinyi Liu, Iyanuoluwa Shode, Yuan Zhao, Jing Peng, Anna Feldman
Euphemisms are found across the world’s languages, making them a universal linguistic phenomenon. As such, euphemistic data may have useful properties for computational tasks across languages. In this study, we explore this premise by training a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with current trends, we demonstrate that zero-shot learning across languages takes place. We also show cases where multilingual models perform better on the task compared to monolingual models by a statistically significant margin, indicating that multilingual data presents additional opportunities for models to learn about cross-lingual, computational properties of euphemisms. In a follow-up analysis, we focus on universal euphemistic “categories” such as death and bodily functions among others. We test to see whether cross-lingual data of the same domain is more important than within-language data of other domains to further understand the nature of the cross-lingual transfer.
委婉语遍布世界各种语言,是一种普遍的语言现象。因此,委婉语数据在跨语言计算任务中可能具有有用的特性。在本研究中,我们通过训练多语言转换器模型(XLM-RoBERTa)来探索这一前提,从而在多语言和跨语言环境中消除潜在委婉语(PET)的歧义。与当前趋势一致,我们证明了跨语言零点学习的发生。我们还展示了多语言模型在任务中的表现优于单语言模型的情况,其差异在统计学上非常明显,这表明多语言数据为模型学习委婉语的跨语言计算特性提供了更多机会。在后续分析中,我们将重点放在通用委婉语 "类别 "上,如死亡和身体机能等。我们将测试同一领域的跨语言数据是否比其他领域的语内数据更重要,以进一步了解跨语言迁移的性质。
{"title":"MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms","authors":"Patrick Lee, Alain Chirino Trujillo, Diana Cuevas Plancarte, O. E. Ojo, Xinyi Liu, Iyanuoluwa Shode, Yuan Zhao, Jing Peng, Anna Feldman","doi":"10.48550/arXiv.2401.14526","DOIUrl":"https://doi.org/10.48550/arXiv.2401.14526","url":null,"abstract":"Euphemisms are found across the world’s languages, making them a universal linguistic phenomenon. As such, euphemistic data may have useful properties for computational tasks across languages. In this study, we explore this premise by training a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with current trends, we demonstrate that zero-shot learning across languages takes place. We also show cases where multilingual models perform better on the task compared to monolingual models by a statistically significant margin, indicating that multilingual data presents additional opportunities for models to learn about cross-lingual, computational properties of euphemisms. In a follow-up analysis, we focus on universal euphemistic “categories” such as death and bodily functions among others. We test to see whether cross-lingual data of the same domain is more important than within-language data of other domains to further understand the nature of the cross-lingual transfer.","PeriodicalId":508951,"journal":{"name":"Findings","volume":"281 3","pages":"875-881"},"PeriodicalIF":0.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140495487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods 评估通过参数效率微调方法训练的参数矩阵的可移植性
Pub Date : 2024-01-25 DOI: 10.48550/arXiv.2401.14228
Mohammed Sabry, Anya Belz
As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning.In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques, using sentiment analysis as an example task. We test portability in a wide range of scenarios, involving different PEFT techniques and different pretrained host models, among other dimensions. We compare the performance of ported modules with that of equivalent modules trained (i) from scratch, and (ii) from parameters sampled from the same distribution as the ported module.We find that the ported modules far outperform the two alternatives tested, but that there are interesting differences between the four PEFT techniques tested.We conclude that task-specific knowledge in the form of structurally modular sets of parameters as produced by PEFT techniques is highly portable, but that degree of success depends on type of PEFT and on differences between originating and receiving pretrained models.
随着训练越来越大的语言模型所需的成本越来越高,人们对重复使用以前所学知识的兴趣也越来越大。迁移学习方法表明,重复使用非特定任务的知识有助于后续特定任务的学习。在本文中,我们研究了反向迁移:将编码特定任务知识的整个功能模块从一个模型移植到另一个模型。我们设计了一项包含 1,440 次训练/测试运行的研究,以情感分析为例,测试通过参数高效微调(PEFT)技术训练的模块的可移植性。我们在广泛的场景中测试了可移植性,包括不同的 PEFT 技术和不同的预训练主机模型等。我们将移植模块的性能与(i)从零开始训练的同等模块和(ii)从与移植模块相同的分布中采样的参数训练的同等模块的性能进行了比较。我们发现移植模块的性能远远优于所测试的两种替代方案,但所测试的四种 PEFT 技术之间存在有趣的差异。我们的结论是,由 PEFT 技术产生的以结构模块化参数集为形式的特定任务知识具有高度可移植性,但成功程度取决于 PEFT 的类型以及原始模型和接收预训练模型之间的差异。
{"title":"Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods","authors":"Mohammed Sabry, Anya Belz","doi":"10.48550/arXiv.2401.14228","DOIUrl":"https://doi.org/10.48550/arXiv.2401.14228","url":null,"abstract":"As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning.In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques, using sentiment analysis as an example task. We test portability in a wide range of scenarios, involving different PEFT techniques and different pretrained host models, among other dimensions. We compare the performance of ported modules with that of equivalent modules trained (i) from scratch, and (ii) from parameters sampled from the same distribution as the ported module.We find that the ported modules far outperform the two alternatives tested, but that there are interesting differences between the four PEFT techniques tested.We conclude that task-specific knowledge in the form of structurally modular sets of parameters as produced by PEFT techniques is highly portable, but that degree of success depends on type of PEFT and on differences between originating and receiving pretrained models.","PeriodicalId":508951,"journal":{"name":"Findings","volume":"30 3","pages":"1548-1556"},"PeriodicalIF":0.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140494953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation PRILoRA:经过剪枝和等级递增的低库适应性
Pub Date : 2024-01-20 DOI: 10.48550/arXiv.2401.11316
Nadav Benedek, Lior Wolf
With the proliferation of large pre-trained language models (PLMs), fine-tuning all model parameters becomes increasingly inefficient, particularly when dealing with numerous downstream tasks that entail substantial training and storage costs. Several approaches aimed at achieving parameter-efficient fine-tuning (PEFT) have been proposed. Among them, Low-Rank Adaptation (LoRA) stands out as an archetypal method, incorporating trainable rank decomposition matrices into each target module. Nevertheless, LoRA does not consider the varying importance of each layer. To address these challenges, we introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process, considering both the temporary magnitude of weights and the accumulated statistics of the input to any given layer. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
随着大量预训练语言模型(PLM)的涌现,对所有模型参数进行微调的效率越来越低,尤其是在处理大量需要大量训练和存储成本的下游任务时。目前已经提出了几种旨在实现参数高效微调(PEFT)的方法。其中,低秩适应(Low-Rank Adaptation,LoRA)是一种典型的方法,它将可训练的秩分解矩阵纳入每个目标模块。然而,LoRA 并未考虑各层的不同重要性。为了应对这些挑战,我们引入了 PRILoRA,它以线性递增的方式为每一层分配不同的秩,并在整个训练过程中执行剪枝,同时考虑权重的临时大小和任何给定层输入的累积统计数据。我们在八个 GLUE 基准上进行了大量实验,验证了 PRILoRA 的有效性,开创了新的技术领域。
{"title":"PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation","authors":"Nadav Benedek, Lior Wolf","doi":"10.48550/arXiv.2401.11316","DOIUrl":"https://doi.org/10.48550/arXiv.2401.11316","url":null,"abstract":"With the proliferation of large pre-trained language models (PLMs), fine-tuning all model parameters becomes increasingly inefficient, particularly when dealing with numerous downstream tasks that entail substantial training and storage costs. Several approaches aimed at achieving parameter-efficient fine-tuning (PEFT) have been proposed. Among them, Low-Rank Adaptation (LoRA) stands out as an archetypal method, incorporating trainable rank decomposition matrices into each target module. Nevertheless, LoRA does not consider the varying importance of each layer. To address these challenges, we introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process, considering both the temporary magnitude of weights and the accumulated statistics of the input to any given layer. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.","PeriodicalId":508951,"journal":{"name":"Findings","volume":"65 4","pages":"252-263"},"PeriodicalIF":0.0,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140501654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Better Explain Transformers by Illuminating Important Information 阐明重要信息,更好地解释变压器
Pub Date : 2024-01-18 DOI: 10.48550/arXiv.2401.09972
Linxin Song, Yan Cui, Ao Luo, Freddy Lecue, Irene Li
Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3% to 33% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: https://anonymous.4open.science/r/MLRP-E676/
基于变换器的模型在各种自然语言处理(NLP)任务中表现出色,吸引了无数人努力解释其内部工作原理。先前的方法通过关注原始梯度和注意力作为标记归因得分来解释变换器,在解释计算过程中往往会考虑非相关信息,从而导致结果混乱。在这项工作中,我们建议在层向相关性传播(LRP)方法的基础上,通过细化信息流来突出重要信息并剔除无关信息。具体来说,我们将句法和位置标题视为重要的关注标题,并关注从这些重要标题中获得的相关性。实验结果表明,无关信息确实会扭曲输出归因得分,因此在计算解释时应屏蔽这些信息。与分类和问题解答数据集上的八种基线方法相比,我们的方法在解释指标上的表现始终优于它们,提高了 3% 到 33% 不等,提供了卓越的解释性能。我们的匿名代码库位于: https://anonymous.4open.science/r/MLRP-E676/
{"title":"Better Explain Transformers by Illuminating Important Information","authors":"Linxin Song, Yan Cui, Ao Luo, Freddy Lecue, Irene Li","doi":"10.48550/arXiv.2401.09972","DOIUrl":"https://doi.org/10.48550/arXiv.2401.09972","url":null,"abstract":"Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3% to 33% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: https://anonymous.4open.science/r/MLRP-E676/","PeriodicalId":508951,"journal":{"name":"Findings","volume":"21 3","pages":"2048-2062"},"PeriodicalIF":0.0,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140504184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large Language Models for Scientific Information Extraction: An Empirical Study for Virology 用于科学信息提取的大型语言模型:病毒学实证研究
Pub Date : 2024-01-18 DOI: 10.48550/arXiv.2401.10040
Mahsa Shamsabadi, Jennifer D'Souza, Sören Auer
In this paper, we champion the use of structured and semantic content representation of discourse-based scholarly communication, inspired by tools like Wikipedia infoboxes or structured Amazon product descriptions. These representations provide users with a concise overview, aiding scientists in navigating the dense academic landscape. Our novel automated approach leverages the robust text generation capabilities of LLMs to produce structured scholarly contribution summaries, offering both a practical solution and insights into LLMs’ emergent abilities.For LLMs, the prime focus is on improving their general intelligence as conversational agents. We argue that these models can also be applied effectively in information extraction (IE), specifically in complex IE tasks within terse domains like Science. This paradigm shift replaces the traditional modular, pipelined machine learning approach with a simpler objective expressed through instructions. Our results show that finetuned FLAN-T5 with 1000x fewer parameters than the state-of-the-art GPT-davinci is competitive for the task.
受维基百科信息框或结构化亚马逊产品描述等工具的启发,我们在本文中倡导使用结构化和语义化的内容表示法来表示基于话语的学术交流。这些表示法为用户提供了简明的概览,有助于科学家浏览密集的学术景观。我们新颖的自动方法利用 LLM 强大的文本生成能力来生成结构化的学术贡献摘要,既提供了实用的解决方案,又让人们深入了解了 LLM 的新兴能力。我们认为,这些模型也可以有效地应用于信息提取(IE),特别是科学等简洁领域的复杂 IE 任务。这种范式的转变取代了传统的模块化、流水线式机器学习方法,而是通过指令来表达更简单的目标。我们的研究结果表明,经过微调的 FLAN-T5 比最先进的 GPT-davinci 少了 1000 倍的参数,在执行任务时具有很强的竞争力。
{"title":"Large Language Models for Scientific Information Extraction: An Empirical Study for Virology","authors":"Mahsa Shamsabadi, Jennifer D'Souza, Sören Auer","doi":"10.48550/arXiv.2401.10040","DOIUrl":"https://doi.org/10.48550/arXiv.2401.10040","url":null,"abstract":"In this paper, we champion the use of structured and semantic content representation of discourse-based scholarly communication, inspired by tools like Wikipedia infoboxes or structured Amazon product descriptions. These representations provide users with a concise overview, aiding scientists in navigating the dense academic landscape. Our novel automated approach leverages the robust text generation capabilities of LLMs to produce structured scholarly contribution summaries, offering both a practical solution and insights into LLMs’ emergent abilities.For LLMs, the prime focus is on improving their general intelligence as conversational agents. We argue that these models can also be applied effectively in information extraction (IE), specifically in complex IE tasks within terse domains like Science. This paradigm shift replaces the traditional modular, pipelined machine learning approach with a simpler objective expressed through instructions. Our results show that finetuned FLAN-T5 with 1000x fewer parameters than the state-of-the-art GPT-davinci is competitive for the task.","PeriodicalId":508951,"journal":{"name":"Findings","volume":"24 18","pages":"374-392"},"PeriodicalIF":0.0,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140504010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Findings
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1