On the instability of further pre-training: Does a single sentence matter to BERT?

Natural Language Processing Journal Pub Date : 2023-10-27 DOI:10.1016/j.nlp.2023.100037

Luca Bacco , Gosse Minnema , Tommaso Caselli , Felice Dell’Orletta , Mario Merone , Malvina Nissim

{"title":"On the instability of further pre-training: Does a single sentence matter to BERT?","authors":"Luca Bacco , Gosse Minnema , Tommaso Caselli , Felice Dell’Orletta , Mario Merone , Malvina Nissim","doi":"10.1016/j.nlp.2023.100037","DOIUrl":null,"url":null,"abstract":"<div><p>We observe a remarkable instability in BERT-like models: minimal changes in the internal representations of BERT, as induced by one-step further pre-training with even a single sentence, can noticeably change the behaviour of subsequently fine-tuned models. While the pre-trained models seem to be essentially the same, also by means of established similarity assessment techniques, the measurable tiny changes appear to substantially impact the models’ tuning path, leading to significantly different fine-tuned systems and affecting downstream performance. After testing a very large number of combinations, which we briefly summarize, the experiments reported in this short paper focus on an intermediate phase consisting of a single-step and single-sentence masked language modeling stage and its impact on a sentiment analysis task. We discuss a series of unexpected findings which leave some open questions over the nature and stability of further pre-training.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"5 ","pages":"Article 100037"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719123000341/pdfft?md5=ea3c6f4e3559eae8be4a84b5fe77fb85&pid=1-s2.0-S2949719123000341-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719123000341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We observe a remarkable instability in BERT-like models: minimal changes in the internal representations of BERT, as induced by one-step further pre-training with even a single sentence, can noticeably change the behaviour of subsequently fine-tuned models. While the pre-trained models seem to be essentially the same, also by means of established similarity assessment techniques, the measurable tiny changes appear to substantially impact the models’ tuning path, leading to significantly different fine-tuned systems and affecting downstream performance. After testing a very large number of combinations, which we briefly summarize, the experiments reported in this short paper focus on an intermediate phase consisting of a single-step and single-sentence masked language modeling stage and its impact on a sentiment analysis task. We discuss a series of unexpected findings which leave some open questions over the nature and stability of further pre-training.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

关于进一步预训练的不稳定性:单个句子对BERT有影响吗?

我们在类BERT模型中观察到一个显著的不稳定性:BERT内部表征的最小变化，由一步进一步的预训练引起，即使是一个句子，也可以显著地改变随后微调模型的行为。虽然预训练的模型似乎本质上是相同的，但通过建立的相似性评估技术，可测量的微小变化似乎会实质性地影响模型的调优路径，导致明显不同的微调系统并影响下游性能。在测试了大量的组合之后，我们简要地总结了这些组合，这篇短文中报告的实验集中在一个由单步和单句掩蔽语言建模阶段组成的中间阶段及其对情感分析任务的影响。我们讨论了一系列意想不到的发现，这些发现对进一步预训练的性质和稳定性留下了一些悬而未决的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Natural Language Processing Journal

自引率

0.00%

发文量