TOCOL: improving contextual representation of pre-trained language models via token-level contrastive learning

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Pub Date : 2024-03-18 DOI:10.1007/s10994-023-06512-9

Keheng Wang, Chuantao Yin, Rumei Li, Sirui Wang, Yunsen Xian, Wenge Rong, Zhang Xiong

{"title":"TOCOL: improving contextual representation of pre-trained language models via token-level contrastive learning","authors":"Keheng Wang, Chuantao Yin, Rumei Li, Sirui Wang, Yunsen Xian, Wenge Rong, Zhang Xiong","doi":"10.1007/s10994-023-06512-9","DOIUrl":null,"url":null,"abstract":"Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a TOken-Level COntrastive Learning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"9 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-023-06512-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a TOken-Level COntrastive Learning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

TOCOL：通过标记级对比学习改进预训练语言模型的语境表征

自我关注允许转换器捕捉深度双向语境，在类似 BERT 的预训练语言模型中发挥着重要作用。然而，BERT 的最大似然预训练目标可能会产生一个各向异性的单词嵌入空间，从而导致高频词块的注意力得分出现偏差，因为这些词块在表征空间中彼此非常接近，因此具有较高的相似性。这种偏差最终可能会影响全局语境信息的编码。为了解决这个问题，我们提出了 TOCOL（一种用于改进预训练语言模型上下文表征的词级对比学习框架），它将一种新颖的自监督目标整合到注意力机制中，以重塑词的表征空间，并鼓励 PLM 捕捉句子的全局语义。GLUE 基准测试结果表明，TOCOL 比原来的 BERT 有了很大的改进。此外，我们还进行了详细分析，证明了我们的方法在低资源场景下的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.

期刊最新文献

Linear Causal Discovery with Interventional Constraints. Interpretable optimisation-based approach for hyper-box classification. Deep latent force models: ODE-based process convolutions for Bayesian deep learning. Offline reinforcement learning for learning to dispatch for job shop scheduling. Computing the distance between unbalanced distributions: the flat metric.