Unsupervised Domain Adaptation on Sentence Matching Through Self-Supervision

IF 1.2 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Computer Science and Technology Pub Date : 2023-11-30 DOI:10.1007/s11390-022-1479-0

Gui-Rong Bai, Qing-Bin Liu, Shi-Zhu He, Kang Liu, Jun Zhao

{"title":"Unsupervised Domain Adaptation on Sentence Matching Through Self-Supervision","authors":"Gui-Rong Bai, Qing-Bin Liu, Shi-Zhu He, Kang Liu, Jun Zhao","doi":"10.1007/s11390-022-1479-0","DOIUrl":null,"url":null,"abstract":"<p>Although neural approaches have yielded state-of-the-art results in the sentence matching task, their performance inevitably drops dramatically when applied to unseen domains. To tackle this cross-domain challenge, we address unsupervised domain adaptation on sentence matching, in which the goal is to have good performance on a target domain with only unlabeled target domain data as well as labeled source domain data. Specifically, we propose to perform self-supervised tasks to achieve it. Different from previous unsupervised domain adaptation methods, self-supervision can not only flexibly suit the characteristics of sentence matching with a special design, but also be much easier to optimize. When training, each self-supervised task is performed on both domains simultaneously in an easy-to-hard curriculum, which gradually brings the two domains closer together along the direction relevant to the task. As a result, the classifier trained on the source domain is able to generalize to the unlabeled target domain. In total, we present three types of self-supervised tasks and the results demonstrate their superiority. In addition, we further study the performance of different usages of self-supervised tasks, which would inspire how to effectively utilize self-supervision for cross-domain scenarios.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"15 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11390-022-1479-0","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Although neural approaches have yielded state-of-the-art results in the sentence matching task, their performance inevitably drops dramatically when applied to unseen domains. To tackle this cross-domain challenge, we address unsupervised domain adaptation on sentence matching, in which the goal is to have good performance on a target domain with only unlabeled target domain data as well as labeled source domain data. Specifically, we propose to perform self-supervised tasks to achieve it. Different from previous unsupervised domain adaptation methods, self-supervision can not only flexibly suit the characteristics of sentence matching with a special design, but also be much easier to optimize. When training, each self-supervised task is performed on both domains simultaneously in an easy-to-hard curriculum, which gradually brings the two domains closer together along the direction relevant to the task. As a result, the classifier trained on the source domain is able to generalize to the unlabeled target domain. In total, we present three types of self-supervised tasks and the results demonstrate their superiority. In addition, we further study the performance of different usages of self-supervised tasks, which would inspire how to effectively utilize self-supervision for cross-domain scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过自我监督实现句子匹配的无监督领域自适应

尽管神经方法在句子匹配任务中取得了最先进的成果，但当它们应用于未知领域时，性能不可避免地会大幅下降。为了应对这一跨域挑战，我们研究了句子匹配的无监督域适应性，其目标是在目标域中仅使用未标注的目标域数据和标注的源域数据就能获得良好的性能。具体来说，我们建议执行自监督任务来实现这一目标。与以往的无监督域适应方法不同，自监督不仅可以通过特殊设计灵活地适应句子匹配的特点，而且更易于优化。在训练时，每个自监督任务都会在两个域上同时进行，按照从易到难的课程设置，沿着与任务相关的方向逐渐拉近两个域的距离。因此，在源领域训练的分类器能够泛化到未标记的目标领域。我们总共提出了三种自监督任务，结果证明了它们的优越性。此外，我们还进一步研究了自监督任务不同用途的性能，这将启发我们如何在跨领域场景中有效利用自监督。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Computer Science and Technology 工程技术-计算机：软件工程

CiteScore

4.00

自引率

0.00%

发文量

2255

审稿时长

9.8 months

期刊介绍： Journal of Computer Science and Technology (JCST), the first English language journal in the computer field published in China, is an international forum for scientists and engineers involved in all aspects of computer science and technology to publish high quality and refereed papers. Papers reporting original research and innovative applications from all parts of the world are welcome. Papers for publication in the journal are selected through rigorous peer review, to ensure originality, timeliness, relevance, and readability. While the journal emphasizes the publication of previously unpublished materials, selected conference papers with exceptional merit that require wider exposure are, at the discretion of the editors, also published, provided they meet the journal''s peer review standards. The journal also seeks clearly written survey and review articles from experts in the field, to promote insightful understanding of the state-of-the-art and technology trends. Topics covered by Journal of Computer Science and Technology include but are not limited to: -Computer Architecture and Systems -Artificial Intelligence and Pattern Recognition -Computer Networks and Distributed Computing -Computer Graphics and Multimedia -Software Systems -Data Management and Data Mining -Theory and Algorithms -Emerging Areas