{"title":"一种用于远程监督关系提取的简单有效的蒸馏框架","authors":"Rui Li, Cheng Yang, Tingwei Li, Sen Su","doi":"10.1145/3503917","DOIUrl":null,"url":null,"abstract":"Relation extraction (RE), an important information extraction task, faced the great challenge brought by limited annotation data. To this end, distant supervision was proposed to automatically label RE data, and thus largely increased the number of annotated instances. Unfortunately, lots of noise relation annotations brought by automatic labeling become a new obstacle. Some recent studies have shown that the teacher-student framework of knowledge distillation can alleviate the interference of noise relation annotations via label softening. Nevertheless, we find that they still suffer from two problems: propagation of inaccurate dark knowledge and constraint of a unified distillation temperature. In this article, we propose a simple and effective Multi-instance Dynamic Temperature Distillation (MiDTD) framework, which is model-agnostic and mainly involves two modules: multi-instance target fusion (MiTF) and dynamic temperature regulation (DTR). MiTF combines the teacher’s predictions for multiple sentences with the same entity pair to amend the inaccurate dark knowledge in each student’s target. DTR allocates alterable distillation temperatures to different training instances to enable the softness of most student’s targets to be regulated to a moderate range. In experiments, we construct three concrete MiDTD instantiations with BERT, PCNN, and BiLSTM-based RE models, and the distilled students significantly outperform their teachers and the state-of-the-art (SOTA) methods.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"8 1","pages":"1 - 32"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"MiDTD: A Simple and Effective Distillation Framework for Distantly Supervised Relation Extraction\",\"authors\":\"Rui Li, Cheng Yang, Tingwei Li, Sen Su\",\"doi\":\"10.1145/3503917\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Relation extraction (RE), an important information extraction task, faced the great challenge brought by limited annotation data. To this end, distant supervision was proposed to automatically label RE data, and thus largely increased the number of annotated instances. Unfortunately, lots of noise relation annotations brought by automatic labeling become a new obstacle. Some recent studies have shown that the teacher-student framework of knowledge distillation can alleviate the interference of noise relation annotations via label softening. Nevertheless, we find that they still suffer from two problems: propagation of inaccurate dark knowledge and constraint of a unified distillation temperature. In this article, we propose a simple and effective Multi-instance Dynamic Temperature Distillation (MiDTD) framework, which is model-agnostic and mainly involves two modules: multi-instance target fusion (MiTF) and dynamic temperature regulation (DTR). MiTF combines the teacher’s predictions for multiple sentences with the same entity pair to amend the inaccurate dark knowledge in each student’s target. DTR allocates alterable distillation temperatures to different training instances to enable the softness of most student’s targets to be regulated to a moderate range. In experiments, we construct three concrete MiDTD instantiations with BERT, PCNN, and BiLSTM-based RE models, and the distilled students significantly outperform their teachers and the state-of-the-art (SOTA) methods.\",\"PeriodicalId\":6934,\"journal\":{\"name\":\"ACM Transactions on Information Systems (TOIS)\",\"volume\":\"8 1\",\"pages\":\"1 - 32\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Information Systems (TOIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3503917\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems (TOIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MiDTD: A Simple and Effective Distillation Framework for Distantly Supervised Relation Extraction
Relation extraction (RE), an important information extraction task, faced the great challenge brought by limited annotation data. To this end, distant supervision was proposed to automatically label RE data, and thus largely increased the number of annotated instances. Unfortunately, lots of noise relation annotations brought by automatic labeling become a new obstacle. Some recent studies have shown that the teacher-student framework of knowledge distillation can alleviate the interference of noise relation annotations via label softening. Nevertheless, we find that they still suffer from two problems: propagation of inaccurate dark knowledge and constraint of a unified distillation temperature. In this article, we propose a simple and effective Multi-instance Dynamic Temperature Distillation (MiDTD) framework, which is model-agnostic and mainly involves two modules: multi-instance target fusion (MiTF) and dynamic temperature regulation (DTR). MiTF combines the teacher’s predictions for multiple sentences with the same entity pair to amend the inaccurate dark knowledge in each student’s target. DTR allocates alterable distillation temperatures to different training instances to enable the softness of most student’s targets to be regulated to a moderate range. In experiments, we construct three concrete MiDTD instantiations with BERT, PCNN, and BiLSTM-based RE models, and the distilled students significantly outperform their teachers and the state-of-the-art (SOTA) methods.