{"title":"利用细粒度实体类型进行远程监督的关系提取","authors":"Chengmin Wu, Lei Chen","doi":"10.1109/SMDS49396.2020.00015","DOIUrl":null,"url":null,"abstract":"Recently, much effort has been paid to relation extraction during the construction of large ontological knowledge bases (KBs). However, most of the traditional relation extraction systems rely on human-annotated data for training, which requires expensive human effort. Therefore, Distant supervision is proposed to assist the creation of large amounts of labeled data. By this method, an existing KB is heuristically aligned to texts, and the alignment data are treated as training data. Nevertheless, the noise in the training data may cause two serious problems. First, the heuristic label alignment may fail and cause the wrong label problem. Second, the existing statistical models are applied to ad-hoc features, and hence perform poorly due to the dynamic features of noisy data. To address these two problems, in this paper, we propose a novel framework for automatic relation extraction from unstructured text corpora. Specifically, to solve the first problem, we propose a fine-grained entity typing technique to filter wrong data by choosing positive entity type pairs and conduct joint instance-type selection over bag of instances. To solve the second problem, instead of directly defining manually crafted features, we propose a deep neural architecture with attention mechanism to automatically learn positive and negative instance features. Extensive experiments on real-world datasets demonstrate that our method outperforms the competitive state-of-the-art techniques in terms of effectiveness.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Utber: Utilizing Fine-Grained Entity Types to Relation Extraction with Distant Supervision\",\"authors\":\"Chengmin Wu, Lei Chen\",\"doi\":\"10.1109/SMDS49396.2020.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, much effort has been paid to relation extraction during the construction of large ontological knowledge bases (KBs). However, most of the traditional relation extraction systems rely on human-annotated data for training, which requires expensive human effort. Therefore, Distant supervision is proposed to assist the creation of large amounts of labeled data. By this method, an existing KB is heuristically aligned to texts, and the alignment data are treated as training data. Nevertheless, the noise in the training data may cause two serious problems. First, the heuristic label alignment may fail and cause the wrong label problem. Second, the existing statistical models are applied to ad-hoc features, and hence perform poorly due to the dynamic features of noisy data. To address these two problems, in this paper, we propose a novel framework for automatic relation extraction from unstructured text corpora. Specifically, to solve the first problem, we propose a fine-grained entity typing technique to filter wrong data by choosing positive entity type pairs and conduct joint instance-type selection over bag of instances. To solve the second problem, instead of directly defining manually crafted features, we propose a deep neural architecture with attention mechanism to automatically learn positive and negative instance features. Extensive experiments on real-world datasets demonstrate that our method outperforms the competitive state-of-the-art techniques in terms of effectiveness.\",\"PeriodicalId\":385149,\"journal\":{\"name\":\"2020 IEEE International Conference on Smart Data Services (SMDS)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Smart Data Services (SMDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SMDS49396.2020.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Smart Data Services (SMDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMDS49396.2020.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Utber: Utilizing Fine-Grained Entity Types to Relation Extraction with Distant Supervision
Recently, much effort has been paid to relation extraction during the construction of large ontological knowledge bases (KBs). However, most of the traditional relation extraction systems rely on human-annotated data for training, which requires expensive human effort. Therefore, Distant supervision is proposed to assist the creation of large amounts of labeled data. By this method, an existing KB is heuristically aligned to texts, and the alignment data are treated as training data. Nevertheless, the noise in the training data may cause two serious problems. First, the heuristic label alignment may fail and cause the wrong label problem. Second, the existing statistical models are applied to ad-hoc features, and hence perform poorly due to the dynamic features of noisy data. To address these two problems, in this paper, we propose a novel framework for automatic relation extraction from unstructured text corpora. Specifically, to solve the first problem, we propose a fine-grained entity typing technique to filter wrong data by choosing positive entity type pairs and conduct joint instance-type selection over bag of instances. To solve the second problem, instead of directly defining manually crafted features, we propose a deep neural architecture with attention mechanism to automatically learn positive and negative instance features. Extensive experiments on real-world datasets demonstrate that our method outperforms the competitive state-of-the-art techniques in terms of effectiveness.