{"title":"农业文献中关系抽取的半监督方法","authors":"V. G, Deepa Gupta, Vani Kanjirangat","doi":"10.1109/OCIT56763.2022.00046","DOIUrl":null,"url":null,"abstract":"In this work, we propose a semi-supervised boot-strapping approach for relation extraction in domain specific texts, specifically focusing on agricultural domain. Our approach utilizes the BERT model with dependency parsing for relation extraction. The proposed model, focuses on identifying five inter subdomain relations viz., Soil_Location, Soil_Crop, Disease_Pathogen, Pathogen_Crop, and Chemical_Crop. We created a corpus of 30,000 sentences extracted from recognised agriculture sites to evaluate the model. The labeled relations were then manually checked to evaluate the prediction accuracy. We used a test corpus with 700 sentences that included 3500 triplets for the evaluation. The proposed approach presents an average macro F -Score of 86.4 %, which is quite promising for semi-supervised domain specific relation extraction systems. Experimental results show the efficacy of the proposed approach in classifying relational phrases in a semi-supervised set-up for the agricultural domain.","PeriodicalId":425541,"journal":{"name":"2022 OITS International Conference on Information Technology (OCIT)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Semi Supervised Approach for Relation Extraction in Agriculture Documents\",\"authors\":\"V. G, Deepa Gupta, Vani Kanjirangat\",\"doi\":\"10.1109/OCIT56763.2022.00046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we propose a semi-supervised boot-strapping approach for relation extraction in domain specific texts, specifically focusing on agricultural domain. Our approach utilizes the BERT model with dependency parsing for relation extraction. The proposed model, focuses on identifying five inter subdomain relations viz., Soil_Location, Soil_Crop, Disease_Pathogen, Pathogen_Crop, and Chemical_Crop. We created a corpus of 30,000 sentences extracted from recognised agriculture sites to evaluate the model. The labeled relations were then manually checked to evaluate the prediction accuracy. We used a test corpus with 700 sentences that included 3500 triplets for the evaluation. The proposed approach presents an average macro F -Score of 86.4 %, which is quite promising for semi-supervised domain specific relation extraction systems. Experimental results show the efficacy of the proposed approach in classifying relational phrases in a semi-supervised set-up for the agricultural domain.\",\"PeriodicalId\":425541,\"journal\":{\"name\":\"2022 OITS International Conference on Information Technology (OCIT)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 OITS International Conference on Information Technology (OCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/OCIT56763.2022.00046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 OITS International Conference on Information Technology (OCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCIT56763.2022.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semi Supervised Approach for Relation Extraction in Agriculture Documents
In this work, we propose a semi-supervised boot-strapping approach for relation extraction in domain specific texts, specifically focusing on agricultural domain. Our approach utilizes the BERT model with dependency parsing for relation extraction. The proposed model, focuses on identifying five inter subdomain relations viz., Soil_Location, Soil_Crop, Disease_Pathogen, Pathogen_Crop, and Chemical_Crop. We created a corpus of 30,000 sentences extracted from recognised agriculture sites to evaluate the model. The labeled relations were then manually checked to evaluate the prediction accuracy. We used a test corpus with 700 sentences that included 3500 triplets for the evaluation. The proposed approach presents an average macro F -Score of 86.4 %, which is quite promising for semi-supervised domain specific relation extraction systems. Experimental results show the efficacy of the proposed approach in classifying relational phrases in a semi-supervised set-up for the agricultural domain.