Turdi Tohti, Le Chang, A. Hamdulla, Hankiz Yilahun
{"title":"Concept Word Extraction for Bilingual Ontology Construction in Unstructured Text Environment*","authors":"Turdi Tohti, Le Chang, A. Hamdulla, Hankiz Yilahun","doi":"10.1109/PRML52754.2021.9520708","DOIUrl":null,"url":null,"abstract":"Aiming at the unsatisfactory efficiency of concept word extraction from unstructured text for domain ontology construction, this work first uses a combined statistic to judge the correctness of the concept word boundary determined by the word segmentation, and corrects the wrong segmentation position, thereby strengthening the structural integrity of the segmented candidate concept words. On this basis, the improved methods and various resource libraries are used to adjust the weight of concept words, and the main purpose is to strengthen the correlation between the weight and its domain attributes of concept words. We conducted experiments and comparisons on English-Chinese bilingual corpus, and found that the method of strengthening the structural integrity of concept words and the method of dynamically adjusting the weight of concept words proposed in this paper both brought a certain improvement in the efficiency of concept word extraction.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Aiming at the unsatisfactory efficiency of concept word extraction from unstructured text for domain ontology construction, this work first uses a combined statistic to judge the correctness of the concept word boundary determined by the word segmentation, and corrects the wrong segmentation position, thereby strengthening the structural integrity of the segmented candidate concept words. On this basis, the improved methods and various resource libraries are used to adjust the weight of concept words, and the main purpose is to strengthen the correlation between the weight and its domain attributes of concept words. We conducted experiments and comparisons on English-Chinese bilingual corpus, and found that the method of strengthening the structural integrity of concept words and the method of dynamically adjusting the weight of concept words proposed in this paper both brought a certain improvement in the efficiency of concept word extraction.