{"title":"基于无监督学习和语义角色标注的DBpedia本体词汇化研究","authors":"A. Marginean, Kando Eniko","doi":"10.1109/SYNASC.2016.048","DOIUrl":null,"url":null,"abstract":"Filling the gap between natural language expressions and ontology concepts or properties is the new trend in Semantic Web. Ontology lexicalization introduces a new layer of lexical information for ontology properties and concepts. We propose a method based on unsupervised learning for the extraction of the potential lexical expressions of DBpedia propertiesfrom Wikipedia text corpus. It is a resource-driven approach that comprises three main steps. The first step consists of the extraction of DBpedia triples for the aimed property followed by the extraction of Wikipedia articles describing the resources from these triples. In the second step, sentences mostly related to the property are extracted from the articles and they are analyzed with a Semantic Role Labeler resulting in a set of SRL annotated trees. In the last step, clusters of expressions are built using spectral clustering based on the distances between the SRL trees. The clusters with the least variance are considered to be relevant for the lexical expressions of the property.","PeriodicalId":268635,"journal":{"name":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Towards Lexicalization of DBpedia Ontology with Unsupervised Learning and Semantic Role Labeling\",\"authors\":\"A. Marginean, Kando Eniko\",\"doi\":\"10.1109/SYNASC.2016.048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Filling the gap between natural language expressions and ontology concepts or properties is the new trend in Semantic Web. Ontology lexicalization introduces a new layer of lexical information for ontology properties and concepts. We propose a method based on unsupervised learning for the extraction of the potential lexical expressions of DBpedia propertiesfrom Wikipedia text corpus. It is a resource-driven approach that comprises three main steps. The first step consists of the extraction of DBpedia triples for the aimed property followed by the extraction of Wikipedia articles describing the resources from these triples. In the second step, sentences mostly related to the property are extracted from the articles and they are analyzed with a Semantic Role Labeler resulting in a set of SRL annotated trees. In the last step, clusters of expressions are built using spectral clustering based on the distances between the SRL trees. The clusters with the least variance are considered to be relevant for the lexical expressions of the property.\",\"PeriodicalId\":268635,\"journal\":{\"name\":\"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SYNASC.2016.048\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2016.048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Lexicalization of DBpedia Ontology with Unsupervised Learning and Semantic Role Labeling
Filling the gap between natural language expressions and ontology concepts or properties is the new trend in Semantic Web. Ontology lexicalization introduces a new layer of lexical information for ontology properties and concepts. We propose a method based on unsupervised learning for the extraction of the potential lexical expressions of DBpedia propertiesfrom Wikipedia text corpus. It is a resource-driven approach that comprises three main steps. The first step consists of the extraction of DBpedia triples for the aimed property followed by the extraction of Wikipedia articles describing the resources from these triples. In the second step, sentences mostly related to the property are extracted from the articles and they are analyzed with a Semantic Role Labeler resulting in a set of SRL annotated trees. In the last step, clusters of expressions are built using spectral clustering based on the distances between the SRL trees. The clusters with the least variance are considered to be relevant for the lexical expressions of the property.