Rivindu Perera, P. Nand, Wen-Hsin Yang, Kohichi Toshioka
{"title":"为人类友好的网络词汇化链接数据","authors":"Rivindu Perera, P. Nand, Wen-Hsin Yang, Kohichi Toshioka","doi":"10.1109/CONFLUENCE.2017.7943119","DOIUrl":null,"url":null,"abstract":"The consumption of Linked Data has dramatically increased with the increasing momentum towards semantic web. Linked data is essentially a very simplistic format for representation of knowledge in that all the knowledge is represented as triples which can be linked using one or more components from the triple. To date, most of the efforts has been towards either creating linked data by mining the web or making it available for users as a source of knowledgebase for knowledge engineering applications. In recent times there has been a growing need for these applications to interact with users in a natural language which required the transformation of the linked data knowledge into a natural language. The aim of the RealText project described in this paper, is to build a scalable framework to transform Linked Data into natural language by generating lexicalization patterns for triples. A lexicalization pattern is a syntactical pattern that will transform a given triple into a syntactically correct natural language sentence. Using DBpedia as the Linked Data resource, we have generated 283 accurate lexicalization patterns for a sample set of 25 ontology classes. We performed human evaluation on a test sub-sample with an inter-rater agreement of 0.86 and 0.80 for readability and accuracy respectively. This results showed that the lexicalization patterns generated language that are accurate, readable and emanates qualities of a human produced language.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"29 1","pages":"30-35"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lexicalizing linked data for a human friendly web\",\"authors\":\"Rivindu Perera, P. Nand, Wen-Hsin Yang, Kohichi Toshioka\",\"doi\":\"10.1109/CONFLUENCE.2017.7943119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The consumption of Linked Data has dramatically increased with the increasing momentum towards semantic web. Linked data is essentially a very simplistic format for representation of knowledge in that all the knowledge is represented as triples which can be linked using one or more components from the triple. To date, most of the efforts has been towards either creating linked data by mining the web or making it available for users as a source of knowledgebase for knowledge engineering applications. In recent times there has been a growing need for these applications to interact with users in a natural language which required the transformation of the linked data knowledge into a natural language. The aim of the RealText project described in this paper, is to build a scalable framework to transform Linked Data into natural language by generating lexicalization patterns for triples. A lexicalization pattern is a syntactical pattern that will transform a given triple into a syntactically correct natural language sentence. Using DBpedia as the Linked Data resource, we have generated 283 accurate lexicalization patterns for a sample set of 25 ontology classes. We performed human evaluation on a test sub-sample with an inter-rater agreement of 0.86 and 0.80 for readability and accuracy respectively. This results showed that the lexicalization patterns generated language that are accurate, readable and emanates qualities of a human produced language.\",\"PeriodicalId\":6651,\"journal\":{\"name\":\"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence\",\"volume\":\"29 1\",\"pages\":\"30-35\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONFLUENCE.2017.7943119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONFLUENCE.2017.7943119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The consumption of Linked Data has dramatically increased with the increasing momentum towards semantic web. Linked data is essentially a very simplistic format for representation of knowledge in that all the knowledge is represented as triples which can be linked using one or more components from the triple. To date, most of the efforts has been towards either creating linked data by mining the web or making it available for users as a source of knowledgebase for knowledge engineering applications. In recent times there has been a growing need for these applications to interact with users in a natural language which required the transformation of the linked data knowledge into a natural language. The aim of the RealText project described in this paper, is to build a scalable framework to transform Linked Data into natural language by generating lexicalization patterns for triples. A lexicalization pattern is a syntactical pattern that will transform a given triple into a syntactically correct natural language sentence. Using DBpedia as the Linked Data resource, we have generated 283 accurate lexicalization patterns for a sample set of 25 ontology classes. We performed human evaluation on a test sub-sample with an inter-rater agreement of 0.86 and 0.80 for readability and accuracy respectively. This results showed that the lexicalization patterns generated language that are accurate, readable and emanates qualities of a human produced language.