Hamid Mousavi, Deirdre Kerr, Markus R Iseli, C. Zaniolo
{"title":"Mining Semantic Structures from Syntactic Structures in Free Text Documents","authors":"Hamid Mousavi, Deirdre Kerr, Markus R Iseli, C. Zaniolo","doi":"10.1109/ICSC.2014.31","DOIUrl":null,"url":null,"abstract":"The Web has made possible many advanced text-mining applications, such as news summarization, essay grading, question answering, and semantic search. For many of such applications, statistical text-mining techniques are ineffective since they do not utilize the morphological structure of the text. Thus, many approaches use NLP-based techniques, that parse the text and use patterns to mine and analyze the parse trees which are often unnecessarily complex. Therefore, we propose a weighted-graph representation of text, called Text Graphs, which captures the grammatical and semantic relations between words and terms in the text. Text Graphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves co references by a novel technique, generates domain-specific Text Graphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining Text Graphs.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Semantic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSC.2014.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
The Web has made possible many advanced text-mining applications, such as news summarization, essay grading, question answering, and semantic search. For many of such applications, statistical text-mining techniques are ineffective since they do not utilize the morphological structure of the text. Thus, many approaches use NLP-based techniques, that parse the text and use patterns to mine and analyze the parse trees which are often unnecessarily complex. Therefore, we propose a weighted-graph representation of text, called Text Graphs, which captures the grammatical and semantic relations between words and terms in the text. Text Graphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves co references by a novel technique, generates domain-specific Text Graphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining Text Graphs.