从句法结构中挖掘自由文本文档中的语义结构

2014 IEEE International Conference on Semantic Computing Pub Date : 2014-06-16 DOI:10.1109/ICSC.2014.31

Hamid Mousavi, Deirdre Kerr, Markus R Iseli, C. Zaniolo

{"title":"从句法结构中挖掘自由文本文档中的语义结构","authors":"Hamid Mousavi, Deirdre Kerr, Markus R Iseli, C. Zaniolo","doi":"10.1109/ICSC.2014.31","DOIUrl":null,"url":null,"abstract":"The Web has made possible many advanced text-mining applications, such as news summarization, essay grading, question answering, and semantic search. For many of such applications, statistical text-mining techniques are ineffective since they do not utilize the morphological structure of the text. Thus, many approaches use NLP-based techniques, that parse the text and use patterns to mine and analyze the parse trees which are often unnecessarily complex. Therefore, we propose a weighted-graph representation of text, called Text Graphs, which captures the grammatical and semantic relations between words and terms in the text. Text Graphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves co references by a novel technique, generates domain-specific Text Graphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining Text Graphs.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Mining Semantic Structures from Syntactic Structures in Free Text Documents\",\"authors\":\"Hamid Mousavi, Deirdre Kerr, Markus R Iseli, C. Zaniolo\",\"doi\":\"10.1109/ICSC.2014.31\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Web has made possible many advanced text-mining applications, such as news summarization, essay grading, question answering, and semantic search. For many of such applications, statistical text-mining techniques are ineffective since they do not utilize the morphological structure of the text. Thus, many approaches use NLP-based techniques, that parse the text and use patterns to mine and analyze the parse trees which are often unnecessarily complex. Therefore, we propose a weighted-graph representation of text, called Text Graphs, which captures the grammatical and semantic relations between words and terms in the text. Text Graphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves co references by a novel technique, generates domain-specific Text Graphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining Text Graphs.\",\"PeriodicalId\":175352,\"journal\":{\"name\":\"2014 IEEE International Conference on Semantic Computing\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Conference on Semantic Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSC.2014.31\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Semantic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSC.2014.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

Web使许多高级文本挖掘应用程序成为可能，例如新闻摘要、论文评分、问题回答和语义搜索。对于许多这样的应用程序，统计文本挖掘技术是无效的，因为它们不利用文本的形态结构。因此，许多方法使用基于nlp的技术来解析文本，并使用模式来挖掘和分析解析树，这通常是不必要的复杂。因此，我们提出了文本的加权图表示，称为文本图，它捕获了文本中单词和术语之间的语法和语义关系。文本图的生成使用了一种新的文本挖掘框架，这是本文的主要关注点。我们的框架SemScape使用统计解析器为每个句子生成几个最可能的解析树，并采用一种新颖的基于模式的两步技术从解析树中提取候选术语及其语法关系。此外，SemScape通过一种新技术解析co引用，通过咨询本体生成特定于领域的文本图，并提供类似sparql的查询语言和用于语义查询和挖掘文本图的优化引擎。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Mining Semantic Structures from Syntactic Structures in Free Text Documents

The Web has made possible many advanced text-mining applications, such as news summarization, essay grading, question answering, and semantic search. For many of such applications, statistical text-mining techniques are ineffective since they do not utilize the morphological structure of the text. Thus, many approaches use NLP-based techniques, that parse the text and use patterns to mine and analyze the parse trees which are often unnecessarily complex. Therefore, we propose a weighted-graph representation of text, called Text Graphs, which captures the grammatical and semantic relations between words and terms in the text. Text Graphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves co references by a novel technique, generates domain-specific Text Graphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining Text Graphs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE International Conference on Semantic Computing

自引率

0.00%

发文量