G. Cleuziou, D. Buscaldi, Vincent Levorato, G. Dias
{"title":"A pretopological framework for the automatic construction of lexical-semantic structures from texts","authors":"G. Cleuziou, D. Buscaldi, Vincent Levorato, G. Dias","doi":"10.1145/2063576.2063990","DOIUrl":null,"url":null,"abstract":"We present in this paper a new approach for the automatic generation of lexical structures from texts. This tedious task is based on the strong hypothesis that simple statistical observations on textual usages can provide pieces of semantics about the lexicon. Using such \"naive\" observations only, we propose a (pre)-topological framework to formalize and combine various hypothesis on textual data usages and then to derive a structure similar to usual lexical knowledge basis such as WordNet. In addition we also consider the evaluation problem for obtained lexical structures ; a multi-level evaluation strategy is proposed that measures the fitting between a given reference structure and automatically generated structures on different point of views : intrinsic/structural and application-based points of view. The evaluation strategy is then used to quantify the contribution of the new structuring approach with respect to the corresponding solution proposed by (Sanderson et al. 2000) on two case studies that differs on the domain and the size of the lexicon.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"19 1","pages":"2453-2456"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2063576.2063990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
We present in this paper a new approach for the automatic generation of lexical structures from texts. This tedious task is based on the strong hypothesis that simple statistical observations on textual usages can provide pieces of semantics about the lexicon. Using such "naive" observations only, we propose a (pre)-topological framework to formalize and combine various hypothesis on textual data usages and then to derive a structure similar to usual lexical knowledge basis such as WordNet. In addition we also consider the evaluation problem for obtained lexical structures ; a multi-level evaluation strategy is proposed that measures the fitting between a given reference structure and automatically generated structures on different point of views : intrinsic/structural and application-based points of view. The evaluation strategy is then used to quantify the contribution of the new structuring approach with respect to the corresponding solution proposed by (Sanderson et al. 2000) on two case studies that differs on the domain and the size of the lexicon.
本文提出了一种自动生成文本词汇结构的新方法。这项繁琐的任务基于一个强有力的假设,即对文本用法的简单统计观察可以提供关于词汇的语义。仅使用这种“幼稚”的观察,我们提出了一个(预)拓扑框架来形式化和组合文本数据使用的各种假设,然后派生出类似于通常的词汇知识基础(如WordNet)的结构。此外,我们还考虑了获得的词汇结构的评价问题;提出了一种多层次评价策略,从内在/结构和基于应用的不同角度衡量给定参考结构与自动生成结构之间的拟合程度。然后使用评估策略来量化新结构化方法相对于(Sanderson et al. 2000)在两个不同领域和词典大小的案例研究中提出的相应解决方案的贡献。