语义标注——揭示自由文本意义的主要成分

Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI:10.1109/SPIRE.2001.10027

Y. Zieman, R. Salas

{"title":"语义标注——揭示自由文本意义的主要成分","authors":"Y. Zieman, R. Salas","doi":"10.1109/SPIRE.2001.10027","DOIUrl":null,"url":null,"abstract":"An experimentally proven methodology for computing semantic labels for natural language and its use in semantic processing of text is described. A combinatorial model of the conceptual space is created where semantic labels result as combinations ofprimary or atomic concepts called Semantic Factors. The set of about 2,500 Semantic Factors is defined. The basic semantic element of a language is a morpheme-type element (s-morpheme), the minimalpart ofa language that bears its own meaning. All s-morphemes in the Knowledge Base (about 15,000 for English) are labeled. The label for a phrase (its ¿Concept Codel7 results as a combination of the labels for the smorphemes constituting it. Algorithms are designed to identify the s-morphemes in a phrase and to generate the phrase¿s Concept Code. The matching procedure compares Concept Codes and identifies conceptually close ones - those sharing a maximal number of Semantic Factors. Similarity is identified here as a match between the Concept Codes of two Text objects. Since a Concept Code is essentially language independent, this technology is appropriate for implementation in cross-language applications. An example is described of an application in the bio-medical domain, where documents of a database of more than 12 million titles are being successfully retrieved in about 50% of the queries normally rejected by traditional search methods.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"190 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Semantic labeling - unveiling the main components of meaning of free-text\",\"authors\":\"Y. Zieman, R. Salas\",\"doi\":\"10.1109/SPIRE.2001.10027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An experimentally proven methodology for computing semantic labels for natural language and its use in semantic processing of text is described. A combinatorial model of the conceptual space is created where semantic labels result as combinations ofprimary or atomic concepts called Semantic Factors. The set of about 2,500 Semantic Factors is defined. The basic semantic element of a language is a morpheme-type element (s-morpheme), the minimalpart ofa language that bears its own meaning. All s-morphemes in the Knowledge Base (about 15,000 for English) are labeled. The label for a phrase (its ¿Concept Codel7 results as a combination of the labels for the smorphemes constituting it. Algorithms are designed to identify the s-morphemes in a phrase and to generate the phrase¿s Concept Code. The matching procedure compares Concept Codes and identifies conceptually close ones - those sharing a maximal number of Semantic Factors. Similarity is identified here as a match between the Concept Codes of two Text objects. Since a Concept Code is essentially language independent, this technology is appropriate for implementation in cross-language applications. An example is described of an application in the bio-medical domain, where documents of a database of more than 12 million titles are being successfully retrieved in about 50% of the queries normally rejected by traditional search methods.\",\"PeriodicalId\":107511,\"journal\":{\"name\":\"Proceedings Eighth Symposium on String Processing and Information Retrieval\",\"volume\":\"190 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Eighth Symposium on String Processing and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPIRE.2001.10027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Eighth Symposium on String Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPIRE.2001.10027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文描述了一种经过实验验证的自然语言语义标签计算方法及其在文本语义处理中的应用。创建概念空间的组合模型，其中语义标签作为称为语义因素的主要或原子概念的组合。定义了大约2500个语义因子的集合。语言的基本语义元素是语素类型元素(s-morpheme)，它是语言中具有自身意义的最小部分。知识库中所有的s-语素(英语约15000个)都有标记。短语的标签(它的概念代码7)是组成短语的同义素标签的组合。设计算法来识别短语中的s-语素并生成短语的概念码。匹配过程比较概念代码并识别概念上接近的代码-那些共享最大数量语义因素的代码。相似性在这里被定义为两个文本对象的概念代码之间的匹配。由于概念代码本质上是独立于语言的，因此该技术适合在跨语言应用程序中实现。本文描述了生物医学领域的一个应用程序示例，其中在传统搜索方法通常拒绝的约50%的查询中，成功检索了数据库中超过1200万个标题的文档。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Semantic labeling - unveiling the main components of meaning of free-text

An experimentally proven methodology for computing semantic labels for natural language and its use in semantic processing of text is described. A combinatorial model of the conceptual space is created where semantic labels result as combinations ofprimary or atomic concepts called Semantic Factors. The set of about 2,500 Semantic Factors is defined. The basic semantic element of a language is a morpheme-type element (s-morpheme), the minimalpart ofa language that bears its own meaning. All s-morphemes in the Knowledge Base (about 15,000 for English) are labeled. The label for a phrase (its ¿Concept Codel7 results as a combination of the labels for the smorphemes constituting it. Algorithms are designed to identify the s-morphemes in a phrase and to generate the phrase¿s Concept Code. The matching procedure compares Concept Codes and identifies conceptually close ones - those sharing a maximal number of Semantic Factors. Similarity is identified here as a match between the Concept Codes of two Text objects. Since a Concept Code is essentially language independent, this technology is appropriate for implementation in cross-language applications. An example is described of an application in the bio-medical domain, where documents of a database of more than 12 million titles are being successfully retrieved in about 50% of the queries normally rejected by traditional search methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助