{"title":"语义标注——揭示自由文本意义的主要成分","authors":"Y. Zieman, R. Salas","doi":"10.1109/SPIRE.2001.10027","DOIUrl":null,"url":null,"abstract":"An experimentally proven methodology for computing semantic labels for natural language and its use in semantic processing of text is described. A combinatorial model of the conceptual space is created where semantic labels result as combinations ofprimary or atomic concepts called Semantic Factors. The set of about 2,500 Semantic Factors is defined. The basic semantic element of a language is a morpheme-type element (s-morpheme), the minimalpart ofa language that bears its own meaning. All s-morphemes in the Knowledge Base (about 15,000 for English) are labeled. The label for a phrase (its ¿Concept Codel7 results as a combination of the labels for the smorphemes constituting it. Algorithms are designed to identify the s-morphemes in a phrase and to generate the phrase¿s Concept Code. The matching procedure compares Concept Codes and identifies conceptually close ones - those sharing a maximal number of Semantic Factors. Similarity is identified here as a match between the Concept Codes of two Text objects. Since a Concept Code is essentially language independent, this technology is appropriate for implementation in cross-language applications. An example is described of an application in the bio-medical domain, where documents of a database of more than 12 million titles are being successfully retrieved in about 50% of the queries normally rejected by traditional search methods.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"190 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Semantic labeling - unveiling the main components of meaning of free-text\",\"authors\":\"Y. Zieman, R. Salas\",\"doi\":\"10.1109/SPIRE.2001.10027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An experimentally proven methodology for computing semantic labels for natural language and its use in semantic processing of text is described. A combinatorial model of the conceptual space is created where semantic labels result as combinations ofprimary or atomic concepts called Semantic Factors. The set of about 2,500 Semantic Factors is defined. The basic semantic element of a language is a morpheme-type element (s-morpheme), the minimalpart ofa language that bears its own meaning. All s-morphemes in the Knowledge Base (about 15,000 for English) are labeled. The label for a phrase (its ¿Concept Codel7 results as a combination of the labels for the smorphemes constituting it. Algorithms are designed to identify the s-morphemes in a phrase and to generate the phrase¿s Concept Code. The matching procedure compares Concept Codes and identifies conceptually close ones - those sharing a maximal number of Semantic Factors. Similarity is identified here as a match between the Concept Codes of two Text objects. Since a Concept Code is essentially language independent, this technology is appropriate for implementation in cross-language applications. An example is described of an application in the bio-medical domain, where documents of a database of more than 12 million titles are being successfully retrieved in about 50% of the queries normally rejected by traditional search methods.\",\"PeriodicalId\":107511,\"journal\":{\"name\":\"Proceedings Eighth Symposium on String Processing and Information Retrieval\",\"volume\":\"190 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Eighth Symposium on String Processing and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPIRE.2001.10027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Eighth Symposium on String Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPIRE.2001.10027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semantic labeling - unveiling the main components of meaning of free-text
An experimentally proven methodology for computing semantic labels for natural language and its use in semantic processing of text is described. A combinatorial model of the conceptual space is created where semantic labels result as combinations ofprimary or atomic concepts called Semantic Factors. The set of about 2,500 Semantic Factors is defined. The basic semantic element of a language is a morpheme-type element (s-morpheme), the minimalpart ofa language that bears its own meaning. All s-morphemes in the Knowledge Base (about 15,000 for English) are labeled. The label for a phrase (its ¿Concept Codel7 results as a combination of the labels for the smorphemes constituting it. Algorithms are designed to identify the s-morphemes in a phrase and to generate the phrase¿s Concept Code. The matching procedure compares Concept Codes and identifies conceptually close ones - those sharing a maximal number of Semantic Factors. Similarity is identified here as a match between the Concept Codes of two Text objects. Since a Concept Code is essentially language independent, this technology is appropriate for implementation in cross-language applications. An example is described of an application in the bio-medical domain, where documents of a database of more than 12 million titles are being successfully retrieved in about 50% of the queries normally rejected by traditional search methods.