M. A. Hadj Taieb, Mohamed Ben Aouicha, M. Tmar, Abdelmajid Ben Hamadou
{"title":"新的信息内容度量和名词化关系为一种新的基于wordnet的语义关联度量方法","authors":"M. A. Hadj Taieb, Mohamed Ben Aouicha, M. Tmar, Abdelmajid Ben Hamadou","doi":"10.1109/CIS.2011.6169134","DOIUrl":null,"url":null,"abstract":"Semantic similarity techniques are used to compute the semantic similarity (common shared information) between two concepts according to certain language or domain resources like ontologies, taxonomies, corpora, etc. Semantic similarity techniques constitute important components in most Information Retrieval (IR) and knowledge-based systems. Taking semantics into account passes by the use of external semantic resources coupled with the initial documentation on which it is necessary to have semantic similarity measurements to carry out comparisons between concepts. This paper presents a new approach for measuring semantic relatedness between words and concepts. It combines a new information content (IC) metric using the WordNet thesaurus and the nominalization relation provided by the Java WordNet Library (JWNL). Specifically, the proposed method offers a thorough use of the relation hypernym/hyponym (noun and verb “is a” taxonomy) without external corpus statistical information. Mainly, we use the subgraph formed by hypernyms of the concerned concept which inherits the whole features of its hypernyms and we quantify the contribution of each concept pertaining to this subgraph in its information content. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value 0.70 with a benchmark based on human similarity judgments and especially a large dataset composed of 260 Finkelstein word pairs (Appendix 1 and 2).","PeriodicalId":286889,"journal":{"name":"2011 IEEE 10th International Conference on Cybernetic Intelligent Systems (CIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"New information content metric and nominalization relation for a new WordNet-based method to measure the semantic relatedness\",\"authors\":\"M. A. Hadj Taieb, Mohamed Ben Aouicha, M. Tmar, Abdelmajid Ben Hamadou\",\"doi\":\"10.1109/CIS.2011.6169134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic similarity techniques are used to compute the semantic similarity (common shared information) between two concepts according to certain language or domain resources like ontologies, taxonomies, corpora, etc. Semantic similarity techniques constitute important components in most Information Retrieval (IR) and knowledge-based systems. Taking semantics into account passes by the use of external semantic resources coupled with the initial documentation on which it is necessary to have semantic similarity measurements to carry out comparisons between concepts. This paper presents a new approach for measuring semantic relatedness between words and concepts. It combines a new information content (IC) metric using the WordNet thesaurus and the nominalization relation provided by the Java WordNet Library (JWNL). Specifically, the proposed method offers a thorough use of the relation hypernym/hyponym (noun and verb “is a” taxonomy) without external corpus statistical information. Mainly, we use the subgraph formed by hypernyms of the concerned concept which inherits the whole features of its hypernyms and we quantify the contribution of each concept pertaining to this subgraph in its information content. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value 0.70 with a benchmark based on human similarity judgments and especially a large dataset composed of 260 Finkelstein word pairs (Appendix 1 and 2).\",\"PeriodicalId\":286889,\"journal\":{\"name\":\"2011 IEEE 10th International Conference on Cybernetic Intelligent Systems (CIS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 10th International Conference on Cybernetic Intelligent Systems (CIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIS.2011.6169134\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 10th International Conference on Cybernetic Intelligent Systems (CIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIS.2011.6169134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
New information content metric and nominalization relation for a new WordNet-based method to measure the semantic relatedness
Semantic similarity techniques are used to compute the semantic similarity (common shared information) between two concepts according to certain language or domain resources like ontologies, taxonomies, corpora, etc. Semantic similarity techniques constitute important components in most Information Retrieval (IR) and knowledge-based systems. Taking semantics into account passes by the use of external semantic resources coupled with the initial documentation on which it is necessary to have semantic similarity measurements to carry out comparisons between concepts. This paper presents a new approach for measuring semantic relatedness between words and concepts. It combines a new information content (IC) metric using the WordNet thesaurus and the nominalization relation provided by the Java WordNet Library (JWNL). Specifically, the proposed method offers a thorough use of the relation hypernym/hyponym (noun and verb “is a” taxonomy) without external corpus statistical information. Mainly, we use the subgraph formed by hypernyms of the concerned concept which inherits the whole features of its hypernyms and we quantify the contribution of each concept pertaining to this subgraph in its information content. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value 0.70 with a benchmark based on human similarity judgments and especially a large dataset composed of 260 Finkelstein word pairs (Appendix 1 and 2).