Shruthi Srinarasi, Reshma Ram, S. Raghavendra, A. Patil, S. Rajarajeswari, Manjunath Belgod Lokanath, Rituraj Kabra, Abhishek Singh
{"title":"A Combination of Enhanced WordNet and BERT for Semantic Textual Similarity","authors":"Shruthi Srinarasi, Reshma Ram, S. Raghavendra, A. Patil, S. Rajarajeswari, Manjunath Belgod Lokanath, Rituraj Kabra, Abhishek Singh","doi":"10.1145/3483845.3483898","DOIUrl":null,"url":null,"abstract":"The task of measuring sentence similarity deals with computing the likeness between a pair of sentences by adopting Natural Language Processing techniques (Euclidean distance, Jaccard distance, Manhattan distance, etc.) as well as embedding techniques (word2vec, GloVe, Flair, etc.). For the purpose of determining sentence similarity, this paper proposes a novel, ensemble learning approach which uses the WordNet corpus and the Bidirectional Encoder Representations from Transformers (BERT) in order to consider the context of words in sentences while computing the similarity scores. The accuracy of the proposed model is computed by calculating the Pearson and Spearman scores for the sentence pairs from the Sentences Involving Compositional Knowledge (SICK) dataset. On analyzing the results, the proposed approach is observed to outperform existing state-of-the-art semantic textual similarity models since it returns the highest correlation scores. Further, this paper also introduces a possible machine learning approach for the same and evaluates its scope and drawbacks.","PeriodicalId":134636,"journal":{"name":"Proceedings of the 2021 2nd International Conference on Control, Robotics and Intelligent System","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 2nd International Conference on Control, Robotics and Intelligent System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3483845.3483898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The task of measuring sentence similarity deals with computing the likeness between a pair of sentences by adopting Natural Language Processing techniques (Euclidean distance, Jaccard distance, Manhattan distance, etc.) as well as embedding techniques (word2vec, GloVe, Flair, etc.). For the purpose of determining sentence similarity, this paper proposes a novel, ensemble learning approach which uses the WordNet corpus and the Bidirectional Encoder Representations from Transformers (BERT) in order to consider the context of words in sentences while computing the similarity scores. The accuracy of the proposed model is computed by calculating the Pearson and Spearman scores for the sentence pairs from the Sentences Involving Compositional Knowledge (SICK) dataset. On analyzing the results, the proposed approach is observed to outperform existing state-of-the-art semantic textual similarity models since it returns the highest correlation scores. Further, this paper also introduces a possible machine learning approach for the same and evaluates its scope and drawbacks.