{"title":"Context-based similar words detection and its application in specialized search engines","authors":"H. Al-Mubaid, Ping Chen","doi":"10.1145/1040830.1040890","DOIUrl":null,"url":null,"abstract":"This paper presents a new context-based method for automatic detection and extraction of similar and related words from texts. Finding similar words is a very important task for many NLP applications including anaphora resolution, document retrieval, text segmentation, and text summarization. Here we use word similarity to improve search quality for search engines in (general and) specific domains. Our method is based on rules for extracting the words in the neighborhood of a target word, then connecting this with the surroundings of other occurrences of the same word in the (training) text corpus. This is an on-going work, and is still under extensive testing. The preliminary results, however, are promising and encouraging more work in this direction.","PeriodicalId":376409,"journal":{"name":"Proceedings of the 10th international conference on Intelligent user interfaces","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th international conference on Intelligent user interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1040830.1040890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
This paper presents a new context-based method for automatic detection and extraction of similar and related words from texts. Finding similar words is a very important task for many NLP applications including anaphora resolution, document retrieval, text segmentation, and text summarization. Here we use word similarity to improve search quality for search engines in (general and) specific domains. Our method is based on rules for extracting the words in the neighborhood of a target word, then connecting this with the surroundings of other occurrences of the same word in the (training) text corpus. This is an on-going work, and is still under extensive testing. The preliminary results, however, are promising and encouraging more work in this direction.