Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587846
Haijun Zhang, Heyan Huang, Chao-Yong Zhu, Shumin Shi
This paper proposed a pragmatic model for repeat-based Chinese New Word Extraction (NWE). It contains two innovations. The first is a formal description for the process of NWE, which gives instructions on feature selection in theory. On the basis of this, the Conditional Random Fields model (CRF) is selected as statistical framework to solve the formal description. The second is an improved algorithm for left (right) entropy to improve the efficiency of NWE. By comparing with baseline algorithm, the improved algorithm can enhance the computational speed of entropy remarkably. On the whole, experiments show that the model this paper proposed is very effective, and the F score is 49.72% in open test and 69.83% in word extraction respectively, which is an evident improvement over previous similar works.
提出了一种基于重复的汉语新词提取的语用模型。它包含两个创新。第一部分是对NWE过程的形式化描述,从理论上给出了特征选择的指导。在此基础上,选择条件随机场模型(Conditional Random Fields model, CRF)作为统计框架来解决形式化描述问题。二是改进了左(右)熵算法,提高了NWE的效率。通过与基线算法的比较,改进后的算法能显著提高熵的计算速度。总体而言,实验表明本文提出的模型是非常有效的,开放测试的F值为49.72%,词语提取的F值为69.83%,与以往的同类作品相比有了明显的提高。
{"title":"A pragmatic model for new Chinese word extraction","authors":"Haijun Zhang, Heyan Huang, Chao-Yong Zhu, Shumin Shi","doi":"10.1109/NLPKE.2010.5587846","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587846","url":null,"abstract":"This paper proposed a pragmatic model for repeat-based Chinese New Word Extraction (NWE). It contains two innovations. The first is a formal description for the process of NWE, which gives instructions on feature selection in theory. On the basis of this, the Conditional Random Fields model (CRF) is selected as statistical framework to solve the formal description. The second is an improved algorithm for left (right) entropy to improve the efficiency of NWE. By comparing with baseline algorithm, the improved algorithm can enhance the computational speed of entropy remarkably. On the whole, experiments show that the model this paper proposed is very effective, and the F score is 49.72% in open test and 69.83% in word extraction respectively, which is an evident improvement over previous similar works.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122616272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587864
N. Bourbakis
The efficient processing, association and understanding of multimedia based events or multi-modal information is a very important research field with a great variety of applications, such as knowledge discovery, document understanding, human computer interaction, etc. A good approach to this important issue is the development of a common platform for converting different modalities (such as images, text, etc) into the same medium and associating them for efficient processing and understanding. Thus, this talk here presents the development of a methodology capable for automatically converting images into natural language (NL) text sentences using image processing-analysis methods and graphs with attributes for object recognition, and image understanding. Then it converts graph representations into NL text sentences. Moreover, it presents a methodology for transforming NL sentences into Graph representations and then into Stochastic Petri-nets (SPN) descriptions in order to offer a common model of representation of multimodal information and at the same time a way of associating “activities or changes” in image frames for events representation and interpretation. The selection of the SPN graph model is due to its capability for efficiently representing structural and functional knowledge where other models cannot. Simple illustrative examples are provided for proving the concept presented here.
{"title":"Image understanding for converting images into natural language text sentences","authors":"N. Bourbakis","doi":"10.1109/NLPKE.2010.5587864","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587864","url":null,"abstract":"The efficient processing, association and understanding of multimedia based events or multi-modal information is a very important research field with a great variety of applications, such as knowledge discovery, document understanding, human computer interaction, etc. A good approach to this important issue is the development of a common platform for converting different modalities (such as images, text, etc) into the same medium and associating them for efficient processing and understanding. Thus, this talk here presents the development of a methodology capable for automatically converting images into natural language (NL) text sentences using image processing-analysis methods and graphs with attributes for object recognition, and image understanding. Then it converts graph representations into NL text sentences. Moreover, it presents a methodology for transforming NL sentences into Graph representations and then into Stochastic Petri-nets (SPN) descriptions in order to offer a common model of representation of multimodal information and at the same time a way of associating “activities or changes” in image frames for events representation and interpretation. The selection of the SPN graph model is due to its capability for efficiently representing structural and functional knowledge where other models cannot. Simple illustrative examples are provided for proving the concept presented here.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131050916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587785
Ling Xia, F. Ren
This paper proposes a query expansion method for cooking question answering system based on pragmatic analysis. In our approach, the results of question analysis is used. The original queries are generated by means of the question subject, then the query terms are expanded based on pragmatic function. When submitting the expended queries to Google search engine to retrieve related passages, we get an overall improvement of 36.2% on the mean average precision.
{"title":"Pragmatic analysis based query expansion for Chinese cuisine QA service system","authors":"Ling Xia, F. Ren","doi":"10.1109/NLPKE.2010.5587785","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587785","url":null,"abstract":"This paper proposes a query expansion method for cooking question answering system based on pragmatic analysis. In our approach, the results of question analysis is used. The original queries are generated by means of the question subject, then the query terms are expanded based on pragmatic function. When submitting the expended queries to Google search engine to retrieve related passages, we get an overall improvement of 36.2% on the mean average precision.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130998929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587801
Hui-Ngo Goh, Ching Kiu
Ontology construction often requires a domain specific corpus in conceptualizing the domain knowledge; specifically, it is an association of terms, relation between terms and related instances. It is a vital task to identify a list of significant term for constructing a practical ontology. In this paper, we present the use of a context-based term identification and extraction methodology for ontology construction from text document. The methodology is using a taxonomy and Wikipedia to support automatic term identification and extraction from structured documents with an assumption of candidate terms for a topic are often associated with its topic-specific keywords. A hierarchical relationship of super-topics and sub-topics is defined by a taxonomy, meanwhile, Wikipedia is used to provide context and background knowledge for topics that defined in the taxonomy to guide the term identification and extraction. The experimental results have shown the context-based term identification and extraction methodology is viable in defining topic concepts and its sub-concepts for constructing ontology. The experimental results have also proven its viability to be applied in a small corpus / text size environment in supporting ontology construction.
{"title":"Context-based term identification and extraction for ontology construction","authors":"Hui-Ngo Goh, Ching Kiu","doi":"10.1109/NLPKE.2010.5587801","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587801","url":null,"abstract":"Ontology construction often requires a domain specific corpus in conceptualizing the domain knowledge; specifically, it is an association of terms, relation between terms and related instances. It is a vital task to identify a list of significant term for constructing a practical ontology. In this paper, we present the use of a context-based term identification and extraction methodology for ontology construction from text document. The methodology is using a taxonomy and Wikipedia to support automatic term identification and extraction from structured documents with an assumption of candidate terms for a topic are often associated with its topic-specific keywords. A hierarchical relationship of super-topics and sub-topics is defined by a taxonomy, meanwhile, Wikipedia is used to provide context and background knowledge for topics that defined in the taxonomy to guide the term identification and extraction. The experimental results have shown the context-based term identification and extraction methodology is viable in defining topic concepts and its sub-concepts for constructing ontology. The experimental results have also proven its viability to be applied in a small corpus / text size environment in supporting ontology construction.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126280720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587804
Keliang Zhang, Qinlong Fei
Ontology-based knowledge base plays an increasingly important role in improving the precision and recall rate of a retrieval system. Based on Distributed Learning theory, a novel approach for the co-construction of ontology-based knowledge base is explored. Making use of the platform set up for the co-construction and sharing of domain-specific knowledge through the Web, we constructed an ontology-based knowledge base of airborne radar field. This study is expected to contribute to the effective improvement of precision and recall rate of information retrieval in the airborne radar field. Hopefully, the mode we designed and adopted for the co-construction and sharing of domain-specific knowledge base could be enlightening for other similar studies.
{"title":"Co-construction of ontology-based knowledge base through the Web: Theory and practice","authors":"Keliang Zhang, Qinlong Fei","doi":"10.1109/NLPKE.2010.5587804","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587804","url":null,"abstract":"Ontology-based knowledge base plays an increasingly important role in improving the precision and recall rate of a retrieval system. Based on Distributed Learning theory, a novel approach for the co-construction of ontology-based knowledge base is explored. Making use of the platform set up for the co-construction and sharing of domain-specific knowledge through the Web, we constructed an ontology-based knowledge base of airborne radar field. This study is expected to contribute to the effective improvement of precision and recall rate of information retrieval in the airborne radar field. Hopefully, the mode we designed and adopted for the co-construction and sharing of domain-specific knowledge base could be enlightening for other similar studies.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133152381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587848
Yuan Kuang, Yanquan Zhou, Huacan He
This paper presents another aspect of sentiment analysis: identifying opinion_holder in the opinionated sentences. To extract opinion_holder, we firstly explore Conditional Random Field(CRF) based on six features including contextual, opinionated_trigger words, POS tags, named entity, dependency and proposed sentence structure feature, and dependency is adjusted to be better helpful for containing contextual dependency information. Then we propose two novel syntactic rules with opinionated_trigger words to directly identify opinion_holder from the parse trees. The results show that the precision from CRF is much higher than that of syntactic rules, while the recall is lower than. So we combine CRF with syntactic rules used as additional three features including HolderNode, ChunkPosition and Paths for the CRF to train our model. The combination results of the system illustrate the higher recall and higher F-measure under the almost same high precision.
本文介绍了情感分析的另一个方面:在自以为是的句子中识别opinion_holder。为了提取opinion_holder,我们首先基于上下文、opinionated_trigger词、POS标签、命名实体、依赖关系和建议句子结构特征六个特征挖掘条件随机场(Conditional Random Field, CRF),并对依赖关系进行调整,以更好地帮助包含上下文依赖信息。然后,我们提出了两种新的带有opinionated_trigger词的句法规则,从解析树中直接识别opinion_holder。结果表明,CRF的准确率远高于句法规则,而查全率则低于句法规则。因此,我们将CRF与语法规则结合使用,作为额外的三个特征,包括HolderNode, ChunkPosition和Paths,用于CRF训练我们的模型。系统的组合结果表明,在几乎相同的高精度下,系统具有较高的查全率和较高的f值。
{"title":"A combination method of CRF with syntactic rules to identify opinion_holder","authors":"Yuan Kuang, Yanquan Zhou, Huacan He","doi":"10.1109/NLPKE.2010.5587848","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587848","url":null,"abstract":"This paper presents another aspect of sentiment analysis: identifying opinion_holder in the opinionated sentences. To extract opinion_holder, we firstly explore Conditional Random Field(CRF) based on six features including contextual, opinionated_trigger words, POS tags, named entity, dependency and proposed sentence structure feature, and dependency is adjusted to be better helpful for containing contextual dependency information. Then we propose two novel syntactic rules with opinionated_trigger words to directly identify opinion_holder from the parse trees. The results show that the precision from CRF is much higher than that of syntactic rules, while the recall is lower than. So we combine CRF with syntactic rules used as additional three features including HolderNode, ChunkPosition and Paths for the CRF to train our model. The combination results of the system illustrate the higher recall and higher F-measure under the almost same high precision.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131450001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587770
Yu Zou, Jiyuan Wu, W. He, Min Hou, Yonglin Teng
The interrelation between prosody and syntax becomes more and more important in speech processing. This paper is intended to analyze the syntactic correlations of prosodic phrase in broadcasting news speech. The research results in the followings: Firstly, the C-PP, which there is a stable prosodic pattern of pitch contour within its rhythmic chunking, has a flexible syntactic structure and stable semantic expression. Secondly, we find that the syntactic structure is more complex than the prosodic structure, and some conjunction and particle more likely attached to the end of left structure or the beginning of right one and form a prosodic word. If it has just four lexical words including the conjunction or particle they form a prosodic word by itself. That is to say, it has very great flexibility in prosodic structures for conjunctions and particles.
{"title":"Syntactic correlations of prosodic phrase in broadcasting news speech","authors":"Yu Zou, Jiyuan Wu, W. He, Min Hou, Yonglin Teng","doi":"10.1109/NLPKE.2010.5587770","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587770","url":null,"abstract":"The interrelation between prosody and syntax becomes more and more important in speech processing. This paper is intended to analyze the syntactic correlations of prosodic phrase in broadcasting news speech. The research results in the followings: Firstly, the C-PP, which there is a stable prosodic pattern of pitch contour within its rhythmic chunking, has a flexible syntactic structure and stable semantic expression. Secondly, we find that the syntactic structure is more complex than the prosodic structure, and some conjunction and particle more likely attached to the end of left structure or the beginning of right one and form a prosodic word. If it has just four lexical words including the conjunction or particle they form a prosodic word by itself. That is to say, it has very great flexibility in prosodic structures for conjunctions and particles.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122622884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587784
Song Gao, Yiyi Zhao, Haitao Liu, Zhiwei Feng
According to Potential Ambiguity Theory, we analyzed “prep+n1+ de+n2” phrase in this paper. We focus on how to make computer automatically detect and process such syntactic ambiguity structure. The purpose is to raise the accuracy rate of the automatic identification and analysis of natural language. At the same time, we take this structure for example, in order to help study other potential ambiguity structures.
{"title":"A study on disambiguation of structure “prep+n1+de+n2” for Chinese information processing","authors":"Song Gao, Yiyi Zhao, Haitao Liu, Zhiwei Feng","doi":"10.1109/NLPKE.2010.5587784","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587784","url":null,"abstract":"According to Potential Ambiguity Theory, we analyzed “prep+n1+ de+n2” phrase in this paper. We focus on how to make computer automatically detect and process such syntactic ambiguity structure. The purpose is to raise the accuracy rate of the automatic identification and analysis of natural language. At the same time, we take this structure for example, in order to help study other potential ambiguity structures.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127794828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587777
Dipankar Das, Sivaji Bandyopadhyay
This paper describes an unsupervised hybrid approach to identify emotion topic(s) from English blog sentences. The baseline system is based on object related dependency relations from parsed constituents. However, the inclusion of the topic related thematic roles present in the verb based syntactic argument structure improves the performance of the baseline system. The argument structures are extracted using VerbNet. The unsupervised hybrid approach consists of two phases; firstly, the information of Rhetorical Structure (RS) is extracted to identify the target span corresponding to the emotional expression from each sentence. Secondly, as an individual target span contains one or more topics corresponding to an emotional expression, a Heuristic Classifier (HC) is designed to identify each of the topic spans associated in the target span. The classifier uses the information of Emotion Holder (EH), Named Entities (NE) and four types of Similarity features to identify the phrase level components of the topic spans. The system achieves average recall, precision and F-score of 60.37%, 57.49% and 58.88% respectively with respect to all emotion classes on 500 annotated sentences containing single or multiple emotion topics.
{"title":"Identifying emotion topic — An unsupervised hybrid approach with Rhetorical Structure and Heuristic Classifier","authors":"Dipankar Das, Sivaji Bandyopadhyay","doi":"10.1109/NLPKE.2010.5587777","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587777","url":null,"abstract":"This paper describes an unsupervised hybrid approach to identify emotion topic(s) from English blog sentences. The baseline system is based on object related dependency relations from parsed constituents. However, the inclusion of the topic related thematic roles present in the verb based syntactic argument structure improves the performance of the baseline system. The argument structures are extracted using VerbNet. The unsupervised hybrid approach consists of two phases; firstly, the information of Rhetorical Structure (RS) is extracted to identify the target span corresponding to the emotional expression from each sentence. Secondly, as an individual target span contains one or more topics corresponding to an emotional expression, a Heuristic Classifier (HC) is designed to identify each of the topic spans associated in the target span. The classifier uses the information of Emotion Holder (EH), Named Entities (NE) and four types of Similarity features to identify the phrase level components of the topic spans. The system achieves average recall, precision and F-score of 60.37%, 57.49% and 58.88% respectively with respect to all emotion classes on 500 annotated sentences containing single or multiple emotion topics.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129041439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587859
Xinsheng Li, Si Li, Weiran Xu, Guang Chen, Jun Guo
Relevance feedback, which traditionally uses the terms in the relevant documents to enrich the user's initial query, is an effective method for improving retrieval performance. This approach has another problem is that Relevance feedback assumes that most frequent terms in the feedback documents are useful for the retrieval. In fact, the reports of some experiments show that it does not hold in reality many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. In this paper, we propose to select better and more relevant documents with a clustering algorithm. And then we present an improved Language Model to help us identify the good terms from those relevant documents. Ours experiments on the 2008 TREC collection show that retrieval effectiveness can be much improved when the improved Language Model is used.
{"title":"Weakly supervised relevance feedback based on an improved language model","authors":"Xinsheng Li, Si Li, Weiran Xu, Guang Chen, Jun Guo","doi":"10.1109/NLPKE.2010.5587859","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587859","url":null,"abstract":"Relevance feedback, which traditionally uses the terms in the relevant documents to enrich the user's initial query, is an effective method for improving retrieval performance. This approach has another problem is that Relevance feedback assumes that most frequent terms in the feedback documents are useful for the retrieval. In fact, the reports of some experiments show that it does not hold in reality many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. In this paper, we propose to select better and more relevant documents with a clustering algorithm. And then we present an improved Language Model to help us identify the good terms from those relevant documents. Ours experiments on the 2008 TREC collection show that retrieval effectiveness can be much improved when the improved Language Model is used.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127379863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}