首页 > 最新文献

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)最新文献

英文 中文
Are we waves or are we particles? A new insight into deep semantics in natural language processing 我们是波还是粒子?自然语言处理中深层语义的新见解
Svetlana Machova, J. Klecková
This paper brings conceptually new, empirically based scientific approach to a deeper understanding of human mind cognition, language acquisition, modularity of language and language origin itself. The research presented provides an interactive multilingual associative experiment as an attempt to map the Cognitive Semantic Space: (CSSES) and its basic frames of the Essential Self in the Czech language, collects and compares it to the CSSES of conceptual language view in Czech, Russian, English and potentially in other languages. We attempt to merge cognitive metaphor theory with psycholinguistics and psychoanalysis applying associative experiment methodology on the Essential Self metaphors. The research has two main goals: the first is to build an Essential Self multilingual WordNet, which serves as the basic lexical resource for Artificial Intelligence describes the core of the human nature. The second is to create a multilingual 3D semantic network.
本文为深入理解人类心理认知、语言习得、语言模块化和语言起源本身带来了全新的、基于经验的科学方法。本研究提供了一个交互式多语言联想实验,试图绘制捷克语的认知语义空间(cses)及其本质自我的基本框架,并将其与捷克语、俄语、英语以及潜在的其他语言的概念语言观的cses进行比较。我们尝试将认知隐喻理论与心理语言学和精神分析学相结合,运用联想实验方法对本质自我隐喻进行研究。本研究主要有两个目标:一是构建一个Essential Self多语言WordNet,作为人工智能描述人性核心的基础词汇资源。二是创建多语言3D语义网络。
{"title":"Are we waves or are we particles? A new insight into deep semantics in natural language processing","authors":"Svetlana Machova, J. Klecková","doi":"10.1109/NLPKE.2010.5587805","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587805","url":null,"abstract":"This paper brings conceptually new, empirically based scientific approach to a deeper understanding of human mind cognition, language acquisition, modularity of language and language origin itself. The research presented provides an interactive multilingual associative experiment as an attempt to map the Cognitive Semantic Space: (CSSES) and its basic frames of the Essential Self in the Czech language, collects and compares it to the CSSES of conceptual language view in Czech, Russian, English and potentially in other languages. We attempt to merge cognitive metaphor theory with psycholinguistics and psychoanalysis applying associative experiment methodology on the Essential Self metaphors. The research has two main goals: the first is to build an Essential Self multilingual WordNet, which serves as the basic lexical resource for Artificial Intelligence describes the core of the human nature. The second is to create a multilingual 3D semantic network.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129526111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Shui nationality characters stroke shape input method 水族文字笔画形状输入法
Hanyue Yang, Xiaorong Chen
Shape of Shui nationality characters is similar to that of Oracle and Jinwen. In order to work out the problems of how to code hieroglyph, a coding method based on stroke shape for Shui Nationality characters is proposed. The shapes of 467 Shui Nationality characters in the Common Shui Script Dictionary are analyzed, and seven basic strokes are extracted to consist of main Shui characters. Through the statistical comparison, 21 kinds of stroke shape can be got by subdividing the seven basic strokes. A Shui Nationality character is coded by an ordered sequence composed by three strokes taken from the corner of the character according to the coding rules. Finally, the users who can not read the Shui character can input it easily and quickly.
水族文字的字形与甲骨文、金文相近。为了解决象形文字编码中存在的问题,提出了一种基于笔画形状的水族文字编码方法。对《通用水文字词典》中467个水族文字的字形进行了分析,提取出7个基本笔画构成了主要的水字。通过统计比较,将7种基本笔画进行细分,可得到21种笔画形状。水族文字是由汉字角上的三笔画按编码规则组成的有序序列进行编码的。最后,不会读水字的用户也可以方便快捷地输入水字。
{"title":"Shui nationality characters stroke shape input method","authors":"Hanyue Yang, Xiaorong Chen","doi":"10.1109/NLPKE.2010.5587840","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587840","url":null,"abstract":"Shape of Shui nationality characters is similar to that of Oracle and Jinwen. In order to work out the problems of how to code hieroglyph, a coding method based on stroke shape for Shui Nationality characters is proposed. The shapes of 467 Shui Nationality characters in the Common Shui Script Dictionary are analyzed, and seven basic strokes are extracted to consist of main Shui characters. Through the statistical comparison, 21 kinds of stroke shape can be got by subdividing the seven basic strokes. A Shui Nationality character is coded by an ordered sequence composed by three strokes taken from the corner of the character according to the coding rules. Finally, the users who can not read the Shui character can input it easily and quickly.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130918935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chinese patent retrieval based on the pragmatic information 基于语用信息的中文专利检索
Liping Wu, Song Liu, F. Ren
In this paper, we propose a novel information retrieval approach based on the pragmatic information for Chinese patents. At present, patent retrieval is becoming more and more important. Not only because patents are always can an important resource in all kinds of field, but patent retrieval save a great deal of time and funds for corporations and researchers. However, with available methods the precision of retrieval results for patents is not very high. What's more, through analyzed the patent documentations we found that except the literal meanings, there are deeper meanings which can be concluded from the patents. Here we call the deeper meanings as pragmatic information. Therefore we established a patent retrieval system to integrate the pragmatic information with classical information retrieval technique to improve the retrieval accuracy. Some experiments using the proposed method have carried out, and the results show that the precision of patent retrieval based on the pragmatic information is higher than the one without using it.
本文提出了一种基于中文专利语用信息的信息检索方法。目前,专利检索变得越来越重要。不仅因为专利在各个领域都是重要的资源,而且专利检索为企业和研究人员节省了大量的时间和资金。然而,在现有的方法下,专利检索结果的精度不是很高。此外,通过对专利文献的分析,我们发现除了字面意义之外,专利文献中还有更深层次的含义。在这里,我们把深层含义称为语用信息。为此,我们建立了一个将实用信息与经典信息检索技术相结合的专利检索系统,以提高检索精度。应用该方法进行的实验结果表明,基于语用信息的专利检索精度高于不使用语用信息的检索精度。
{"title":"Chinese patent retrieval based on the pragmatic information","authors":"Liping Wu, Song Liu, F. Ren","doi":"10.1109/NLPKE.2010.5587776","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587776","url":null,"abstract":"In this paper, we propose a novel information retrieval approach based on the pragmatic information for Chinese patents. At present, patent retrieval is becoming more and more important. Not only because patents are always can an important resource in all kinds of field, but patent retrieval save a great deal of time and funds for corporations and researchers. However, with available methods the precision of retrieval results for patents is not very high. What's more, through analyzed the patent documentations we found that except the literal meanings, there are deeper meanings which can be concluded from the patents. Here we call the deeper meanings as pragmatic information. Therefore we established a patent retrieval system to integrate the pragmatic information with classical information retrieval technique to improve the retrieval accuracy. Some experiments using the proposed method have carried out, and the results show that the precision of patent retrieval based on the pragmatic information is higher than the one without using it.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125546479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Part-of-speech tagging for Chinese unknown words in a domain-specific small corpus using morphological and contextual rules 基于形态和语境规则的小语料库中文未知词词性标注
Tao-Hsing Chang, Fu-Yuan Hsu, Chia-Hoang Lee, Hahn-Ming Lee
Many studies have tried to search useful information on the Internet by meaningful terms or words. The performance of these approaches is often affected by the accuracy of unknown word extraction and POS tagging, while the accuracy is affected by the size of training corpora and the characteristics of language. This work proposes and develops a method that concentrates on tagging the POS of Chinese unknown words for the domain of our interest, based on the integration of morphological, contextual rules and a statistics-based method. Experimental results indicate that the proposed method can overcome the difficulties resulting from small corpora in oriental languages, and can accurately tags unknown words with POS in domain-specific small corpora.
许多研究都试图通过有意义的术语或单词在互联网上搜索有用的信息。这些方法的性能往往受到未知词提取和词性标注准确性的影响,而准确性又受到训练语料库大小和语言特征的影响。本文提出并发展了一种基于形态学、上下文规则和基于统计的方法的方法,专注于为我们感兴趣的领域标注中文未知词的词性标注。实验结果表明,该方法克服了东方语言小语料库的困难,能够在特定领域的小语料库中准确标注未知词。
{"title":"Part-of-speech tagging for Chinese unknown words in a domain-specific small corpus using morphological and contextual rules","authors":"Tao-Hsing Chang, Fu-Yuan Hsu, Chia-Hoang Lee, Hahn-Ming Lee","doi":"10.1109/NLPKE.2010.5587771","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587771","url":null,"abstract":"Many studies have tried to search useful information on the Internet by meaningful terms or words. The performance of these approaches is often affected by the accuracy of unknown word extraction and POS tagging, while the accuracy is affected by the size of training corpora and the characteristics of language. This work proposes and develops a method that concentrates on tagging the POS of Chinese unknown words for the domain of our interest, based on the integration of morphological, contextual rules and a statistics-based method. Experimental results indicate that the proposed method can overcome the difficulties resulting from small corpora in oriental languages, and can accurately tags unknown words with POS in domain-specific small corpora.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125283724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical parsing based on Maximal Noun Phrase pre-processing 基于最大名词短语预处理的统计分析
Qiaoli Zhou, Yue Gu, Xin Liu, Wenjing Lang, Dongfeng Cai
According to the characteristics of Chinese language, this paper proposes a statistical parsing method based on Maximal Noun Phrase(MNP) per-processing. MNP parsing is preferable to be separated from parsing of the full sentence. Firstly, MNP in a sentence are identified; next, MNP can be represented by the head of MNP, and then the sentence is parsed with the head of the MNP. Therefore, the original sentence is divided into two parts, which can be parsed separately. The first part is MNP parsing; the second part is parsing of the sentence in which the MNP are replaced by their head words. Finally, the paper takes Conditional Random Fields (CRFs) as the statistical recognition model of each level in syntactic parsing process.
根据汉语的特点,提出了一种基于最大名词短语预处理的统计句法分析方法。MNP解析最好与整个句子的解析分开。首先,识别句子中的MNP;然后,MNP可以用MNP的头部来表示,然后用MNP的头部来解析句子。因此,将原句分成两部分,可以分别解析。第一部分是MNP解析;第二部分是句子的解析,其中MNP被它们的头词所取代。最后,本文将条件随机场(Conditional Random Fields, CRFs)作为句法解析过程中各个层次的统计识别模型。
{"title":"Statistical parsing based on Maximal Noun Phrase pre-processing","authors":"Qiaoli Zhou, Yue Gu, Xin Liu, Wenjing Lang, Dongfeng Cai","doi":"10.1109/NLPKE.2010.5587850","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587850","url":null,"abstract":"According to the characteristics of Chinese language, this paper proposes a statistical parsing method based on Maximal Noun Phrase(MNP) per-processing. MNP parsing is preferable to be separated from parsing of the full sentence. Firstly, MNP in a sentence are identified; next, MNP can be represented by the head of MNP, and then the sentence is parsed with the head of the MNP. Therefore, the original sentence is divided into two parts, which can be parsed separately. The first part is MNP parsing; the second part is parsing of the sentence in which the MNP are replaced by their head words. Finally, the paper takes Conditional Random Fields (CRFs) as the statistical recognition model of each level in syntactic parsing process.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127018013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bagging to find better expansion words 寻找更好的扩展词
Bingqing Wang, Yaqian Zhou, Xipeng Qiu, Qi Zhang, Xuanjing Huang
The supervised learning has been applied into the query expansion techniques, which trains a model to predict the “goodness” or “utility” of the expanded term to the retrieval system. There are many features to measure the relatedness between the expanded word and the query, which can be incorporated in the supervised learning to select the expanded terms. The training data set is generated automatically by a tricky method. However, this method can be affected by many aspects. A severe problem is that the distribution of the features is query-dependent, which has not been discussed in previous work. With a different distribution on the features, it is questionable to merge these training instances together and use the whole data set to train one single model. In this paper, we first investigate the statistical distribution of the auto-generated training data and show the problems in the training data set. Based on our analysis, we proposed to use the bagging method to ensemble several regression models in order to get a better supervised model to make prediction on the expanded terms. We conducted the experiments on the TREC benchmark test collections. Our analysis on the training data reveals some interesting phenomena about the query expansion techniques. The experiment results also show that the bagging approach can achieve the state-of-art retrieval performance on the standard TREC data set.
将监督学习应用到查询扩展技术中,训练一个模型来预测扩展词对检索系统的“良度”或“效用”。有许多特征可以用来衡量扩展词与查询之间的相关性,这些特征可以被纳入监督学习中来选择扩展词。训练数据集是通过一种复杂的方法自动生成的。然而,这种方法会受到许多方面的影响。一个严重的问题是特征的分布是查询相关的,这在以前的工作中没有讨论过。由于特征的分布不同,将这些训练实例合并在一起并使用整个数据集来训练单个模型是有问题的。在本文中,我们首先研究了自动生成的训练数据的统计分布,并指出了训练数据集中存在的问题。在分析的基础上,我们提出采用bagging方法对多个回归模型进行集成,以得到一个更好的监督模型来对扩展项进行预测。我们在TREC基准测试集合上进行了实验。我们对训练数据的分析揭示了一些关于查询扩展技术的有趣现象。实验结果还表明,套袋方法可以在标准TREC数据集上达到最先进的检索性能。
{"title":"Bagging to find better expansion words","authors":"Bingqing Wang, Yaqian Zhou, Xipeng Qiu, Qi Zhang, Xuanjing Huang","doi":"10.1109/NLPKE.2010.5587826","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587826","url":null,"abstract":"The supervised learning has been applied into the query expansion techniques, which trains a model to predict the “goodness” or “utility” of the expanded term to the retrieval system. There are many features to measure the relatedness between the expanded word and the query, which can be incorporated in the supervised learning to select the expanded terms. The training data set is generated automatically by a tricky method. However, this method can be affected by many aspects. A severe problem is that the distribution of the features is query-dependent, which has not been discussed in previous work. With a different distribution on the features, it is questionable to merge these training instances together and use the whole data set to train one single model. In this paper, we first investigate the statistical distribution of the auto-generated training data and show the problems in the training data set. Based on our analysis, we proposed to use the bagging method to ensemble several regression models in order to get a better supervised model to make prediction on the expanded terms. We conducted the experiments on the TREC benchmark test collections. Our analysis on the training data reveals some interesting phenomena about the query expansion techniques. The experiment results also show that the bagging approach can achieve the state-of-art retrieval performance on the standard TREC data set.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125257444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Affix-augmented stem-based language model for persian 波斯语词缀增强词干语言模型
Heshaam Faili, H. Ravanbakhsh
Language modeling is used in many NLP applications like machine translation, POS tagging, speech recognition and information retrieval. It assigns a probability to a sequence of words. This task becomes a challenging problem for high inflectional languages. In this paper we investigate standard statistical language models on the Persian as an inflectional language. We propose two variations of morphological language models that rely on a morphological analyzer to manipulate the dataset before modeling. Then we discuss shortcoming of these models, and introduce a novel approach that exploits the structure of the language and produces more accurate. Experimental results are encouraging especially when we use n-gram models with small training dataset.
语言建模用于许多NLP应用,如机器翻译、词性标注、语音识别和信息检索。它为单词序列分配一个概率。对于高屈折变化的语言来说,这是一个具有挑战性的问题。本文研究了波斯语作为一种屈折变化语言的标准统计语言模型。我们提出了两种形态学语言模型的变体,它们依赖于形态学分析器在建模之前对数据集进行操作。然后讨论了这些模型的不足之处,并介绍了一种利用语言结构的新方法。实验结果令人鼓舞,特别是当我们使用n-gram模型和小训练数据集时。
{"title":"Affix-augmented stem-based language model for persian","authors":"Heshaam Faili, H. Ravanbakhsh","doi":"10.1109/NLPKE.2010.5587823","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587823","url":null,"abstract":"Language modeling is used in many NLP applications like machine translation, POS tagging, speech recognition and information retrieval. It assigns a probability to a sequence of words. This task becomes a challenging problem for high inflectional languages. In this paper we investigate standard statistical language models on the Persian as an inflectional language. We propose two variations of morphological language models that rely on a morphological analyzer to manipulate the dataset before modeling. Then we discuss shortcoming of these models, and introduce a novel approach that exploits the structure of the language and produces more accurate. Experimental results are encouraging especially when we use n-gram models with small training dataset.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134173801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sentiment word identification using the maximum entropy model 基于最大熵模型的情感词识别
Xiaoxu Fei, Huizhen Wang, Jingbo Zhu
This paper addresses the issue of sentiment word identification given an opinionated sentence, which is very important in sentiment analysis tasks. The most common way to tackle this problem is to utilize a readily available sentiment lexicon such as HowNet or SentiWordNet to determine whether a word is a sentiment word. However, in practice, words existing in the lexicon sometimes can not express sentiment tendency in a certain context while other words out of the lexicon do express. To address this challenge, this paper presents an approach based on maximum-entropy classification model to identify sentiment words given an opinionated sentence. Experimental results show that our approach outperforms baseline lexicon-based methods.
本文研究了在情感分析任务中非常重要的一个问题,即给定一个自以为是句子的情感词识别问题。解决这个问题最常见的方法是利用现成的情感词典,如HowNet或SentiWordNet来确定一个词是否为情感词。然而,在实践中,词典中存在的词汇有时不能表达特定语境下的情感倾向,而词典外的词汇却能表达情感倾向。为了解决这一挑战,本文提出了一种基于最大熵分类模型的方法来识别给定固执己见句子的情感词。实验结果表明,我们的方法优于基于词典的基线方法。
{"title":"Sentiment word identification using the maximum entropy model","authors":"Xiaoxu Fei, Huizhen Wang, Jingbo Zhu","doi":"10.1109/NLPKE.2010.5587811","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587811","url":null,"abstract":"This paper addresses the issue of sentiment word identification given an opinionated sentence, which is very important in sentiment analysis tasks. The most common way to tackle this problem is to utilize a readily available sentiment lexicon such as HowNet or SentiWordNet to determine whether a word is a sentiment word. However, in practice, words existing in the lexicon sometimes can not express sentiment tendency in a certain context while other words out of the lexicon do express. To address this challenge, this paper presents an approach based on maximum-entropy classification model to identify sentiment words given an opinionated sentence. Experimental results show that our approach outperforms baseline lexicon-based methods.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133917926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A reranking method for syntactic parsing with heterogeneous treebanks 异构树库句法分析的重排序方法
Haibo Ding, Muhua Zhu, Jingbo Zhu
In the field of natural language processing (NLP), there often exist multiple corpora with different annotation standards for the same task. In this paper, we take syntactic parsing as a case study and propose a reranking method which is able to make direct use of disparate treebanks simultaneously without using techniques such as treebank conversion. The method proceeds in three steps: 1) build parsers on individual treebanks; 2) use parsers independently to generate n-best lists for each sentence in test set; 3) rerank individual n-best lists which correspond to the same sentence by using consensus information exchanged among these n-best lists. Experimental results on two open Chinese treebanks show that our method significantly outperforms the baseline system by 0.84% and 0.53% respectively.
在自然语言处理(NLP)领域,同一任务往往存在多个标注标准不同的语料库。本文以句法分析为例,提出了一种能够同时直接使用不同树库而不使用树库转换等技术的重新排序方法。该方法分三步进行:1)在单个树库上构建解析器;2)独立使用解析器为测试集中的每个句子生成n个最优列表;3)利用n个最优列表之间交换的共识信息,对同一句子对应的单个n个最优列表进行重新排序。在两个开放的中国树库上的实验结果表明,我们的方法分别显著优于基线系统0.84%和0.53%。
{"title":"A reranking method for syntactic parsing with heterogeneous treebanks","authors":"Haibo Ding, Muhua Zhu, Jingbo Zhu","doi":"10.1109/NLPKE.2010.5587842","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587842","url":null,"abstract":"In the field of natural language processing (NLP), there often exist multiple corpora with different annotation standards for the same task. In this paper, we take syntactic parsing as a case study and propose a reranking method which is able to make direct use of disparate treebanks simultaneously without using techniques such as treebank conversion. The method proceeds in three steps: 1) build parsers on individual treebanks; 2) use parsers independently to generate n-best lists for each sentence in test set; 3) rerank individual n-best lists which correspond to the same sentence by using consensus information exchanged among these n-best lists. Experimental results on two open Chinese treebanks show that our method significantly outperforms the baseline system by 0.84% and 0.53% respectively.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123574465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible English writing support based on negative-positive conversion method 基于正负转换法的灵活英语写作支持
Yasushi Katsura, Kazuyuki Matsumoto, F. Ren
With development of the recent globalization, the chance to exchange in English increased in the business field. In particular, it's necessary to write a thesis and a charter handwriting in English. Because many Japanese are not used to making English sentence, it is a great burden to write appropriate sentence in English without any support for creating English sentence. In this study we have developed an English composition support system. By this system, it's to search for the interlinear translation example to refer to by database and generate a new sentence by replacing a noun in the example sentence. In this paper, based on the technique of Super-Function, we propose a method to convert an affirmative sentence into negative sentence and vice versa to realize more flexible and extensive text conversion.
随着全球化的发展,在商务领域用英语交流的机会越来越多。特别需要用英文写论文和特许状。由于很多日本人不习惯造英语句子,在没有任何造英语句子的支持下,用英语写出合适的句子是一个很大的负担。在这项研究中,我们开发了一个英语作文支持系统。本系统的目的是在数据库中搜索到行间翻译的例句,并通过替换例句中的名词来生成一个新的句子。本文基于Super-Function技术,提出了一种肯定句与否定句相互转换的方法,实现了更灵活、更广泛的文本转换。
{"title":"Flexible English writing support based on negative-positive conversion method","authors":"Yasushi Katsura, Kazuyuki Matsumoto, F. Ren","doi":"10.1109/NLPKE.2010.5587778","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587778","url":null,"abstract":"With development of the recent globalization, the chance to exchange in English increased in the business field. In particular, it's necessary to write a thesis and a charter handwriting in English. Because many Japanese are not used to making English sentence, it is a great burden to write appropriate sentence in English without any support for creating English sentence. In this study we have developed an English composition support system. By this system, it's to search for the interlinear translation example to refer to by database and generate a new sentence by replacing a noun in the example sentence. In this paper, based on the technique of Super-Function, we propose a method to convert an affirmative sentence into negative sentence and vice versa to realize more flexible and extensive text conversion.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121698126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1