首页 > 最新文献

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)最新文献

英文 中文
Chinese base phrases chunking based on latent semi-CRF model 基于潜在半crf模型的汉语基本短语分块
Xiao Sun, Xiaoli Nan
In the fields of Chinese natural language processing, recognizing simple and non-recursive base phrases is an important task for natural language processing applications, such as information processing and machine translation. Instead of rule-based model, we adopt the statistical machine learning method, newly proposed Latent semi-CRF model to solve the Chinese base phrase chunking problem. The Chinese base phrases could be treated as the sequence labeling problem, which involve the prediction of a class label for each frame in an unsegmented sequence. The Chinese base phrases have sub-structures which could not be observed in training data. We propose a latent discriminative model called Latent semi-CRF(Latent Semi Conditional Random Fields), which incorporates the advantages of LDCRF(Latent Dynamic Conditional Random Fields) and semi-CRF that model the sub-structure of a class sequence and learn dynamics between class labels, in detecting the Chinese base phrases. Our results demonstrate that the latent dynamic discriminative model compares favorably to Support Vector Machines, Maximum Entropy Model, and Conditional Random Fields(including LDCRF and semi-CRF) on Chinese base phrases chunking.
在汉语自然语言处理领域中,简单非递归基短语的识别是信息处理和机器翻译等自然语言处理应用的重要任务。本文采用统计机器学习方法和新提出的Latent半crf模型代替基于规则的模型来解决中文基短语分块问题。汉语基本短语可以看作是序列标注问题,它涉及到对未分割序列中每一帧的类标记进行预测。汉语基本短语具有在训练数据中观察不到的子结构。我们提出了一种潜在判别模型,称为潜在半条件随机场(latent Semi - Conditional Random Fields),该模型结合了LDCRF(latent Dynamic Conditional Random Fields)和半条件随机场(Semi - crf)对类序列的子结构建模和类标签之间的动态学习的优点,用于汉语基短语的检测。我们的研究结果表明,潜在动态判别模型在中文基础短语分块上优于支持向量机、最大熵模型和条件随机场(包括LDCRF和半crf)。
{"title":"Chinese base phrases chunking based on latent semi-CRF model","authors":"Xiao Sun, Xiaoli Nan","doi":"10.1109/NLPKE.2010.5587802","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587802","url":null,"abstract":"In the fields of Chinese natural language processing, recognizing simple and non-recursive base phrases is an important task for natural language processing applications, such as information processing and machine translation. Instead of rule-based model, we adopt the statistical machine learning method, newly proposed Latent semi-CRF model to solve the Chinese base phrase chunking problem. The Chinese base phrases could be treated as the sequence labeling problem, which involve the prediction of a class label for each frame in an unsegmented sequence. The Chinese base phrases have sub-structures which could not be observed in training data. We propose a latent discriminative model called Latent semi-CRF(Latent Semi Conditional Random Fields), which incorporates the advantages of LDCRF(Latent Dynamic Conditional Random Fields) and semi-CRF that model the sub-structure of a class sequence and learn dynamics between class labels, in detecting the Chinese base phrases. Our results demonstrate that the latent dynamic discriminative model compares favorably to Support Vector Machines, Maximum Entropy Model, and Conditional Random Fields(including LDCRF and semi-CRF) on Chinese base phrases chunking.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133124987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
iTree - Automating the construction of the narration tree of Hadiths (Prophetic Traditions) iTree -自动构建圣训(先知传统)的叙述树
Aqil M. Azmi, Nawaf Bin Badia
The two fundamental sources of Islamic legislation are Qur'an and the Hadith. The Hadiths, or Prophetic Traditions, are narrations originating from the sayings and conducts of Prophet Muhammad. Each Hadith starts with a list of narrators involved in transmitting it followed by the transmitted text. The Hadith corpus is extremely huge and runs into hundreds of volumes. Due to its legislative importance, Hadiths have been carefully scrutinized by hadith scholars. One way a scholar may grade a Hadith is by its narration chain and the individual narrators in the chain. In this paper we report on a system that automatically generates the transmission chains of a Hadith and graphically display it. Computationally, this is a challenging problem. The text of Hadith is in Arabic, a morphologically rich language; and each Hadith has its own peculiar way of listing narrators. Our solution involves parsing and annotating the Hadith text and identifying the narrators' names. We use shallow parsing along with a domain specific grammar to parse the Hadith content. Experiments on sample Hadiths show our approach to have a very good success rate.
伊斯兰立法的两个基本来源是古兰经和圣训。圣训,或先知的传统,是源自先知穆罕默德的言论和行为的叙述。每一段圣训的开头都列出了参与传播的叙述者,然后是传播的文本。圣训文集极其庞大,多达数百卷。由于其立法的重要性,圣训学者仔细审查。学者给圣训评分的一种方法是通过它的叙述链和链中的个别叙述者。本文介绍了一种能够自动生成圣训传播链并图形化显示的系统。在计算上,这是一个具有挑战性的问题。圣训的文本是阿拉伯语,一种形态丰富的语言;每个圣训都有自己独特的叙述方式。我们的解决方案包括解析和注释圣训文本,并识别叙述者的名字。我们使用浅层解析和特定于领域的语法来解析圣训内容。对样本圣训的实验表明,我们的方法有很好的成功率。
{"title":"iTree - Automating the construction of the narration tree of Hadiths (Prophetic Traditions)","authors":"Aqil M. Azmi, Nawaf Bin Badia","doi":"10.1109/NLPKE.2010.5587810","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587810","url":null,"abstract":"The two fundamental sources of Islamic legislation are Qur'an and the Hadith. The Hadiths, or Prophetic Traditions, are narrations originating from the sayings and conducts of Prophet Muhammad. Each Hadith starts with a list of narrators involved in transmitting it followed by the transmitted text. The Hadith corpus is extremely huge and runs into hundreds of volumes. Due to its legislative importance, Hadiths have been carefully scrutinized by hadith scholars. One way a scholar may grade a Hadith is by its narration chain and the individual narrators in the chain. In this paper we report on a system that automatically generates the transmission chains of a Hadith and graphically display it. Computationally, this is a challenging problem. The text of Hadith is in Arabic, a morphologically rich language; and each Hadith has its own peculiar way of listing narrators. Our solution involves parsing and annotating the Hadith text and identifying the narrators' names. We use shallow parsing along with a domain specific grammar to parse the Hadith content. Experiments on sample Hadiths show our approach to have a very good success rate.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"120 3‐4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132908081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Anusaaraka: An expert system based machine translation system Anusaaraka:基于专家系统的机器翻译系统
Sriram Chaudhury, A. Rao, D. Sharma
Most research in Machine translation is about having the computers completely bear the load of translating one human language into another. This paper looks at the machine translation problem afresh and observes that there is a need to share the load between man and machine, distinguish reliable knowledge from the heuristics, provide a spectrum of outputs to serve different strata of people, and finally make use of existing resources instead of reinventing the wheel. This paper describes a unique approach to develop machine translation system based on the insights of information dynamics from Paninian Grammar Formalism. Anusaaraka is a Language Accessor cum Machine Translation system based on the fundamental premise of sharing the load producing good enough results according to the needs of the reader. The system promises to give faithful representation of the translated text, no loss of information while translating and graceful degradation (robustness) in case of failure. The layered output provides an access to all the stages of translation making the whole process transparent. Thus, Anusaaraka differs from the Machine Translation systems in two respects: (1) its commitment to faithfulness and thereby providing a layer of 100% faithful output so that a user with some training can “access the source text” faithfully. (2) The system is so designed that a user can contribute to it and participate in improving its quality. Further Anusaaraka provides an eclectic combination of the Apertium architecture with the forward chaining expert system, allowing use of both the deep parser and shallow parser outputs to analyze the SL text. Existing language resources (parsers, taggers, chunkers) available under GPL are used instead of rewriting it again. Language data and linguistic rules are independent from the core programme, making it easy for linguists to modify and experiment with different language phenomena to improve the system. Users can become contributors by contributing new word sense disambiguation (WSD) rules of the ambiguous words through a web-interface available over internet. The system uses forward chaining of expert system to infer new language facts from the existing language data. It helps to solve the complex behavior of language translation by applying specific knowledge rather than specific technique creating a vast language knowledge base in electronic form. Or in other words, the expert system facilitates the transformation of subject matter expert's (SME) knowledge available with humans into a computer processable knowledge base.
大多数关于机器翻译的研究都是关于让计算机完全承担将一种人类语言翻译成另一种语言的重担。本文重新审视了机器翻译问题,并指出需要在人和机器之间分担负荷,从启发式中区分可靠的知识,提供一系列输出以服务于不同阶层的人,最后利用现有资源而不是重新发明轮子。本文以帕尼尼语法形式主义的信息动力学思想为基础,提出了一种开发机器翻译系统的独特方法。Anusaaraka是一个基于共享负载的基本前提下,根据读者的需要产生足够好的翻译结果的语言访问和机器翻译系统。该系统承诺忠实地表示翻译文本,在翻译时不会丢失信息,并且在失败的情况下会有优雅的退化(鲁棒性)。分层输出提供了访问翻译的所有阶段,使整个过程透明。因此,Anusaaraka在两个方面不同于机器翻译系统:(1)它对忠实的承诺,从而提供了一个100%忠实的输出层,这样经过一些训练的用户就可以忠实地“访问源文本”。(2)系统的设计使用户可以对其做出贡献并参与改进其质量。此外,Anusaaraka提供了Apertium架构与前向链专家系统的折衷组合,允许使用深层解析器和浅层解析器输出来分析SL文本。使用GPL下可用的现有语言资源(解析器、标记器、分块器),而不是再次重写它。语言数据和语言规则独立于核心程序,使得语言学家可以很容易地修改和实验不同的语言现象来改进系统。用户可以通过互联网上提供的web界面提供新的歧义词的词义消歧规则,从而成为贡献者。该系统利用专家系统的前向链,从已有的语言数据中推断出新的语言事实。它通过应用特定的知识而不是特定的技术,以电子的形式建立一个庞大的语言知识库,有助于解决语言翻译的复杂行为。或者换句话说,专家系统有助于将人类可用的主题专家(SME)知识转化为计算机可处理的知识库。
{"title":"Anusaaraka: An expert system based machine translation system","authors":"Sriram Chaudhury, A. Rao, D. Sharma","doi":"10.1109/NLPKE.2010.5587789","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587789","url":null,"abstract":"Most research in Machine translation is about having the computers completely bear the load of translating one human language into another. This paper looks at the machine translation problem afresh and observes that there is a need to share the load between man and machine, distinguish reliable knowledge from the heuristics, provide a spectrum of outputs to serve different strata of people, and finally make use of existing resources instead of reinventing the wheel. This paper describes a unique approach to develop machine translation system based on the insights of information dynamics from Paninian Grammar Formalism. Anusaaraka is a Language Accessor cum Machine Translation system based on the fundamental premise of sharing the load producing good enough results according to the needs of the reader. The system promises to give faithful representation of the translated text, no loss of information while translating and graceful degradation (robustness) in case of failure. The layered output provides an access to all the stages of translation making the whole process transparent. Thus, Anusaaraka differs from the Machine Translation systems in two respects: (1) its commitment to faithfulness and thereby providing a layer of 100% faithful output so that a user with some training can “access the source text” faithfully. (2) The system is so designed that a user can contribute to it and participate in improving its quality. Further Anusaaraka provides an eclectic combination of the Apertium architecture with the forward chaining expert system, allowing use of both the deep parser and shallow parser outputs to analyze the SL text. Existing language resources (parsers, taggers, chunkers) available under GPL are used instead of rewriting it again. Language data and linguistic rules are independent from the core programme, making it easy for linguists to modify and experiment with different language phenomena to improve the system. Users can become contributors by contributing new word sense disambiguation (WSD) rules of the ambiguous words through a web-interface available over internet. The system uses forward chaining of expert system to infer new language facts from the existing language data. It helps to solve the complex behavior of language translation by applying specific knowledge rather than specific technique creating a vast language knowledge base in electronic form. Or in other words, the expert system facilitates the transformation of subject matter expert's (SME) knowledge available with humans into a computer processable knowledge base.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129896254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Descriptive analysis of emotion and feeling in voice 声音中情绪和感觉的描述性分析
M. Shimura, Fumiaki Monma, S. Mitsuyoshi, M. Shuzo, Taishi Yamamoto, I. Yamada
Recognition of human “emotions” or “feelings” from voice is important to research on human communications. Although there has been much research on emotions or feelings in voice, definitions of these terms have been inconsistent. We reviewed previous papers in linguistics, brain science, information science, etc. and developed specific definitions for these term. In our paper, “emotion” is defined as an involuntary reaction in the human brain; it has two states: pleasure and displeasure. “Feeling” (e.g., anger, enjoyment, sadness, fear, and distress) is defined as a state voluntarily resulting from an emotion. Here, we should notice that the pleasure-displeasure direction does not always correspond to the feeling. So, our objective is to obtain sufficient amount of voice data and to analyze the relationship between emotions and feelings. In voice recording experiments, the voice database from about 100 participants with various natural feelings was constructed. A result of descriptive analysis showed that pleasure-displeasure direction did not correspond to the each feeling in 5% of voice data. This result suggested that, if an experimental situation is constructed that tends to arouse various feelings, data with less variability can be obtained. Further analysis of the characteristics of the data obtained to identify situations in which the pleasure-displeasure direction does not necessarily correspond to the basic feeling should lead to improved accuracy of voice emotion recognition.
从声音中识别人的“情绪”或“感觉”,对研究人类交流具有重要意义。尽管对声音中的情绪或感觉进行了很多研究,但这些术语的定义一直不一致。我们回顾了语言学、脑科学、信息科学等方面的文献,并对这些术语进行了具体的定义。在我们的论文中,“情感”被定义为人类大脑中的一种无意识反应;它有两种状态:快乐和不快乐。“感觉”(如愤怒、享受、悲伤、恐惧和痛苦)被定义为一种由情绪自发产生的状态。在这里,我们应该注意到快乐-不快乐的方向并不总是与感觉相对应。因此,我们的目标是获得足够数量的语音数据,并分析情绪和感觉之间的关系。在录音实验中,构建了约100名具有各种自然感受的参与者的语音数据库。描述性分析的结果表明,在5%的语音数据中,快乐-不快乐的方向与每种感觉不对应。这一结果表明,如果构建一个易于引起各种感受的实验情境,则可以获得变异性较小的数据。进一步分析所获得的数据的特征,以确定快乐-不快乐方向不一定与基本感觉相对应的情况,这将提高语音情绪识别的准确性。
{"title":"Descriptive analysis of emotion and feeling in voice","authors":"M. Shimura, Fumiaki Monma, S. Mitsuyoshi, M. Shuzo, Taishi Yamamoto, I. Yamada","doi":"10.1109/NLPKE.2010.5587794","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587794","url":null,"abstract":"Recognition of human “emotions” or “feelings” from voice is important to research on human communications. Although there has been much research on emotions or feelings in voice, definitions of these terms have been inconsistent. We reviewed previous papers in linguistics, brain science, information science, etc. and developed specific definitions for these term. In our paper, “emotion” is defined as an involuntary reaction in the human brain; it has two states: pleasure and displeasure. “Feeling” (e.g., anger, enjoyment, sadness, fear, and distress) is defined as a state voluntarily resulting from an emotion. Here, we should notice that the pleasure-displeasure direction does not always correspond to the feeling. So, our objective is to obtain sufficient amount of voice data and to analyze the relationship between emotions and feelings. In voice recording experiments, the voice database from about 100 participants with various natural feelings was constructed. A result of descriptive analysis showed that pleasure-displeasure direction did not correspond to the each feeling in 5% of voice data. This result suggested that, if an experimental situation is constructed that tends to arouse various feelings, data with less variability can be obtained. Further analysis of the characteristics of the data obtained to identify situations in which the pleasure-displeasure direction does not necessarily correspond to the basic feeling should lead to improved accuracy of voice emotion recognition.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124413153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Research on sentiment classification of Blog based on PMI-IR 基于PMI-IR的博客情感分类研究
Xiuting Duan, Tingting He, Le Song
Development of Blog texts information on the internet has brought new challenge to Chinese text classification. Aim to solving the semantics deficiency problem in traditional methods for Chinese text classification, this paper implements a text classification method on classifying a blog as joy, angry, sad or fear using a simple unsupervised learning algorithm. The classification of a blog text is predicted by the max semantic orientation (SO) of the phrases in the blog text that contains adjectives or adverbs. In this paper, the SO of a phrase is calculated as the mutual information between the given phrase and the polar words. Then the SO of the given blog text is determined by the max mutual information value. A blog text is classified as joy if the SO of its phrases is joy. Two different corpora are adopted to test our method, one is the Blog corpus collected by Monitor and Research Center for National Language Resource Network Multimedia Sub-branch Center, and the other is Chinese dataset provided by COAE2008 task. Based on the two datasets, the method respectively achieves a high improvement compared to the traditional methods.
网络博客文本信息的发展给中文文本分类带来了新的挑战。针对传统中文文本分类方法语义不足的问题,本文采用一种简单的无监督学习算法实现了一种将博客分为喜、怒、悲、恐四类的文本分类方法。通过博客文本中包含形容词或副词的短语的最大语义方向(SO)来预测博客文本的分类。本文将短语的SO计算为给定短语与极性词之间的互信息。然后,给定博客文本的SO由最大互信息值确定。如果一篇博客文章中短语的SO是joy,那么它就被归类为joy。采用两个不同的语料库来测试我们的方法,一个是国家语言资源网络多媒体分中心监测与研究中心收集的博客语料库,另一个是COAE2008任务提供的中文数据集。基于这两个数据集,该方法分别比传统方法实现了较高的改进。
{"title":"Research on sentiment classification of Blog based on PMI-IR","authors":"Xiuting Duan, Tingting He, Le Song","doi":"10.1109/NLPKE.2010.5587849","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587849","url":null,"abstract":"Development of Blog texts information on the internet has brought new challenge to Chinese text classification. Aim to solving the semantics deficiency problem in traditional methods for Chinese text classification, this paper implements a text classification method on classifying a blog as joy, angry, sad or fear using a simple unsupervised learning algorithm. The classification of a blog text is predicted by the max semantic orientation (SO) of the phrases in the blog text that contains adjectives or adverbs. In this paper, the SO of a phrase is calculated as the mutual information between the given phrase and the polar words. Then the SO of the given blog text is determined by the max mutual information value. A blog text is classified as joy if the SO of its phrases is joy. Two different corpora are adopted to test our method, one is the Blog corpus collected by Monitor and Research Center for National Language Resource Network Multimedia Sub-branch Center, and the other is Chinese dataset provided by COAE2008 task. Based on the two datasets, the method respectively achieves a high improvement compared to the traditional methods.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116011509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Chinese semantic role labeling based on semantic knowledge 基于语义知识的汉语语义角色标注
Yanqiu Shao, Zhifang Sui, Ning Mao
Most of the semantic role labeling systems use syntactic analysis results to predict semantic roles. However, there are some problems that could not be well-done only by syntactic features. In this paper, lexical semantic features are extracted from some semantic dictionaries. Two typical lexical semantic dictionaries are used, TongYiCi CiLin and CSD. CiLin is built on convergent relationship and CSD is based on syntagmatic relationship. According to both of the dictionaries, two labeling models are set up, CiLin model and CSD model. Also, one pure syntactic model and one mixed model are built. The mixed model combines all of the syntactic and semantic features. The experimental results show that the application of different level of lexical semantic knowledge could help use some language inherent attributes and the knowledge could help to improve the performance of the system.
大多数语义角色标注系统使用句法分析结果来预测语义角色。然而,也有一些问题是仅靠句法特征无法很好地解决的。本文从一些语义词典中提取词汇语义特征。两种典型的词汇语义词典:同义词词典和CSD词典。clin是建立在收敛关系上的,CSD是建立在组合关系上的。根据两种词典,建立了两种标注模型:CiLin模型和CSD模型。建立了一个纯语法模型和一个混合语法模型。混合模型结合了所有的语法和语义特征。实验结果表明,使用不同层次的词汇语义知识有助于利用语言固有属性,提高系统的性能。
{"title":"Chinese semantic role labeling based on semantic knowledge","authors":"Yanqiu Shao, Zhifang Sui, Ning Mao","doi":"10.1109/NLPKE.2010.5587821","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587821","url":null,"abstract":"Most of the semantic role labeling systems use syntactic analysis results to predict semantic roles. However, there are some problems that could not be well-done only by syntactic features. In this paper, lexical semantic features are extracted from some semantic dictionaries. Two typical lexical semantic dictionaries are used, TongYiCi CiLin and CSD. CiLin is built on convergent relationship and CSD is based on syntagmatic relationship. According to both of the dictionaries, two labeling models are set up, CiLin model and CSD model. Also, one pure syntactic model and one mixed model are built. The mixed model combines all of the syntactic and semantic features. The experimental results show that the application of different level of lexical semantic knowledge could help use some language inherent attributes and the knowledge could help to improve the performance of the system.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114382717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving Chinese-English patent machine translation using sentence segmentation 基于分句的汉英专利机器翻译改进
Yaohong Jin, Zhiying Liu
This paper presents a method using sentence segmentation to improve the performance of Chinese-English patent machine translation. In this method, long Chinese sentence was segmented into separated short sentences using some features from the Hierarchical Network of Concepts theory (HNC theory). Some semantic features are introduced, including main verb of CSC (Eg), main verb of CSP (Egp), long NPs and conjunctions. The main purpose of segmentation algorithm is to detect if one CSC can or cannot be a separate sentence. The segmentation method was integrated with a rule-base MT system. The sequence of these short translations was adjusted and the different ways of expressions in both Chinese and English languages also were in consideration. From the result of the experiments, we can see that the performance of the Chinese-English patent translation was improved effectively. Our method had been integrated into an online patent MT system running in SIPO.
提出了一种利用句子分词提高汉英专利机器翻译性能的方法。该方法利用层次概念网络理论(HNC)的一些特征,将汉语长句分割成独立的短句。介绍了CSC的主要动词(Eg)、CSP的主要动词(Egp)、长NPs和连词的语义特征。分割算法的主要目的是检测一个CSC是否可以是一个单独的句子。将分割方法与基于规则的机器翻译系统相结合。对这些短译的顺序进行了调整,并考虑了英汉两种语言的不同表达方式。从实验结果可以看出,该方法有效地提高了汉英专利翻译的性能。我们的方法已被集成到国家知识产权局运行的在线专利MT系统中。
{"title":"Improving Chinese-English patent machine translation using sentence segmentation","authors":"Yaohong Jin, Zhiying Liu","doi":"10.1109/NLPKE.2010.5587855","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587855","url":null,"abstract":"This paper presents a method using sentence segmentation to improve the performance of Chinese-English patent machine translation. In this method, long Chinese sentence was segmented into separated short sentences using some features from the Hierarchical Network of Concepts theory (HNC theory). Some semantic features are introduced, including main verb of CSC (Eg), main verb of CSP (Egp), long NPs and conjunctions. The main purpose of segmentation algorithm is to detect if one CSC can or cannot be a separate sentence. The segmentation method was integrated with a rule-base MT system. The sequence of these short translations was adjusted and the different ways of expressions in both Chinese and English languages also were in consideration. From the result of the experiments, we can see that the performance of the Chinese-English patent translation was improved effectively. Our method had been integrated into an online patent MT system running in SIPO.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"124 20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130009419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Texture image retrieval based on gray-primitive co-occurrence matrix 基于灰度-原元共现矩阵的纹理图像检索
Wei Wang, Motoyuki Suzuki, F. Ren
The research of texture similarity is very important component of content-based image retrieval system. Firstly the rotation invariance of gray-primitive co-occurrence matrix was proved in this paper, then a new texture image retrieval technique based on gray-primitive co-occurrence matrix was presented. The result of experiment indicates that the algorithm proposed has low computational complexity and certain noise resisting ability.
纹理相似度的研究是基于内容的图像检索系统的重要组成部分。首先证明了灰度-原始共现矩阵的旋转不变性,然后提出了一种基于灰度-原始共现矩阵的纹理图像检索技术。实验结果表明,该算法具有较低的计算复杂度和一定的抗噪能力。
{"title":"Texture image retrieval based on gray-primitive co-occurrence matrix","authors":"Wei Wang, Motoyuki Suzuki, F. Ren","doi":"10.1109/NLPKE.2010.5587830","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587830","url":null,"abstract":"The research of texture similarity is very important component of content-based image retrieval system. Firstly the rotation invariance of gray-primitive co-occurrence matrix was proved in this paper, then a new texture image retrieval technique based on gray-primitive co-occurrence matrix was presented. The result of experiment indicates that the algorithm proposed has low computational complexity and certain noise resisting ability.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130347415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Bottom up: Exploring word emotions for Chinese sentence chief sentiment classification 自下而上:探索汉语句子主情感分类的词语情感
Xin Kang, F. Ren, Yunong Wu
In this paper we demonstrate the effectiveness of employing basic sentiment components for analyzing the chief sentiment of Chinese sentence among nine categories of sentiments (including “No emotion”). Compared to traditional lexicon based methods, our research explores emotion intensities of words and phrases in an eight dimensional sentiment space as features. An emotion matrix kernel is designed to evaluate inner product of these sentiment features for SVM classification with O(n) time complexity. Experimental result shows our method significantly improves performance of sentiment classification.
在本文中,我们证明了在九类情感(包括“无情感”)中使用基本情感成分分析汉语句子主情感的有效性。与传统的基于词汇的方法相比,我们的研究以八维情感空间为特征,探索词语和短语的情感强度。设计了情感矩阵核来评估这些情感特征的内积,用于时间复杂度为0 (n)的支持向量机分类。实验结果表明,该方法显著提高了情感分类的性能。
{"title":"Bottom up: Exploring word emotions for Chinese sentence chief sentiment classification","authors":"Xin Kang, F. Ren, Yunong Wu","doi":"10.1109/NLPKE.2010.5587793","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587793","url":null,"abstract":"In this paper we demonstrate the effectiveness of employing basic sentiment components for analyzing the chief sentiment of Chinese sentence among nine categories of sentiments (including “No emotion”). Compared to traditional lexicon based methods, our research explores emotion intensities of words and phrases in an eight dimensional sentiment space as features. An emotion matrix kernel is designed to evaluate inner product of these sentiment features for SVM classification with O(n) time complexity. Experimental result shows our method significantly improves performance of sentiment classification.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"18 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130283550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A new method for solving context ambiguities using field association knowledge 一种利用领域关联知识解决上下文歧义的新方法
Li Wang, E. Atlam, M. Fuketa, K. Morita, J. Aoe
In computational linguistics, word sense disambiguation is an open problem and is important in various aspects of natural language processing. However, the traditional methods using case frames and semantic primitives are not effective for solving context ambiguities that require information beyond sentences. This paper presents a new method of solving context ambiguities using a field association scheme that can determine the specified fields by using field association (FA) terms. In order to solve context ambiguities, the formal disambiguation algorithm is calculating the weight of fields in that scope by controlling the scope for a set of variable number of sentences. The accuracy of disambiguating the context ambiguities is improved 65% by applying the proposed field association knowledge.
在计算语言学中,词义消歧是一个开放的问题,在自然语言处理的各个方面都很重要。然而,传统的使用格框架和语义原语的方法对于解决需要句子以外信息的上下文歧义并不有效。本文提出了一种使用字段关联方案解决上下文歧义的新方法,该方案可以通过字段关联项确定指定的字段。为了解决上下文歧义,形式消歧义算法通过控制一组可变数量的句子的范围来计算该范围内字段的权重。应用提出的领域关联知识对上下文歧义进行消歧的准确率提高了65%。
{"title":"A new method for solving context ambiguities using field association knowledge","authors":"Li Wang, E. Atlam, M. Fuketa, K. Morita, J. Aoe","doi":"10.1109/NLPKE.2010.5587858","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587858","url":null,"abstract":"In computational linguistics, word sense disambiguation is an open problem and is important in various aspects of natural language processing. However, the traditional methods using case frames and semantic primitives are not effective for solving context ambiguities that require information beyond sentences. This paper presents a new method of solving context ambiguities using a field association scheme that can determine the specified fields by using field association (FA) terms. In order to solve context ambiguities, the formal disambiguation algorithm is calculating the weight of fields in that scope by controlling the scope for a set of variable number of sentences. The accuracy of disambiguating the context ambiguities is improved 65% by applying the proposed field association knowledge.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122567155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1