Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587802
Xiao Sun, Xiaoli Nan
In the fields of Chinese natural language processing, recognizing simple and non-recursive base phrases is an important task for natural language processing applications, such as information processing and machine translation. Instead of rule-based model, we adopt the statistical machine learning method, newly proposed Latent semi-CRF model to solve the Chinese base phrase chunking problem. The Chinese base phrases could be treated as the sequence labeling problem, which involve the prediction of a class label for each frame in an unsegmented sequence. The Chinese base phrases have sub-structures which could not be observed in training data. We propose a latent discriminative model called Latent semi-CRF(Latent Semi Conditional Random Fields), which incorporates the advantages of LDCRF(Latent Dynamic Conditional Random Fields) and semi-CRF that model the sub-structure of a class sequence and learn dynamics between class labels, in detecting the Chinese base phrases. Our results demonstrate that the latent dynamic discriminative model compares favorably to Support Vector Machines, Maximum Entropy Model, and Conditional Random Fields(including LDCRF and semi-CRF) on Chinese base phrases chunking.
在汉语自然语言处理领域中,简单非递归基短语的识别是信息处理和机器翻译等自然语言处理应用的重要任务。本文采用统计机器学习方法和新提出的Latent半crf模型代替基于规则的模型来解决中文基短语分块问题。汉语基本短语可以看作是序列标注问题,它涉及到对未分割序列中每一帧的类标记进行预测。汉语基本短语具有在训练数据中观察不到的子结构。我们提出了一种潜在判别模型,称为潜在半条件随机场(latent Semi - Conditional Random Fields),该模型结合了LDCRF(latent Dynamic Conditional Random Fields)和半条件随机场(Semi - crf)对类序列的子结构建模和类标签之间的动态学习的优点,用于汉语基短语的检测。我们的研究结果表明,潜在动态判别模型在中文基础短语分块上优于支持向量机、最大熵模型和条件随机场(包括LDCRF和半crf)。
{"title":"Chinese base phrases chunking based on latent semi-CRF model","authors":"Xiao Sun, Xiaoli Nan","doi":"10.1109/NLPKE.2010.5587802","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587802","url":null,"abstract":"In the fields of Chinese natural language processing, recognizing simple and non-recursive base phrases is an important task for natural language processing applications, such as information processing and machine translation. Instead of rule-based model, we adopt the statistical machine learning method, newly proposed Latent semi-CRF model to solve the Chinese base phrase chunking problem. The Chinese base phrases could be treated as the sequence labeling problem, which involve the prediction of a class label for each frame in an unsegmented sequence. The Chinese base phrases have sub-structures which could not be observed in training data. We propose a latent discriminative model called Latent semi-CRF(Latent Semi Conditional Random Fields), which incorporates the advantages of LDCRF(Latent Dynamic Conditional Random Fields) and semi-CRF that model the sub-structure of a class sequence and learn dynamics between class labels, in detecting the Chinese base phrases. Our results demonstrate that the latent dynamic discriminative model compares favorably to Support Vector Machines, Maximum Entropy Model, and Conditional Random Fields(including LDCRF and semi-CRF) on Chinese base phrases chunking.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133124987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587810
Aqil M. Azmi, Nawaf Bin Badia
The two fundamental sources of Islamic legislation are Qur'an and the Hadith. The Hadiths, or Prophetic Traditions, are narrations originating from the sayings and conducts of Prophet Muhammad. Each Hadith starts with a list of narrators involved in transmitting it followed by the transmitted text. The Hadith corpus is extremely huge and runs into hundreds of volumes. Due to its legislative importance, Hadiths have been carefully scrutinized by hadith scholars. One way a scholar may grade a Hadith is by its narration chain and the individual narrators in the chain. In this paper we report on a system that automatically generates the transmission chains of a Hadith and graphically display it. Computationally, this is a challenging problem. The text of Hadith is in Arabic, a morphologically rich language; and each Hadith has its own peculiar way of listing narrators. Our solution involves parsing and annotating the Hadith text and identifying the narrators' names. We use shallow parsing along with a domain specific grammar to parse the Hadith content. Experiments on sample Hadiths show our approach to have a very good success rate.
{"title":"iTree - Automating the construction of the narration tree of Hadiths (Prophetic Traditions)","authors":"Aqil M. Azmi, Nawaf Bin Badia","doi":"10.1109/NLPKE.2010.5587810","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587810","url":null,"abstract":"The two fundamental sources of Islamic legislation are Qur'an and the Hadith. The Hadiths, or Prophetic Traditions, are narrations originating from the sayings and conducts of Prophet Muhammad. Each Hadith starts with a list of narrators involved in transmitting it followed by the transmitted text. The Hadith corpus is extremely huge and runs into hundreds of volumes. Due to its legislative importance, Hadiths have been carefully scrutinized by hadith scholars. One way a scholar may grade a Hadith is by its narration chain and the individual narrators in the chain. In this paper we report on a system that automatically generates the transmission chains of a Hadith and graphically display it. Computationally, this is a challenging problem. The text of Hadith is in Arabic, a morphologically rich language; and each Hadith has its own peculiar way of listing narrators. Our solution involves parsing and annotating the Hadith text and identifying the narrators' names. We use shallow parsing along with a domain specific grammar to parse the Hadith content. Experiments on sample Hadiths show our approach to have a very good success rate.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"120 3‐4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132908081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587789
Sriram Chaudhury, A. Rao, D. Sharma
Most research in Machine translation is about having the computers completely bear the load of translating one human language into another. This paper looks at the machine translation problem afresh and observes that there is a need to share the load between man and machine, distinguish reliable knowledge from the heuristics, provide a spectrum of outputs to serve different strata of people, and finally make use of existing resources instead of reinventing the wheel. This paper describes a unique approach to develop machine translation system based on the insights of information dynamics from Paninian Grammar Formalism. Anusaaraka is a Language Accessor cum Machine Translation system based on the fundamental premise of sharing the load producing good enough results according to the needs of the reader. The system promises to give faithful representation of the translated text, no loss of information while translating and graceful degradation (robustness) in case of failure. The layered output provides an access to all the stages of translation making the whole process transparent. Thus, Anusaaraka differs from the Machine Translation systems in two respects: (1) its commitment to faithfulness and thereby providing a layer of 100% faithful output so that a user with some training can “access the source text” faithfully. (2) The system is so designed that a user can contribute to it and participate in improving its quality. Further Anusaaraka provides an eclectic combination of the Apertium architecture with the forward chaining expert system, allowing use of both the deep parser and shallow parser outputs to analyze the SL text. Existing language resources (parsers, taggers, chunkers) available under GPL are used instead of rewriting it again. Language data and linguistic rules are independent from the core programme, making it easy for linguists to modify and experiment with different language phenomena to improve the system. Users can become contributors by contributing new word sense disambiguation (WSD) rules of the ambiguous words through a web-interface available over internet. The system uses forward chaining of expert system to infer new language facts from the existing language data. It helps to solve the complex behavior of language translation by applying specific knowledge rather than specific technique creating a vast language knowledge base in electronic form. Or in other words, the expert system facilitates the transformation of subject matter expert's (SME) knowledge available with humans into a computer processable knowledge base.
{"title":"Anusaaraka: An expert system based machine translation system","authors":"Sriram Chaudhury, A. Rao, D. Sharma","doi":"10.1109/NLPKE.2010.5587789","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587789","url":null,"abstract":"Most research in Machine translation is about having the computers completely bear the load of translating one human language into another. This paper looks at the machine translation problem afresh and observes that there is a need to share the load between man and machine, distinguish reliable knowledge from the heuristics, provide a spectrum of outputs to serve different strata of people, and finally make use of existing resources instead of reinventing the wheel. This paper describes a unique approach to develop machine translation system based on the insights of information dynamics from Paninian Grammar Formalism. Anusaaraka is a Language Accessor cum Machine Translation system based on the fundamental premise of sharing the load producing good enough results according to the needs of the reader. The system promises to give faithful representation of the translated text, no loss of information while translating and graceful degradation (robustness) in case of failure. The layered output provides an access to all the stages of translation making the whole process transparent. Thus, Anusaaraka differs from the Machine Translation systems in two respects: (1) its commitment to faithfulness and thereby providing a layer of 100% faithful output so that a user with some training can “access the source text” faithfully. (2) The system is so designed that a user can contribute to it and participate in improving its quality. Further Anusaaraka provides an eclectic combination of the Apertium architecture with the forward chaining expert system, allowing use of both the deep parser and shallow parser outputs to analyze the SL text. Existing language resources (parsers, taggers, chunkers) available under GPL are used instead of rewriting it again. Language data and linguistic rules are independent from the core programme, making it easy for linguists to modify and experiment with different language phenomena to improve the system. Users can become contributors by contributing new word sense disambiguation (WSD) rules of the ambiguous words through a web-interface available over internet. The system uses forward chaining of expert system to infer new language facts from the existing language data. It helps to solve the complex behavior of language translation by applying specific knowledge rather than specific technique creating a vast language knowledge base in electronic form. Or in other words, the expert system facilitates the transformation of subject matter expert's (SME) knowledge available with humans into a computer processable knowledge base.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129896254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587794
M. Shimura, Fumiaki Monma, S. Mitsuyoshi, M. Shuzo, Taishi Yamamoto, I. Yamada
Recognition of human “emotions” or “feelings” from voice is important to research on human communications. Although there has been much research on emotions or feelings in voice, definitions of these terms have been inconsistent. We reviewed previous papers in linguistics, brain science, information science, etc. and developed specific definitions for these term. In our paper, “emotion” is defined as an involuntary reaction in the human brain; it has two states: pleasure and displeasure. “Feeling” (e.g., anger, enjoyment, sadness, fear, and distress) is defined as a state voluntarily resulting from an emotion. Here, we should notice that the pleasure-displeasure direction does not always correspond to the feeling. So, our objective is to obtain sufficient amount of voice data and to analyze the relationship between emotions and feelings. In voice recording experiments, the voice database from about 100 participants with various natural feelings was constructed. A result of descriptive analysis showed that pleasure-displeasure direction did not correspond to the each feeling in 5% of voice data. This result suggested that, if an experimental situation is constructed that tends to arouse various feelings, data with less variability can be obtained. Further analysis of the characteristics of the data obtained to identify situations in which the pleasure-displeasure direction does not necessarily correspond to the basic feeling should lead to improved accuracy of voice emotion recognition.
{"title":"Descriptive analysis of emotion and feeling in voice","authors":"M. Shimura, Fumiaki Monma, S. Mitsuyoshi, M. Shuzo, Taishi Yamamoto, I. Yamada","doi":"10.1109/NLPKE.2010.5587794","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587794","url":null,"abstract":"Recognition of human “emotions” or “feelings” from voice is important to research on human communications. Although there has been much research on emotions or feelings in voice, definitions of these terms have been inconsistent. We reviewed previous papers in linguistics, brain science, information science, etc. and developed specific definitions for these term. In our paper, “emotion” is defined as an involuntary reaction in the human brain; it has two states: pleasure and displeasure. “Feeling” (e.g., anger, enjoyment, sadness, fear, and distress) is defined as a state voluntarily resulting from an emotion. Here, we should notice that the pleasure-displeasure direction does not always correspond to the feeling. So, our objective is to obtain sufficient amount of voice data and to analyze the relationship between emotions and feelings. In voice recording experiments, the voice database from about 100 participants with various natural feelings was constructed. A result of descriptive analysis showed that pleasure-displeasure direction did not correspond to the each feeling in 5% of voice data. This result suggested that, if an experimental situation is constructed that tends to arouse various feelings, data with less variability can be obtained. Further analysis of the characteristics of the data obtained to identify situations in which the pleasure-displeasure direction does not necessarily correspond to the basic feeling should lead to improved accuracy of voice emotion recognition.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124413153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587849
Xiuting Duan, Tingting He, Le Song
Development of Blog texts information on the internet has brought new challenge to Chinese text classification. Aim to solving the semantics deficiency problem in traditional methods for Chinese text classification, this paper implements a text classification method on classifying a blog as joy, angry, sad or fear using a simple unsupervised learning algorithm. The classification of a blog text is predicted by the max semantic orientation (SO) of the phrases in the blog text that contains adjectives or adverbs. In this paper, the SO of a phrase is calculated as the mutual information between the given phrase and the polar words. Then the SO of the given blog text is determined by the max mutual information value. A blog text is classified as joy if the SO of its phrases is joy. Two different corpora are adopted to test our method, one is the Blog corpus collected by Monitor and Research Center for National Language Resource Network Multimedia Sub-branch Center, and the other is Chinese dataset provided by COAE2008 task. Based on the two datasets, the method respectively achieves a high improvement compared to the traditional methods.
{"title":"Research on sentiment classification of Blog based on PMI-IR","authors":"Xiuting Duan, Tingting He, Le Song","doi":"10.1109/NLPKE.2010.5587849","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587849","url":null,"abstract":"Development of Blog texts information on the internet has brought new challenge to Chinese text classification. Aim to solving the semantics deficiency problem in traditional methods for Chinese text classification, this paper implements a text classification method on classifying a blog as joy, angry, sad or fear using a simple unsupervised learning algorithm. The classification of a blog text is predicted by the max semantic orientation (SO) of the phrases in the blog text that contains adjectives or adverbs. In this paper, the SO of a phrase is calculated as the mutual information between the given phrase and the polar words. Then the SO of the given blog text is determined by the max mutual information value. A blog text is classified as joy if the SO of its phrases is joy. Two different corpora are adopted to test our method, one is the Blog corpus collected by Monitor and Research Center for National Language Resource Network Multimedia Sub-branch Center, and the other is Chinese dataset provided by COAE2008 task. Based on the two datasets, the method respectively achieves a high improvement compared to the traditional methods.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116011509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587821
Yanqiu Shao, Zhifang Sui, Ning Mao
Most of the semantic role labeling systems use syntactic analysis results to predict semantic roles. However, there are some problems that could not be well-done only by syntactic features. In this paper, lexical semantic features are extracted from some semantic dictionaries. Two typical lexical semantic dictionaries are used, TongYiCi CiLin and CSD. CiLin is built on convergent relationship and CSD is based on syntagmatic relationship. According to both of the dictionaries, two labeling models are set up, CiLin model and CSD model. Also, one pure syntactic model and one mixed model are built. The mixed model combines all of the syntactic and semantic features. The experimental results show that the application of different level of lexical semantic knowledge could help use some language inherent attributes and the knowledge could help to improve the performance of the system.
{"title":"Chinese semantic role labeling based on semantic knowledge","authors":"Yanqiu Shao, Zhifang Sui, Ning Mao","doi":"10.1109/NLPKE.2010.5587821","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587821","url":null,"abstract":"Most of the semantic role labeling systems use syntactic analysis results to predict semantic roles. However, there are some problems that could not be well-done only by syntactic features. In this paper, lexical semantic features are extracted from some semantic dictionaries. Two typical lexical semantic dictionaries are used, TongYiCi CiLin and CSD. CiLin is built on convergent relationship and CSD is based on syntagmatic relationship. According to both of the dictionaries, two labeling models are set up, CiLin model and CSD model. Also, one pure syntactic model and one mixed model are built. The mixed model combines all of the syntactic and semantic features. The experimental results show that the application of different level of lexical semantic knowledge could help use some language inherent attributes and the knowledge could help to improve the performance of the system.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114382717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587855
Yaohong Jin, Zhiying Liu
This paper presents a method using sentence segmentation to improve the performance of Chinese-English patent machine translation. In this method, long Chinese sentence was segmented into separated short sentences using some features from the Hierarchical Network of Concepts theory (HNC theory). Some semantic features are introduced, including main verb of CSC (Eg), main verb of CSP (Egp), long NPs and conjunctions. The main purpose of segmentation algorithm is to detect if one CSC can or cannot be a separate sentence. The segmentation method was integrated with a rule-base MT system. The sequence of these short translations was adjusted and the different ways of expressions in both Chinese and English languages also were in consideration. From the result of the experiments, we can see that the performance of the Chinese-English patent translation was improved effectively. Our method had been integrated into an online patent MT system running in SIPO.
{"title":"Improving Chinese-English patent machine translation using sentence segmentation","authors":"Yaohong Jin, Zhiying Liu","doi":"10.1109/NLPKE.2010.5587855","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587855","url":null,"abstract":"This paper presents a method using sentence segmentation to improve the performance of Chinese-English patent machine translation. In this method, long Chinese sentence was segmented into separated short sentences using some features from the Hierarchical Network of Concepts theory (HNC theory). Some semantic features are introduced, including main verb of CSC (Eg), main verb of CSP (Egp), long NPs and conjunctions. The main purpose of segmentation algorithm is to detect if one CSC can or cannot be a separate sentence. The segmentation method was integrated with a rule-base MT system. The sequence of these short translations was adjusted and the different ways of expressions in both Chinese and English languages also were in consideration. From the result of the experiments, we can see that the performance of the Chinese-English patent translation was improved effectively. Our method had been integrated into an online patent MT system running in SIPO.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"124 20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130009419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587830
Wei Wang, Motoyuki Suzuki, F. Ren
The research of texture similarity is very important component of content-based image retrieval system. Firstly the rotation invariance of gray-primitive co-occurrence matrix was proved in this paper, then a new texture image retrieval technique based on gray-primitive co-occurrence matrix was presented. The result of experiment indicates that the algorithm proposed has low computational complexity and certain noise resisting ability.
{"title":"Texture image retrieval based on gray-primitive co-occurrence matrix","authors":"Wei Wang, Motoyuki Suzuki, F. Ren","doi":"10.1109/NLPKE.2010.5587830","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587830","url":null,"abstract":"The research of texture similarity is very important component of content-based image retrieval system. Firstly the rotation invariance of gray-primitive co-occurrence matrix was proved in this paper, then a new texture image retrieval technique based on gray-primitive co-occurrence matrix was presented. The result of experiment indicates that the algorithm proposed has low computational complexity and certain noise resisting ability.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130347415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587793
Xin Kang, F. Ren, Yunong Wu
In this paper we demonstrate the effectiveness of employing basic sentiment components for analyzing the chief sentiment of Chinese sentence among nine categories of sentiments (including “No emotion”). Compared to traditional lexicon based methods, our research explores emotion intensities of words and phrases in an eight dimensional sentiment space as features. An emotion matrix kernel is designed to evaluate inner product of these sentiment features for SVM classification with O(n) time complexity. Experimental result shows our method significantly improves performance of sentiment classification.
{"title":"Bottom up: Exploring word emotions for Chinese sentence chief sentiment classification","authors":"Xin Kang, F. Ren, Yunong Wu","doi":"10.1109/NLPKE.2010.5587793","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587793","url":null,"abstract":"In this paper we demonstrate the effectiveness of employing basic sentiment components for analyzing the chief sentiment of Chinese sentence among nine categories of sentiments (including “No emotion”). Compared to traditional lexicon based methods, our research explores emotion intensities of words and phrases in an eight dimensional sentiment space as features. An emotion matrix kernel is designed to evaluate inner product of these sentiment features for SVM classification with O(n) time complexity. Experimental result shows our method significantly improves performance of sentiment classification.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"18 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130283550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/NLPKE.2010.5587858
Li Wang, E. Atlam, M. Fuketa, K. Morita, J. Aoe
In computational linguistics, word sense disambiguation is an open problem and is important in various aspects of natural language processing. However, the traditional methods using case frames and semantic primitives are not effective for solving context ambiguities that require information beyond sentences. This paper presents a new method of solving context ambiguities using a field association scheme that can determine the specified fields by using field association (FA) terms. In order to solve context ambiguities, the formal disambiguation algorithm is calculating the weight of fields in that scope by controlling the scope for a set of variable number of sentences. The accuracy of disambiguating the context ambiguities is improved 65% by applying the proposed field association knowledge.
{"title":"A new method for solving context ambiguities using field association knowledge","authors":"Li Wang, E. Atlam, M. Fuketa, K. Morita, J. Aoe","doi":"10.1109/NLPKE.2010.5587858","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587858","url":null,"abstract":"In computational linguistics, word sense disambiguation is an open problem and is important in various aspects of natural language processing. However, the traditional methods using case frames and semantic primitives are not effective for solving context ambiguities that require information beyond sentences. This paper presents a new method of solving context ambiguities using a field association scheme that can determine the specified fields by using field association (FA) terms. In order to solve context ambiguities, the formal disambiguation algorithm is calculating the weight of fields in that scope by controlling the scope for a set of variable number of sentences. The accuracy of disambiguating the context ambiguities is improved 65% by applying the proposed field association knowledge.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122567155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}