The linguistic competency structure which has been modeled as Linguistic Competency Model (LCM) is a new and an extensive work of discourse analysis of an Intentional Intelligent Agent. This model is able to explain explicitly the role of linguistic competency in a discourse and incorporated the intention as an embodied entity into the cognitive process of discourse. This model managed to address the hermeneutical process grammar as instrument of linguistic competency skill in the social interactive structure of discourse. This LCM has established a discourse structure which can semantically interpret and hermeneutically analyze the psychological temporal semantic- pragmatic linkages and movement of discourse. The embedded structure of physical spatio-temporal contiguity of verbals or events linkages also can be systematically associated with psychological temporal semantic- pragmatic movement in this LCM. Therefore this LCM model will give a new insight into an understanding of the composition and the characteristic of a hermeneutic discourse especially to define the linguistic competency in it.
{"title":"Linguistic Competency Model for Intentional Agent","authors":"Sivakumar Ramakrishnan, V. Mohanan","doi":"10.1109/IALP.2011.71","DOIUrl":"https://doi.org/10.1109/IALP.2011.71","url":null,"abstract":"The linguistic competency structure which has been modeled as Linguistic Competency Model (LCM) is a new and an extensive work of discourse analysis of an Intentional Intelligent Agent. This model is able to explain explicitly the role of linguistic competency in a discourse and incorporated the intention as an embodied entity into the cognitive process of discourse. This model managed to address the hermeneutical process grammar as instrument of linguistic competency skill in the social interactive structure of discourse. This LCM has established a discourse structure which can semantically interpret and hermeneutically analyze the psychological temporal semantic- pragmatic linkages and movement of discourse. The embedded structure of physical spatio-temporal contiguity of verbals or events linkages also can be systematically associated with psychological temporal semantic- pragmatic movement in this LCM. Therefore this LCM model will give a new insight into an understanding of the composition and the characteristic of a hermeneutic discourse especially to define the linguistic competency in it.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115194019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The cluster of search results can facilitate users in finding the needed from massive information. But the effect of the traditional text clustering has been verified not good enough. Lingo Algorithm, which adopts LSI for clustering, generates candidate labels first, then distributes the documents, and forms the clusters finally. On the basis of Lingo Algorithm, this paper presents a linear weighted method of Single-Pass improvement, which integrates HowNet semantic similarity and cosine similarity, fuses and rediscovers clusters, and extracting the cluster labels. The experiments have showed that our method it achieves a good results in clusters in the form of purity and F-measure.
{"title":"Search Results Clustering Based on a Linear Weighting Method of Similarity","authors":"Dequan Zheng, Haibo Liu, T. Zhao","doi":"10.1109/IALP.2011.72","DOIUrl":"https://doi.org/10.1109/IALP.2011.72","url":null,"abstract":"The cluster of search results can facilitate users in finding the needed from massive information. But the effect of the traditional text clustering has been verified not good enough. Lingo Algorithm, which adopts LSI for clustering, generates candidate labels first, then distributes the documents, and forms the clusters finally. On the basis of Lingo Algorithm, this paper presents a linear weighted method of Single-Pass improvement, which integrates HowNet semantic similarity and cosine similarity, fuses and rediscovers clusters, and extracting the cluster labels. The experiments have showed that our method it achieves a good results in clusters in the form of purity and F-measure.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126860780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a proposed ontological language processing system for integrating two semantic sets for a group of important verbs with the prefix Kain Yami, an Austronesian language in Taiwan. The two semantic sets represent two different classification approaches. One approach follows the concepts and rules of WordNet and the other uses the metaphors in Yami indigenous knowledge. The ontologies are used for classification and semantic integration. The results of implementation are used for building the Yami lexical database. This paper illustrates how the methodology and framework used in classifying Yami can be applied to Austronesia language processing.
{"title":"Two Ontological Approaches to Building an Intergrated Semantic Network for Yami ka-Verbs","authors":"Meng-Chien Yang, Si-Wei Huang, D. V. Rau","doi":"10.1109/IALP.2011.26","DOIUrl":"https://doi.org/10.1109/IALP.2011.26","url":null,"abstract":"This paper describes a proposed ontological language processing system for integrating two semantic sets for a group of important verbs with the prefix Kain Yami, an Austronesian language in Taiwan. The two semantic sets represent two different classification approaches. One approach follows the concepts and rules of WordNet and the other uses the metaphors in Yami indigenous knowledge. The ontologies are used for classification and semantic integration. The results of implementation are used for building the Yami lexical database. This paper illustrates how the methodology and framework used in classifying Yami can be applied to Austronesia language processing.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126378277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Coreference resolution is an important subtask in natural language processing systems. The process of it is to find whether two expressions in natural language refer to the same entity in the world. Machine learning approaches to this problem have been reasonably successful, operating primarily by recasting the problem as a classification task. A great deal of research has been done on this task in English, using approaches ranging from those based on linguistics to those based on machine learning. In Chinese, however, much less work has been done in this area. The lack of public resources is a big problem in the research of Chinese NLP. The other problem is that some features are more difficult to get than those features of English. In this paper, We present a noun phrase coreference system that refers to the work of Soon et al. (2001). We also explore the impact of various features on our system's performance. Experiments on the Chinese portion of OntoNotes 3.0 show that the platform achieves a good performance.
{"title":"Research of Noun Phrase Coreference Resolution","authors":"Junwei Gao, Fang Kong, Peifeng Li, Qiaoming Zhu","doi":"10.1109/IALP.2011.32","DOIUrl":"https://doi.org/10.1109/IALP.2011.32","url":null,"abstract":"Coreference resolution is an important subtask in natural language processing systems. The process of it is to find whether two expressions in natural language refer to the same entity in the world. Machine learning approaches to this problem have been reasonably successful, operating primarily by recasting the problem as a classification task. A great deal of research has been done on this task in English, using approaches ranging from those based on linguistics to those based on machine learning. In Chinese, however, much less work has been done in this area. The lack of public resources is a big problem in the research of Chinese NLP. The other problem is that some features are more difficult to get than those features of English. In this paper, We present a noun phrase coreference system that refers to the work of Soon et al. (2001). We also explore the impact of various features on our system's performance. Experiments on the Chinese portion of OntoNotes 3.0 show that the platform achieves a good performance.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124626752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parallel corpora are important resources in data-driven natural language processing domain. Concerning the issues such as the scale, comprehensiveness and timeliness, the existing Chinese-Mongolian parallel corpora are significantly limited in practical use. Reviewing the traditional heuristic information used to identify major languages parallel web pages, this paper focuses on exploring new heuristic information to improve the performance of identifying Chinese-Mongolian parallel pages. Based on these heuristics, support vector machine is used to classify webs as parallel pages or non-parallel pages. Experiment gains a precision rate of 95% and a recall rate of 88%. This paper makes preliminary research in automatically constructing minority languages parallel corpora from the web.
{"title":"Automatic Construction of Chinese-Mongolian Parallel Corpora from the Web Based on the New Heuristic Information","authors":"Zede Zhu, Miao Li, Lei Chen, Shouguo Zheng","doi":"10.1109/IALP.2011.17","DOIUrl":"https://doi.org/10.1109/IALP.2011.17","url":null,"abstract":"Parallel corpora are important resources in data-driven natural language processing domain. Concerning the issues such as the scale, comprehensiveness and timeliness, the existing Chinese-Mongolian parallel corpora are significantly limited in practical use. Reviewing the traditional heuristic information used to identify major languages parallel web pages, this paper focuses on exploring new heuristic information to improve the performance of identifying Chinese-Mongolian parallel pages. Based on these heuristics, support vector machine is used to classify webs as parallel pages or non-parallel pages. Experiment gains a precision rate of 95% and a recall rate of 88%. This paper makes preliminary research in automatically constructing minority languages parallel corpora from the web.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":" 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133121358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an approach to translate element sub-sentences which widely exist in Chinese patent documents. Element sub-sentence is a kind of language chunk in a sentence in which one part of sub-sentence is the headword and others are attributives, or the modifier-head phrase of VP+NP or NP+VP structure. Element sub-sentence can be divided into three types in HNC theory: EK (predicate) sub-sentence, GBK1 (subject) sub-sentence and GBK2 (object) sub-sentence. In this paper, we give the method to detect the sub-sentence boundaries, analyze the characteristic of each type of element sub-sentences from the structure and semantics, and discover the Chinese-English translation rules of each type. By using the processing strategies most of the Chinese-English translation problem about the element sub-sentence can be perfectly solvable on an online patent MT system in SIPO.
{"title":"Research on Element Sub-sentence in Chinese-English Patent Machine Translation","authors":"Zhiying Liu, Yaohong Jin, Yu-huan Chi","doi":"10.1109/IALP.2011.29","DOIUrl":"https://doi.org/10.1109/IALP.2011.29","url":null,"abstract":"This paper presents an approach to translate element sub-sentences which widely exist in Chinese patent documents. Element sub-sentence is a kind of language chunk in a sentence in which one part of sub-sentence is the headword and others are attributives, or the modifier-head phrase of VP+NP or NP+VP structure. Element sub-sentence can be divided into three types in HNC theory: EK (predicate) sub-sentence, GBK1 (subject) sub-sentence and GBK2 (object) sub-sentence. In this paper, we give the method to detect the sub-sentence boundaries, analyze the characteristic of each type of element sub-sentences from the structure and semantics, and discover the Chinese-English translation rules of each type. By using the processing strategies most of the Chinese-English translation problem about the element sub-sentence can be perfectly solvable on an online patent MT system in SIPO.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132203824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the natural language processing and its related fields, the classic text representation methods seldom consider the role of the words order and long-distance dependency in the texts for the semantic representation. In this paper, we discussed current situation and problems of the statistical language models, especially for Head-driven statistical language model and Head-driven Phrase Structure Grammar (HPSG). And then the development and realization methods of the long-distance dependency language model simply introduced. At last graph-based long-distance dependency language model was proposed in the paper.
{"title":"Graph-Based Language Model of Long-Distance Dependency","authors":"Faguo Zhou, Xingang Yu","doi":"10.1109/IALP.2011.49","DOIUrl":"https://doi.org/10.1109/IALP.2011.49","url":null,"abstract":"In the natural language processing and its related fields, the classic text representation methods seldom consider the role of the words order and long-distance dependency in the texts for the semantic representation. In this paper, we discussed current situation and problems of the statistical language models, especially for Head-driven statistical language model and Head-driven Phrase Structure Grammar (HPSG). And then the development and realization methods of the long-distance dependency language model simply introduced. At last graph-based long-distance dependency language model was proposed in the paper.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128521045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In reality, different persons often have the same person name. The Person Cross Document Co-reference Resolution is a task, which requires that all and only the textual mentions of an entity of type Person be individuated in a collection of text documents. In this paper, we implement a Chinese Person Name Cross Document Co-reference Resolution System. First, we utilize name identification module to recognize all person names of the texts, and then classify the document collection of same person name by rules preliminarily, and at last, we compute similarities of each classification based on VSM, according to the prior similarities, the system get the final classification results. We test the system on 30 usual Chinese names of the corpus provided by CLP, and average F measure is 85.9%.
{"title":"Research on Cross-Document Coreference of Chinese Person Name","authors":"Ji Ni, Fang Kong, Peifeng Li, Qiaoming Zhu","doi":"10.1109/IALP.2011.30","DOIUrl":"https://doi.org/10.1109/IALP.2011.30","url":null,"abstract":"In reality, different persons often have the same person name. The Person Cross Document Co-reference Resolution is a task, which requires that all and only the textual mentions of an entity of type Person be individuated in a collection of text documents. In this paper, we implement a Chinese Person Name Cross Document Co-reference Resolution System. First, we utilize name identification module to recognize all person names of the texts, and then classify the document collection of same person name by rules preliminarily, and at last, we compute similarities of each classification based on VSM, according to the prior similarities, the system get the final classification results. We test the system on 30 usual Chinese names of the corpus provided by CLP, and average F measure is 85.9%.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115170721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hossein Kamyar, M. Kahani, Mohsen Kamyar, Asef Poormasoomi
In this paper we propose a novel technique for summarizing a text based on the linguistics properties of text elements and semantic chains among them. In most summarization approaches, the major consideration is the statistical properties of text elements such as term frequency. Here we use centering theory which helps us to recognize semantic chains in a text, for proposing a new automatic single document summarization approach. For processing a text by centering theory and extracting a coherent summery, a processing pipeline should be constructed. This pipeline consists of several components such as co-reference resolution, semantic role labeling and POS [Part of speech] tagging.
{"title":"An Automatic Linguistics Approach for Persian Document Summarization","authors":"Hossein Kamyar, M. Kahani, Mohsen Kamyar, Asef Poormasoomi","doi":"10.1109/IALP.2011.52","DOIUrl":"https://doi.org/10.1109/IALP.2011.52","url":null,"abstract":"In this paper we propose a novel technique for summarizing a text based on the linguistics properties of text elements and semantic chains among them. In most summarization approaches, the major consideration is the statistical properties of text elements such as term frequency. Here we use centering theory which helps us to recognize semantic chains in a text, for proposing a new automatic single document summarization approach. For processing a text by centering theory and extracting a coherent summery, a processing pipeline should be constructed. This pipeline consists of several components such as co-reference resolution, semantic role labeling and POS [Part of speech] tagging.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"7 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113933028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although the "grammatical rule + dictionary" is the traditional pattern for natural language processing, it can be hard to explain the combination of words in language. If all word combinations are entered into a database, the grammar and the information system would be simplified. The necessity, the methods and the principles of establishing the phrase information database of Uyghur language will be discussed in the paper on the basis of the review of the "little grammar in the big word storehouse".
{"title":"Research on the Uyghur Information Database for Information Processing","authors":"Yusup Ebeydulla, Hesenjan Abliz, Azragul Yusup","doi":"10.1109/IALP.2011.79","DOIUrl":"https://doi.org/10.1109/IALP.2011.79","url":null,"abstract":"Although the \"grammatical rule + dictionary\" is the traditional pattern for natural language processing, it can be hard to explain the combination of words in language. If all word combinations are entered into a database, the grammar and the information system would be simplified. The necessity, the methods and the principles of establishing the phrase information database of Uyghur language will be discussed in the paper on the basis of the review of the \"little grammar in the big word storehouse\".","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114541176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}