Pub Date : 2011-12-01DOI: 10.30019/IJCLCLP.201112.0005
D. Lin, Shelley Ching-Yu Hsieh
This paper presents a corpus-driven linguistic approach to embodiment in modern patent language as a contribution to the growing needs in intellectual property rights. While there is work that appears to fill a niche in English for Specific Purposes (ESP), the present study suggests that a statistical retrieval approach is necessary for compiling a patent technical word list to expand learner vocabulary size. Since a significant percentage of technical vocabulary appears within the range of independent claim among claim lexis, this study examines the essential features to show how it was characterized with respect to the linguistic specificity of patent style. It is further demonstrated how the proposed approach to the term independent claim contained in the patent specification is reliable for patent application on an international level. For example, clausal types that specify how clauses are used in U.S. patent documents under co-occurrence relations are potential for patent writing, while verb-noun collocations allow learners to grip hidden semantic prosodic associations. In short, the research content and statistical investigations of our approach highlight the pedagogical value of Patent English for ESP teachers, applied linguists, and the development of interdisciplinary research.
{"title":"Characteristics of Independent Claim: A Corpus-Linguistic Approach to Contemporary English Patents","authors":"D. Lin, Shelley Ching-Yu Hsieh","doi":"10.30019/IJCLCLP.201112.0005","DOIUrl":"https://doi.org/10.30019/IJCLCLP.201112.0005","url":null,"abstract":"This paper presents a corpus-driven linguistic approach to embodiment in modern patent language as a contribution to the growing needs in intellectual property rights. While there is work that appears to fill a niche in English for Specific Purposes (ESP), the present study suggests that a statistical retrieval approach is necessary for compiling a patent technical word list to expand learner vocabulary size. Since a significant percentage of technical vocabulary appears within the range of independent claim among claim lexis, this study examines the essential features to show how it was characterized with respect to the linguistic specificity of patent style. It is further demonstrated how the proposed approach to the term independent claim contained in the patent specification is reliable for patent application on an international level. For example, clausal types that specify how clauses are used in U.S. patent documents under co-occurrence relations are potential for patent writing, while verb-noun collocations allow learners to grip hidden semantic prosodic associations. In short, the research content and statistical investigations of our approach highlight the pedagogical value of Patent English for ESP teachers, applied linguists, and the development of interdisciplinary research.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123587887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-06-01DOI: 10.30019/IJCLCLP.201106.0003
Chao-Lin Liu, Guantao Jin, Qingfeng Liu, W. Chiu, Yih-Soong Yu
We report applications of language technology to analyzing historical documents in the Database for the Study of Modern Chinese Thoughts and Literature (DSMCTL). We studied two historical issues with the reported techniques: the conceptualization of "huaren" (Chinese people) and the attempt to institute constitutional monarchy in the late Qing dynasty. We also discuss research challenges for supporting sophisticated issues using our experience with DSMCTL, the Database of Government Officials of the Republic of China, and the Dream of the Red Chamber. Advanced techniques and tools for lexical, syntactic, semantic, and pragmatic processing of language information, along with more thorough data collection, are needed to strengthen the collaboration between historians and computer scientists.
{"title":"Some Chances and Challenges in Applying Language Technologies to Historical Studies in Chinese","authors":"Chao-Lin Liu, Guantao Jin, Qingfeng Liu, W. Chiu, Yih-Soong Yu","doi":"10.30019/IJCLCLP.201106.0003","DOIUrl":"https://doi.org/10.30019/IJCLCLP.201106.0003","url":null,"abstract":"We report applications of language technology to analyzing historical documents in the Database for the Study of Modern Chinese Thoughts and Literature (DSMCTL). We studied two historical issues with the reported techniques: the conceptualization of \"huaren\" (Chinese people) and the attempt to institute constitutional monarchy in the late Qing dynasty. We also discuss research challenges for supporting sophisticated issues using our experience with DSMCTL, the Database of Government Officials of the Republic of China, and the Dream of the Red Chamber. Advanced techniques and tools for lexical, syntactic, semantic, and pragmatic processing of language information, along with more thorough data collection, are needed to strengthen the collaboration between historians and computer scientists.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115548278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-06-01DOI: 10.30019/IJCLCLP.201106.0001
Wei-Ho Tsai, Hsin-Chieh Lee
Automatic speaker-identification (SID) has long been an important research topic. It is aimed at identifying who among a set of enrolled persons spoke a given utterance. This study extends the conventional SID problem to examining if an SID system trained using speech data can identify the singing voices of the enrolled persons. Our experiment found that a standard SID system fails to identify most singing data, due to the significant differences between singing and speaking for a majority of people. In order for an SID system to handle both speech and singing data, we examine the feasibility of using model-adaptation strategy to enhance the generalization of a standard SID. Our experiments show that a majority of the singing clips can be correctly identified after adapting speech-derived voice models with some singing data.
{"title":"Performance Evaluation of Speaker-Identification Systems for Singing Voice Data","authors":"Wei-Ho Tsai, Hsin-Chieh Lee","doi":"10.30019/IJCLCLP.201106.0001","DOIUrl":"https://doi.org/10.30019/IJCLCLP.201106.0001","url":null,"abstract":"Automatic speaker-identification (SID) has long been an important research topic. It is aimed at identifying who among a set of enrolled persons spoke a given utterance. This study extends the conventional SID problem to examining if an SID system trained using speech data can identify the singing voices of the enrolled persons. Our experiment found that a standard SID system fails to identify most singing data, due to the significant differences between singing and speaking for a majority of people. In order for an SID system to handle both speech and singing data, we examine the feasibility of using model-adaptation strategy to enhance the generalization of a standard SID. Our experiments show that a majority of the singing clips can be correctly identified after adapting speech-derived voice models with some singing data.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131403423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-01DOI: 10.30019/IJCLCLP.201009.0003
Wei-Ti Kuo, Chao-Shainn Huang, Chao-Lin Liu
We investigated the problem of classifying short essays used in comprehension tests for senior high school students in Taiwan. The tests were for first and second year students, so the answers included only four categories, each for one semester of the first two years. A random-guess approach would achieve only 25% in accuracy for our problem. We analyzed three publicly available scores for readability, but did not find them directly applicable. By considering a wide array of features at the levels of word, sentence, and essay, we gradually improved the F measure achieved by our classifiers from 0.381 to 0.536.
{"title":"Using Linguistic Features to Predict Readability of Short Essays for Senior High School Students in Taiwan","authors":"Wei-Ti Kuo, Chao-Shainn Huang, Chao-Lin Liu","doi":"10.30019/IJCLCLP.201009.0003","DOIUrl":"https://doi.org/10.30019/IJCLCLP.201009.0003","url":null,"abstract":"We investigated the problem of classifying short essays used in comprehension tests for senior high school students in Taiwan. The tests were for first and second year students, so the answers included only four categories, each for one semester of the first two years. A random-guess approach would achieve only 25% in accuracy for our problem. We analyzed three publicly available scores for readability, but did not find them directly applicable. By considering a wide array of features at the levels of word, sentence, and essay, we gradually improved the F measure achieved by our classifiers from 0.381 to 0.536.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116076045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-01DOI: 10.30019/IJCLCLP.201009.0004
Anta Huang, Tsung-Ting Kuo, Ying-Chun Lai, Shou-de Lin
This paper describes a framework that extracts effective correction rules from a sentence-aligned corpus and shows a practical application: auto-editing using the discovered rules. The framework exploits the methodology of finding the Levenshtein distance between sentences to identify the key parts of the rules and uses the editing corpus to filter, condense, and refine the rules. We have produced the rule candidates of such form, A → B, where A stands for the erroneous pattern and B for the correct pattern.The developed framework is language independent; therefore, it can be applied to other languages. The evaluation of the discovered rules reveals that 67.2% of the top 1500 ranked rules are annotated as correct or mostly correct by experts. Based on the rules, we have developed an online auto-editing system for demonstration at http://ppt.cc/02yY.
{"title":"Discovering Correction Rules for Auto Editing","authors":"Anta Huang, Tsung-Ting Kuo, Ying-Chun Lai, Shou-de Lin","doi":"10.30019/IJCLCLP.201009.0004","DOIUrl":"https://doi.org/10.30019/IJCLCLP.201009.0004","url":null,"abstract":"This paper describes a framework that extracts effective correction rules from a sentence-aligned corpus and shows a practical application: auto-editing using the discovered rules. The framework exploits the methodology of finding the Levenshtein distance between sentences to identify the key parts of the rules and uses the editing corpus to filter, condense, and refine the rules. We have produced the rule candidates of such form, A → B, where A stands for the erroneous pattern and B for the correct pattern.The developed framework is language independent; therefore, it can be applied to other languages. The evaluation of the discovered rules reveals that 67.2% of the top 1500 ranked rules are annotated as correct or mostly correct by experts. Based on the rules, we have developed an online auto-editing system for demonstration at http://ppt.cc/02yY.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116241475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-01DOI: 10.30019/IJCLCLP.201009.0002
Liang-Chih Yu, Chung-Hsien Wu, Jui-Feng Yeh
Word sense disambiguation (WSD) is a technique used to identify the correct sense of polysemous words, and it is useful for many applications, such as machine translation (MT), lexical substitution, information retrieval (IR), and biomedical applications. In this paper, we propose the use of multiple contextual features, including the predicate-argument structure and named entities, to train two commonly used classifiers, Naive Bayes (NB) and Maximum Entropy (ME), for word sense disambiguation. Experiments are conducted to evaluate the classifiers' performance on the OntoNotes corpus and are compared with classifiers trained using a set of baseline features, such as the bag-of-words, n-grams, and part-of-speech (POS) tags. Experimental results show that incorporating both predicate-argument structure and named entities yields higher classification accuracy for both classifiers than does the use of the baseline features, resulting in accuracy as high as 81.6% and 87.4%, respectively, for NB and ME.
{"title":"Word Sense Disambiguation Using Multiple Contextual Features","authors":"Liang-Chih Yu, Chung-Hsien Wu, Jui-Feng Yeh","doi":"10.30019/IJCLCLP.201009.0002","DOIUrl":"https://doi.org/10.30019/IJCLCLP.201009.0002","url":null,"abstract":"Word sense disambiguation (WSD) is a technique used to identify the correct sense of polysemous words, and it is useful for many applications, such as machine translation (MT), lexical substitution, information retrieval (IR), and biomedical applications. In this paper, we propose the use of multiple contextual features, including the predicate-argument structure and named entities, to train two commonly used classifiers, Naive Bayes (NB) and Maximum Entropy (ME), for word sense disambiguation. Experiments are conducted to evaluate the classifiers' performance on the OntoNotes corpus and are compared with classifiers trained using a set of baseline features, such as the bag-of-words, n-grams, and part-of-speech (POS) tags. Experimental results show that incorporating both predicate-argument structure and named entities yields higher classification accuracy for both classifiers than does the use of the baseline features, resulting in accuracy as high as 81.6% and 87.4%, respectively, for NB and ME.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122048979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-06-01DOI: 10.30019/IJCLCLP.201006.0003
Yong-Zhi Chen, Shih-Hung Wu, Ping-Che Yang, Tsun Ku
In this paper, we propose a system that automatically generates templates for detecting Chinese character errors. We first collect the confusion sets for each high-frequency Chinese character. Error types include pronunciation-related errors and radical-related errors. With the help of the confusion sets, our system generates possible error patterns in context, which will be used as detection templates. Combined with a word segmentation module, our system generates more accurate templates. The experimental results show the precision of performance approaches 95%. Such a system should not only help teachers grade and check student essays, but also effectively help students learn how to write.
{"title":"Improving the Template Generation for Chinese Character Error Detection with Confusion Sets","authors":"Yong-Zhi Chen, Shih-Hung Wu, Ping-Che Yang, Tsun Ku","doi":"10.30019/IJCLCLP.201006.0003","DOIUrl":"https://doi.org/10.30019/IJCLCLP.201006.0003","url":null,"abstract":"In this paper, we propose a system that automatically generates templates for detecting Chinese character errors. We first collect the confusion sets for each high-frequency Chinese character. Error types include pronunciation-related errors and radical-related errors. With the help of the confusion sets, our system generates possible error patterns in context, which will be used as detection templates. Combined with a word segmentation module, our system generates more accurate templates. The experimental results show the precision of performance approaches 95%. Such a system should not only help teachers grade and check student essays, but also effectively help students learn how to write.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126696020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-06-01DOI: 10.30019/IJCLCLP.201006.0002
Le Quan Ha, Trần Thị Thanh Vân, Hoang Tien Long, N. H. Tinh, N. Tham, Le Trong Ngoc
It is shown that the enormous improvement in the size of disk storage space in recent years can be used to build individual word-domain statistical language models, one for each significant word of a language that contributes to the context of the text. Each of these word-domain language models is a precise domain model for the relevant significant word; when combined appropriately, they provide a highly specific domain language model for the language following a cache, even a short cache. Our individual word probability and frequency models have been constructed and tested in the Vietnamese and English languages. For English, we employed the Wall Street Journal corpus of 40 million English word tokens; for Vietnamese, we used the QUB corpus of 6.5 million tokens. Our testing methods used a priori and a posteriori approaches. Finally, we explain adjustment of a previously exaggerated prediction of the potential power of a posteriori models. Accurate improvements in perplexity for 14 kinds of individual word language models have been obtained in tests, (i) between 33.9% and 53.34% for Vietnamese and (ii) between 30.78% and 44.5% for English, over a baseline global tri-gram weighted average model. For both languages, the best a posteriori model is the a posteriori weighted frequency model of 44.5% English perplexity improvement and 53.34% Vietnamese perplexity improvement. In addition, five Vietnamese a posteriori models were tested to obtain from 9.9% to 16.8% word-error-rate (WER) reduction over a Katz trigram model by the same Vietnamese speech decoder.
{"title":"A Posteriori Individual Word Language Models for Vietnamese Language","authors":"Le Quan Ha, Trần Thị Thanh Vân, Hoang Tien Long, N. H. Tinh, N. Tham, Le Trong Ngoc","doi":"10.30019/IJCLCLP.201006.0002","DOIUrl":"https://doi.org/10.30019/IJCLCLP.201006.0002","url":null,"abstract":"It is shown that the enormous improvement in the size of disk storage space in recent years can be used to build individual word-domain statistical language models, one for each significant word of a language that contributes to the context of the text. Each of these word-domain language models is a precise domain model for the relevant significant word; when combined appropriately, they provide a highly specific domain language model for the language following a cache, even a short cache. Our individual word probability and frequency models have been constructed and tested in the Vietnamese and English languages. For English, we employed the Wall Street Journal corpus of 40 million English word tokens; for Vietnamese, we used the QUB corpus of 6.5 million tokens. Our testing methods used a priori and a posteriori approaches. Finally, we explain adjustment of a previously exaggerated prediction of the potential power of a posteriori models. Accurate improvements in perplexity for 14 kinds of individual word language models have been obtained in tests, (i) between 33.9% and 53.34% for Vietnamese and (ii) between 30.78% and 44.5% for English, over a baseline global tri-gram weighted average model. For both languages, the best a posteriori model is the a posteriori weighted frequency model of 44.5% English perplexity improvement and 53.34% Vietnamese perplexity improvement. In addition, five Vietnamese a posteriori models were tested to obtain from 9.9% to 16.8% word-error-rate (WER) reduction over a Katz trigram model by the same Vietnamese speech decoder.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116051060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.30019/IJCLCLP.200912.0002
H. Gu, Sung-Feng Tsai
Approximating a spectral envelope via regularized discrete cepstrum coefficients has been proposed by previous researchers. In this paper, we study two problems encountered in practice when adopting this approach to estimate the spectral envelope. The first is which spectral peaks should be selected, and the second is which frequency axis scaling function should be adopted. After some efforts of trying and experiments, we propose two feasible solution methods for these two problems. Then, we combine these solution methods with the methods for regularizing and computing discrete cepstrum coefficients to form a spectral-envelope estimation scheme. This scheme has been verified, by measuring spectral-envelope approximation error, as being much better than the original scheme. Furthermore, we have applied this scheme to building a system for voice timbre transformation. The performance of this system demonstrates the effectiveness of the proposed spectral-envelope estimation scheme.
{"title":"A Discrete-cepstrum Based Spectrum-envelope Estimation Scheme and Its Example Application of Voice Transformation","authors":"H. Gu, Sung-Feng Tsai","doi":"10.30019/IJCLCLP.200912.0002","DOIUrl":"https://doi.org/10.30019/IJCLCLP.200912.0002","url":null,"abstract":"Approximating a spectral envelope via regularized discrete cepstrum coefficients has been proposed by previous researchers. In this paper, we study two problems encountered in practice when adopting this approach to estimate the spectral envelope. The first is which spectral peaks should be selected, and the second is which frequency axis scaling function should be adopted. After some efforts of trying and experiments, we propose two feasible solution methods for these two problems. Then, we combine these solution methods with the methods for regularizing and computing discrete cepstrum coefficients to form a spectral-envelope estimation scheme. This scheme has been verified, by measuring spectral-envelope approximation error, as being much better than the original scheme. Furthermore, we have applied this scheme to building a system for voice timbre transformation. The performance of this system demonstrates the effectiveness of the proposed spectral-envelope estimation scheme.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114177121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.30019/IJCLCLP.200909.0004
Cheng-Hsien Chen
Taking Mandarin Possessive Construction (MPC) as an example, the present study investigates the relation between lexicon and constructional schemas in a quantitative corpus linguistic approach. We argue that the wide use of raw frequency distribution in traditional corpus linguistic studies may undermine the validity of the results and reduce the possibility for interdisciplinary communication. Furthermore, several methodological issues in traditional corpus linguistics are discussed. To mitigate the impact of these issues, we utilize phylogenic hierarchical clustering to identify semantic classes of the possessor NPs, thereby reducing the subjectivity in categorization that most traditional corpus linguistic studies suffer from. It is hoped that our rigorous endeavor in methodology may have far-reaching implications for theory in usage-based approaches to language and cognition.
{"title":"Corpus, Lexicon, and Construction: A Quantitative Corpus Approach to Mandarin Possessive Construction","authors":"Cheng-Hsien Chen","doi":"10.30019/IJCLCLP.200909.0004","DOIUrl":"https://doi.org/10.30019/IJCLCLP.200909.0004","url":null,"abstract":"Taking Mandarin Possessive Construction (MPC) as an example, the present study investigates the relation between lexicon and constructional schemas in a quantitative corpus linguistic approach. We argue that the wide use of raw frequency distribution in traditional corpus linguistic studies may undermine the validity of the results and reduce the possibility for interdisciplinary communication. Furthermore, several methodological issues in traditional corpus linguistics are discussed. To mitigate the impact of these issues, we utilize phylogenic hierarchical clustering to identify semantic classes of the possessor NPs, thereby reducing the subjectivity in categorization that most traditional corpus linguistic studies suffer from. It is hoped that our rigorous endeavor in methodology may have far-reaching implications for theory in usage-based approaches to language and cognition.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131616859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}