Pub Date : 2019-11-01DOI: 10.1109/IALP48816.2019.9037703
Gulnur Arkin, Gvljan Alijan, A. Hamdulla, Mijit Ablimit
In this paper, based on the vowel and phonological pronunciation corpora of 20 Kazakh undergraduate Mandarin learners, 10 Uyghur learners, and 10 standard pronunciations, under the framework of the phonetic learning model and comparative analysis, the method of experimental phonetics will be applied to the Kazak and Uyghur learners. The learners and standard speaker Mandarin vowels were analyzed for acoustic characteristics, such as formant frequency values, the vowel duration similarity and other prosodic parameters were compared with the standard speaker. These results are conducive to providing learners with effective teaching-related reference information, providing reliable and correct parameters and pronunciation assessments for computer-assisted language teaching systems (CALLs), as well as improving the accuracy of multinational Chinese Putonghua speech recognition and ethnic identification.
{"title":"A Comparative Analysis of Acoustic Characteristics between Kazak & Uyghur Mandarin Learners and Standard Mandarin Speakers","authors":"Gulnur Arkin, Gvljan Alijan, A. Hamdulla, Mijit Ablimit","doi":"10.1109/IALP48816.2019.9037703","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037703","url":null,"abstract":"In this paper, based on the vowel and phonological pronunciation corpora of 20 Kazakh undergraduate Mandarin learners, 10 Uyghur learners, and 10 standard pronunciations, under the framework of the phonetic learning model and comparative analysis, the method of experimental phonetics will be applied to the Kazak and Uyghur learners. The learners and standard speaker Mandarin vowels were analyzed for acoustic characteristics, such as formant frequency values, the vowel duration similarity and other prosodic parameters were compared with the standard speaker. These results are conducive to providing learners with effective teaching-related reference information, providing reliable and correct parameters and pronunciation assessments for computer-assisted language teaching systems (CALLs), as well as improving the accuracy of multinational Chinese Putonghua speech recognition and ethnic identification.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"53 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130861347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/IALP48816.2019.9037654
Huibin Zhuang, Zhanting Bu
In Chinese he 河 ‘river’ can be used as proper names (for the Yellow River), as well as a common word for rivers in North China. Based on linguistic data, ethnological evidence and historical documents, this paper argues against these leading hypotheses and proposes that he originated from the Old Yi language, entered Chinese through language contact, and replaced shui which was from Old Qiang and later became the only common noun for river in North China.
{"title":"On the Etymology of he ‘river’ in Chinese","authors":"Huibin Zhuang, Zhanting Bu","doi":"10.1109/IALP48816.2019.9037654","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037654","url":null,"abstract":"In Chinese he 河 ‘river’ can be used as proper names (for the Yellow River), as well as a common word for rivers in North China. Based on linguistic data, ethnological evidence and historical documents, this paper argues against these leading hypotheses and proposes that he originated from the Old Yi language, entered Chinese through language contact, and replaced shui which was from Old Qiang and later became the only common noun for river in North China.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122344303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/IALP48816.2019.9037663
Shichen Liang, Jianyu Zheng, Xuemei Tang, Renfen Hu, Zhiying Liu
In recent years, there has been a large number of publications that use distributed methods to track temporal changes in lexical semantics. However, most current researches only state the simple fact that the meaning of words has changed, lacking more detailed and in-depth analysis. We combine linguistic theory and word embedding model to study Chinese diachronic semantics. Specifically, two methods of word analogy and word similarity are associated with diachronic synonymy and diachronic polysemy respectively, and the aligned diachronic word embeddings are used to detect the changes of relationship between forms and meanings of words. Through experiments and case studies, our method achieves the ideal result. We also find that the evolution of Chinese vocabulary is closely related to social development, and there is a certain correlation between the polysemy and synonymy of the word meaning.
{"title":"Diachronic Synonymy and Polysemy: Exploring Dynamic Relation Between Forms and Meanings of Words Based on Word Embeddings","authors":"Shichen Liang, Jianyu Zheng, Xuemei Tang, Renfen Hu, Zhiying Liu","doi":"10.1109/IALP48816.2019.9037663","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037663","url":null,"abstract":"In recent years, there has been a large number of publications that use distributed methods to track temporal changes in lexical semantics. However, most current researches only state the simple fact that the meaning of words has changed, lacking more detailed and in-depth analysis. We combine linguistic theory and word embedding model to study Chinese diachronic semantics. Specifically, two methods of word analogy and word similarity are associated with diachronic synonymy and diachronic polysemy respectively, and the aligned diachronic word embeddings are used to detect the changes of relationship between forms and meanings of words. Through experiments and case studies, our method achieves the ideal result. We also find that the evolution of Chinese vocabulary is closely related to social development, and there is a certain correlation between the polysemy and synonymy of the word meaning.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126248466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/IALP48816.2019.9037694
Joseph Marvin Imperial, R. Roxas, Erica Mae Campos, Jemelee Oandasan, Reyniel Caraballo, Ferry Winsley Sabdani, Ani Rosa Almaroi
Reading is an essential part of children’s learning. Identifying the proper readability level of reading materials will ensure effective comprehension. We present our efforts to develop a baseline model for automatically identifying the readability of children’s and young adult’s books written in Filipino using machine learning algorithms. For this study, we processed 258 picture books published by Adarna House Inc. In contrast to old readability formulas relying on static attributes like number of words, sentences, syllables, etc., other textual features were explored. Count vectors, Term FrequencyInverse Document Frequency (TF-IDF), n-grams, and character-level n-grams were extracted to train models using three major machine learning algorithms–Multinomial Naïve-Bayes, Random Forest, and K-Nearest Neighbors. A combination of K-Nearest Neighbors and Random Forest via voting-based classification mechanism resulted with the best performing model with a high average training accuracy and validation accuracy of 0.822 and 0.74 respectively. Analysis of the top 10 most useful features for each algorithm show that they share common similarity in identifying readability levels–the use of Filipino stop words. Performance of other classifiers and features were also explored.
阅读是儿童学习的重要组成部分。确定阅读材料的适当可读性水平将确保有效理解。我们展示了我们的努力,开发一个基线模型,用于使用机器学习算法自动识别用菲律宾语编写的儿童和青少年书籍的可读性。在这项研究中,我们处理了由Adarna House Inc.出版的258本绘本。与以往依赖单词数、句子数、音节数等静态属性的可读性公式不同,我们探索了其他文本特征。提取计数向量、Term Frequency、inverse Document Frequency (TF-IDF)、n-gram和字符级n-gram,并使用三种主要的机器学习算法(multinomial Naïve-Bayes、Random Forest和K-Nearest Neighbors)训练模型。通过基于投票的分类机制,将k近邻与随机森林相结合,得到了最佳的模型,平均训练精度和验证精度分别达到0.822和0.74。对每个算法最有用的前10个特征的分析表明,它们在识别可读性水平上有共同的相似性——使用菲律宾语停顿词。对其他分类器和特征的性能也进行了探讨。
{"title":"Developing a machine learning-based grade level classifier for Filipino children’s literature","authors":"Joseph Marvin Imperial, R. Roxas, Erica Mae Campos, Jemelee Oandasan, Reyniel Caraballo, Ferry Winsley Sabdani, Ani Rosa Almaroi","doi":"10.1109/IALP48816.2019.9037694","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037694","url":null,"abstract":"Reading is an essential part of children’s learning. Identifying the proper readability level of reading materials will ensure effective comprehension. We present our efforts to develop a baseline model for automatically identifying the readability of children’s and young adult’s books written in Filipino using machine learning algorithms. For this study, we processed 258 picture books published by Adarna House Inc. In contrast to old readability formulas relying on static attributes like number of words, sentences, syllables, etc., other textual features were explored. Count vectors, Term FrequencyInverse Document Frequency (TF-IDF), n-grams, and character-level n-grams were extracted to train models using three major machine learning algorithms–Multinomial Naïve-Bayes, Random Forest, and K-Nearest Neighbors. A combination of K-Nearest Neighbors and Random Forest via voting-based classification mechanism resulted with the best performing model with a high average training accuracy and validation accuracy of 0.822 and 0.74 respectively. Analysis of the top 10 most useful features for each algorithm show that they share common similarity in identifying readability levels–the use of Filipino stop words. Performance of other classifiers and features were also explored.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121464491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/IALP48816.2019.9037674
Haoyi Cheng, Peifeng Li, Qiaoming Zhu
Event coreference resolution is a challenging task. To address the issues of the influence on event-independent information in event mentions and the flexible and diverse sentence structure in Chinese language, this paper introduces a GANN (Gated Attention Neural Networks) model to document-level Chinese event coreference resolution. GANN introduces a gated attention mechanism to select eventrelated information from event mentions and then filter noisy information. Moreover, GANN not only uses a single Cosine distance to calculate the linear distance between two event mentions, but also introduces multi-mechanisms, i.e., Bilinear distance and Single Layer Network, to further calculate the linear and nonlinear distances. The experimental results on the ACE 2005 Chinese corpus illustrate that our model GANN outperforms the state-of-the-art baselines.
{"title":"Employing Gated Attention and Multi-similarities to Resolve Document-level Chinese Event Coreference","authors":"Haoyi Cheng, Peifeng Li, Qiaoming Zhu","doi":"10.1109/IALP48816.2019.9037674","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037674","url":null,"abstract":"Event coreference resolution is a challenging task. To address the issues of the influence on event-independent information in event mentions and the flexible and diverse sentence structure in Chinese language, this paper introduces a GANN (Gated Attention Neural Networks) model to document-level Chinese event coreference resolution. GANN introduces a gated attention mechanism to select eventrelated information from event mentions and then filter noisy information. Moreover, GANN not only uses a single Cosine distance to calculate the linear distance between two event mentions, but also introduces multi-mechanisms, i.e., Bilinear distance and Single Layer Network, to further calculate the linear and nonlinear distances. The experimental results on the ACE 2005 Chinese corpus illustrate that our model GANN outperforms the state-of-the-art baselines.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131571031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we proposed a neural network architecture based on Time-Delay Neural Network (TDNN)Bidirectional Gated Recurrent Unit (BiGRU) for small-footprint keyWord spotting. Our model consists of three parts: TDNN, BiGRU and Attention Mechanism. TDNN models the time information and BiGRU extracts the hidden layer features of the audio. The attention mechanism generates a vector of fixed length with hidden layer features. The system generates the final score through vector linear transformation and softmax function. We explored the step size and unit size of TDNN and two attention mechanisms. Our model has achieved a true positive rate of 99.63% at a 5% false positive rate.
{"title":"An End-to-End Model Based on TDNN-BiGRU for Keyword Spotting","authors":"Shuzhou Chai, Zhenye Yang, Changsheng Lv, Weiqiang Zhang","doi":"10.1109/IALP48816.2019.9037714","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037714","url":null,"abstract":"In this paper, we proposed a neural network architecture based on Time-Delay Neural Network (TDNN)Bidirectional Gated Recurrent Unit (BiGRU) for small-footprint keyWord spotting. Our model consists of three parts: TDNN, BiGRU and Attention Mechanism. TDNN models the time information and BiGRU extracts the hidden layer features of the audio. The attention mechanism generates a vector of fixed length with hidden layer features. The system generates the final score through vector linear transformation and softmax function. We explored the step size and unit size of TDNN and two attention mechanisms. Our model has achieved a true positive rate of 99.63% at a 5% false positive rate.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"521 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131869190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/IALP48816.2019.9037649
Yuting Song, Biligsaikhan Batjargal, Akira Maeda
Recently, cross-lingual word embeddings have attracted a lot of attention, because they can capture semantic meaning of words across languages, which can be applied to cross-lingual tasks. Most methods learn a single mapping (e.g., a linear mapping) to transform word embeddings space from one language to another. In this paper, we propose an advanced method for improving bilingual word embeddings by adding a language-specific mapping. We focus on learning Japanese-English bilingual word embedding mapping by considering the specificity of Japanese language. On a benchmark data set of JapaneseEnglish bilingual lexicon induction, the proposed method achieved competitive performance compared to the method using a single mapping, with better results being found on original Japanese words.
{"title":"Improving Japanese-English Bilingual Mapping of Word Embeddings based on Language Specificity","authors":"Yuting Song, Biligsaikhan Batjargal, Akira Maeda","doi":"10.1109/IALP48816.2019.9037649","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037649","url":null,"abstract":"Recently, cross-lingual word embeddings have attracted a lot of attention, because they can capture semantic meaning of words across languages, which can be applied to cross-lingual tasks. Most methods learn a single mapping (e.g., a linear mapping) to transform word embeddings space from one language to another. In this paper, we propose an advanced method for improving bilingual word embeddings by adding a language-specific mapping. We focus on learning Japanese-English bilingual word embedding mapping by considering the specificity of Japanese language. On a benchmark data set of JapaneseEnglish bilingual lexicon induction, the proposed method achieved competitive performance compared to the method using a single mapping, with better results being found on original Japanese words.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133367257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/IALP48816.2019.9037650
T. Maruyama, Kazuhide Yamamoto
Recent text simplification approaches regard the task as a monolingual text-to-text generation inspired by machine translation. In particular, the transformer-based translation model outperform previous methods. Although machine translation approaches need a large-scale parallel corpus, parallel corpora for text simplification are very small compared to machine translation tasks. Therefore, we attempt a simple approach which fine-tunes the pre-trained language model for text simplification with a small parallel corpus. Specifically, we conduct experiments with the following two models: transformer-based encoder-decoder model and a language model that receives a joint input of original and simplified sentences, called TransformerLM. Thus, we show that TransformerLM, which is a simple text generation model, substantially outperforms a strong baseline. In addition, we show that fine-tuned TransformerLM with only 3,000 supervised examples can achieve performance comparable to a strong baseline trained by all supervised data.
{"title":"Extremely Low Resource Text simplification with Pre-trained Transformer Language Model","authors":"T. Maruyama, Kazuhide Yamamoto","doi":"10.1109/IALP48816.2019.9037650","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037650","url":null,"abstract":"Recent text simplification approaches regard the task as a monolingual text-to-text generation inspired by machine translation. In particular, the transformer-based translation model outperform previous methods. Although machine translation approaches need a large-scale parallel corpus, parallel corpora for text simplification are very small compared to machine translation tasks. Therefore, we attempt a simple approach which fine-tunes the pre-trained language model for text simplification with a small parallel corpus. Specifically, we conduct experiments with the following two models: transformer-based encoder-decoder model and a language model that receives a joint input of original and simplified sentences, called TransformerLM. Thus, we show that TransformerLM, which is a simple text generation model, substantially outperforms a strong baseline. In addition, we show that fine-tuned TransformerLM with only 3,000 supervised examples can achieve performance comparable to a strong baseline trained by all supervised data.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116837582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/IALP48816.2019.9037681
Lijie Wang, Mei Tu, Mengxia Zhai, Huadong Wang, Song Liu, Sang Ha Kim
Expression with honorifics is an important way of dressing up the language and showing politeness in Korean. For machine translation, generating honorifics is indispensable on the formal occasion when the target language is Korean. However, current Neural Machine Translation (NMT) models ignore generation of honorifics, which causes the limitation of the MT application on business occasion. In order to address the problem, this paper presents two strategies to improve Korean honorific generation ratio: 1) we introduce honorific fusion training (HFT) loss under the minimum risk training framework to guide the model to generate honorifics; 2) we introduce a data labeling (DL) method which tags the training corpus with distinctive labels without any modification to the model structure. Our experimental results show that the proposed two strategies can significantly improve the honorific generation ratio by 34.35% and 45.59%.
{"title":"Neural Machine Translation Strategies for Generating Honorific-style Korean","authors":"Lijie Wang, Mei Tu, Mengxia Zhai, Huadong Wang, Song Liu, Sang Ha Kim","doi":"10.1109/IALP48816.2019.9037681","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037681","url":null,"abstract":"Expression with honorifics is an important way of dressing up the language and showing politeness in Korean. For machine translation, generating honorifics is indispensable on the formal occasion when the target language is Korean. However, current Neural Machine Translation (NMT) models ignore generation of honorifics, which causes the limitation of the MT application on business occasion. In order to address the problem, this paper presents two strategies to improve Korean honorific generation ratio: 1) we introduce honorific fusion training (HFT) loss under the minimum risk training framework to guide the model to generate honorifics; 2) we introduce a data labeling (DL) method which tags the training corpus with distinctive labels without any modification to the model structure. Our experimental results show that the proposed two strategies can significantly improve the honorific generation ratio by 34.35% and 45.59%.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128095103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/IALP48816.2019.9037695
Yusha Zhang, Nankai Lin, Sheng-yi Jiang
English is the most widely used language in the world. With the spread and evolution of language, there are differences in the English text expression and reading difficulty in different regions. Due to the difference in the content and wording, English news in some countries is easier to understand than in others. Using an accurate and effective method to calculate the difficulty of text is not only beneficial for news writers to write easy-to-understand articles, but also for readers to choose articles that they can understand. In this paper, we study the differences in the text readability between most ASEAN countries, England and America. We compare the textual readability and syntactic complexity of English news texts among England, America and eight ASEAN countries (Indonesia, Malaysia, Philippines, Singapore, Brunei, Thailand, Vietnam, Cambodia). This paper selected the authoritative news media of each country as the research object. We used different indicators including Flesch-Kincaid Grade Level (FKG), Flesch Reading Ease Index (FRE), Gunning Fog Index (GF), Automated Readability Index (AR), Coleman-Liau Index (CL) and Linsear Write Index (LW) to measure the textual readability, and then applied L2SCA to analyze the syntactic complexity of news text. According to the analysis results, we used the hierarchical clustering method to classify the English texts of different countries into six different levels. Moreover, we elucidated the reasons for such readability differences in these countries.
英语是世界上使用最广泛的语言。随着语言的传播和演变,不同地区的英语文本表达和阅读难度存在差异。由于内容和措辞的不同,一些国家的英语新闻比另一些国家的英语新闻更容易理解。使用一种准确有效的方法来计算文本的难度,不仅有利于新闻作者写出易于理解的文章,也有利于读者选择自己能够理解的文章。本文研究了大多数东盟国家、英国和美国在文本可读性方面的差异。我们比较了英国、美国和东盟八个国家(印度尼西亚、马来西亚、菲律宾、新加坡、文莱、泰国、越南、柬埔寨)英语新闻语篇的可读性和句法复杂性。本文选取了各国的权威新闻媒体作为研究对象。本文采用Flesch- kincaid Grade Level (FKG)、Flesch Reading Ease Index (FRE)、Gunning Fog Index (GF)、Automated Readability Index (AR)、Coleman-Liau Index (CL)和Linsear Write Index (LW)等不同指标衡量新闻文本的可读性,并应用L2SCA对新闻文本的句法复杂度进行分析。根据分析结果,我们使用层次聚类方法将不同国家的英语文本划分为六个不同的层次。此外,我们还阐明了这些国家的可读性差异的原因。
{"title":"A Study on Syntactic Complexity and Text Readability of ASEAN English News","authors":"Yusha Zhang, Nankai Lin, Sheng-yi Jiang","doi":"10.1109/IALP48816.2019.9037695","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037695","url":null,"abstract":"English is the most widely used language in the world. With the spread and evolution of language, there are differences in the English text expression and reading difficulty in different regions. Due to the difference in the content and wording, English news in some countries is easier to understand than in others. Using an accurate and effective method to calculate the difficulty of text is not only beneficial for news writers to write easy-to-understand articles, but also for readers to choose articles that they can understand. In this paper, we study the differences in the text readability between most ASEAN countries, England and America. We compare the textual readability and syntactic complexity of English news texts among England, America and eight ASEAN countries (Indonesia, Malaysia, Philippines, Singapore, Brunei, Thailand, Vietnam, Cambodia). This paper selected the authoritative news media of each country as the research object. We used different indicators including Flesch-Kincaid Grade Level (FKG), Flesch Reading Ease Index (FRE), Gunning Fog Index (GF), Automated Readability Index (AR), Coleman-Liau Index (CL) and Linsear Write Index (LW) to measure the textual readability, and then applied L2SCA to analyze the syntactic complexity of news text. According to the analysis results, we used the hierarchical clustering method to classify the English texts of different countries into six different levels. Moreover, we elucidated the reasons for such readability differences in these countries.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125219339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}