{"title":"二语写作中翻译语标记的特征重要性排序:基于语料库的跨学科统计分析","authors":"Younghee Cheri Lee, Soomin Jwa","doi":"10.15858/engtea.78.2.202206.55","DOIUrl":null,"url":null,"abstract":"In recent years, an array of studies has focused on ‘translationese’ (i.e., unique features that manifest in translated texts, causing second language (L2) writings to be similar to translated texts but different from native language (L1) writings). This intriguing linguistic pattern has motivated scholars to investigate potential markers for predicting the divergence of L1 and L2 texts. This study builds on this work, evaluating the feature importance ranking of specific translationese markers, including standardized type-token ratio (STTR), mean sentence length, bottom-frequency words, connectives, and n-grams. A random forest model was used to compare these markers in L1 and L2 academic journal article abstracts, providing a robust quantitative analysis. We further examined the consistency of these markers across different academic disciplines. Our results indicate that bottom-frequency words are the most reliable markers across disciplines, whereas connectives show the least consistency. Interestingly, we identified three-word lexical bundles as discipline-specific markers. These findings present several implications and open new avenues for future research into translationese in L2 writing.","PeriodicalId":36188,"journal":{"name":"English Teaching(South Korea)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Feature Importance Ranking of Translationese Markers in L2 Writing: A Corpus-Based Statistical Analysis Across Disciplines\",\"authors\":\"Younghee Cheri Lee, Soomin Jwa\",\"doi\":\"10.15858/engtea.78.2.202206.55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, an array of studies has focused on ‘translationese’ (i.e., unique features that manifest in translated texts, causing second language (L2) writings to be similar to translated texts but different from native language (L1) writings). This intriguing linguistic pattern has motivated scholars to investigate potential markers for predicting the divergence of L1 and L2 texts. This study builds on this work, evaluating the feature importance ranking of specific translationese markers, including standardized type-token ratio (STTR), mean sentence length, bottom-frequency words, connectives, and n-grams. A random forest model was used to compare these markers in L1 and L2 academic journal article abstracts, providing a robust quantitative analysis. We further examined the consistency of these markers across different academic disciplines. Our results indicate that bottom-frequency words are the most reliable markers across disciplines, whereas connectives show the least consistency. Interestingly, we identified three-word lexical bundles as discipline-specific markers. These findings present several implications and open new avenues for future research into translationese in L2 writing.\",\"PeriodicalId\":36188,\"journal\":{\"name\":\"English Teaching(South Korea)\",\"volume\":\"75 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"English Teaching(South Korea)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15858/engtea.78.2.202206.55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"English Teaching(South Korea)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15858/engtea.78.2.202206.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Social Sciences","Score":null,"Total":0}
Feature Importance Ranking of Translationese Markers in L2 Writing: A Corpus-Based Statistical Analysis Across Disciplines
In recent years, an array of studies has focused on ‘translationese’ (i.e., unique features that manifest in translated texts, causing second language (L2) writings to be similar to translated texts but different from native language (L1) writings). This intriguing linguistic pattern has motivated scholars to investigate potential markers for predicting the divergence of L1 and L2 texts. This study builds on this work, evaluating the feature importance ranking of specific translationese markers, including standardized type-token ratio (STTR), mean sentence length, bottom-frequency words, connectives, and n-grams. A random forest model was used to compare these markers in L1 and L2 academic journal article abstracts, providing a robust quantitative analysis. We further examined the consistency of these markers across different academic disciplines. Our results indicate that bottom-frequency words are the most reliable markers across disciplines, whereas connectives show the least consistency. Interestingly, we identified three-word lexical bundles as discipline-specific markers. These findings present several implications and open new avenues for future research into translationese in L2 writing.