Solving a math word problem requires selecting quantities in it and performing appropriate arithmetic operations to obtain the answer. For deep learning-based methods, it is vital to obtain good quantity representations, i.e., to selectively and emphatically aggregate information in the context of quantities. However, existing works have not paid much attention to this aspect. Many works simply encode quantities as ordinary tokens, or use some implicit or rule-based methods to select information in their context. This leads to poor results when dealing with linguistic variations and confounding quantities. This paper proposes a novel method to identify question-related distinguishing features of quantities by contrasting their context with the question and the context of other quantities, thereby enhancing the representation of quantities. Our method not only considers the contrastive relationship between quantities, but also considers multiple relationships jointly. Besides, we propose two auxiliary tasks to further guide the representation learning of quantities: 1) predicting whether a quantity is used in the question; 2) predicting the relations (operators) between quantities given the question. Experimental results show that our method outperforms previous methods on SVAMP and ASDiv-A under similar settings, even some newly released strong baselines. Supplementary experiments further confirm that our method indeed improves the performance of quantity selection by improving the representation of both quantities and questions.
{"title":"Towards Better Quantity Representations for Solving Math Word Problems","authors":"Runxin Sun, Shizhu He, Jun Zhao, Kang Liu","doi":"10.1145/3665644","DOIUrl":"https://doi.org/10.1145/3665644","url":null,"abstract":"<p>Solving a math word problem requires selecting quantities in it and performing appropriate arithmetic operations to obtain the answer. For deep learning-based methods, it is vital to obtain good quantity representations, i.e., to selectively and emphatically aggregate information in the context of quantities. However, existing works have not paid much attention to this aspect. Many works simply encode quantities as ordinary tokens, or use some implicit or rule-based methods to select information in their context. This leads to poor results when dealing with linguistic variations and confounding quantities. This paper proposes a novel method to identify question-related distinguishing features of quantities by contrasting their context with the question and the context of other quantities, thereby enhancing the representation of quantities. Our method not only considers the contrastive relationship between quantities, but also considers multiple relationships jointly. Besides, we propose two auxiliary tasks to further guide the representation learning of quantities: 1) predicting whether a quantity is used in the question; 2) predicting the relations (operators) between quantities given the question. Experimental results show that our method outperforms previous methods on SVAMP and ASDiv-A under similar settings, even some newly released strong baselines. Supplementary experiments further confirm that our method indeed improves the performance of quantity selection by improving the representation of both quantities and questions.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"13 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the work performed for automated abusive language detection in the Khasi language, a low-resource language spoken primarily in the state of Meghalaya, India. A dataset named Khasi Abusive Language Dataset (KALD) was created which consists of 4,573 human-annotated Khasi YouTube and Facebook comments. A corpus of Khasi text was built and it was used to create Khasi word2vec and fastText word embeddings. Deep learning, traditional machine learning, and ensemble models were used in the study. Experiments were performed using word2vec, fastText, and topic vectors obtained using LDA. Experiments were also performed to check if zero-shot cross-lingual nature of language models such as LaBSE and LASER can be utilized for abusive language detection in the Khasi language. The best F1 score of 0.90725 was obtained by an XGBoost classifier. After feature selection and rebalancing of the dataset, F1 score of 0.91828 and 0.91945 were obtained by an SVM based classifiers.
{"title":"Abusive Language Detection in Khasi Social Media Comments","authors":"Arup Baruah, Lakhamti Wahlang, Firstbornson Jyrwa, Floriginia Shadap, Ferdous Barbhuiya, Kuntal Dey","doi":"10.1145/3664285","DOIUrl":"https://doi.org/10.1145/3664285","url":null,"abstract":"<p>This paper describes the work performed for automated abusive language detection in the Khasi language, a low-resource language spoken primarily in the state of Meghalaya, India. A dataset named Khasi Abusive Language Dataset (KALD) was created which consists of 4,573 human-annotated Khasi YouTube and Facebook comments. A corpus of Khasi text was built and it was used to create Khasi word2vec and fastText word embeddings. Deep learning, traditional machine learning, and ensemble models were used in the study. Experiments were performed using word2vec, fastText, and topic vectors obtained using LDA. Experiments were also performed to check if zero-shot cross-lingual nature of language models such as LaBSE and LASER can be utilized for abusive language detection in the Khasi language. The best F1 score of 0.90725 was obtained by an XGBoost classifier. After feature selection and rebalancing of the dataset, F1 score of 0.91828 and 0.91945 were obtained by an SVM based classifiers.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"1 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140928629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suvarna Rajesh. Bhagwat, R. P. Bhavsar, B. V. Pawar
Machine translation has been a prominent field of research, contributing significantly to human life enhancement. Sign language machine translation, a subfield, focuses on translating spoken language content into sign language and vice versa, thereby facilitating communication between the normal hearing and hard-of-hearing communities, promoting inclusivity.
This study presents the development of a ‘sign language machine translation system’ converting simple Marathi sentences into Indian Sign Language (ISL) glosses and animation. Given the low-resource nature of both languages, a phrase-level rule-based approach was employed for the translation. Initial encoding of translation rules relied on basic linguistic knowledge of Marathi and ISL, with subsequent incorporation of rules to address 'simultaneous morphological' features in ISL. These rules were applied during the ‘generation phase’ of translation to dynamically adjust phonological sign parameters, resulting in improved target sentence fluency.
The paper provides a detailed description of the system architecture, translation rules, and comprehensive experimentation. Rigorous evaluation efforts were undertaken, encompassing various linguistic features, and the findings are discussed herein.
The web-based version of the system serves as an interpreter for brief communications and can support the teaching and learning of sign language and its grammar in schools for hard-of-hearing students.
{"title":"Marathi to Indian Sign Language Machine Translation","authors":"Suvarna Rajesh. Bhagwat, R. P. Bhavsar, B. V. Pawar","doi":"10.1145/3664609","DOIUrl":"https://doi.org/10.1145/3664609","url":null,"abstract":"<p>Machine translation has been a prominent field of research, contributing significantly to human life enhancement. Sign language machine translation, a subfield, focuses on translating spoken language content into sign language and vice versa, thereby facilitating communication between the normal hearing and hard-of-hearing communities, promoting inclusivity.</p><p>This study presents the development of a ‘sign language machine translation system’ converting simple Marathi sentences into Indian Sign Language (ISL) glosses and animation. Given the low-resource nature of both languages, a phrase-level rule-based approach was employed for the translation. Initial encoding of translation rules relied on basic linguistic knowledge of Marathi and ISL, with subsequent incorporation of rules to address 'simultaneous morphological' features in ISL. These rules were applied during the ‘generation phase’ of translation to dynamically adjust phonological sign parameters, resulting in improved target sentence fluency.</p><p>The paper provides a detailed description of the system architecture, translation rules, and comprehensive experimentation. Rigorous evaluation efforts were undertaken, encompassing various linguistic features, and the findings are discussed herein.</p><p>The web-based version of the system serves as an interpreter for brief communications and can support the teaching and learning of sign language and its grammar in schools for hard-of-hearing students.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"84 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shallow Parsing is an important step for many Natural Language Processing tasks. Although shallow parsing has a rich history for resource rich languages, it is not the case for most Indian languages. Shallow Parsing consists of POS Tagging and Chunking. Our study focuses on developing shallow parsers for Indian languages. As part of shallow parsing we included morph analysis as well.
For the study, we first consolidated available shallow parsing corpora for 7 Indian Languages (Hindi, Kannada, Bangla, Malayalam, Marathi, Urdu, Telugu) for which treebanks are publicly available. We then trained models to achieve state of the art performance for shallow parsing in these languages for multiple domains. Since analyzing the performance of model predictions at sentence level is more realistic, we report the performance of these shallow parsers not only at the token level, but also at the sentence level. We also present machine learning techniques for multitask shallow parsing. Our experiments show that fine-tuned contextual embedding with multi-task learning improves the performance of multiple as well as individual shallow parsing tasks across different domains. We show the transfer learning capability of these models by creating shallow parsers (only with POS and Chunk) for Gujarati, Odia, and Punjabi for which no treebanks are available.
As a part of this work, we will be releasing the Indian Languages Shallow Linguistic (ILSL) benchmarks for 10 Indian languages including both the major language families Indo-Aryan and Dravidian as common building blocks that can be used to evaluate and understand various linguistic phenomena found in Indian languages and how well newer approaches can tackle them.
{"title":"Multi Task Learning Based Shallow Parsing for Indian Languages","authors":"Pruthwik Mishra, Vandan Mujadia","doi":"10.1145/3664620","DOIUrl":"https://doi.org/10.1145/3664620","url":null,"abstract":"<p>Shallow Parsing is an important step for many Natural Language Processing tasks. Although shallow parsing has a rich history for resource rich languages, it is not the case for most Indian languages. Shallow Parsing consists of POS Tagging and Chunking. Our study focuses on developing shallow parsers for Indian languages. As part of shallow parsing we included morph analysis as well. </p><p>For the study, we first consolidated available shallow parsing corpora for <b>7 Indian Languages</b> (Hindi, Kannada, Bangla, Malayalam, Marathi, Urdu, Telugu) for which treebanks are publicly available. We then trained models to achieve state of the art performance for shallow parsing in these languages for multiple domains. Since analyzing the performance of model predictions at sentence level is more realistic, we report the performance of these shallow parsers not only at the token level, but also at the sentence level. We also present machine learning techniques for multitask shallow parsing. Our experiments show that fine-tuned contextual embedding with multi-task learning improves the performance of multiple as well as individual shallow parsing tasks across different domains. We show the transfer learning capability of these models by creating shallow parsers (only with POS and Chunk) for Gujarati, Odia, and Punjabi for which no treebanks are available. </p><p>As a part of this work, we will be releasing the Indian Languages Shallow Linguistic (ILSL) benchmarks for 10 Indian languages including both the major language families Indo-Aryan and Dravidian as common building blocks that can be used to evaluate and understand various linguistic phenomena found in Indian languages and how well newer approaches can tackle them.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"155 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140928413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Li, Kaifang Deng, Yiwen Mo, Yuanze Ji, Chong Teng, Donghong Ji
The dependency syntactic structure is widely used in event extraction. However, the dependency structure reflecting syntactic features is essentially different from the event structure that reflects semantic features, leading to the performance degradation. In this paper, we propose to use Event Trigger Structure for Event Extraction (ETSEE), which can compensate the inconsistency between two structures. First, we leverage the ACE2005 dataset as case study, and annotate 3 kinds of ETSs, i.e., “light verb + trigger”, “preposition structures” and “tense + trigger”. Then we design a graph-based event extraction model that jointly identifies triggers and arguments, where the graph consists of both the dependency structure and ETSs. Experiments show that our model significantly outperforms the state-of-the-art methods. Through empirical analysis and manual observation, we find that the ETSs can bring the following benefits: (1) enriching trigger identification features by introducing structural event information; (2) enriching dependency structures with event semantic information; (3) enhancing the interactions between triggers and candidate arguments by shortening their distances in the dependency graph.
{"title":"Enhancing Chinese Event Extraction with Event Trigger Structures","authors":"Fei Li, Kaifang Deng, Yiwen Mo, Yuanze Ji, Chong Teng, Donghong Ji","doi":"10.1145/3663567","DOIUrl":"https://doi.org/10.1145/3663567","url":null,"abstract":"<p>The dependency syntactic structure is widely used in event extraction. However, the dependency structure reflecting syntactic features is essentially different from the event structure that reflects semantic features, leading to the performance degradation. In this paper, we propose to use Event Trigger Structure for Event Extraction (ETSEE), which can compensate the inconsistency between two structures. First, we leverage the ACE2005 dataset as case study, and annotate 3 kinds of ETSs, i.e., “light verb + trigger”, “preposition structures” and “tense + trigger”. Then we design a graph-based event extraction model that jointly identifies triggers and arguments, where the graph consists of both the dependency structure and ETSs. Experiments show that our model significantly outperforms the state-of-the-art methods. Through empirical analysis and manual observation, we find that the ETSs can bring the following benefits: (1) enriching trigger identification features by introducing structural event information; (2) enriching dependency structures with event semantic information; (3) enhancing the interactions between triggers and candidate arguments by shortening their distances in the dependency graph.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"62 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140885519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, ways of communication among people have changed due to advancements in information technology and the rise of online multi-social media. Many people express their feelings, ideas, and emotions on social media sites such as Instagram, Twitter, Gab, Reddit, Facebook, YouTube, etc. However, people have misused social media to send hateful messages to specific individuals or groups to create chaos. For various Governance authorities, manually identifying hate speech on various social media platforms is a difficult task to avoid such chaos. In this study, a hybrid deep-learning model, where bidirectional long short-term memory (BiLSTM) and convolutional neural network (CNN) are used to classify hate speech in textual data, has been proposed. This model incorporates a GLOVE-based word embedding approach, dropout, L2 regularization, and global max pooling to get impressive results. Further, the proposed BiLSTM-CNN model has been evaluated on various datasets to achieve state-of-the-art performance that is superior to the traditional and existing machine learning methods in terms of accuracy, precision, recall, and F1-score.
如今,由于信息技术的进步和在线多元社交媒体的兴起,人与人之间的交流方式发生了变化。许多人在 Instagram、Twitter、Gab、Reddit、Facebook、YouTube 等社交媒体网站上表达自己的情感、想法和情绪。然而,有人滥用社交媒体向特定个人或群体发送仇恨信息,制造混乱。对于各治理部门来说,要避免这种混乱局面,人工识别各种社交媒体平台上的仇恨言论是一项艰巨的任务。本研究提出了一种混合深度学习模型,利用双向长短期记忆(BiLSTM)和卷积神经网络(CNN)对文本数据中的仇恨言论进行分类。该模型采用了基于 GLOVE 的单词嵌入方法、剔除、L2 正则化和全局最大池化,取得了令人印象深刻的结果。此外,还在各种数据集上对所提出的 BiLSTM-CNN 模型进行了评估,结果表明该模型在准确率、精确度、召回率和 F1 分数方面都优于传统和现有的机器学习方法,达到了最先进的性能。
{"title":"A Hybrid Deep BiLSTM-CNN for Hate Speech Detection in Multi-social media","authors":"Ashwini Kumar, Santosh Kumar, Kalpdrum Passi, Aniket Mahanti","doi":"10.1145/3657635","DOIUrl":"https://doi.org/10.1145/3657635","url":null,"abstract":"<p>Nowadays, ways of communication among people have changed due to advancements in information technology and the rise of online multi-social media. Many people express their feelings, ideas, and emotions on social media sites such as Instagram, Twitter, Gab, Reddit, Facebook, YouTube, etc. However, people have misused social media to send hateful messages to specific individuals or groups to create chaos. For various Governance authorities, manually identifying hate speech on various social media platforms is a difficult task to avoid such chaos. In this study, a hybrid deep-learning model, where bidirectional long short-term memory (BiLSTM) and convolutional neural network (CNN) are used to classify hate speech in textual data, has been proposed. This model incorporates a GLOVE-based word embedding approach, dropout, L2 regularization, and global max pooling to get impressive results. Further, the proposed BiLSTM-CNN model has been evaluated on various datasets to achieve state-of-the-art performance that is superior to the traditional and existing machine learning methods in terms of accuracy, precision, recall, and F1-score.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"1 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140885818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kamran Aziz, Aizihaierjiang Yusufu, Jun Zhou, Donghong Ji, Muhammad Shahid Iqbal, Shijie Wang, Hassan Jalil Hadi, Zhengming Yuan
Urdu, characterized by its intricate morphological structure and linguistic nuances, presents distinct challenges in computational sentiment analysis. Addressing these, we introduce ”UrduAspectNet” – a dedicated model tailored for Aspect-Based Sentiment Analysis (ABSA) in Urdu. Central to our approach is a rigorous preprocessing phase. Leveraging the Stanza library, we extract Part-of-Speech (POS) tags and lemmas, ensuring Urdu’s linguistic intricacies are aptly represented. To probe the effectiveness of different embeddings, we trained our model using both mBERT and XLM-R embeddings, comparing their performances to identify the most effective representation for Urdu ABSA. Recognizing the nuanced inter-relationships between words, especially in Urdu’s flexible syntactic constructs, our model incorporates a dual Graph Convolutional Network (GCN) layer.Addressing the challenge of the absence of a dedicated Urdu ABSA dataset, we curated our own, collecting over 4,603 news headlines from various domains, such as politics, entertainment, business, and sports. These headlines, sourced from diverse news platforms, not only identify prevalent aspects but also pinpoints their sentiment polarities, categorized as positive, negative, or neutral. Despite the inherent complexities of Urdu, such as its colloquial expressions and idioms, ”UrduAspectNet” showcases remarkable efficacy. Initial comparisons between mBERT and XLM-R embeddings integrated with dual GCN provide valuable insights into their respective strengths in the context of Urdu ABSA. With broad applications spanning media analytics, business insights, and socio-cultural analysis, ”UrduAspectNet” is positioned as a pivotal benchmark in Urdu ABSA research.
{"title":"UrduAspectNet: Fusing Transformers and Dual GCN for Urdu Aspect-Based Sentiment Detection","authors":"Kamran Aziz, Aizihaierjiang Yusufu, Jun Zhou, Donghong Ji, Muhammad Shahid Iqbal, Shijie Wang, Hassan Jalil Hadi, Zhengming Yuan","doi":"10.1145/3663367","DOIUrl":"https://doi.org/10.1145/3663367","url":null,"abstract":"<p>Urdu, characterized by its intricate morphological structure and linguistic nuances, presents distinct challenges in computational sentiment analysis. Addressing these, we introduce ”UrduAspectNet” – a dedicated model tailored for Aspect-Based Sentiment Analysis (ABSA) in Urdu. Central to our approach is a rigorous preprocessing phase. Leveraging the Stanza library, we extract Part-of-Speech (POS) tags and lemmas, ensuring Urdu’s linguistic intricacies are aptly represented. To probe the effectiveness of different embeddings, we trained our model using both mBERT and XLM-R embeddings, comparing their performances to identify the most effective representation for Urdu ABSA. Recognizing the nuanced inter-relationships between words, especially in Urdu’s flexible syntactic constructs, our model incorporates a dual Graph Convolutional Network (GCN) layer.Addressing the challenge of the absence of a dedicated Urdu ABSA dataset, we curated our own, collecting over 4,603 news headlines from various domains, such as politics, entertainment, business, and sports. These headlines, sourced from diverse news platforms, not only identify prevalent aspects but also pinpoints their sentiment polarities, categorized as positive, negative, or neutral. Despite the inherent complexities of Urdu, such as its colloquial expressions and idioms, ”UrduAspectNet” showcases remarkable efficacy. Initial comparisons between mBERT and XLM-R embeddings integrated with dual GCN provide valuable insights into their respective strengths in the context of Urdu ABSA. With broad applications spanning media analytics, business insights, and socio-cultural analysis, ”UrduAspectNet” is positioned as a pivotal benchmark in Urdu ABSA research.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"56 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The relevance of the problem of automatic speech recognition lies in the lack of research for low-resource languages, stemming from limited training data and the necessity for new technologies to enhance efficiency and performance. The purpose of this work was to study the main aspects of integrated end-to-end speech recognition and the use of modern technologies in the natural processing of agglutinative languages, including Kazakh. In this article, the study of language models was carried out using comparative, graphic, statistical and analytical-synthetic methods, which were used in combination. This paper addresses automatic speech recognition (ASR) in agglutinative languages, particularly Kazakh, through a unified neural network model that integrates both acoustic and language modeling. Employing advanced techniques like connectionist temporal classification and attention mechanisms, the study focuses on effective speech-to-text transcription for languages with complex morphologies. Transfer learning from high-resource languages helps mitigate data scarcity in languages such as Kazakh, Kyrgyz, Uzbek, Turkish, and Azerbaijani. The research assesses model performance, underscores ASR challenges, and proposes advancements for these languages. It includes a comparative analysis of phonetic and word-formation features in agglutinative Turkic languages, using statistical data. The findings aid further research in linguistics and technology for enhancing speech recognition and synthesis, contributing to voice identification and automation processes.
自动语音识别问题的相关性在于缺乏对低资源语言的研究,原因是训练数据有限,而且需要新技术来提高效率和性能。这项工作的目的是研究端到端综合语音识别的主要方面,以及现代技术在包括哈萨克语在内的凝集语自然处理中的应用。本文采用比较法、图形法、统计法和分析-合成法对语言模型进行了研究。本文通过声学建模和语言建模相结合的统一神经网络模型来解决凝集语(尤其是哈萨克语)的自动语音识别(ASR)问题。该研究采用了联结时序分类和注意力机制等先进技术,重点关注具有复杂形态的语言的有效语音到文本转录。从高资源语言中转移学习有助于缓解哈萨克语、吉尔吉斯语、乌兹别克语、土耳其语和阿塞拜疆语等语言的数据稀缺问题。研究评估了模型性能,强调了 ASR 面临的挑战,并提出了针对这些语言的改进建议。研究还利用统计数据对突厥语的语音和构词特征进行了比较分析。研究结果有助于进一步开展语言学和技术研究,以提高语音识别和合成能力,促进语音识别和自动化进程。
{"title":"Integrated End-to-End automatic speech recognition for languages for agglutinative languages","authors":"Akbayan Bekarystankyzy, Orken Mamyrbayev, Tolganay Anarbekova","doi":"10.1145/3663568","DOIUrl":"https://doi.org/10.1145/3663568","url":null,"abstract":"<p>The relevance of the problem of automatic speech recognition lies in the lack of research for low-resource languages, stemming from limited training data and the necessity for new technologies to enhance efficiency and performance. The purpose of this work was to study the main aspects of integrated end-to-end speech recognition and the use of modern technologies in the natural processing of agglutinative languages, including Kazakh. In this article, the study of language models was carried out using comparative, graphic, statistical and analytical-synthetic methods, which were used in combination. This paper addresses automatic speech recognition (ASR) in agglutinative languages, particularly Kazakh, through a unified neural network model that integrates both acoustic and language modeling. Employing advanced techniques like connectionist temporal classification and attention mechanisms, the study focuses on effective speech-to-text transcription for languages with complex morphologies. Transfer learning from high-resource languages helps mitigate data scarcity in languages such as Kazakh, Kyrgyz, Uzbek, Turkish, and Azerbaijani. The research assesses model performance, underscores ASR challenges, and proposes advancements for these languages. It includes a comparative analysis of phonetic and word-formation features in agglutinative Turkic languages, using statistical data. The findings aid further research in linguistics and technology for enhancing speech recognition and synthesis, contributing to voice identification and automation processes.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advances in Natural Language Processing (NLP) have improved the performance of the systems that perform tasks, such as Emotion Detection (ED), Information Retrieval, Translation, etc., in resource-rich languages like English and Chinese. But similar advancements have not been made in Malayalam due to the dearth of annotated datasets. Because of its rich morphology, free word order and agglutinative character, data preparation in Malayalam is highly challenging. In this paper, we employ traditional Machine Learning (ML) techniques such as support vector machines (SVM) and multilayer perceptrons (MLP), and recent deep learning methods such as Recurrent Neural Networks (RNN) and advanced transformer-based methodologies to train an emotion detection system. This work stands out since all the previous attempts to extract emotions from Malayalam text have relied on lexicons, which are inappropriate for handling large amounts of data. By tweaking the hyperparameters, we enhanced the transformer-based model known as MuRIL to obtain an accuracy of 79%, which is then compared with the only state-of-the-art (SOTA) model. We found that the proposed techniques surpass the SOTA system available for detecting emotions in Malayalam reported so far.
自然语言处理(NLP)领域的最新进展提高了执行任务的系统性能,如在英语和中文等资源丰富的语言中执行情感检测(ED)、信息检索、翻译等任务。但由于缺乏注释数据集,马拉雅拉姆语还没有取得类似的进步。由于马拉雅拉姆语具有丰富的词形、自由词序和聚合特征,因此数据准备工作极具挑战性。在本文中,我们采用了传统的机器学习(ML)技术,如支持向量机(SVM)和多层感知器(MLP),以及最新的深度学习方法,如递归神经网络(RNN)和先进的基于变换器的方法来训练情绪检测系统。这项工作非常突出,因为之前从马拉雅拉姆语文本中提取情感的所有尝试都依赖于词典,而词典并不适合处理大量数据。通过调整超参数,我们增强了名为 MuRIL 的基于变换器的模型,从而获得了 79% 的准确率,并将其与唯一的最先进模型(SOTA)进行了比较。我们发现,所提出的技术超越了迄今为止所报道的用于检测马拉雅拉姆语情绪的 SOTA 系统。
{"title":"Emotion Detection System for Malayalam Text using Deep Learning and Transformers","authors":"Anuja K, P. C. Reghu Raj, Remesh Babu K R","doi":"10.1145/3663475","DOIUrl":"https://doi.org/10.1145/3663475","url":null,"abstract":"<p>Recent advances in Natural Language Processing (NLP) have improved the performance of the systems that perform tasks, such as Emotion Detection (ED), Information Retrieval, Translation, etc., in resource-rich languages like English and Chinese. But similar advancements have not been made in Malayalam due to the dearth of annotated datasets. Because of its rich morphology, free word order and agglutinative character, data preparation in Malayalam is highly challenging. In this paper, we employ traditional Machine Learning (ML) techniques such as support vector machines (SVM) and multilayer perceptrons (MLP), and recent deep learning methods such as Recurrent Neural Networks (RNN) and advanced transformer-based methodologies to train an emotion detection system. This work stands out since all the previous attempts to extract emotions from Malayalam text have relied on lexicons, which are inappropriate for handling large amounts of data. By tweaking the hyperparameters, we enhanced the transformer-based model known as MuRIL to obtain an accuracy of 79%, which is then compared with the only state-of-the-art (SOTA) model. We found that the proposed techniques surpass the SOTA system available for detecting emotions in Malayalam reported so far.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"11 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuanghong Huang, Chong Feng, Ge Shi, Zhengjun Li, Xuan Zhao, Xinyan Li, Xiaomei Wang
Domain adaptation proves to be an effective solution for addressing inadequate translation performance within specific domains. However, the straightforward approach of mixing data from multiple domains to obtain the multi-domain neural machine translation (NMT) model can give rise to the parameter interference between domains problem, resulting in a degradation of overall performance. To address this, we introduce a multi-domain adaptive NMT method aimed at learning domain specific sub-layer latent variable and employ the Gumbel-Softmax reparameterization technique to concurrently train both model parameters and domain specific sub-layer latent variable. This approach facilitates the learning of private domain-specific knowledge while sharing common domain-invariant knowledge, effectively mitigating the parameter interference problem. The experimental results show that our proposed method significantly improved by up to 7.68 and 3.71 BLEU compared with the baseline model in English-German and Chinese-English public multi-domain datasets, respectively.
{"title":"Learning Domain Specific Sub-layer Latent Variable for Multi-Domain Adaptation Neural Machine Translation","authors":"Shuanghong Huang, Chong Feng, Ge Shi, Zhengjun Li, Xuan Zhao, Xinyan Li, Xiaomei Wang","doi":"10.1145/3661305","DOIUrl":"https://doi.org/10.1145/3661305","url":null,"abstract":"<p>Domain adaptation proves to be an effective solution for addressing inadequate translation performance within specific domains. However, the straightforward approach of mixing data from multiple domains to obtain the multi-domain neural machine translation (NMT) model can give rise to the parameter interference between domains problem, resulting in a degradation of overall performance. To address this, we introduce a multi-domain adaptive NMT method aimed at learning domain specific sub-layer latent variable and employ the Gumbel-Softmax reparameterization technique to concurrently train both model parameters and domain specific sub-layer latent variable. This approach facilitates the learning of private domain-specific knowledge while sharing common domain-invariant knowledge, effectively mitigating the parameter interference problem. The experimental results show that our proposed method significantly improved by up to 7.68 and 3.71 BLEU compared with the baseline model in English-German and Chinese-English public multi-domain datasets, respectively.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"10 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140811942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}