In the field of disease diagnosis, medical image classification faces an inherent challenge due to various factors involving data imbalance, image quality variability, annotation variability, and limited data availability and data representativeness. Such challenges affect the algorithm's classification ability on the medical images in an adverse way, which leads to biased model outcomes and inaccurate interpretations. In this paper, a novel Discrete Levy Flight Grey Wolf Optimizer (DLFGWO) is combined with the Random Forest (RF) classifier to address the above limitations on the biomedical datasets and to achieve better classification rate. The DLFGWO-RF resolves the image quality variability in ultrasound images and limits the inaccuracies on classification using RF by handling the incomplete and noisy data. The sheer focus on the majority class may lead to unequal distribution of classes and thus leads to data imbalance. The DLFGWO balances such distribution by leveraging grey wolves and its exploration and exploitation capabilities are improved using Discrete Levy Flight (DLF). It further optimizes the classifier's performance to achieve balanced classification rate. DLFGWO-RF is designed to perform classification even on limited datasets, thereby the requirement of numerous expert annotations can thus be reduced. In diabetic retinopathy grading, the DLFGWO-RF reduces disagreements in annotation variability using subjective interpretations. However, the representativeness of the diabetic retinopathy dataset fails to capture the entire population diversity, which limits the generalization ability of the proposed DLFGWO-RF. Thus, fine-tuning of RF can robustly adapt to the subgroups in the dataset, enhancing its overall performance. The experiments are conducted on two widely used medical image datasets to test the efficacy of the model. The experimental results show that the DLFGWO-RF classifier achieves improved classification accuracy between 90-95%, which outperforms the existing techniques for various imbalanced datasets.
{"title":"Handling Imbalance and Limited Data in Thyroid Ultrasound and Diabetic Retinopathy Datasets Using Discrete Levy Flights Grey Wolf Optimizer Based Random Forest for Robust Medical Data Classification","authors":"Shobha Aswal, Neelu Jyothi Ahuja, Ritika Mehra","doi":"10.1145/3648363","DOIUrl":"https://doi.org/10.1145/3648363","url":null,"abstract":"<p>In the field of disease diagnosis, medical image classification faces an inherent challenge due to various factors involving data imbalance, image quality variability, annotation variability, and limited data availability and data representativeness. Such challenges affect the algorithm's classification ability on the medical images in an adverse way, which leads to biased model outcomes and inaccurate interpretations. In this paper, a novel Discrete Levy Flight Grey Wolf Optimizer (DLFGWO) is combined with the Random Forest (RF) classifier to address the above limitations on the biomedical datasets and to achieve better classification rate. The DLFGWO-RF resolves the image quality variability in ultrasound images and limits the inaccuracies on classification using RF by handling the incomplete and noisy data. The sheer focus on the majority class may lead to unequal distribution of classes and thus leads to data imbalance. The DLFGWO balances such distribution by leveraging grey wolves and its exploration and exploitation capabilities are improved using Discrete Levy Flight (DLF). It further optimizes the classifier's performance to achieve balanced classification rate. DLFGWO-RF is designed to perform classification even on limited datasets, thereby the requirement of numerous expert annotations can thus be reduced. In diabetic retinopathy grading, the DLFGWO-RF reduces disagreements in annotation variability using subjective interpretations. However, the representativeness of the diabetic retinopathy dataset fails to capture the entire population diversity, which limits the generalization ability of the proposed DLFGWO-RF. Thus, fine-tuning of RF can robustly adapt to the subgroups in the dataset, enhancing its overall performance. The experiments are conducted on two widely used medical image datasets to test the efficacy of the model. The experimental results show that the DLFGWO-RF classifier achieves improved classification accuracy between 90-95%, which outperforms the existing techniques for various imbalanced datasets.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"176 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Named Entity Recognition (NER) is an indispensable component of Natural Language Processing (NLP), which aims to identify and classify entities within text data. While Deep Learning (DL) models have excelled in NER for well-resourced languages like English, Spanish, and Chinese, they face significant hurdles when dealing with low-resource languages like Urdu. These challenges stem from the intricate linguistic characteristics of Urdu, including morphological diversity, context-dependent lexicon, and the scarcity of training data. This study addresses these issues by focusing on Urdu Named Entity Recognition (U-NER) and introducing three key contributions. First, various pre-trained embedding methods are employed, encompassing Word2vec (W2V), GloVe, FastText, Bidirectional Encoder Representations from Transformers (BERT), and Embeddings from language models (ELMo). In particular, fine-tuning is performed on BERTBASE and ELMo using Urdu Wikipedia and news articles. Secondly, a novel generative Data Augmentation (DA) technique replaces Named Entities (NEs) with mask tokens, employing pre-trained masked language models to predict masked tokens, effectively expanding the training dataset. Finally, the study introduces a novel hybrid model combining a Transformer Encoder with a Convolutional Neural Network (CNN) to capture the intricate morphology of Urdu. These modules enable the model to handle polysemy, extract short and long-range dependencies, and enhance learning capacity. Empirical experiments demonstrate that the proposed model, incorporating BERT embeddings and an innovative DA approach, attains the highest F1-Score of 93.99%, highlighting its efficacy for the U-NER task.
命名实体识别(NER)是自然语言处理(NLP)不可或缺的组成部分,旨在识别文本数据中的实体并对其进行分类。虽然深度学习(DL)模型在英语、西班牙语和中文等资源丰富的语言的 NER 中表现出色,但在处理乌尔都语等资源匮乏的语言时却面临巨大障碍。这些挑战源于乌尔都语错综复杂的语言特点,包括形态多样性、上下文相关词汇以及训练数据的稀缺性。本研究通过关注乌尔都语命名实体识别(U-NER)来解决这些问题,并引入了三个关键贡献。首先,采用了多种预训练嵌入方法,包括 Word2vec (W2V)、GloVe、FastText、来自变换器的双向编码器表示法 (BERT) 和来自语言模型的嵌入法 (ELMo)。其中,利用乌尔都语维基百科和新闻文章对 BERTBASE 和 ELMo 进行了微调。其次,一种新颖的生成性数据增强(DA)技术用掩码标记取代了命名实体(NE),利用预先训练好的掩码语言模型来预测掩码标记,从而有效地扩展了训练数据集。最后,该研究引入了一种新型混合模型,该模型结合了变换器编码器和卷积神经网络(CNN),以捕捉乌尔都语复杂的形态。这些模块使模型能够处理多义词,提取短程和长程依赖关系,并增强学习能力。实证实验表明,所提出的模型结合了 BERT 嵌入和创新的 DA 方法,达到了最高的 F1-Score 93.99%,突显了其在 U-NER 任务中的功效。
{"title":"Enriching Urdu NER with BERT Embedding, Data Augmentation, and Hybrid Encoder-CNN Architecture","authors":"Anil Ahmed, Degen Huang, Syed Yasser Arafat, Imran Hameed","doi":"10.1145/3648362","DOIUrl":"https://doi.org/10.1145/3648362","url":null,"abstract":"<p>Named Entity Recognition (NER) is an indispensable component of Natural Language Processing (NLP), which aims to identify and classify entities within text data. While Deep Learning (DL) models have excelled in NER for well-resourced languages like English, Spanish, and Chinese, they face significant hurdles when dealing with low-resource languages like Urdu. These challenges stem from the intricate linguistic characteristics of Urdu, including morphological diversity, context-dependent lexicon, and the scarcity of training data. This study addresses these issues by focusing on Urdu Named Entity Recognition (U-NER) and introducing three key contributions. First, various pre-trained embedding methods are employed, encompassing Word2vec (W2V), GloVe, FastText, Bidirectional Encoder Representations from Transformers (BERT), and Embeddings from language models (ELMo). In particular, fine-tuning is performed on BERT<sub>BASE</sub> and ELMo using Urdu Wikipedia and news articles. Secondly, a novel generative Data Augmentation (DA) technique replaces Named Entities (NEs) with mask tokens, employing pre-trained masked language models to predict masked tokens, effectively expanding the training dataset. Finally, the study introduces a novel hybrid model combining a Transformer Encoder with a Convolutional Neural Network (CNN) to capture the intricate morphology of Urdu. These modules enable the model to handle polysemy, extract short and long-range dependencies, and enhance learning capacity. Empirical experiments demonstrate that the proposed model, incorporating BERT embeddings and an innovative DA approach, attains the highest F1-Score of 93.99%, highlighting its efficacy for the U-NER task.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"223 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The COVID-19 pandemic in 2020 brought an unprecedented global crisis. After two years of control efforts, life gradually returned to the pre-pandemic state, but localized outbreaks continued to occur. Towards the end of 2022, COVID-19 resurged in China, leading to another disruption of people’s lives and work. Many pieces of information on social media reflected people’s views and emotions towards the second outbreak, which showed distinct differences compared to the first outbreak in 2020. To explore people’s emotional attitudes towards the pandemic at different stages and the underlying reasons, this study collected microblog data from November 2022 to January 2023 and from January to June 2020, encompassing Chinese reactions to the COVID-19 pandemic. Based on hesitancy and the Fuzzy Intuition theory, we proposed a hypothesis: hesitancy can be integrated into machine learning models to select suitable corpora for training, which not only improves accuracy but also enhances model efficiency. Based on this hypothesis, we designed a hesitancy-integrated model. The experimental results demonstrated the model’s positive performance on a self-constructed database. By applying this model to analyze people’s attitudes towards the pandemic, we obtained their sentiments in different months. We found that the most negative emotions appeared at the beginning of the pandemic, followed by emotional fluctuations influenced by social events, ultimately showing an overall positive trend. Combining word cloud techniques and the Latent Dirichlet Allocation (LDA) model effectively helped explore the reasons behind the changes in pandemic attitude.
{"title":"Sentiment Analysis Method of Epidemic-related Microblog Based on Hesitation Theory","authors":"Yang Yu, Dong Qiu, HuanYu Wan","doi":"10.1145/3648360","DOIUrl":"https://doi.org/10.1145/3648360","url":null,"abstract":"<p>The COVID-19 pandemic in 2020 brought an unprecedented global crisis. After two years of control efforts, life gradually returned to the pre-pandemic state, but localized outbreaks continued to occur. Towards the end of 2022, COVID-19 resurged in China, leading to another disruption of people’s lives and work. Many pieces of information on social media reflected people’s views and emotions towards the second outbreak, which showed distinct differences compared to the first outbreak in 2020. To explore people’s emotional attitudes towards the pandemic at different stages and the underlying reasons, this study collected microblog data from November 2022 to January 2023 and from January to June 2020, encompassing Chinese reactions to the COVID-19 pandemic. Based on hesitancy and the Fuzzy Intuition theory, we proposed a hypothesis: hesitancy can be integrated into machine learning models to select suitable corpora for training, which not only improves accuracy but also enhances model efficiency. Based on this hypothesis, we designed a hesitancy-integrated model. The experimental results demonstrated the model’s positive performance on a self-constructed database. By applying this model to analyze people’s attitudes towards the pandemic, we obtained their sentiments in different months. We found that the most negative emotions appeared at the beginning of the pandemic, followed by emotional fluctuations influenced by social events, ultimately showing an overall positive trend. Combining word cloud techniques and the Latent Dirichlet Allocation (LDA) model effectively helped explore the reasons behind the changes in pandemic attitude.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"198 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangling Ding, Pu Huang, Dengyong Zhang, Wei Liang, Feng Li, Gaobo Yang, Xin Liao, Yue Li
Within the context of video frame interpolation, complex motion modeling is the task of capturing, in a video sequence, where the moving objects are located in the interpolated frame, and how to maintain the temporal consistency of motion. Existing video frame interpolation methods typically assign either a fixed size of the motion kernel or a refined optical flow to model complex motions. However, they have the limitation of data redundancy and inaccuracy representation of motion. This paper introduces a unified warping framework, named multi-scale expandable deformable convolution (MSEConv), for simultaneously performing complex motion modeling and frame interpolation. In the proposed framework, a deep fully convolutional neural network with global attention is proposed to estimate multiple small-scale kernel weights with different expansion degrees and adaptive weight allocation for each pixel synthesis. Moreover, most of the kernel-based interpolation methods can be treated as the special case of the proposed MSEConv, thus, MSEConv can be easily transferred to other kernel-based frame interpolation methods for performance improvement. To further improve the robustness of motion occlusions, an operation of mask occlusion is introduced. As a consequence, our proposed MSEConv shows strong performance on par or even better than the state-of-the-art kernel-based frame interpolation works on public datasets. Our source code and visual comparable results are available at https://github.com/Pumpkin123709/MSEConv.
{"title":"MSEConv: A Unified Warping Framework for Video Frame Interpolation","authors":"Xiangling Ding, Pu Huang, Dengyong Zhang, Wei Liang, Feng Li, Gaobo Yang, Xin Liao, Yue Li","doi":"10.1145/3648364","DOIUrl":"https://doi.org/10.1145/3648364","url":null,"abstract":"<p>Within the context of video frame interpolation, complex motion modeling is the task of capturing, in a video sequence, where the moving objects are located in the interpolated frame, and how to maintain the temporal consistency of motion. Existing video frame interpolation methods typically assign either a fixed size of the motion kernel or a refined optical flow to model complex motions. However, they have the limitation of data redundancy and inaccuracy representation of motion. This paper introduces a unified warping framework, named multi-scale expandable deformable convolution (MSEConv), for simultaneously performing complex motion modeling and frame interpolation. In the proposed framework, a deep fully convolutional neural network with global attention is proposed to estimate multiple small-scale kernel weights with different expansion degrees and adaptive weight allocation for each pixel synthesis. Moreover, most of the kernel-based interpolation methods can be treated as the special case of the proposed MSEConv, thus, MSEConv can be easily transferred to other kernel-based frame interpolation methods for performance improvement. To further improve the robustness of motion occlusions, an operation of mask occlusion is introduced. As a consequence, our proposed MSEConv shows strong performance on par or even better than the state-of-the-art kernel-based frame interpolation works on public datasets. Our source code and visual comparable results are available at https://github.com/Pumpkin123709/MSEConv.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"78 3 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the successful application of deep learning, document summarization systems can produce more readable results. However, abstractive summarization still suffers from unfaithful outputs and factual errors, especially in named entities. Current approaches tend to employ external knowledge to improve model performance while neglecting the boundary information and the semantics of the entities. In this paper, we propose an entity-augmented method (EAM) to encourage the model to make full use of the entity boundary information and pay more attention to the critical entities. Experimental results on three Chinese and English summarization datasets show that our method outperforms several strong baselines and achieves state-of-the-art performance on the CLTS dataset. Our method can also improve the faithfulness of the summary and generalize well to different pre-trained language models. Moreover, we propose a method to evaluate the integrity of generated entities. Besides, we adapt the data augmentation method in the FactCC model according to the difference between Chinese and English in grammar and train a new evaluation model for factual consistency evaluation in Chinese summarization.
{"title":"Boundary-Aware Abstractive Summarization with Entity-Augmented Attention for Enhancing Faithfulness","authors":"Jiuyi Li, Junpeng Liu, Jianjun Ma, Wei Yang, Degen Huang","doi":"10.1145/3641278","DOIUrl":"https://doi.org/10.1145/3641278","url":null,"abstract":"<p>With the successful application of deep learning, document summarization systems can produce more readable results. However, abstractive summarization still suffers from unfaithful outputs and factual errors, especially in named entities. Current approaches tend to employ external knowledge to improve model performance while neglecting the boundary information and the semantics of the entities. In this paper, we propose an entity-augmented method (EAM) to encourage the model to make full use of the entity boundary information and pay more attention to the critical entities. Experimental results on three Chinese and English summarization datasets show that our method outperforms several strong baselines and achieves state-of-the-art performance on the CLTS dataset. Our method can also improve the faithfulness of the summary and generalize well to different pre-trained language models. Moreover, we propose a method to evaluate the integrity of generated entities. Besides, we adapt the data augmentation method in the FactCC model according to the difference between Chinese and English in grammar and train a new evaluation model for factual consistency evaluation in Chinese summarization.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"63 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mya Ei San, Sasiporn Usanavasin, Ye Kyaw Thu, Manabu Okumura
Several methodologies have recently been proposed to enhance the performance of low-resource Neural Machine Translation (NMT). However, these techniques have yet to be explored thoroughly in low-resource Thai and Myanmar languages. Therefore, we first applied augmentation techniques such as SwitchOut and Ciphertext Based Data Augmentation (CipherDAug) to improve NMT performance in these languages. We secondly enhanced the NMT performance by fine-tuning the pre-trained Multilingual Denoising BART model (mBART), where BART denotes Bidirectional and Auto-Regressive Transformer. We implemented three NMT systems: namely, Transformer+SwitchOut, Multi-source Transformer+CipherDAug, and fine-tuned mBART in the bidirectional translations of Thai-English-Myanmar language pairs from the ASEAN-MT corpus. Experimental results showed that Multi-source Transformer+CipherDAug significantly improved BLEU, ChrF, and TER scores over the first baseline Transformer and second baseline Edit-Based Transformer (EDITOR). The model achieved notable BLEU scores: 37.9 (English-to-Thai), 42.7 (Thai-to-English), 28.9 (English-to-Myanmar), 31.2 (Myanmar-to-English), 25.3 (Thai-to-Myanmar), and 25.5 (Myanmar-to-Thai). The fine-tuned mBART model also considerably outperformed the two baselines, except for the Myanmar-to-English pair. SwitchOut improved over the second baseline in all pairs and performed similarly to the first baseline in most cases. Lastly, we performed detailed analyses verifying that the CipherDAug and mBART models potentially facilitate improving low-resource NMT performance in Thai and Myanmar languages.
{"title":"A Study for Enhancing Low-resource Thai-Myanmar-English Neural Machine Translation","authors":"Mya Ei San, Sasiporn Usanavasin, Ye Kyaw Thu, Manabu Okumura","doi":"10.1145/3645111","DOIUrl":"https://doi.org/10.1145/3645111","url":null,"abstract":"<p>Several methodologies have recently been proposed to enhance the performance of low-resource Neural Machine Translation (NMT). However, these techniques have yet to be explored thoroughly in low-resource Thai and Myanmar languages. Therefore, we first applied augmentation techniques such as SwitchOut and Ciphertext Based Data Augmentation (CipherDAug) to improve NMT performance in these languages. We secondly enhanced the NMT performance by fine-tuning the pre-trained Multilingual Denoising BART model (mBART), where BART denotes Bidirectional and Auto-Regressive Transformer. We implemented three NMT systems: namely, Transformer+SwitchOut, Multi-source Transformer+CipherDAug, and fine-tuned mBART in the bidirectional translations of Thai-English-Myanmar language pairs from the ASEAN-MT corpus. Experimental results showed that Multi-source Transformer+CipherDAug significantly improved BLEU, ChrF, and TER scores over the first baseline Transformer and second baseline Edit-Based Transformer (EDITOR). The model achieved notable BLEU scores: 37.9 (English-to-Thai), 42.7 (Thai-to-English), 28.9 (English-to-Myanmar), 31.2 (Myanmar-to-English), 25.3 (Thai-to-Myanmar), and 25.5 (Myanmar-to-Thai). The fine-tuned mBART model also considerably outperformed the two baselines, except for the Myanmar-to-English pair. SwitchOut improved over the second baseline in all pairs and performed similarly to the first baseline in most cases. Lastly, we performed detailed analyses verifying that the CipherDAug and mBART models potentially facilitate improving low-resource NMT performance in Thai and Myanmar languages.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"176 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manipuri is a low-resource, Tibeto-Burman tonal language spoken mainly in Manipur, a northeastern state of India. Tone identification is crucial to speech comprehension for tonal languages, where tone defines the word’s meaning. Automatic Speech Recognition for those languages can perform better by including tonal information from a powerful tone detection system. While significant research has been conducted on tonal languages like Mandarin, Thai, Cantonese and Vietnamese, a notable gap exists in exploring Manipuri within this context. To address this gap, this work expands our previously developed handcrafted speech corpus, ManiTo, which comprises of isolated Manipuri tonal contrast word pairs to study the tones of Manipuri. This extension includes contributions from twenty native speakers. Preliminary findings have confirmed that Manipuri has two unique tones, Falling and Level. The study then conducts a comprehensive acoustic feature analysis. Two sets of features based on Pitch contours, Jitter and Shimmer measurements are investigated to distinguish the two tones of Manipuri. Support Vector Machine, Long Short-Term Memory, Random Forest and k-Nearest Neighbors are the classifiers adopted to validate the selected feature sets. The results indicate that the second set of features consistently outperformed the first set, demonstrating higher accuracy, particularly when utilizing the Random Forest classifier, which provides valuable insights for further advancements in speech recognition technology for low-resource tonal language Manipuri.
曼尼普尔语是一种资源匮乏的藏缅语调语言,主要在印度东北部的曼尼普尔邦使用。音调识别对于音调语言的语音理解至关重要,因为音调决定了单词的含义。如果将强大的音调检测系统提供的音调信息包括在内,这些语言的自动语音识别功能就能发挥得更好。虽然对普通话、泰语、粤语和越南语等声调语言进行了大量研究,但在探索曼尼普里语方面还存在明显差距。为了填补这一空白,这项工作扩展了我们之前开发的手工制作语音语料库 ManiTo,该语料库由孤立的曼尼普尔语声调对比词对组成,用于研究曼尼普尔语的声调。这一扩展包括来自 20 位母语人士的贡献。初步研究结果证实,曼尼普尔语有两种独特的音调,即 "下降 "和 "水平"。研究随后进行了全面的声学特征分析。研究了基于音高轮廓、抖动和微光测量的两组特征,以区分曼尼普里语的两种音调。支持向量机、长短期记忆、随机森林和 k 近邻是验证所选特征集的分类器。结果表明,第二组特征始终优于第一组特征,尤其是在使用随机森林分类器时,表现出更高的准确性,这为进一步提高低资源音调语言曼尼普尔语的语音识别技术提供了宝贵的见解。
{"title":"Disambiguation of Isolated Manipuri Tonal Contrast Word Pairs using Acoustic Features","authors":"Thiyam Susma Devi, Pradip K. Das","doi":"10.1145/3643830","DOIUrl":"https://doi.org/10.1145/3643830","url":null,"abstract":"<p>Manipuri is a low-resource, Tibeto-Burman tonal language spoken mainly in Manipur, a northeastern state of India. Tone identification is crucial to speech comprehension for tonal languages, where tone defines the word’s meaning. Automatic Speech Recognition for those languages can perform better by including tonal information from a powerful tone detection system. While significant research has been conducted on tonal languages like Mandarin, Thai, Cantonese and Vietnamese, a notable gap exists in exploring Manipuri within this context. To address this gap, this work expands our previously developed handcrafted speech corpus, ManiTo, which comprises of isolated Manipuri tonal contrast word pairs to study the tones of Manipuri. This extension includes contributions from twenty native speakers. Preliminary findings have confirmed that Manipuri has two unique tones, Falling and Level. The study then conducts a comprehensive acoustic feature analysis. Two sets of features based on Pitch contours, Jitter and Shimmer measurements are investigated to distinguish the two tones of Manipuri. Support Vector Machine, Long Short-Term Memory, Random Forest and k-Nearest Neighbors are the classifiers adopted to validate the selected feature sets. The results indicate that the second set of features consistently outperformed the first set, demonstrating higher accuracy, particularly when utilizing the Random Forest classifier, which provides valuable insights for further advancements in speech recognition technology for low-resource tonal language Manipuri.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current generative knowledge graph construction approaches usually fail to capture structural knowledge by simply flattening natural language into serialized texts or a specification language. However, large generative language model trained on structured data such as code has demonstrated impressive capability in understanding natural language for structural prediction and reasoning tasks. Intuitively, we address the task of generative knowledge graph construction with code language model: given a code-format natural language input, the target is to generate triples which can be represented as code completion tasks. Specifically, we develop schema-aware prompts that effectively utilize the semantic structure within the knowledge graph. As code inherently possesses structure, such as class and function definitions, it serves as a useful model for prior semantic structural knowledge. Furthermore, we employ a rationale-enhanced generation method to boost the performance. Rationales provide intermediate steps, thereby improving knowledge extraction abilities. Experimental results indicate that the proposed approach can obtain better performance on benchmark datasets compared with baselines.
{"title":"CodeKGC: Code Language Model for Generative Knowledge Graph Construction","authors":"Zhen Bi, Jing Chen, Yinuo Jiang, Feiyu Xiong, Wei Guo, Huajun Chen, Ningyu Zhang","doi":"10.1145/3641850","DOIUrl":"https://doi.org/10.1145/3641850","url":null,"abstract":"<p>Current generative knowledge graph construction approaches usually fail to capture structural knowledge by simply flattening natural language into serialized texts or a specification language. However, large generative language model trained on structured data such as code has demonstrated impressive capability in understanding natural language for structural prediction and reasoning tasks. Intuitively, we address the task of generative knowledge graph construction with code language model: given a code-format natural language input, the target is to generate triples which can be represented as code completion tasks. Specifically, we develop schema-aware prompts that effectively utilize the semantic structure within the knowledge graph. As code inherently possesses structure, such as class and function definitions, it serves as a useful model for prior semantic structural knowledge. Furthermore, we employ a rationale-enhanced generation method to boost the performance. Rationales provide intermediate steps, thereby improving knowledge extraction abilities. Experimental results indicate that the proposed approach can obtain better performance on benchmark datasets compared with baselines.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"12 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent years have witnessed a surge of academic interest in knowledge-enhanced pre-trained language models (PLMs) that incorporate factual knowledge to enhance knowledge-driven applications. Nevertheless, existing studies primarily focus on shallow, static, and separately pre-trained entity embeddings, with few delving into the potential of deep contextualized knowledge representation for knowledge incorporation. Consequently, the performance gains of such models remain limited. In this paper, we introduce a simple yet effective knowledge-enhanced model, College (Contrastive Language-Knowledge Graph Pre-training), which leverages contrastive learning to incorporate factual knowledge into PLMs. This approach maintains the knowledge in its original graph structure to provide the most available information and circumvents the issue of heterogeneous embedding fusion. Experimental results demonstrate that our approach achieves more effective results on several knowledge-intensive tasks compared to previous state-of-the-art methods. Our code and trained models are available at https://github.com/Stacy027/COLLEGE.
{"title":"Contrastive Language-Knowledge Graph Pre-training","authors":"Xiaowei Yuan, Kang Liu, Yequan Wang","doi":"10.1145/3644820","DOIUrl":"https://doi.org/10.1145/3644820","url":null,"abstract":"<p>Recent years have witnessed a surge of academic interest in knowledge-enhanced pre-trained language models (PLMs) that incorporate factual knowledge to enhance knowledge-driven applications. Nevertheless, existing studies primarily focus on shallow, static, and separately pre-trained entity embeddings, with few delving into the potential of deep contextualized knowledge representation for knowledge incorporation. Consequently, the performance gains of such models remain limited. In this paper, we introduce a simple yet effective knowledge-enhanced model, <span>College</span> (<b>Co</b>ntrastive <b>L</b>anguage-Know<b>le</b>dge <b>G</b>raph Pr<b>e</b>-training), which leverages contrastive learning to incorporate factual knowledge into PLMs. This approach maintains the knowledge in its original graph structure to provide the most available information and circumvents the issue of heterogeneous embedding fusion. Experimental results demonstrate that our approach achieves more effective results on several knowledge-intensive tasks compared to previous state-of-the-art methods. Our code and trained models are available at https://github.com/Stacy027/COLLEGE.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"176 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Debajyoty Banik, Rahul Paul, Rajkumar Singh Rathore, Rutvij H. Jhaveri
In this research, we introduce two new machine learning regression methods: the Ensemble Average and the Pipelined Model. These methods aim to enhance traditional regression analysis for predictive tasks and have undergone thorough evaluation across three datasets: Kaggle House Price, Boston House Price, and California Housing, using various performance metrics. The results consistently show that our models outperform existing methods in terms of accuracy and reliability across all three datasets. The Pipelined Model, in particular, is notable for its ability to combine predictions from multiple models, leading to higher accuracy and impressive scalability. This scalability allows for their application in diverse fields like technology, finance, and healthcare. Furthermore, these models can be adapted for real-time and streaming data analysis, making them valuable for applications such as fraud detection, stock market prediction, and IoT sensor data analysis. Enhancements to the models also make them suitable for big data applications, ensuring their relevance for large datasets and distributed computing environments. It’s important to acknowledge some limitations of our models, including potential data biases, specific assumptions, increased complexity, and challenges related to interpretability when using them in practical scenarios. Nevertheless, these innovations advance predictive modeling, and our comprehensive evaluation underscores their potential to provide increased accuracy and reliability across a wide range of applications. The results indicate that the proposed models outperform existing models in terms of accuracy and robustness for all three datasets. The source code can be found at https://huggingface.co/DebajyotyBanik/Ensemble-Pipelined-Regression/tree/main.
{"title":"Improved Regression Analysis with Ensemble Pipeline Approach for Applications Across Multiple Domains","authors":"Debajyoty Banik, Rahul Paul, Rajkumar Singh Rathore, Rutvij H. Jhaveri","doi":"10.1145/3645110","DOIUrl":"https://doi.org/10.1145/3645110","url":null,"abstract":"<p>In this research, we introduce two new machine learning regression methods: the Ensemble Average and the Pipelined Model. These methods aim to enhance traditional regression analysis for predictive tasks and have undergone thorough evaluation across three datasets: Kaggle House Price, Boston House Price, and California Housing, using various performance metrics. The results consistently show that our models outperform existing methods in terms of accuracy and reliability across all three datasets. The Pipelined Model, in particular, is notable for its ability to combine predictions from multiple models, leading to higher accuracy and impressive scalability. This scalability allows for their application in diverse fields like technology, finance, and healthcare. Furthermore, these models can be adapted for real-time and streaming data analysis, making them valuable for applications such as fraud detection, stock market prediction, and IoT sensor data analysis. Enhancements to the models also make them suitable for big data applications, ensuring their relevance for large datasets and distributed computing environments. It’s important to acknowledge some limitations of our models, including potential data biases, specific assumptions, increased complexity, and challenges related to interpretability when using them in practical scenarios. Nevertheless, these innovations advance predictive modeling, and our comprehensive evaluation underscores their potential to provide increased accuracy and reliability across a wide range of applications. The results indicate that the proposed models outperform existing models in terms of accuracy and robustness for all three datasets. The source code can be found at https://huggingface.co/DebajyotyBanik/Ensemble-Pipelined-Regression/tree/main.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"16 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}