Pre-trained language models learn informative word representations on a large-scale text corpus through self-supervised learning, which has achieved promising performance in fields of natural language processing (NLP) after fine-tuning. These models, however, suffer from poor robustness and lack of interpretability. We refer to pre-trained language models with knowledge injection as knowledge-enhanced pre-trained language models (KEPLMs). These models demonstrate deep understanding and logical reasoning and introduce interpretability. In this survey, we provide a comprehensive overview of KEPLMs in NLP. We first discuss the advancements in pre-trained language models and knowledge representation learning. Then we systematically categorize existing KEPLMs from three different perspectives. Finally, we outline some potential directions of KEPLMs for future research.
{"title":"A Survey of Knowledge Enhanced Pre-trained Language Models","authors":"Jian Yang, Xinyu Hu, Gang Xiao, Yulong Shen","doi":"10.1145/3631392","DOIUrl":"https://doi.org/10.1145/3631392","url":null,"abstract":"<p>Pre-trained language models learn informative word representations on a large-scale text corpus through self-supervised learning, which has achieved promising performance in fields of natural language processing (NLP) after fine-tuning. These models, however, suffer from poor robustness and lack of interpretability. We refer to pre-trained language models with knowledge injection as knowledge-enhanced pre-trained language models (KEPLMs). These models demonstrate deep understanding and logical reasoning and introduce interpretability. In this survey, we provide a comprehensive overview of KEPLMs in NLP. We first discuss the advancements in pre-trained language models and knowledge representation learning. Then we systematically categorize existing KEPLMs from three different perspectives. Finally, we outline some potential directions of KEPLMs for future research.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article aimed to address the problems of word order confusion, context dependency, and ambiguity in traditional machine translation (MT) methods for verb recognition. By applying advanced intelligent algorithms of artificial intelligence, verb recognition can be better processed and the quality and accuracy of MT can be improved. Based on Neural machine translation (NMT), basic attention mechanisms, historical attention information, dynamically obtain information related to the generated words, and constraint mechanisms were introduced to embed semantic information, represent polysemy, and annotate semantic roles of verbs. This article used the Workshop on machine translation (WMT), British National Corpus (BNC), Gutenberg, Reuters Corpus, OpenSubtitles corpus, and enhanced the data in the corpus. The improved NMT model was compared with traditional NMT models, Rule Based machine translation (RBMT), and Statistical machine translation (SMT). The experimental results showed that the average verb semantic matching degree of the improved NMT model in 5 corpora was 0.85, and the average Bilingual Evaluation Understudy (BLEU) score in 5 corpora was 0.90. The improved NMT model in this article can effectively improve the accuracy of verb recognition in MT, providing new methods for verb recognition in MT.
{"title":"Exploration on Advanced Intelligent Algorithms of Artificial Intelligence for Verb Recognition in Machine Translation","authors":"Qinghua Ai, Qingyan Ai, Jun Wang","doi":"10.1145/3649891","DOIUrl":"https://doi.org/10.1145/3649891","url":null,"abstract":"<p>This article aimed to address the problems of word order confusion, context dependency, and ambiguity in traditional machine translation (MT) methods for verb recognition. By applying advanced intelligent algorithms of artificial intelligence, verb recognition can be better processed and the quality and accuracy of MT can be improved. Based on Neural machine translation (NMT), basic attention mechanisms, historical attention information, dynamically obtain information related to the generated words, and constraint mechanisms were introduced to embed semantic information, represent polysemy, and annotate semantic roles of verbs. This article used the Workshop on machine translation (WMT), British National Corpus (BNC), Gutenberg, Reuters Corpus, OpenSubtitles corpus, and enhanced the data in the corpus. The improved NMT model was compared with traditional NMT models, Rule Based machine translation (RBMT), and Statistical machine translation (SMT). The experimental results showed that the average verb semantic matching degree of the improved NMT model in 5 corpora was 0.85, and the average Bilingual Evaluation Understudy (BLEU) score in 5 corpora was 0.90. The improved NMT model in this article can effectively improve the accuracy of verb recognition in MT, providing new methods for verb recognition in MT.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Part-of-speech tagging plays a vital role in text processing and natural language understanding. Very few attempts have been made in the past for tagging Pashto Part-of-Speech. In this work, we present LSTM based approach for Pashto part-of-speech tagging with special focus on ambiguity resolution. Initially we created a corpus of Pashto sentences having words with multiple meanings and their tags. We introduce a powerful sentences representation and new architecture for Pashto text processing. The accuracy of the proposed approach is compared with state-of-the-art Hidden Markov Model. Our Model shows 87.60% accuracy for all words excluding punctuations and 95.45% for ambiguous words, on the other hand Hidden Markov Model shows 78.37% and 44.72% accuracy respectively. Results show that our approach outperform Hidden Markov Model in Part-of-Speech tagging for Pashto text.
{"title":"Leveraging Bidirectionl LSTM with CRFs for Pashto tagging","authors":"Farooq Zaman, Onaiza Maqbool, Jaweria Kanwal","doi":"10.1145/3649456","DOIUrl":"https://doi.org/10.1145/3649456","url":null,"abstract":"<p>Part-of-speech tagging plays a vital role in text processing and natural language understanding. Very few attempts have been made in the past for tagging Pashto Part-of-Speech. In this work, we present LSTM based approach for Pashto part-of-speech tagging with special focus on ambiguity resolution. Initially we created a corpus of Pashto sentences having words with multiple meanings and their tags. We introduce a powerful sentences representation and new architecture for Pashto text processing. The accuracy of the proposed approach is compared with state-of-the-art Hidden Markov Model. Our Model shows 87.60% accuracy for all words excluding punctuations and 95.45% for ambiguous words, on the other hand Hidden Markov Model shows 78.37% and 44.72% accuracy respectively. Results show that our approach outperform Hidden Markov Model in Part-of-Speech tagging for Pashto text.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139977793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we introduce WAFFNet, an attention-centric feature fusion architecture tailored for word-level multi-lingual scene text script identification. Motivated by the limitations of traditional approaches that rely exclusively on feature-based methods or deep learning strategies, our approach amalgamates statistical and deep features to bridge the gap. At the core of WAFFNet, we utilized the merits of Local Binary Pattern —a prominent descriptor capturing low-level texture features with high-dimensional, semantically-rich convolutional features. This fusion is judiciously augmented by a spatial attention mechanism, ensuring targeted emphasis on semantically critical regions of the input image. To address the class imbalance problem in multi-class classification scenarios, we employed a weighted objective function. This not only regularizes the learning process but also addresses the class imbalance problem. The architectural integrity of WAFFNet is preserved through an end-to-end training paradigm, leveraging transfer learning to expedite convergence and optimize performance metrics. Considering the under-representation of regional Indian languages in current datasets, we meticulously curated IIITG-STLI2023, a comprehensive dataset encapsulating English alongside six under-represented Indian languages: Hindi, Kannada, Malayalam, Telugu, Bengali, and Manipuri. Rigorous evaluation of the IIITG-STLI2023, as well as the established MLe2e and SIW-13 datasets, underscores WAFFNet’s supremacy over both traditional feature-engineering approaches as well as state-of-the-art deep learning frameworks. Thus, the proposed WAFFNet framework offers a robust and effective solution for language identification in scene text images.
{"title":"A Hybrid Scene Text Script Identification Network for regional Indian Languages","authors":"Veronica Naosekpam, Nilkanta Sahu","doi":"10.1145/3649439","DOIUrl":"https://doi.org/10.1145/3649439","url":null,"abstract":"<p>In this work, we introduce WAFFNet, an attention-centric feature fusion architecture tailored for word-level multi-lingual scene text script identification. Motivated by the limitations of traditional approaches that rely exclusively on feature-based methods or deep learning strategies, our approach amalgamates statistical and deep features to bridge the gap. At the core of WAFFNet, we utilized the merits of Local Binary Pattern —a prominent descriptor capturing low-level texture features with high-dimensional, semantically-rich convolutional features. This fusion is judiciously augmented by a spatial attention mechanism, ensuring targeted emphasis on semantically critical regions of the input image. To address the class imbalance problem in multi-class classification scenarios, we employed a weighted objective function. This not only regularizes the learning process but also addresses the class imbalance problem. The architectural integrity of WAFFNet is preserved through an end-to-end training paradigm, leveraging transfer learning to expedite convergence and optimize performance metrics. Considering the under-representation of regional Indian languages in current datasets, we meticulously curated IIITG-STLI2023, a comprehensive dataset encapsulating English alongside six under-represented Indian languages: Hindi, Kannada, Malayalam, Telugu, Bengali, and Manipuri. Rigorous evaluation of the IIITG-STLI2023, as well as the established MLe2e and SIW-13 datasets, underscores WAFFNet’s supremacy over both traditional feature-engineering approaches as well as state-of-the-art deep learning frameworks. Thus, the proposed WAFFNet framework offers a robust and effective solution for language identification in scene text images.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139949859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A classification system for hazardous materials in air traffic control was investigated using the Human Factors Analysis and Classification System (HFACS) framework and natural language processing to prevent hazardous situations in air traffic control. Based on the development of the HFACS standard, an air traffic control hazard classification system will be created. The dangerous data of the aviation safety management system is selected by dead bodies, classified and marked in 5 levels. TFIDF TextRank text classification method based on key content extraction and text classification model based on CNN and BERT model were used in the experiment to solve the problem of small samples, many labels and random samples in hazardous environment of air pollution control. The results show that the total cost of model training time and classification accuracy is the highest when the keywords are around 8. As the number of points increases, the time spent in dimensioning decreases and affects accuracy. When the number of points reaches about 93, the time spent in determining the size increases, but the accuracy of the allocation remains close to 0.7, but the increase in the value of time leads to a decrease in the total cost. It has been proven that extracting key content can solve text classification problems for small companies and contribute to further research in the development of security systems.
{"title":"A Natural Language Processing System for Text Classification Corpus Based on Machine Learning","authors":"Yawen Su","doi":"10.1145/3648361","DOIUrl":"https://doi.org/10.1145/3648361","url":null,"abstract":"<p>A classification system for hazardous materials in air traffic control was investigated using the Human Factors Analysis and Classification System (HFACS) framework and natural language processing to prevent hazardous situations in air traffic control. Based on the development of the HFACS standard, an air traffic control hazard classification system will be created. The dangerous data of the aviation safety management system is selected by dead bodies, classified and marked in 5 levels. TFIDF TextRank text classification method based on key content extraction and text classification model based on CNN and BERT model were used in the experiment to solve the problem of small samples, many labels and random samples in hazardous environment of air pollution control. The results show that the total cost of model training time and classification accuracy is the highest when the keywords are around 8. As the number of points increases, the time spent in dimensioning decreases and affects accuracy. When the number of points reaches about 93, the time spent in determining the size increases, but the accuracy of the allocation remains close to 0.7, but the increase in the value of time leads to a decrease in the total cost. It has been proven that extracting key content can solve text classification problems for small companies and contribute to further research in the development of security systems.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139910190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes an efficient Summary Caption Technique(SCT) which considers the multimodal summary and image captions as input to retrieve the correspondence images from the captions that are highly influential to the multimodal summary. Matching a multimodal summary with an appropriate image is a challenging task in computer vision (CV) and natural language processing (NLP) fields. Merging of these fields are tedious, though the research community has steadily focused on the cross-modal retrieval. These issues include the visual question-answering, matching queries with the images, and semantic relationship matching between two modalities for retrieving the corresponding image. Relevant works consider in questions to match the relationship of visual information, object detection and to match the text with visual information, and employing structural-level representation to align the images with the text. However, these techniques are primarily focused on retrieving the images to text or for the image captioning. But less effort has been spent on retrieving relevant images for the multimodal summary. Hence, our proposed technique extracts and merge features in Hybrid Image Text(HIT) layer and captions in the semantic embeddings with word2vec where the contextual features and semantic relationships are compared and matched with each vector between the modalities, with cosine semantic similarity. In cross-modal retrieval, we achieve top five related images and align the relevant images to the multimodal summary that achieves the highest cosine score among the retrieved images. The model has been trained with seq-to-seq modal with 100 epochs, besides reducing the information loss by the sparse categorical cross entropy. Further, experimenting with the multimodal summarization with multimodal output dataset (MSMO), in cross-modal retrieval, helps to evaluate the quality of image alignment with an image-precision metric that demonstrate the best results.
{"title":"SCT:Summary Caption Technique for Retrieving Relevant Images in Alignment with Multimodal Abstractive Summary","authors":"Shaik Rafi, Ranjita Das","doi":"10.1145/3645029","DOIUrl":"https://doi.org/10.1145/3645029","url":null,"abstract":"<p>This work proposes an efficient Summary Caption Technique(SCT) which considers the multimodal summary and image captions as input to retrieve the correspondence images from the captions that are highly influential to the multimodal summary. Matching a multimodal summary with an appropriate image is a challenging task in computer vision (CV) and natural language processing (NLP) fields. Merging of these fields are tedious, though the research community has steadily focused on the cross-modal retrieval. These issues include the visual question-answering, matching queries with the images, and semantic relationship matching between two modalities for retrieving the corresponding image. Relevant works consider in questions to match the relationship of visual information, object detection and to match the text with visual information, and employing structural-level representation to align the images with the text. However, these techniques are primarily focused on retrieving the images to text or for the image captioning. But less effort has been spent on retrieving relevant images for the multimodal summary. Hence, our proposed technique extracts and merge features in Hybrid Image Text(HIT) layer and captions in the semantic embeddings with word2vec where the contextual features and semantic relationships are compared and matched with each vector between the modalities, with cosine semantic similarity. In cross-modal retrieval, we achieve top five related images and align the relevant images to the multimodal summary that achieves the highest cosine score among the retrieved images. The model has been trained with seq-to-seq modal with 100 epochs, besides reducing the information loss by the sparse categorical cross entropy. Further, experimenting with the multimodal summarization with multimodal output dataset (MSMO), in cross-modal retrieval, helps to evaluate the quality of image alignment with an image-precision metric that demonstrate the best results.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the field of disease diagnosis, medical image classification faces an inherent challenge due to various factors involving data imbalance, image quality variability, annotation variability, and limited data availability and data representativeness. Such challenges affect the algorithm's classification ability on the medical images in an adverse way, which leads to biased model outcomes and inaccurate interpretations. In this paper, a novel Discrete Levy Flight Grey Wolf Optimizer (DLFGWO) is combined with the Random Forest (RF) classifier to address the above limitations on the biomedical datasets and to achieve better classification rate. The DLFGWO-RF resolves the image quality variability in ultrasound images and limits the inaccuracies on classification using RF by handling the incomplete and noisy data. The sheer focus on the majority class may lead to unequal distribution of classes and thus leads to data imbalance. The DLFGWO balances such distribution by leveraging grey wolves and its exploration and exploitation capabilities are improved using Discrete Levy Flight (DLF). It further optimizes the classifier's performance to achieve balanced classification rate. DLFGWO-RF is designed to perform classification even on limited datasets, thereby the requirement of numerous expert annotations can thus be reduced. In diabetic retinopathy grading, the DLFGWO-RF reduces disagreements in annotation variability using subjective interpretations. However, the representativeness of the diabetic retinopathy dataset fails to capture the entire population diversity, which limits the generalization ability of the proposed DLFGWO-RF. Thus, fine-tuning of RF can robustly adapt to the subgroups in the dataset, enhancing its overall performance. The experiments are conducted on two widely used medical image datasets to test the efficacy of the model. The experimental results show that the DLFGWO-RF classifier achieves improved classification accuracy between 90-95%, which outperforms the existing techniques for various imbalanced datasets.
{"title":"Handling Imbalance and Limited Data in Thyroid Ultrasound and Diabetic Retinopathy Datasets Using Discrete Levy Flights Grey Wolf Optimizer Based Random Forest for Robust Medical Data Classification","authors":"Shobha Aswal, Neelu Jyothi Ahuja, Ritika Mehra","doi":"10.1145/3648363","DOIUrl":"https://doi.org/10.1145/3648363","url":null,"abstract":"<p>In the field of disease diagnosis, medical image classification faces an inherent challenge due to various factors involving data imbalance, image quality variability, annotation variability, and limited data availability and data representativeness. Such challenges affect the algorithm's classification ability on the medical images in an adverse way, which leads to biased model outcomes and inaccurate interpretations. In this paper, a novel Discrete Levy Flight Grey Wolf Optimizer (DLFGWO) is combined with the Random Forest (RF) classifier to address the above limitations on the biomedical datasets and to achieve better classification rate. The DLFGWO-RF resolves the image quality variability in ultrasound images and limits the inaccuracies on classification using RF by handling the incomplete and noisy data. The sheer focus on the majority class may lead to unequal distribution of classes and thus leads to data imbalance. The DLFGWO balances such distribution by leveraging grey wolves and its exploration and exploitation capabilities are improved using Discrete Levy Flight (DLF). It further optimizes the classifier's performance to achieve balanced classification rate. DLFGWO-RF is designed to perform classification even on limited datasets, thereby the requirement of numerous expert annotations can thus be reduced. In diabetic retinopathy grading, the DLFGWO-RF reduces disagreements in annotation variability using subjective interpretations. However, the representativeness of the diabetic retinopathy dataset fails to capture the entire population diversity, which limits the generalization ability of the proposed DLFGWO-RF. Thus, fine-tuning of RF can robustly adapt to the subgroups in the dataset, enhancing its overall performance. The experiments are conducted on two widely used medical image datasets to test the efficacy of the model. The experimental results show that the DLFGWO-RF classifier achieves improved classification accuracy between 90-95%, which outperforms the existing techniques for various imbalanced datasets.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Named Entity Recognition (NER) is an indispensable component of Natural Language Processing (NLP), which aims to identify and classify entities within text data. While Deep Learning (DL) models have excelled in NER for well-resourced languages like English, Spanish, and Chinese, they face significant hurdles when dealing with low-resource languages like Urdu. These challenges stem from the intricate linguistic characteristics of Urdu, including morphological diversity, context-dependent lexicon, and the scarcity of training data. This study addresses these issues by focusing on Urdu Named Entity Recognition (U-NER) and introducing three key contributions. First, various pre-trained embedding methods are employed, encompassing Word2vec (W2V), GloVe, FastText, Bidirectional Encoder Representations from Transformers (BERT), and Embeddings from language models (ELMo). In particular, fine-tuning is performed on BERTBASE and ELMo using Urdu Wikipedia and news articles. Secondly, a novel generative Data Augmentation (DA) technique replaces Named Entities (NEs) with mask tokens, employing pre-trained masked language models to predict masked tokens, effectively expanding the training dataset. Finally, the study introduces a novel hybrid model combining a Transformer Encoder with a Convolutional Neural Network (CNN) to capture the intricate morphology of Urdu. These modules enable the model to handle polysemy, extract short and long-range dependencies, and enhance learning capacity. Empirical experiments demonstrate that the proposed model, incorporating BERT embeddings and an innovative DA approach, attains the highest F1-Score of 93.99%, highlighting its efficacy for the U-NER task.
命名实体识别(NER)是自然语言处理(NLP)不可或缺的组成部分,旨在识别文本数据中的实体并对其进行分类。虽然深度学习(DL)模型在英语、西班牙语和中文等资源丰富的语言的 NER 中表现出色,但在处理乌尔都语等资源匮乏的语言时却面临巨大障碍。这些挑战源于乌尔都语错综复杂的语言特点,包括形态多样性、上下文相关词汇以及训练数据的稀缺性。本研究通过关注乌尔都语命名实体识别(U-NER)来解决这些问题,并引入了三个关键贡献。首先,采用了多种预训练嵌入方法,包括 Word2vec (W2V)、GloVe、FastText、来自变换器的双向编码器表示法 (BERT) 和来自语言模型的嵌入法 (ELMo)。其中,利用乌尔都语维基百科和新闻文章对 BERTBASE 和 ELMo 进行了微调。其次,一种新颖的生成性数据增强(DA)技术用掩码标记取代了命名实体(NE),利用预先训练好的掩码语言模型来预测掩码标记,从而有效地扩展了训练数据集。最后,该研究引入了一种新型混合模型,该模型结合了变换器编码器和卷积神经网络(CNN),以捕捉乌尔都语复杂的形态。这些模块使模型能够处理多义词,提取短程和长程依赖关系,并增强学习能力。实证实验表明,所提出的模型结合了 BERT 嵌入和创新的 DA 方法,达到了最高的 F1-Score 93.99%,突显了其在 U-NER 任务中的功效。
{"title":"Enriching Urdu NER with BERT Embedding, Data Augmentation, and Hybrid Encoder-CNN Architecture","authors":"Anil Ahmed, Degen Huang, Syed Yasser Arafat, Imran Hameed","doi":"10.1145/3648362","DOIUrl":"https://doi.org/10.1145/3648362","url":null,"abstract":"<p>Named Entity Recognition (NER) is an indispensable component of Natural Language Processing (NLP), which aims to identify and classify entities within text data. While Deep Learning (DL) models have excelled in NER for well-resourced languages like English, Spanish, and Chinese, they face significant hurdles when dealing with low-resource languages like Urdu. These challenges stem from the intricate linguistic characteristics of Urdu, including morphological diversity, context-dependent lexicon, and the scarcity of training data. This study addresses these issues by focusing on Urdu Named Entity Recognition (U-NER) and introducing three key contributions. First, various pre-trained embedding methods are employed, encompassing Word2vec (W2V), GloVe, FastText, Bidirectional Encoder Representations from Transformers (BERT), and Embeddings from language models (ELMo). In particular, fine-tuning is performed on BERT<sub>BASE</sub> and ELMo using Urdu Wikipedia and news articles. Secondly, a novel generative Data Augmentation (DA) technique replaces Named Entities (NEs) with mask tokens, employing pre-trained masked language models to predict masked tokens, effectively expanding the training dataset. Finally, the study introduces a novel hybrid model combining a Transformer Encoder with a Convolutional Neural Network (CNN) to capture the intricate morphology of Urdu. These modules enable the model to handle polysemy, extract short and long-range dependencies, and enhance learning capacity. Empirical experiments demonstrate that the proposed model, incorporating BERT embeddings and an innovative DA approach, attains the highest F1-Score of 93.99%, highlighting its efficacy for the U-NER task.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The COVID-19 pandemic in 2020 brought an unprecedented global crisis. After two years of control efforts, life gradually returned to the pre-pandemic state, but localized outbreaks continued to occur. Towards the end of 2022, COVID-19 resurged in China, leading to another disruption of people’s lives and work. Many pieces of information on social media reflected people’s views and emotions towards the second outbreak, which showed distinct differences compared to the first outbreak in 2020. To explore people’s emotional attitudes towards the pandemic at different stages and the underlying reasons, this study collected microblog data from November 2022 to January 2023 and from January to June 2020, encompassing Chinese reactions to the COVID-19 pandemic. Based on hesitancy and the Fuzzy Intuition theory, we proposed a hypothesis: hesitancy can be integrated into machine learning models to select suitable corpora for training, which not only improves accuracy but also enhances model efficiency. Based on this hypothesis, we designed a hesitancy-integrated model. The experimental results demonstrated the model’s positive performance on a self-constructed database. By applying this model to analyze people’s attitudes towards the pandemic, we obtained their sentiments in different months. We found that the most negative emotions appeared at the beginning of the pandemic, followed by emotional fluctuations influenced by social events, ultimately showing an overall positive trend. Combining word cloud techniques and the Latent Dirichlet Allocation (LDA) model effectively helped explore the reasons behind the changes in pandemic attitude.
{"title":"Sentiment Analysis Method of Epidemic-related Microblog Based on Hesitation Theory","authors":"Yang Yu, Dong Qiu, HuanYu Wan","doi":"10.1145/3648360","DOIUrl":"https://doi.org/10.1145/3648360","url":null,"abstract":"<p>The COVID-19 pandemic in 2020 brought an unprecedented global crisis. After two years of control efforts, life gradually returned to the pre-pandemic state, but localized outbreaks continued to occur. Towards the end of 2022, COVID-19 resurged in China, leading to another disruption of people’s lives and work. Many pieces of information on social media reflected people’s views and emotions towards the second outbreak, which showed distinct differences compared to the first outbreak in 2020. To explore people’s emotional attitudes towards the pandemic at different stages and the underlying reasons, this study collected microblog data from November 2022 to January 2023 and from January to June 2020, encompassing Chinese reactions to the COVID-19 pandemic. Based on hesitancy and the Fuzzy Intuition theory, we proposed a hypothesis: hesitancy can be integrated into machine learning models to select suitable corpora for training, which not only improves accuracy but also enhances model efficiency. Based on this hypothesis, we designed a hesitancy-integrated model. The experimental results demonstrated the model’s positive performance on a self-constructed database. By applying this model to analyze people’s attitudes towards the pandemic, we obtained their sentiments in different months. We found that the most negative emotions appeared at the beginning of the pandemic, followed by emotional fluctuations influenced by social events, ultimately showing an overall positive trend. Combining word cloud techniques and the Latent Dirichlet Allocation (LDA) model effectively helped explore the reasons behind the changes in pandemic attitude.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangling Ding, Pu Huang, Dengyong Zhang, Wei Liang, Feng Li, Gaobo Yang, Xin Liao, Yue Li
Within the context of video frame interpolation, complex motion modeling is the task of capturing, in a video sequence, where the moving objects are located in the interpolated frame, and how to maintain the temporal consistency of motion. Existing video frame interpolation methods typically assign either a fixed size of the motion kernel or a refined optical flow to model complex motions. However, they have the limitation of data redundancy and inaccuracy representation of motion. This paper introduces a unified warping framework, named multi-scale expandable deformable convolution (MSEConv), for simultaneously performing complex motion modeling and frame interpolation. In the proposed framework, a deep fully convolutional neural network with global attention is proposed to estimate multiple small-scale kernel weights with different expansion degrees and adaptive weight allocation for each pixel synthesis. Moreover, most of the kernel-based interpolation methods can be treated as the special case of the proposed MSEConv, thus, MSEConv can be easily transferred to other kernel-based frame interpolation methods for performance improvement. To further improve the robustness of motion occlusions, an operation of mask occlusion is introduced. As a consequence, our proposed MSEConv shows strong performance on par or even better than the state-of-the-art kernel-based frame interpolation works on public datasets. Our source code and visual comparable results are available at https://github.com/Pumpkin123709/MSEConv.
{"title":"MSEConv: A Unified Warping Framework for Video Frame Interpolation","authors":"Xiangling Ding, Pu Huang, Dengyong Zhang, Wei Liang, Feng Li, Gaobo Yang, Xin Liao, Yue Li","doi":"10.1145/3648364","DOIUrl":"https://doi.org/10.1145/3648364","url":null,"abstract":"<p>Within the context of video frame interpolation, complex motion modeling is the task of capturing, in a video sequence, where the moving objects are located in the interpolated frame, and how to maintain the temporal consistency of motion. Existing video frame interpolation methods typically assign either a fixed size of the motion kernel or a refined optical flow to model complex motions. However, they have the limitation of data redundancy and inaccuracy representation of motion. This paper introduces a unified warping framework, named multi-scale expandable deformable convolution (MSEConv), for simultaneously performing complex motion modeling and frame interpolation. In the proposed framework, a deep fully convolutional neural network with global attention is proposed to estimate multiple small-scale kernel weights with different expansion degrees and adaptive weight allocation for each pixel synthesis. Moreover, most of the kernel-based interpolation methods can be treated as the special case of the proposed MSEConv, thus, MSEConv can be easily transferred to other kernel-based frame interpolation methods for performance improvement. To further improve the robustness of motion occlusions, an operation of mask occlusion is introduced. As a consequence, our proposed MSEConv shows strong performance on par or even better than the state-of-the-art kernel-based frame interpolation works on public datasets. Our source code and visual comparable results are available at https://github.com/Pumpkin123709/MSEConv.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}