ACM Transactions on Asian and Low-Resource Language Information Processing最新文献_第6页

Multization: Multi-Modal Summarization Enhanced by Multi-Contextually Relevant and Irrelevant Attention Alignment 多化：通过多语境相关和不相关注意力对齐增强多模态总结能力

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-03-09 DOI: 10.1145/3651983

Huan Rong, Zhongfeng Chen, Zhenyu Lu, Fan Xu, Victor S. Sheng

This paper focuses on the task of Multi-Modal Summarization with Multi-Modal Output for China JD.COM e-commerce product description containing both source text and source images. In the context learning of multi-modal (text and image) input, there exists a semantic gap between text and image, especially in the cross-modal semantics of text and image. As a result, capturing shared cross-modal semantics earlier becomes crucial for multi-modal summarization. On the other hand, when generating the multi-modal summarization, based on the different contributions of input text and images, the relevance and irrelevance of multi-modal contexts to the target summary should be considered, so as to optimize the process of learning cross-modal context to guide the summary generation process and to emphasize the significant semantics within each modality. To address the aforementioned challenges, Multization has been proposed to enhance multi-modal semantic information by multi-contextually relevant and irrelevant attention alignment. Specifically, a Semantic Alignment Enhancement mechanism is employed to capture shared semantics between different modalities (text and image), so as to enhance the importance of crucial multi-modal information in the encoding stage. Additionally, the IR-Relevant Multi-Context Learning mechanism is utilized to observe the summary generation process from both relevant and irrelevant perspectives, so as to form a multi-modal context that incorporates both text and image semantic information. The experimental results in the China JD.COM e-commerce dataset demonstrate that the proposed Multization method effectively captures the shared semantics between the input source text and source images, and highlights essential semantics. It also successfully generates the multi-modal summary (including image and text) that comprehensively considers the semantics information of both text and image.

本文主要针对中国 JD.COM 电子商务中同时包含源文本和源图像的产品描述，提出了多模态总结与多模态输出的任务。在多模态（文本和图像）输入的语境学习中，文本和图像之间存在语义鸿沟，尤其是文本和图像的跨模态语义。因此，及早捕捉共享的跨模态语义对于多模态总结至关重要。另一方面，在生成多模态摘要时，基于输入文本和图像的不同贡献，应考虑多模态上下文与目标摘要的相关性和不相关性，从而优化学习跨模态上下文的过程，以指导摘要生成过程，并强调每种模态中的重要语义。为了应对上述挑战，有人提出了多化（Multization）方法，通过多语境相关和不相关注意力对齐来增强多模态语义信息。具体来说，我们采用了语义对齐增强机制来捕捉不同模态（文本和图像）之间的共享语义，从而在编码阶段提高关键多模态信息的重要性。此外，利用红外相关多语境学习机制，从相关和不相关两个角度观察摘要生成过程，从而形成包含文本和图像语义信息的多模态语境。在中国 JD.COM 电子商务数据集中的实验结果表明，所提出的多化方法能有效捕捉输入源文本和源图像之间的共享语义，并突出重要语义。它还成功生成了多模态摘要（包括图像和文本），全面考虑了文本和图像的语义信息。

{"title":"Multization: Multi-Modal Summarization Enhanced by Multi-Contextually Relevant and Irrelevant Attention Alignment","authors":"Huan Rong, Zhongfeng Chen, Zhenyu Lu, Fan Xu, Victor S. Sheng","doi":"10.1145/3651983","DOIUrl":"https://doi.org/10.1145/3651983","url":null,"abstract":"This paper focuses on the task of Multi-Modal Summarization with Multi-Modal Output for China JD.COM e-commerce product description containing both source text and source images. In the context learning of multi-modal (text and image) input, there exists a semantic gap between text and image, especially in the cross-modal semantics of text and image. As a result, capturing shared cross-modal semantics earlier becomes crucial for multi-modal summarization. On the other hand, when generating the multi-modal summarization, based on the different contributions of input text and images, the relevance and irrelevance of multi-modal contexts to the target summary should be considered, so as to optimize the process of learning cross-modal context to guide the summary generation process and to emphasize the significant semantics within each modality. To address the aforementioned challenges, Multization has been proposed to enhance multi-modal semantic information by multi-contextually relevant and irrelevant attention alignment. Specifically, a Semantic Alignment Enhancement mechanism is employed to capture shared semantics between different modalities (text and image), so as to enhance the importance of crucial multi-modal information in the encoding stage. Additionally, the IR-Relevant Multi-Context Learning mechanism is utilized to observe the summary generation process from both relevant and irrelevant perspectives, so as to form a multi-modal context that incorporates both text and image semantic information. The experimental results in the China JD.COM e-commerce dataset demonstrate that the proposed Multization method effectively captures the shared semantics between the input source text and source images, and highlights essential semantics. It also successfully generates the multi-modal summary (including image and text) that comprehensively considers the semantics information of both text and image.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"65 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140072938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Am I hurt?: Evaluating Psychological Pain Detection in Hindi Text using Transformer-based Models 我受伤了吗？使用基于变换器的模型评估印地语文本中的心理痛苦检测

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-03-05 DOI: 10.1145/3650206

Ravleen Kaur, M. P. S. Bhatia, Akshi Kumar

The automated evaluation of pain is critical for developing effective pain management approaches that seek to alleviate while preserving patients’ functioning. Transformer-based models can aid in detecting pain from Hindi text data gathered from social media by leveraging their ability to capture complex language patterns and contextual information. By understanding the nuances and context of Hindi text, transformer models can effectively identify linguistic cues, sentiment and expressions associated with pain enabling the detection and analysis of pain-related content present in social media posts. The purpose of this research is to analyse the feasibility of utilizing NLP techniques to automatically identify pain within Hindi textual data, providing a valuable tool for pain assessment in Hindi-speaking populations. The research showcases the HindiPainNet model, a deep neural network that employs the IndicBERT model, classifying the dataset into two class labels {pain, no_pain} for detecting pain in Hindi textual data. The model is trained and tested using a novel dataset, दर्द-ए-शायरी (pronounced as Dard-e-Shayari) curated using posts from social media platforms. The results demonstrate the model's effectiveness, achieving an accuracy of 70.5%. This pioneer research highlights the potential of utilizing textual data from diverse sources to identify and understand pain experiences based on psychosocial factors. This research could pave the path for the development of automated pain assessment tools that help medical professionals comprehend and treat pain in Hindi speaking populations. Additionally, it opens avenues to conduct further NLP-based multilingual pain detection research, addressing the needs of diverse language communities.

疼痛的自动评估对于开发有效的疼痛管理方法至关重要，这种方法既能缓解疼痛，又能保护患者的功能。基于转换器的模型可以利用其捕捉复杂语言模式和上下文信息的能力，帮助从社交媒体收集的印地语文本数据中检测疼痛。通过理解印地语文本的细微差别和上下文，转换器模型可以有效识别与疼痛相关的语言线索、情感和表达方式，从而检测和分析社交媒体帖子中与疼痛相关的内容。本研究旨在分析利用 NLP 技术自动识别印地语文本数据中疼痛的可行性，为印地语人群的疼痛评估提供有价值的工具。该研究展示了 HindiPainNet 模型，这是一个采用 IndicBERT 模型的深度神经网络，可将数据集分为两个类别标签 {pain, no_pain}，用于检测印地语文本数据中的疼痛。该模型使用一个新的数据集进行了训练和测试，该数据集是利用社交媒体平台上的帖子策划的दर्द-ए-शायरी（发音为 Dard-e-Shayari）。研究结果证明了该模型的有效性，准确率达到 70.5%。这项开创性的研究凸显了利用不同来源的文本数据来识别和理解基于社会心理因素的疼痛体验的潜力。这项研究可以为开发自动疼痛评估工具铺平道路，帮助医疗专业人员理解和治疗印地语人群的疼痛。此外，它还为进一步开展基于 NLP 的多语言疼痛检测研究开辟了道路，从而满足不同语言社区的需求。

{"title":"Am I hurt?: Evaluating Psychological Pain Detection in Hindi Text using Transformer-based Models","authors":"Ravleen Kaur, M. P. S. Bhatia, Akshi Kumar","doi":"10.1145/3650206","DOIUrl":"https://doi.org/10.1145/3650206","url":null,"abstract":"The automated evaluation of pain is critical for developing effective pain management approaches that seek to alleviate while preserving patients’ functioning. Transformer-based models can aid in detecting pain from Hindi text data gathered from social media by leveraging their ability to capture complex language patterns and contextual information. By understanding the nuances and context of Hindi text, transformer models can effectively identify linguistic cues, sentiment and expressions associated with pain enabling the detection and analysis of pain-related content present in social media posts. The purpose of this research is to analyse the feasibility of utilizing NLP techniques to automatically identify pain within Hindi textual data, providing a valuable tool for pain assessment in Hindi-speaking populations. The research showcases the HindiPainNet model, a deep neural network that employs the IndicBERT model, classifying the dataset into two class labels {pain, no_pain} for detecting pain in Hindi textual data. The model is trained and tested using a novel dataset, दर्द-ए-शायरी (pronounced as Dard-e-Shayari) curated using posts from social media platforms. The results demonstrate the model's effectiveness, achieving an accuracy of 70.5%. This pioneer research highlights the potential of utilizing textual data from diverse sources to identify and understand pain experiences based on psychosocial factors. This research could pave the path for the development of automated pain assessment tools that help medical professionals comprehend and treat pain in Hindi speaking populations. Additionally, it opens avenues to conduct further NLP-based multilingual pain detection research, addressing the needs of diverse language communities.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"108 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140034668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TransVAE-PAM: A Combined Transformer and DAG-based Approach for Enhanced Fake News Detection in Indian Context TransVAE-PAM：基于变压器和 DAG 的组合方法，用于增强印度背景下的假新闻检测能力

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-03-02 DOI: 10.1145/3651160

Shivani Tufchi, Tanveer Ahmed, Ashima Yadav, Krishna Kant Agrawal, Ankit Vidyarthi

In this study, we introduce a novel method, “TransVAE-PAM”, for the classification of fake news articles, tailored specifically for the Indian context. The approach capitalizes on state-of-the-art contextual and sentence transformer-based embedding models to generate article embeddings. Furthermore, we also try to address the issue of compact model size. In this respect, we employ a Variational Autoencoder (VAE) and β-VAE to reduce the dimensions of the embeddings, thereby yielding compact latent representations. To capture the thematic essence or important topics in the news articles, we use the Pachinko Allocation Model (PAM) model, a Directed Acyclic Graph (DAG) based approach, to generate meaningful topics. These two facets of representation - the reduced-dimension embeddings from the VAE and the extracted topics from the PAM model - are fused together to create a feature set. This representation is subsequently channeled into five different methods for fake news classification. Furthermore, we use eight distinct transformer-based architectures to test the embedding generation. To validate the feasibility of the proposed approach, we have conducted extensive experimentation on a proprietary dataset. The dataset is sourced from “Times of India” and other online media. Considering the size of the dataset, large-scale experiments are conducted on an NVIDIA supercomputer. Through this comprehensive numerical investigation, we have achieved an accuracy of 96.2% and an F1 score of 96% using the DistilBERT transformer architecture. By complementing the method via topic modeling, we record a performance improvement with the accuracy and F1 score both at 97%. These results indicate a promising direction toward leveraging the combination of advanced topic models into existing classification schemes to enhance research on fake news detection.

在本研究中，我们介绍了一种新方法 "TransVAE-PAM"，用于对假新闻文章进行分类，该方法专门针对印度的情况而定制。该方法利用最先进的基于上下文和句子转换器的嵌入模型来生成文章嵌入。此外，我们还尝试解决模型尺寸紧凑的问题。在这方面，我们采用了变异自动编码器（VAE）和 β-VAE 来减少嵌入的维度，从而生成紧凑的潜在表示。为了捕捉新闻文章中的主题本质或重要话题，我们使用了基于有向无环图（DAG）的柏青柯分配模型（PAM）来生成有意义的话题。这两方面的表征--来自 VAE 的降维嵌入和来自 PAM 模型的提取主题--被融合在一起以创建一个特征集。这一表征随后被导入五种不同的假新闻分类方法中。此外，我们还使用了八种不同的基于变换器的架构来测试嵌入生成。为了验证所提方法的可行性，我们在一个专有数据集上进行了广泛的实验。该数据集来自《印度时报》和其他网络媒体。考虑到数据集的规模，我们在英伟达超级计算机上进行了大规模实验。通过全面的数值研究，我们利用 DistilBERT 变换器架构实现了 96.2% 的准确率和 96% 的 F1 分数。通过主题建模对该方法进行补充，我们的准确率和 F1 分数均达到了 97%，性能得到了提高。这些结果表明，将先进的话题模型与现有的分类方案相结合，加强假新闻检测研究是一个很有前景的方向。

{"title":"TransVAE-PAM: A Combined Transformer and DAG-based Approach for Enhanced Fake News Detection in Indian Context","authors":"Shivani Tufchi, Tanveer Ahmed, Ashima Yadav, Krishna Kant Agrawal, Ankit Vidyarthi","doi":"10.1145/3651160","DOIUrl":"https://doi.org/10.1145/3651160","url":null,"abstract":"In this study, we introduce a novel method, “TransVAE-PAM”, for the classification of fake news articles, tailored specifically for the Indian context. The approach capitalizes on state-of-the-art contextual and sentence transformer-based embedding models to generate article embeddings. Furthermore, we also try to address the issue of compact model size. In this respect, we employ a Variational Autoencoder (VAE) and β-VAE to reduce the dimensions of the embeddings, thereby yielding compact latent representations. To capture the thematic essence or important topics in the news articles, we use the Pachinko Allocation Model (PAM) model, a Directed Acyclic Graph (DAG) based approach, to generate meaningful topics. These two facets of representation - the reduced-dimension embeddings from the VAE and the extracted topics from the PAM model - are fused together to create a feature set. This representation is subsequently channeled into five different methods for fake news classification. Furthermore, we use eight distinct transformer-based architectures to test the embedding generation. To validate the feasibility of the proposed approach, we have conducted extensive experimentation on a proprietary dataset. The dataset is sourced from “Times of India” and other online media. Considering the size of the dataset, large-scale experiments are conducted on an NVIDIA supercomputer. Through this comprehensive numerical investigation, we have achieved an accuracy of 96.2% and an F1 score of 96% using the DistilBERT transformer architecture. By complementing the method via topic modeling, we record a performance improvement with the accuracy and F1 score both at 97%. These results indicate a promising direction toward leveraging the combination of advanced topic models into existing classification schemes to enhance research on fake news detection.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"119 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Opinion Mining on Social Media Text Using Optimized Deep Belief Networks 使用优化的深度信念网络对社交媒体文本进行观点挖掘

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-03-02 DOI: 10.1145/3649502

S. Vinayaga Vadivu, P. Nagaraj, B. S. Murugan

In the digital world, most people spend their leisure and precious time on social media networks such as Facebook, Twitter. Instagram, and so on. Moreover, users post their views of products, services, political parties on their social sites. This information is viewed by many other users and brands. With the aid of these posts and tweets, the emotions, polarities of users are extracted to obtain the opinion about products or services. To analyze these posts sentiment analysis or opinion mining techniques are applied. Subsequently, this field rapidly attracts many researchers to conduct their research work due to the availability of an enormous number of data on social media networks. Further, this method can also be used to analyze the text to extract the sentiments which are classified as moderate, neutral, low extreme, and high extreme. However, the extraction of sentiment is an arduous one from the social media datasets, since it includes formal and informal texts, emojis, symbols. Hence to extract the feature vector from the accessed social media datasets and to perform accurate classification to group the texts based on the appropriate sentiments we proposed a novel method known as, Deep Belief Network-based Dynamic Grouping-based Cooperative optimization method DBN based DGCO. Exploiting this method the data are preprocessed to attain the required format of text and henceforth the feature vectors are extracted by the ICS algorithm. Furthermore, the extracted datasets are classified and grouped into moderate, neutral, low extreme, and high extreme with DBN based DGCO method. For experimental analysis, we have taken two social media datasets and analyzed the performance of the proposed method in terms of performance metrics such as accuracy/precision, recall, F1 Score, and ROC with HEMOS, WOA-SITO, PDCNN, and NB-LSVC state-of-art works. The acquired accuracy/precision, recall, and F1 Score, of our proposed ICS-DBN-DGCO method, are 89%, 80%, 98.2%, respectively.

在数字世界里，大多数人都把闲暇和宝贵的时间花在社交媒体网络上，如 Facebook、Twitter、Instagram 等。Instagram 等。此外，用户还会在社交网站上发表他们对产品、服务和政党的看法。许多其他用户和品牌都会浏览这些信息。借助这些帖子和推文，可以提取用户的情绪和两极分化，从而获得对产品或服务的看法。为了分析这些帖子，需要应用情感分析或意见挖掘技术。随后，由于社交媒体网络上存在大量数据，这一领域迅速吸引了许多研究人员开展研究工作。此外，这种方法还可用于分析文本以提取情感，情感可分为温和、中性、低极端和高极端。然而，从社交媒体数据集中提取情感是一项艰巨的工作，因为其中包括正式和非正式文本、表情符号和符号。因此，为了从访问的社交媒体数据集中提取特征向量，并根据适当的情感对文本进行准确分类，我们提出了一种新方法，即基于深度信网络的动态分组合作优化方法 DBN based DGCO。利用这种方法对数据进行预处理，以获得所需的文本格式，然后通过 ICS 算法提取特征向量。此外，还利用基于 DBN 的 DGCO 方法将提取的数据集分类并分为中等、中性、低极端和高极端。在实验分析中，我们选取了两个社交媒体数据集，并与 HEMOS、WOA-SITO、PDCNN 和 NB-LSVC 等最先进的作品一起，从准确率/精度、召回率、F1 分数和 ROC 等性能指标方面分析了所提方法的性能。我们提出的 ICS-DBN-DGCO 方法获得的准确率/精确率、召回率和 F1 分数分别为 89%、80% 和 98.2%。

{"title":"Opinion Mining on Social Media Text Using Optimized Deep Belief Networks","authors":"S. Vinayaga Vadivu, P. Nagaraj, B. S. Murugan","doi":"10.1145/3649502","DOIUrl":"https://doi.org/10.1145/3649502","url":null,"abstract":"In the digital world, most people spend their leisure and precious time on social media networks such as Facebook, Twitter. Instagram, and so on. Moreover, users post their views of products, services, political parties on their social sites. This information is viewed by many other users and brands. With the aid of these posts and tweets, the emotions, polarities of users are extracted to obtain the opinion about products or services. To analyze these posts sentiment analysis or opinion mining techniques are applied. Subsequently, this field rapidly attracts many researchers to conduct their research work due to the availability of an enormous number of data on social media networks. Further, this method can also be used to analyze the text to extract the sentiments which are classified as moderate, neutral, low extreme, and high extreme. However, the extraction of sentiment is an arduous one from the social media datasets, since it includes formal and informal texts, emojis, symbols. Hence to extract the feature vector from the accessed social media datasets and to perform accurate classification to group the texts based on the appropriate sentiments we proposed a novel method known as, Deep Belief Network-based Dynamic Grouping-based Cooperative optimization method DBN based DGCO. Exploiting this method the data are preprocessed to attain the required format of text and henceforth the feature vectors are extracted by the ICS algorithm. Furthermore, the extracted datasets are classified and grouped into moderate, neutral, low extreme, and high extreme with DBN based DGCO method. For experimental analysis, we have taken two social media datasets and analyzed the performance of the proposed method in terms of performance metrics such as accuracy/precision, recall, F1 Score, and ROC with HEMOS, WOA-SITO, PDCNN, and NB-LSVC state-of-art works. The acquired accuracy/precision, recall, and F1 Score, of our proposed ICS-DBN-DGCO method, are 89%, 80%, 98.2%, respectively.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"79 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140034763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Survey of Knowledge Enhanced Pre-trained Language Models 知识增强型预训练语言模型调查

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-03-01 DOI: 10.1145/3631392

Jian Yang, Xinyu Hu, Gang Xiao, Yulong Shen

Pre-trained language models learn informative word representations on a large-scale text corpus through self-supervised learning, which has achieved promising performance in fields of natural language processing (NLP) after fine-tuning. These models, however, suffer from poor robustness and lack of interpretability. We refer to pre-trained language models with knowledge injection as knowledge-enhanced pre-trained language models (KEPLMs). These models demonstrate deep understanding and logical reasoning and introduce interpretability. In this survey, we provide a comprehensive overview of KEPLMs in NLP. We first discuss the advancements in pre-trained language models and knowledge representation learning. Then we systematically categorize existing KEPLMs from three different perspectives. Finally, we outline some potential directions of KEPLMs for future research.

预训练语言模型通过自我监督学习在大规模文本语料库上学习信息词表征，经过微调后在自然语言处理（NLP）领域取得了可喜的成绩。然而，这些模型存在鲁棒性差和缺乏可解释性的问题。我们将注入知识的预训练语言模型称为知识增强预训练语言模型（KEPLM）。这些模型展示了深度理解和逻辑推理能力，并引入了可解释性。在本调查报告中，我们将全面综述知识增强预训练语言模型在 NLP 中的应用。我们首先讨论了预训练语言模型和知识表示学习的进展。然后，我们从三个不同的角度对现有的 KEPLM 进行了系统分类。最后，我们概述了 KEPLMs 未来研究的一些潜在方向。

引用次数: 0

Exploration on Advanced Intelligent Algorithms of Artificial Intelligence for Verb Recognition in Machine Translation 机器翻译中动词识别的人工智能高级算法探讨

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-02-28 DOI: 10.1145/3649891

Qinghua Ai, Qingyan Ai, Jun Wang

This article aimed to address the problems of word order confusion, context dependency, and ambiguity in traditional machine translation (MT) methods for verb recognition. By applying advanced intelligent algorithms of artificial intelligence, verb recognition can be better processed and the quality and accuracy of MT can be improved. Based on Neural machine translation (NMT), basic attention mechanisms, historical attention information, dynamically obtain information related to the generated words, and constraint mechanisms were introduced to embed semantic information, represent polysemy, and annotate semantic roles of verbs. This article used the Workshop on machine translation (WMT), British National Corpus (BNC), Gutenberg, Reuters Corpus, OpenSubtitles corpus, and enhanced the data in the corpus. The improved NMT model was compared with traditional NMT models, Rule Based machine translation (RBMT), and Statistical machine translation (SMT). The experimental results showed that the average verb semantic matching degree of the improved NMT model in 5 corpora was 0.85, and the average Bilingual Evaluation Understudy (BLEU) score in 5 corpora was 0.90. The improved NMT model in this article can effectively improve the accuracy of verb recognition in MT, providing new methods for verb recognition in MT.

本文旨在解决传统机器翻译（MT）方法在动词识别中存在的词序混乱、语境依赖和歧义等问题。通过应用人工智能的先进智能算法，可以更好地处理动词识别，提高 MT 的质量和准确性。在神经机器翻译（NMT）的基础上，引入了基本关注机制、历史关注信息、动态获取生成词相关信息和约束机制，以嵌入语义信息、表示多义词和注释动词的语义角色。本文使用了机器翻译研讨会（WMT）、英国国家语料库（BNC）、古腾堡语料库、路透社语料库、OpenSubtitles 语料库，并对语料库中的数据进行了增强。改进后的 NMT 模型与传统 NMT 模型、基于规则的机器翻译（RBMT）和统计机器翻译（SMT）进行了比较。实验结果表明，改进后的 NMT 模型在 5 个语料库中的平均动词语义匹配度为 0.85，在 5 个语料库中的平均双语评估（BLEU）得分为 0.90。本文改进的 NMT 模型能有效提高 MT 中动词识别的准确率，为 MT 中的动词识别提供了新的方法。

{"title":"Exploration on Advanced Intelligent Algorithms of Artificial Intelligence for Verb Recognition in Machine Translation","authors":"Qinghua Ai, Qingyan Ai, Jun Wang","doi":"10.1145/3649891","DOIUrl":"https://doi.org/10.1145/3649891","url":null,"abstract":"This article aimed to address the problems of word order confusion, context dependency, and ambiguity in traditional machine translation (MT) methods for verb recognition. By applying advanced intelligent algorithms of artificial intelligence, verb recognition can be better processed and the quality and accuracy of MT can be improved. Based on Neural machine translation (NMT), basic attention mechanisms, historical attention information, dynamically obtain information related to the generated words, and constraint mechanisms were introduced to embed semantic information, represent polysemy, and annotate semantic roles of verbs. This article used the Workshop on machine translation (WMT), British National Corpus (BNC), Gutenberg, Reuters Corpus, OpenSubtitles corpus, and enhanced the data in the corpus. The improved NMT model was compared with traditional NMT models, Rule Based machine translation (RBMT), and Statistical machine translation (SMT). The experimental results showed that the average verb semantic matching degree of the improved NMT model in 5 corpora was 0.85, and the average Bilingual Evaluation Understudy (BLEU) score in 5 corpora was 0.90. The improved NMT model in this article can effectively improve the accuracy of verb recognition in MT, providing new methods for verb recognition in MT.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"1 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging Bidirectionl LSTM with CRFs for Pashto tagging 利用双向 LSTM 和 CRFs 进行普什图语标记

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-02-27 DOI: 10.1145/3649456

Farooq Zaman, Onaiza Maqbool, Jaweria Kanwal

Part-of-speech tagging plays a vital role in text processing and natural language understanding. Very few attempts have been made in the past for tagging Pashto Part-of-Speech. In this work, we present LSTM based approach for Pashto part-of-speech tagging with special focus on ambiguity resolution. Initially we created a corpus of Pashto sentences having words with multiple meanings and their tags. We introduce a powerful sentences representation and new architecture for Pashto text processing. The accuracy of the proposed approach is compared with state-of-the-art Hidden Markov Model. Our Model shows 87.60% accuracy for all words excluding punctuations and 95.45% for ambiguous words, on the other hand Hidden Markov Model shows 78.37% and 44.72% accuracy respectively. Results show that our approach outperform Hidden Markov Model in Part-of-Speech tagging for Pashto text.

语音部分标记在文本处理和自然语言理解中起着至关重要的作用。过去，很少有人尝试对普什图语部分语音进行标记。在这项工作中，我们提出了基于 LSTM 的普什图语部分语音标记方法，并特别关注歧义解决。最初，我们创建了一个普什图语句子语料库，其中包含多义词及其标签。我们为普什图语文本处理引入了强大的句子表示法和新架构。我们将所提出方法的准确率与最先进的隐马尔可夫模型进行了比较。我们的模型对所有单词（不包括标点符号）的准确率为 87.60%，对模糊单词的准确率为 95.45%，而隐马尔可夫模型的准确率分别为 78.37% 和 44.72%。结果表明，在普什图语文本的语音部分标记方面，我们的方法优于隐马尔可夫模型。

引用次数: 0

A Hybrid Scene Text Script Identification Network for regional Indian Languages 印度地区语言的混合场景文本脚本识别网络

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-02-24 DOI: 10.1145/3649439

Veronica Naosekpam, Nilkanta Sahu

In this work, we introduce WAFFNet, an attention-centric feature fusion architecture tailored for word-level multi-lingual scene text script identification. Motivated by the limitations of traditional approaches that rely exclusively on feature-based methods or deep learning strategies, our approach amalgamates statistical and deep features to bridge the gap. At the core of WAFFNet, we utilized the merits of Local Binary Pattern —a prominent descriptor capturing low-level texture features with high-dimensional, semantically-rich convolutional features. This fusion is judiciously augmented by a spatial attention mechanism, ensuring targeted emphasis on semantically critical regions of the input image. To address the class imbalance problem in multi-class classification scenarios, we employed a weighted objective function. This not only regularizes the learning process but also addresses the class imbalance problem. The architectural integrity of WAFFNet is preserved through an end-to-end training paradigm, leveraging transfer learning to expedite convergence and optimize performance metrics. Considering the under-representation of regional Indian languages in current datasets, we meticulously curated IIITG-STLI2023, a comprehensive dataset encapsulating English alongside six under-represented Indian languages: Hindi, Kannada, Malayalam, Telugu, Bengali, and Manipuri. Rigorous evaluation of the IIITG-STLI2023, as well as the established MLe2e and SIW-13 datasets, underscores WAFFNet’s supremacy over both traditional feature-engineering approaches as well as state-of-the-art deep learning frameworks. Thus, the proposed WAFFNet framework offers a robust and effective solution for language identification in scene text images.

在这项工作中，我们介绍了 WAFFNet，这是一种以注意力为中心的特征融合架构，专为词级多语言场景文本脚本识别而定制。由于传统方法完全依赖基于特征的方法或深度学习策略存在局限性，我们的方法融合了统计特征和深度特征，从而缩小了差距。在 WAFFNet 的核心中，我们利用了局部二进制模式（Local Binary Pattern）的优点，这是一种捕捉低级纹理特征的著名描述符，具有高维、语义丰富的卷积特征。空间注意力机制对这种融合进行了明智的补充，确保有针对性地强调输入图像的语义关键区域。为了解决多类分类场景中的类不平衡问题，我们采用了加权目标函数。这不仅规范了学习过程，还解决了类不平衡问题。WAFFNet 的结构完整性通过端到端训练范例得以保留，并利用迁移学习加快收敛速度和优化性能指标。考虑到印度地区语言在当前数据集中的代表性不足，我们精心策划了 IIITG-STLI2023，这是一个包含英语和六种代表性不足的印度语言的综合数据集：印地语、卡纳达语、马拉雅拉姆语、泰卢固语、孟加拉语和曼尼普尔语。对 IIITG-STLI2023 以及已建立的 MLe2e 和 SIW-13 数据集的严格评估，彰显了 WAFFNet 超越传统特征工程方法和最先进深度学习框架的优势。因此，所提出的 WAFFNet 框架为场景文本图像中的语言识别提供了一种稳健而有效的解决方案。

{"title":"A Hybrid Scene Text Script Identification Network for regional Indian Languages","authors":"Veronica Naosekpam, Nilkanta Sahu","doi":"10.1145/3649439","DOIUrl":"https://doi.org/10.1145/3649439","url":null,"abstract":"In this work, we introduce WAFFNet, an attention-centric feature fusion architecture tailored for word-level multi-lingual scene text script identification. Motivated by the limitations of traditional approaches that rely exclusively on feature-based methods or deep learning strategies, our approach amalgamates statistical and deep features to bridge the gap. At the core of WAFFNet, we utilized the merits of Local Binary Pattern —a prominent descriptor capturing low-level texture features with high-dimensional, semantically-rich convolutional features. This fusion is judiciously augmented by a spatial attention mechanism, ensuring targeted emphasis on semantically critical regions of the input image. To address the class imbalance problem in multi-class classification scenarios, we employed a weighted objective function. This not only regularizes the learning process but also addresses the class imbalance problem. The architectural integrity of WAFFNet is preserved through an end-to-end training paradigm, leveraging transfer learning to expedite convergence and optimize performance metrics. Considering the under-representation of regional Indian languages in current datasets, we meticulously curated IIITG-STLI2023, a comprehensive dataset encapsulating English alongside six under-represented Indian languages: Hindi, Kannada, Malayalam, Telugu, Bengali, and Manipuri. Rigorous evaluation of the IIITG-STLI2023, as well as the established MLe2e and SIW-13 datasets, underscores WAFFNet’s supremacy over both traditional feature-engineering approaches as well as state-of-the-art deep learning frameworks. Thus, the proposed WAFFNet framework offers a robust and effective solution for language identification in scene text images.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"171 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139949859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Natural Language Processing System for Text Classification Corpus Based on Machine Learning 基于机器学习的文本分类语料库自然语言处理系统

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-02-19 DOI: 10.1145/3648361

Yawen Su

A classification system for hazardous materials in air traffic control was investigated using the Human Factors Analysis and Classification System (HFACS) framework and natural language processing to prevent hazardous situations in air traffic control. Based on the development of the HFACS standard, an air traffic control hazard classification system will be created. The dangerous data of the aviation safety management system is selected by dead bodies, classified and marked in 5 levels. TFIDF TextRank text classification method based on key content extraction and text classification model based on CNN and BERT model were used in the experiment to solve the problem of small samples, many labels and random samples in hazardous environment of air pollution control. The results show that the total cost of model training time and classification accuracy is the highest when the keywords are around 8. As the number of points increases, the time spent in dimensioning decreases and affects accuracy. When the number of points reaches about 93, the time spent in determining the size increases, but the accuracy of the allocation remains close to 0.7, but the increase in the value of time leads to a decrease in the total cost. It has been proven that extracting key content can solve text classification problems for small companies and contribute to further research in the development of security systems.

利用人为因素分析和分类系统（HFACS）框架和自然语言处理技术，对空中交通管制中的危险品分类系统进行了研究，以防止空中交通管制中出现危险情况。在制定 HFACS 标准的基础上，将创建一个空中交通管制危险分类系统。航空安全管理系统中的危险数据通过死体筛选、分类并标记为 5 个等级。实验中采用了基于关键内容提取的 TFIDF TextRank 文本分类方法和基于 CNN 和 BERT 模型的文本分类模型，解决了空管危险环境中样本少、标签多、随机样本多的问题。结果表明，当关键词数在 8 个左右时，模型训练时间总成本和分类准确率最高。随着点数的增加，维度计算所花费的时间会减少，并影响准确率。当点的数量达到 93 个左右时，确定尺寸所花费的时间会增加，但分配的准确率仍然接近 0.7，但时间值的增加导致总成本的降低。事实证明，提取关键内容可以解决小型公司的文本分类问题，并有助于进一步研究安全系统的开发。

{"title":"A Natural Language Processing System for Text Classification Corpus Based on Machine Learning","authors":"Yawen Su","doi":"10.1145/3648361","DOIUrl":"https://doi.org/10.1145/3648361","url":null,"abstract":"A classification system for hazardous materials in air traffic control was investigated using the Human Factors Analysis and Classification System (HFACS) framework and natural language processing to prevent hazardous situations in air traffic control. Based on the development of the HFACS standard, an air traffic control hazard classification system will be created. The dangerous data of the aviation safety management system is selected by dead bodies, classified and marked in 5 levels. TFIDF TextRank text classification method based on key content extraction and text classification model based on CNN and BERT model were used in the experiment to solve the problem of small samples, many labels and random samples in hazardous environment of air pollution control. The results show that the total cost of model training time and classification accuracy is the highest when the keywords are around 8. As the number of points increases, the time spent in dimensioning decreases and affects accuracy. When the number of points reaches about 93, the time spent in determining the size increases, but the accuracy of the allocation remains close to 0.7, but the increase in the value of time leads to a decrease in the total cost. It has been proven that extracting key content can solve text classification problems for small companies and contribute to further research in the development of security systems.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"25 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139910190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SCT:Summary Caption Technique for Retrieving Relevant Images in Alignment with Multimodal Abstractive Summary SCT：根据多模态摘要检索相关图像的摘要标题技术

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-02-17 DOI: 10.1145/3645029

Shaik Rafi, Ranjita Das

This work proposes an efficient Summary Caption Technique(SCT) which considers the multimodal summary and image captions as input to retrieve the correspondence images from the captions that are highly influential to the multimodal summary. Matching a multimodal summary with an appropriate image is a challenging task in computer vision (CV) and natural language processing (NLP) fields. Merging of these fields are tedious, though the research community has steadily focused on the cross-modal retrieval. These issues include the visual question-answering, matching queries with the images, and semantic relationship matching between two modalities for retrieving the corresponding image. Relevant works consider in questions to match the relationship of visual information, object detection and to match the text with visual information, and employing structural-level representation to align the images with the text. However, these techniques are primarily focused on retrieving the images to text or for the image captioning. But less effort has been spent on retrieving relevant images for the multimodal summary. Hence, our proposed technique extracts and merge features in Hybrid Image Text(HIT) layer and captions in the semantic embeddings with word2vec where the contextual features and semantic relationships are compared and matched with each vector between the modalities, with cosine semantic similarity. In cross-modal retrieval, we achieve top five related images and align the relevant images to the multimodal summary that achieves the highest cosine score among the retrieved images. The model has been trained with seq-to-seq modal with 100 epochs, besides reducing the information loss by the sparse categorical cross entropy. Further, experimenting with the multimodal summarization with multimodal output dataset (MSMO), in cross-modal retrieval, helps to evaluate the quality of image alignment with an image-precision metric that demonstrate the best results.

本作品提出了一种高效的摘要标题技术（SCT），该技术将多模态摘要和图像标题作为输入，从对多模态摘要影响较大的标题中检索对应图像。在计算机视觉（CV）和自然语言处理（NLP）领域，将多模态摘要与合适的图像相匹配是一项具有挑战性的任务。虽然研究界一直在关注跨模态检索，但将这些领域合并在一起是一项繁琐的工作。这些问题包括视觉问题解答、查询与图像匹配，以及两种模式之间的语义关系匹配以检索相应图像。相关作品考虑了在问题中匹配视觉信息的关系、对象检测并将文本与视觉信息匹配，以及采用结构层表示法将图像与文本对齐。然而，这些技术主要集中在检索图像到文本或图像标题。但在为多模态摘要检索相关图像方面所做的努力较少。因此，我们提出的技术通过 word2vec 提取并合并混合图像文本（HIT）层中的特征和语义嵌入中的标题，其中上下文特征和语义关系通过余弦语义相似性与模态之间的每个向量进行比较和匹配。在跨模态检索中，我们获得前五张相关图像，并将相关图像对齐到检索图像中余弦分数最高的多模态摘要。除了通过稀疏分类交叉熵减少信息损失外，该模型还经过了 100 次序列到序列模态训练。此外，在跨模态检索中使用多模态输出数据集（MSMO）进行多模态总结实验，有助于通过图像精度指标评估图像对齐的质量，从而展示最佳结果。

{"title":"SCT:Summary Caption Technique for Retrieving Relevant Images in Alignment with Multimodal Abstractive Summary","authors":"Shaik Rafi, Ranjita Das","doi":"10.1145/3645029","DOIUrl":"https://doi.org/10.1145/3645029","url":null,"abstract":"This work proposes an efficient Summary Caption Technique(SCT) which considers the multimodal summary and image captions as input to retrieve the correspondence images from the captions that are highly influential to the multimodal summary. Matching a multimodal summary with an appropriate image is a challenging task in computer vision (CV) and natural language processing (NLP) fields. Merging of these fields are tedious, though the research community has steadily focused on the cross-modal retrieval. These issues include the visual question-answering, matching queries with the images, and semantic relationship matching between two modalities for retrieving the corresponding image. Relevant works consider in questions to match the relationship of visual information, object detection and to match the text with visual information, and employing structural-level representation to align the images with the text. However, these techniques are primarily focused on retrieving the images to text or for the image captioning. But less effort has been spent on retrieving relevant images for the multimodal summary. Hence, our proposed technique extracts and merge features in Hybrid Image Text(HIT) layer and captions in the semantic embeddings with word2vec where the contextual features and semantic relationships are compared and matched with each vector between the modalities, with cosine semantic similarity. In cross-modal retrieval, we achieve top five related images and align the relevant images to the multimodal summary that achieves the highest cosine score among the retrieved images. The model has been trained with seq-to-seq modal with 100 epochs, besides reducing the information loss by the sparse categorical cross entropy. Further, experimenting with the multimodal summarization with multimodal output dataset (MSMO), in cross-modal retrieval, helps to evaluate the quality of image alignment with an image-precision metric that demonstrate the best results.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"7 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0