ACM Transactions on Asian and Low-Resource Language Information Processing最新文献_第3页

Share What You Already Know: Cross-Language-Script Transfer and Alignment for Sentiment Detection in Code-Mixed Data 分享您已掌握的知识：在混合代码数据中进行跨语言脚本转移和对齐以实现情感检测

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-04-27 DOI: 10.1145/3661307

Niraj Pahari, Kazutaka Shimada

Code-switching entails mixing multiple languages. It is an increasingly occurring phenomenon in social media texts. Usually, code-mixed texts are written in a single script, even though the languages involved have different scripts. Pre-trained multilingual models primarily utilize the data in the native script of the language. In existing studies, the code-switched texts are utilized as they are. However, using the native script for each language can generate better representations of the text owing to the pre-trained knowledge. Therefore, a cross-language-script knowledge sharing architecture utilizing the cross attention and alignment of the representations of text in individual language scripts was proposed in this study. Experimental results on two different datasets containing Nepali-English and Hindi-English code-switched texts, demonstrate the effectiveness of the proposed method. The interpretation of the model using model explainability technique illustrates the sharing of language-specific knowledge between language-specific representations.

代码转换是指混合使用多种语言。这种现象在社交媒体文本中越来越常见。通常情况下，代码混合文本是用单一脚本书写的，尽管涉及的语言有不同的脚本。预训练的多语言模型主要利用语言的母语脚本数据。在现有的研究中，代码混合文本是按原样使用的。然而，由于预先训练的知识，使用每种语言的母语脚本可以生成更好的文本表述。因此，本研究提出了一种跨语言脚本知识共享架构，该架构利用了各语言脚本中文本表征的交叉关注和对齐。在包含尼泊尔语-英语和印地语-英语代码转换文本的两个不同数据集上的实验结果证明了所提方法的有效性。使用模型可解释性技术对模型的解释说明了特定语言表征之间特定语言知识的共享。

引用次数: 0

HumourHindiNet: Humour detection in Hindi web series using word embedding and convolutional neural network HumourHindiNet：利用单词嵌入和卷积神经网络检测印地语网络剧中的幽默内容

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-04-27 DOI: 10.1145/3661306

Akshi Kumar, Abhishek Mallik, Sanjay Kumar

Humour is a crucial aspect of human speech, and it is, therefore, imperative to create a system that can offer such detection. While data regarding humour in English speech is plentiful, the same cannot be said for a low-resource language like Hindi. Through this paper, we introduce two multimodal datasets for humour detection in the Hindi web series. The dataset was collected from over 500 minutes of conversations amongst the characters of the Hindi web series (Kota-Factory) and (Panchayat). Each dialogue is manually annotated as Humour or Non-Humour. Along with presenting a new Hindi language-based Humour detection dataset, we propose an improved framework for detecting humour in Hindi conversations. We start by preprocessing both datasets to obtain uniformity across the dialogues and datasets. The processed dialogues are then passed through the Skip-gram model for generating Hindi word embedding. The generated Hindi word embedding is then passed onto three convolutional neural network (CNN) architectures simultaneously, each having a different filter size for feature extraction. The extracted features are then passed through stacked Long Short-Term Memory (LSTM) layers for further processing and finally classifying the dialogues as Humour or Non-Humour. We conduct intensive experiments on both proposed Hindi datasets and evaluate several standard performance metrics. The performance of our proposed framework was also compared with several baselines and contemporary algorithms for Humour detection. The results demonstrate the effectiveness of our dataset to be used as a standard dataset for Humour detection in the Hindi web series. The proposed model yields an accuracy of 91.79 and 87.32 while an F1 score of 91.64 and 87.04 in percentage for the (Kota-Factory) and (Panchayat) datasets, respectively.

幽默是人类语言的一个重要方面，因此，创建一个能够提供幽默检测的系统势在必行。虽然有关英语语音中幽默的数据非常丰富，但像印地语这样的低资源语言却不尽相同。通过本文，我们介绍了两个用于检测印地语网络系列中幽默的多模态数据集。该数据集是从印地语网络系列剧（Kota-Factory）和（Panchayat）角色之间超过 500 分钟的对话中收集的。每段对话都被人工标注为 "幽默 "或 "非幽默"。在提出基于印地语的新幽默检测数据集的同时，我们还提出了一个改进的框架，用于检测印地语对话中的幽默。我们首先对两个数据集进行预处理，以获得对话和数据集的一致性。然后，将处理过的对话通过 Skip-gram 模型生成印地语单词嵌入。生成的印地语单词嵌入会同时传递给三个卷积神经网络（CNN）架构，每个架构都有不同的滤波器大小，用于提取特征。提取的特征随后通过堆叠的长短期记忆（LSTM）层进行进一步处理，并最终将对话分类为 "幽默"（Humour）或 "非幽默"（Non-Humour）。我们在两个拟议的印地语数据集上进行了深入实验，并评估了多个标准性能指标。我们提出的框架的性能还与若干基准和当代幽默检测算法进行了比较。结果表明，我们的数据集可以有效地用作印地语网络系列中幽默检测的标准数据集。所提出的模型在（Kota-Factory）和（Panchayat）数据集上的准确率分别为91.79和87.32，F1得分分别为91.64和87.04。

{"title":"HumourHindiNet: Humour detection in Hindi web series using word embedding and convolutional neural network","authors":"Akshi Kumar, Abhishek Mallik, Sanjay Kumar","doi":"10.1145/3661306","DOIUrl":"https://doi.org/10.1145/3661306","url":null,"abstract":"Humour is a crucial aspect of human speech, and it is, therefore, imperative to create a system that can offer such detection. While data regarding humour in English speech is plentiful, the same cannot be said for a low-resource language like Hindi. Through this paper, we introduce two multimodal datasets for humour detection in the Hindi web series. The dataset was collected from over 500 minutes of conversations amongst the characters of the Hindi web series (Kota-Factory) and (Panchayat). Each dialogue is manually annotated as Humour or Non-Humour. Along with presenting a new Hindi language-based Humour detection dataset, we propose an improved framework for detecting humour in Hindi conversations. We start by preprocessing both datasets to obtain uniformity across the dialogues and datasets. The processed dialogues are then passed through the Skip-gram model for generating Hindi word embedding. The generated Hindi word embedding is then passed onto three convolutional neural network (CNN) architectures simultaneously, each having a different filter size for feature extraction. The extracted features are then passed through stacked Long Short-Term Memory (LSTM) layers for further processing and finally classifying the dialogues as Humour or Non-Humour. We conduct intensive experiments on both proposed Hindi datasets and evaluate several standard performance metrics. The performance of our proposed framework was also compared with several baselines and contemporary algorithms for Humour detection. The results demonstrate the effectiveness of our dataset to be used as a standard dataset for Humour detection in the Hindi web series. The proposed model yields an accuracy of 91.79 and 87.32 while an F1 score of 91.64 and 87.04 in percentage for the (Kota-Factory) and (Panchayat) datasets, respectively.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"7 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Interaction-Design Method Based upon a Modified Algorithm of Newton's Second Law of Motion 基于牛顿第二运动定律修正算法的交互设计方法

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-04-20 DOI: 10.1145/3657634

Qiao Feng, Tian Huang

Newton's Second Law of Motion algorithm is crucial to interactive visual effects and interactive behavior in interface design. Designers can only utilize simple algorithm templates in interface design since they lack organized mathematical science, especially programming. Directly using Newton's Second Law of Motion algorithm introduces two interface design issues. First, the created picture has a simplistic impact, laborious interaction, too few interactive parts, and boring visual effects. Second, using this novel approach directly to interface design reduces creativity, originality, and cognitive inertia. This study suggests a Newton's Second Law-based algorithm modification. It provides a novel algorithm application idea and a design strategy based on algorithm change to enable new interface design. Algorithm design gives interface design a new viewpoint and improves content production. In the arithmetic process of Newton's second law of motion algorithm, the introduction of repulsive force, reset force, shape, color and other attributes of interactive objects, and the integration of other algorithms to transform its basic arithmetic logic, which is conducive to the improvement of the visual effect of interaction design. It also improves users' interaction experiences, sentiments, and desire to participate with design work.

牛顿第二运动定律算法对于界面设计中的交互视觉效果和交互行为至关重要。由于缺乏有条理的数学科学，尤其是编程，设计师只能在界面设计中使用简单的算法模板。直接使用牛顿第二运动定律算法会带来两个界面设计问题。首先，设计出来的画面效果简单，交互费力，交互部分太少，视觉效果乏味。其次，直接使用这种新颖的方法进行界面设计会降低创造力、原创性和认知惰性。本研究提出了基于牛顿第二定律的算法修改建议。它提供了一种新颖的算法应用思路和基于算法变化的设计策略，从而实现新的界面设计。算法设计为界面设计提供了新的视角，改进了内容生产。在牛顿第二运动定律算法的运算过程中，引入交互对象的斥力、复位力、形状、颜色等属性，并融合其他算法改造其基本运算逻辑，有利于交互设计视觉效果的提升。同时，它还能改善用户的交互体验、情感和参与设计工作的欲望。

{"title":"An Interaction-Design Method Based upon a Modified Algorithm of Newton's Second Law of Motion","authors":"Qiao Feng, Tian Huang","doi":"10.1145/3657634","DOIUrl":"https://doi.org/10.1145/3657634","url":null,"abstract":"Newton's Second Law of Motion algorithm is crucial to interactive visual effects and interactive behavior in interface design. Designers can only utilize simple algorithm templates in interface design since they lack organized mathematical science, especially programming. Directly using Newton's Second Law of Motion algorithm introduces two interface design issues. First, the created picture has a simplistic impact, laborious interaction, too few interactive parts, and boring visual effects. Second, using this novel approach directly to interface design reduces creativity, originality, and cognitive inertia. This study suggests a Newton's Second Law-based algorithm modification. It provides a novel algorithm application idea and a design strategy based on algorithm change to enable new interface design. Algorithm design gives interface design a new viewpoint and improves content production. In the arithmetic process of Newton's second law of motion algorithm, the introduction of repulsive force, reset force, shape, color and other attributes of interactive objects, and the integration of other algorithms to transform its basic arithmetic logic, which is conducive to the improvement of the visual effect of interaction design. It also improves users' interaction experiences, sentiments, and desire to participate with design work.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"213 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140625576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An adaptive Dual Graph Convolution Fusion Network for Aspect-Based Sentiment Analysis 基于方面的情感分析的自适应双图卷积融合网络

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-04-17 DOI: 10.1145/3659579

Chunmei Wang, Yuan Luo, Chunli Meng, Feiniu Yuan

Aspect-based Sentiment Analysis (ABSA), also known as fine-grained sentiment analysis, aims to predict the sentiment polarity of specific aspect words in the sentence. Some studies have explored the semantic correlation between words in sentences through attention-based methods. Other studies have learned syntactic knowledge by using graph convolution networks to introduce dependency relations. These methods have achieved satisfactory results in the ABSA tasks. However, due to the complexity of language, effectively capturing semantic and syntactic knowledge remains a challenging research question. Therefore, we propose an Adaptive Dual Graph Convolution Fusion Network (AD-GCFN) for aspect-based sentiment analysis. This model uses two graph convolution networks: one for the semantic layer to learn semantic correlations by an attention mechanism, and the other for the syntactic layer to learn syntactic structure by dependency parsing. To reduce the noise caused by the attention mechanism, we designed a module that dynamically updates the graph structure information for adaptively aggregating node information. To effectively fuse semantic and syntactic information, we propose a cross-fusion module that uses the double random similarity matrix to obtain the syntactic features in the semantic space and the semantic features in the syntactic space, respectively. Additionally, we employ two regularizers to further improve the ability to capture semantic correlations. The orthogonal regularizer encourages the semantic layer to learn word semantics without overlap, while the differential regularizer encourages the semantic and syntactic layers to learn different parts. Finally, the experimental results on three benchmark datasets show that the AD-GCFN model is superior to the contrast models in terms of accuracy and macro-F1.

基于方面的情感分析（ABSA）又称细粒度情感分析，旨在预测句子中特定方面词的情感极性。一些研究通过基于注意力的方法探索了句子中词语之间的语义关联。其他研究则通过使用图卷积网络引入依赖关系来学习句法知识。这些方法在 ABSA 任务中取得了令人满意的结果。然而，由于语言的复杂性，有效捕捉语义和句法知识仍然是一个具有挑战性的研究问题。因此，我们提出了一种用于基于方面的情感分析的自适应双图卷积融合网络（AD-GCFN）。该模型使用了两个图卷积网络：一个用于语义层，通过注意力机制学习语义关联；另一个用于句法层，通过依赖解析学习句法结构。为了减少注意力机制造成的噪音，我们设计了一个动态更新图结构信息的模块，用于自适应聚合节点信息。为了有效融合语义和句法信息，我们提出了交叉融合模块，利用双随机相似矩阵分别获得语义空间中的句法特征和句法空间中的语义特征。此外，我们还采用了两个正则器来进一步提高捕捉语义相关性的能力。正交正则鼓励语义层学习词的语义而不重叠，而差分正则鼓励语义层和句法层学习不同的部分。最后，在三个基准数据集上的实验结果表明，AD-GCFN 模型在准确性和宏F1 方面优于对比模型。

{"title":"An adaptive Dual Graph Convolution Fusion Network for Aspect-Based Sentiment Analysis","authors":"Chunmei Wang, Yuan Luo, Chunli Meng, Feiniu Yuan","doi":"10.1145/3659579","DOIUrl":"https://doi.org/10.1145/3659579","url":null,"abstract":"Aspect-based Sentiment Analysis (ABSA), also known as fine-grained sentiment analysis, aims to predict the sentiment polarity of specific aspect words in the sentence. Some studies have explored the semantic correlation between words in sentences through attention-based methods. Other studies have learned syntactic knowledge by using graph convolution networks to introduce dependency relations. These methods have achieved satisfactory results in the ABSA tasks. However, due to the complexity of language, effectively capturing semantic and syntactic knowledge remains a challenging research question. Therefore, we propose an Adaptive Dual Graph Convolution Fusion Network (AD-GCFN) for aspect-based sentiment analysis. This model uses two graph convolution networks: one for the semantic layer to learn semantic correlations by an attention mechanism, and the other for the syntactic layer to learn syntactic structure by dependency parsing. To reduce the noise caused by the attention mechanism, we designed a module that dynamically updates the graph structure information for adaptively aggregating node information. To effectively fuse semantic and syntactic information, we propose a cross-fusion module that uses the double random similarity matrix to obtain the syntactic features in the semantic space and the semantic features in the syntactic space, respectively. Additionally, we employ two regularizers to further improve the ability to capture semantic correlations. The orthogonal regularizer encourages the semantic layer to learn word semantics without overlap, while the differential regularizer encourages the semantic and syntactic layers to learn different parts. Finally, the experimental results on three benchmark datasets show that the AD-GCFN model is superior to the contrast models in terms of accuracy and macro-F1.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"33 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140611097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge-Enriched Prompt for Low-Resource Named Entity Recognition 低资源命名实体识别的知识丰富提示

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-04-17 DOI: 10.1145/3659948

Wenlong Hou, Weidong Zhao, Xianhui Liu, WenYan Guo

Named Entity Recognition (NER) in low-resource settings aims to identify and categorize entities in a sentence with limited labeled data. Although prompt-based methods have succeeded in low-resource perspectives, challenges persist in effectively harnessing information and optimizing computational efficiency. In this work, we present a novel prompt-based method to enhance low-resource NER without exhaustive template tuning. First, we construct knowledge-enriched prompts by integrating representative entities and background information to provide informative supervision tailored to each entity type. Then, we introduce an efficient reverse generative framework inspired by QA, which avoids redundant computations. Finally, We reduce costs by generating entities from their types while retaining model reasoning capacity. Experiment results demonstrate that our method outperforms other baselines on three datasets under few-shot settings.

低资源环境下的命名实体识别（NER）旨在利用有限的标记数据识别句子中的实体并对其进行分类。虽然基于提示的方法在低资源环境中取得了成功，但在有效利用信息和优化计算效率方面仍然存在挑战。在这项工作中，我们提出了一种新颖的基于提示的方法，无需详尽的模板调整即可增强低资源 NER。首先，我们通过整合具有代表性的实体和背景信息来构建知识丰富的提示，从而为每种实体类型提供量身定制的信息监督。然后，我们受 QA 的启发，引入了一个高效的反向生成框架，避免了冗余计算。最后，我们通过从实体类型生成实体来降低成本，同时保留模型推理能力。实验结果表明，我们的方法在三个数据集上的表现优于其他基线方法。

引用次数: 0

Online English Resource Integration Algorithm based on high-dimensional Mixed Attribute Data Mining 基于高维混合属性数据挖掘的在线英语资源整合算法

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-04-16 DOI: 10.1145/3657289

Zhiyu Zhou

To improve the scalability of resources and ensure the effective sharing and utilization of online English resources, an online English resource integration algorithm based on high-dimensional mixed-attribute data mining is proposed. First, an integration structure based on high-dimensional mixed-attribute data mining is constructed. According to this structure, the characteristics of online English resources are extracted, and historical data mining is carried out in combination with the spatial distribution characteristics of resources. In this way, the spatial mapping function of features is established, and the optimal clustering center is designed according to the clustering and fusion structure of online English resources. At this node, the clustering and fusion of online English resources are carried out. According to the fusion results, the distribution structure model of online English resources is constructed, and the optimization research of the integration algorithm of online English resources is carried out. The experimental results show that the integration optimization efficiency of the proposed algorithm is 89%, and the packet loss rate is 0.19%. It has good integration performance, and can realize the integration of multi-channel and various forms of online English resources.

为了提高资源的可扩展性，保证在线英语资源的有效共享和利用，提出了一种基于高维混合属性数据挖掘的在线英语资源整合算法。首先，构建基于高维混合属性数据挖掘的整合结构。根据该结构，提取在线英语资源的特征，结合资源的空间分布特征进行历史数据挖掘。这样就建立了特征的空间映射函数，并根据在线英语资源的聚类融合结构设计出最优聚类中心。在此节点上，进行在线英语资源的聚类与融合。根据融合结果，构建在线英语资源的分布结构模型，开展在线英语资源整合算法的优化研究。实验结果表明，所提算法的整合优化效率为 89%，丢包率为 0.19%。该算法具有良好的整合性能，可以实现多渠道、多种形式的在线英语资源整合。

{"title":"Online English Resource Integration Algorithm based on high-dimensional Mixed Attribute Data Mining","authors":"Zhiyu Zhou","doi":"10.1145/3657289","DOIUrl":"https://doi.org/10.1145/3657289","url":null,"abstract":"To improve the scalability of resources and ensure the effective sharing and utilization of online English resources, an online English resource integration algorithm based on high-dimensional mixed-attribute data mining is proposed. First, an integration structure based on high-dimensional mixed-attribute data mining is constructed. According to this structure, the characteristics of online English resources are extracted, and historical data mining is carried out in combination with the spatial distribution characteristics of resources. In this way, the spatial mapping function of features is established, and the optimal clustering center is designed according to the clustering and fusion structure of online English resources. At this node, the clustering and fusion of online English resources are carried out. According to the fusion results, the distribution structure model of online English resources is constructed, and the optimization research of the integration algorithm of online English resources is carried out. The experimental results show that the integration optimization efficiency of the proposed algorithm is 89%, and the packet loss rate is 0.19%. It has good integration performance, and can realize the integration of multi-channel and various forms of online English resources.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Personalized Explainable Recommendations for Self-Attention Collaboration 针对自我关注合作的个性化可解释建议

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-04-10 DOI: 10.1145/3657636

Yongfu Zha, Xuanxuan Che, Lina Sun, Yumin Dong

In recommender systems, providing reasonable explanations can enhance users’ comprehension of recommended results. Template-based explainable recommendation heavily relies on pre-defined templates, constraining the expressiveness of generated sentences and resulting in low-quality explanations. Recently, a novel approach was introduced, utilizing embedding representations of items and comments to address the issue of user IDs and item IDs not residing in the same semantic space as words, thus attributing linguistic meaning to IDs. However, these models often fail to fully exploit collaborative information within the data. In personalized recommendation and explanation processes, understanding the user’s emotional feedback and feature preferences is paramount. To address this, we propose a personalized explainable recommendation model based on self-attention collaboration. Initially, the model employs an attention network to amalgamate the user’s historical interaction feature preferences with their user ID information, while simultaneously integrating all feature information of the item with its item ID to enhance semantic ID representation. Subsequently, the model incorporates the user’s comment feature rhetoric and sentiment feedback to generate more personalized recommendation explanations utilizing a self-attention network. Experimental evaluations conducted on two datasets of varying scales demonstrate the superiority of our model over current state-of-the-art approaches, validating its effectiveness.

在推荐系统中，提供合理的解释可以增强用户对推荐结果的理解。基于模板的可解释推荐严重依赖预定义模板，限制了生成句子的表达能力，导致解释质量低下。最近，有人提出了一种新方法，利用项目和评论的嵌入表示法来解决用户 ID 和项目 ID 与单词不在同一语义空间的问题，从而为 ID 赋予语言意义。然而，这些模型往往无法充分利用数据中的协作信息。在个性化推荐和解释过程中，了解用户的情感反馈和特征偏好至关重要。为此，我们提出了一种基于自我注意力协作的个性化可解释推荐模型。首先，该模型利用注意力网络将用户的历史交互特征偏好与其用户 ID 信息整合在一起，同时将物品的所有特征信息与其物品 ID 整合在一起，以增强语义 ID 表示。随后，该模型结合用户的评论特征修辞和情感反馈，利用自我关注网络生成更加个性化的推荐解释。在两个不同规模的数据集上进行的实验评估表明，我们的模型优于目前最先进的方法，验证了其有效性。

{"title":"Personalized Explainable Recommendations for Self-Attention Collaboration","authors":"Yongfu Zha, Xuanxuan Che, Lina Sun, Yumin Dong","doi":"10.1145/3657636","DOIUrl":"https://doi.org/10.1145/3657636","url":null,"abstract":"In recommender systems, providing reasonable explanations can enhance users’ comprehension of recommended results. Template-based explainable recommendation heavily relies on pre-defined templates, constraining the expressiveness of generated sentences and resulting in low-quality explanations. Recently, a novel approach was introduced, utilizing embedding representations of items and comments to address the issue of user IDs and item IDs not residing in the same semantic space as words, thus attributing linguistic meaning to IDs. However, these models often fail to fully exploit collaborative information within the data. In personalized recommendation and explanation processes, understanding the user’s emotional feedback and feature preferences is paramount. To address this, we propose a personalized explainable recommendation model based on self-attention collaboration. Initially, the model employs an attention network to amalgamate the user’s historical interaction feature preferences with their user ID information, while simultaneously integrating all feature information of the item with its item ID to enhance semantic ID representation. Subsequently, the model incorporates the user’s comment feature rhetoric and sentiment feedback to generate more personalized recommendation explanations utilizing a self-attention network. Experimental evaluations conducted on two datasets of varying scales demonstrate the superiority of our model over current state-of-the-art approaches, validating its effectiveness.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"58 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Complexity Analysis of Chinese Text Based on the Construction Grammar Theory and Deep Learning 基于构造语法理论和深度学习的中文文本复杂性分析

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-04-10 DOI: 10.1145/3625390

Changlin Wu, Changan Wu

Due to the complexity of Chinese and the differences between Chinese and English, the application of Chinese text in the digital field has a certain complexity. Taking Chinese text in Open Relation Extraction (ORE) as the research object, the complexity of Chinese text is analyzed. An extraction system of word vectors based on construction grammar theory and Deep Learning (DL) is constructed to achieve smooth extraction of Chinese text. The work of this paper mainly includes the following aspects. To study the application of DL in the complexity analysis of Chinese text based on construction grammar, firstly, the connotation of construction grammar and its role in Chinese text analysis are explored. Secondly, from the perspective of the ORE of word vectors in language analysis, an ORE model based on word vectors is implemented. Moreover, an extraction method based on the distance of word vectors is proposed. The test results show that the F1 value of the proposed algorithm is 67% on the public WEB-500 and NYT-500 datasets, which is superior to other similar text extraction algorithms. When the recall rate is more than 30%, the accuracy of the proposed method is higher than several other latest language analysis systems. This indicates that the proposed Chinese text extraction system based on the DL algorithm and construction grammar theory has advantages in complexity analysis and can provide a new research idea for Chinese text analysis.

由于中文的复杂性和中英文的差异，中文文本在数字领域的应用具有一定的复杂性。以开放关系提取（ORE）中的中文文本为研究对象，分析了中文文本的复杂性。构建了基于构词法理论和深度学习（DL）的词向量提取系统，实现了中文文本的平滑提取。本文的工作主要包括以下几个方面。为了研究基于构词法的深度学习在中文文本复杂性分析中的应用，首先探讨了构词法的内涵及其在中文文本分析中的作用。其次，从语言分析中词向量ORE的角度出发，实现了基于词向量的ORE模型。此外，还提出了一种基于词向量距离的提取方法。测试结果表明，在公开的 WEB-500 和 NYT-500 数据集上，所提算法的 F1 值为 67%，优于其他同类文本提取算法。当召回率超过 30% 时，所提方法的准确率高于其他几种最新的语言分析系统。这表明基于 DL 算法和构造语法理论的中文文本提取系统在复杂性分析方面具有优势，可以为中文文本分析提供一种新的研究思路。

{"title":"Complexity Analysis of Chinese Text Based on the Construction Grammar Theory and Deep Learning","authors":"Changlin Wu, Changan Wu","doi":"10.1145/3625390","DOIUrl":"https://doi.org/10.1145/3625390","url":null,"abstract":"Due to the complexity of Chinese and the differences between Chinese and English, the application of Chinese text in the digital field has a certain complexity. Taking Chinese text in Open Relation Extraction (ORE) as the research object, the complexity of Chinese text is analyzed. An extraction system of word vectors based on construction grammar theory and Deep Learning (DL) is constructed to achieve smooth extraction of Chinese text. The work of this paper mainly includes the following aspects. To study the application of DL in the complexity analysis of Chinese text based on construction grammar, firstly, the connotation of construction grammar and its role in Chinese text analysis are explored. Secondly, from the perspective of the ORE of word vectors in language analysis, an ORE model based on word vectors is implemented. Moreover, an extraction method based on the distance of word vectors is proposed. The test results show that the F1 value of the proposed algorithm is 67% on the public WEB-500 and NYT-500 datasets, which is superior to other similar text extraction algorithms. When the recall rate is more than 30%, the accuracy of the proposed method is higher than several other latest language analysis systems. This indicates that the proposed Chinese text extraction system based on the DL algorithm and construction grammar theory has advantages in complexity analysis and can provide a new research idea for Chinese text analysis.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"121 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance of Binarization Algorithms on Tamizhi Inscription Images: An Analysis Tamizhi 铭文图像的二值化算法性能：分析

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-04-08 DOI: 10.1145/3656583

Monisha Munivel, V S Felix Enigo

Binarization of Tamizhi (Tamil-Brahmi) inscription images are highly challenging as it is captured from very old stone inscriptions that exists around 3rd century BCE in India. The difficulty is due to the degradation of these inscriptions by environmental factors and human negligence over ages. Though many works have been carried out in the binarization of inscription images, very few research was performed for inscription images and no work has been reported for binarization of inscriptions inscribed on irregular medium. The findings of the analysis hold true to all writings that are carved in irregular background. This paper reviews the performance of various binarization techniques on Tamizhi inscription images. Since no previous work was performed, we have applied the existing binarization algorithms on Tamizhi inscription images and analyzed the performance of these algorithms with proper reasoning. In future, we believe that this reasoning on the results will help a new researcher, to adapt or combine or devise new binarization techniques.

Tamizhi（泰米尔-婆罗米）碑文图像的二值化具有很高的挑战性，因为它是从印度公元前 3 世纪左右的非常古老的石刻中采集的。困难的原因在于这些碑文因环境因素和人类长期疏忽而退化。虽然已有许多研究对碑文图像进行了二值化处理，但针对碑文图像的研究却寥寥无几，而且还没有关于对刻写在不规则介质上的碑文进行二值化处理的研究报告。分析结果适用于所有刻在不规则背景上的文字。本文回顾了各种二值化技术在 Tamizhi 碑文图像上的表现。由于之前没有相关工作，我们将现有的二值化算法应用于 Tamizhi 碑文图像，并通过适当的推理分析了这些算法的性能。今后，我们相信这种对结果的推理将有助于新的研究人员调整、组合或设计新的二值化技术。

引用次数: 0

Automatic Extractive Text Summarization using Multiple Linguistic Features 利用多种语言特征自动提取文本摘要

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing

Pub Date : 2024-04-08 DOI: 10.1145/3656471

Pooja Gupta, Swati Nigam, Rajiv Singh

Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for automatically generating summaries of Hindi documents using extractive technique. The approach retrieves pertinent sentences from the source documents by employing multiple linguistic features and machine learning (ML) using maximum likelihood estimation (MLE) and maximum entropy (ME). We conducted pre-processing on the input documents, such as eliminating Hindi stop words and stemming. We have obtained 15 linguistic feature scores from each document to identify the phrases with high scores for summary generation. We have performed experiments over BBC News articles, CNN News, DUC 2004, Hindi Text Short Summarization Corpus, Indian Language News Text Summarization Corpus, and Wikipedia Articles for the proposed text summarizer. The Hindi Text Short Summarization Corpus and Indian Language News Text Summarization Corpus datasets are in Hindi, whereas BBC News articles, CNN News, and the DUC 2004 datasets have been translated into Hindi using Google, Microsoft Bing, and Systran translators for experiments. The summarization results have been calculated and shown for Hindi as well as for English to compare the performance of a low and rich-resource language. Multiple ROUGE metrics, along with precision, recall, and F-measure, have been used for the evaluation, which shows the better performance of the proposed method with multiple ROUGE scores. We compare the proposed method with the supervised and unsupervised machine learning methodologies, including support vector machine (SVM), Naive Bayes (NB), decision tree (DT), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and K-means clustering, and it was found that the proposed method outperforms these methods.

自动文本摘要（ATS）利用自然语言处理（NLP）对不同类别的信息进行摘要。印地语等低资源语言限制了这些技术的应用。本研究提出了一种使用提取技术自动生成印地语文档摘要的方法。该方法通过使用多种语言特征以及最大似然估计（MLE）和最大熵（ME）的机器学习（ML），从源文档中检索相关句子。我们对输入文档进行了预处理，如消除印地语停滞词和词干。我们从每篇文档中获取了 15 个语言特征分数，以识别出分数较高的短语，从而生成摘要。我们对 BBC 新闻报道、CNN 新闻、DUC 2004、印地语文本简短摘要语料库、印度语新闻文本摘要语料库和维基百科文章进行了实验，以验证所提议的文本摘要器。印地语文本简短摘要语料库和印度语新闻文本摘要语料库的数据集是印地语的，而 BBC News 文章、CNN News 和 DUC 2004 数据集已使用 Google、Microsoft Bing 和 Systran 翻译器翻译成印地语进行实验。计算并显示了印地语和英语的摘要结果，以比较低资源语言和丰富资源语言的性能。在评估中使用了多个 ROUGE 指标以及精确度、召回率和 F-measure，结果表明使用多个 ROUGE 分数的拟议方法性能更佳。我们将提出的方法与支持向量机 (SVM)、奈夫贝叶斯 (NB)、决策树 (DT)、潜在语义分析 (LSA)、潜在 Dirichlet 分配 (LDA) 和 K-means 聚类等有监督和无监督机器学习方法进行了比较，发现提出的方法优于这些方法。

{"title":"Automatic Extractive Text Summarization using Multiple Linguistic Features","authors":"Pooja Gupta, Swati Nigam, Rajiv Singh","doi":"10.1145/3656471","DOIUrl":"https://doi.org/10.1145/3656471","url":null,"abstract":"Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for automatically generating summaries of Hindi documents using extractive technique. The approach retrieves pertinent sentences from the source documents by employing multiple linguistic features and machine learning (ML) using maximum likelihood estimation (MLE) and maximum entropy (ME). We conducted pre-processing on the input documents, such as eliminating Hindi stop words and stemming. We have obtained 15 linguistic feature scores from each document to identify the phrases with high scores for summary generation. We have performed experiments over BBC News articles, CNN News, DUC 2004, Hindi Text Short Summarization Corpus, Indian Language News Text Summarization Corpus, and Wikipedia Articles for the proposed text summarizer. The Hindi Text Short Summarization Corpus and Indian Language News Text Summarization Corpus datasets are in Hindi, whereas BBC News articles, CNN News, and the DUC 2004 datasets have been translated into Hindi using Google, Microsoft Bing, and Systran translators for experiments. The summarization results have been calculated and shown for Hindi as well as for English to compare the performance of a low and rich-resource language. Multiple ROUGE metrics, along with precision, recall, and F-measure, have been used for the evaluation, which shows the better performance of the proposed method with multiple ROUGE scores. We compare the proposed method with the supervised and unsupervised machine learning methodologies, including support vector machine (SVM), Naive Bayes (NB), decision tree (DT), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and K-means clustering, and it was found that the proposed method outperforms these methods.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0