Computer Speech and Language最新文献_第8页

PaSCoNT - Parallel Speech Corpus of Northern-central Thai for automatic speech recognition PaSCoNT - 用于自动语音识别的泰语中北部平行语音库

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-07-22 DOI: 10.1016/j.csl.2024.101692

Supawat Taerungruang , Phimphaka Taninpong , Vataya Chunwijitra , Sumonmas Thatphithakkul , Sawit Kasuriya , Viroj Inthanon , Pawat Paksaranuwat , Salinee Thumronglaohapun , Nawapon Nakharutai , Papangkorn Inkeaw , Jakramate Bootkrajang

This paper proposed a Parallel Speech Corpus of Northern-central Thai (PaSCoNT). The purpose of this research is not only to understand the different linguistic characteristics between Northern and Central Thai, but also to utilize this corpus for automatic speech recognition. The corpus is composed of speech data from dialogues of daily life among northern Thai people. We designed 2,000 Northern Thai sentences covering all phonemes, in collaboration with linguists specialized in the Northern Thai dialect. The samples in this study are 200 Northern Thai dialect speakers who had been living in Chiang Mai province for more than 18 years. The speech was recorded in both open and closed environments. In the speech recording, each speaker must read 100 pairs of Northern-Central Thai sentences to ensure that the speech data comes from the same speaker. In total, 100 h of speech were recorded: 50 h of Northern Thai and 50 h of Central Thai. Overall, PaSCoNT consists of 907,832 words and 6,279 vocabulary items. Statistical analysis of the PaSCoNT corpus revealed that 49.64 % of words in the lexicon belongs to the Northern Thai dialect, 50.36 % from the Central Thai dialect, and 1,621 vocabulary items appeared in both Northern and Central Thai. Statistical analysis is used to examine the difference in speech tempo, i.e. time per phoneme (TTP), syllable per minute (SPM), between Northern and Central Thai. The results revealed that there were statistically significant differences speech tempo between Central and Northern Thai. The TTP speaking and articulation rate of Central Thai is lower than Northern Thai whereas SPM speaking and articulation rate of Central Thai is higher than Northern Thai. The results also showed that the ASR model training using Northern Thai speech corpus provides the lower WER% when testing using Northern Thai testing speech data and provides the higher WER% when testing using Central Thai Testing speech data and vice versa. However, the ASR model training using the PaSCoNT speech corpus provides the lower WER% for both Northern Thai and Central Thai testing speech data.

本文提出了泰语北部-中部平行语音语料库（PaSCoNT）。本研究的目的不仅在于了解泰语北部和中部的不同语言特点，还在于利用该语料库进行自动语音识别。该语料库由泰北人日常生活对话中的语音数据组成。我们与专门研究泰北方言的语言学家合作，设计了 2,000 个涵盖所有音素的泰北方言句子。本研究的样本是在清迈府生活了 18 年以上的 200 名讲泰北方言的人。语音记录在开放和封闭的环境中进行。在语音录制过程中，每位说话者必须朗读 100 对中北部泰语句子，以确保语音数据来自同一说话者。总共录制了 100 小时的语音：50 小时北部泰语，50 小时中部泰语。总体而言，PaSCoNT 包含 907,832 个单词和 6,279 个词汇项目。对 PaSCoNT 语料库进行统计分析后发现，词库中 49.64% 的单词属于泰北方言，50.36% 属于泰中方言，1,621 个词汇同时出现在泰北和泰中方言中。统计分析用于研究北部泰语和中部泰语在语音节奏上的差异，即每音素时间 (TTP) 和每分钟音节数 (SPM)。结果显示，中部泰语和北部泰语在语音节奏上存在显著的统计学差异。中部泰语的 TTP 说话和发音速度低于北部泰语，而中部泰语的 SPM 说话和发音速度高于北部泰语。结果还显示，使用泰北语测试语音数据进行测试时，使用泰北语语料库训练的 ASR 模型的 WER% 较低，而使用泰中语测试语音数据进行测试时的 WER% 较高，反之亦然。但是，使用 PaSCoNT 语音语料进行 ASR 模型训练时，泰北和泰中测试语音数据的 WER% 都较低。

{"title":"PaSCoNT - Parallel Speech Corpus of Northern-central Thai for automatic speech recognition","authors":"Supawat Taerungruang , Phimphaka Taninpong , Vataya Chunwijitra , Sumonmas Thatphithakkul , Sawit Kasuriya , Viroj Inthanon , Pawat Paksaranuwat , Salinee Thumronglaohapun , Nawapon Nakharutai , Papangkorn Inkeaw , Jakramate Bootkrajang","doi":"10.1016/j.csl.2024.101692","DOIUrl":"10.1016/j.csl.2024.101692","url":null,"abstract":"<div><p>This paper proposed a Parallel Speech Corpus of Northern-central Thai (PaSCoNT). The purpose of this research is not only to understand the different linguistic characteristics between Northern and Central Thai, but also to utilize this corpus for automatic speech recognition. The corpus is composed of speech data from dialogues of daily life among northern Thai people. We designed 2,000 Northern Thai sentences covering all phonemes, in collaboration with linguists specialized in the Northern Thai dialect. The samples in this study are 200 Northern Thai dialect speakers who had been living in Chiang Mai province for more than 18 years. The speech was recorded in both open and closed environments. In the speech recording, each speaker must read 100 pairs of Northern-Central Thai sentences to ensure that the speech data comes from the same speaker. In total, 100 h of speech were recorded: 50 h of Northern Thai and 50 h of Central Thai. Overall, PaSCoNT consists of 907,832 words and 6,279 vocabulary items. Statistical analysis of the PaSCoNT corpus revealed that 49.64 % of words in the lexicon belongs to the Northern Thai dialect, 50.36 % from the Central Thai dialect, and 1,621 vocabulary items appeared in both Northern and Central Thai. Statistical analysis is used to examine the difference in speech tempo, i.e. time per phoneme (TTP), syllable per minute (SPM), between Northern and Central Thai. The results revealed that there were statistically significant differences speech tempo between Central and Northern Thai. The TTP speaking and articulation rate of Central Thai is lower than Northern Thai whereas SPM speaking and articulation rate of Central Thai is higher than Northern Thai. The results also showed that the ASR model training using Northern Thai speech corpus provides the lower WER% when testing using Northern Thai testing speech data and provides the higher WER% when testing using Central Thai Testing speech data and vice versa. However, the ASR model training using the PaSCoNT speech corpus provides the lower WER% for both Northern Thai and Central Thai testing speech data.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101692"},"PeriodicalIF":3.1,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000755/pdfft?md5=f97afe2aa357037c83c6473c50174543&pid=1-s2.0-S0885230824000755-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141839086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public Figures 利用多任务学习实现仇恨言论检测的泛化：政治公众人物案例研究

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-07-17 DOI: 10.1016/j.csl.2024.101690

Lanqin Yuan, Marian-Andrei Rizoiu

Automatic identification of hateful and abusive content is vital in combating the spread of harmful online content and its damaging effects. Most existing works evaluate models by examining the generalization error on train–test splits on hate speech datasets. These datasets often differ in their definitions and labeling criteria, leading to poor generalization performance when predicting across new domains and datasets. This work proposes a new Multi-task Learning (MTL) pipeline that trains simultaneously across multiple hate speech datasets to construct a more encompassing classification model. Using a dataset-level leave-one-out evaluation (designating a dataset for testing and jointly training on all others), we trial the MTL detection on new, previously unseen datasets. Our results consistently outperform a large sample of existing work. We show strong results when examining the generalization error in train–test splits and substantial improvements when predicting on previously unseen datasets. Furthermore, we assemble a novel dataset, dubbed PubFigs, focusing on the problematic speech of American Public Political Figures. We crowdsource-label using Amazon MTurk more than 20,000 tweets and machine-label problematic speech in all the 305,235 tweets in PubFigs. We find that the abusive and hate tweeting mainly originates from right-leaning figures and relates to six topics, including Islam, women, ethnicity, and immigrants. We show that MTL builds embeddings that can simultaneously separate abusive from hate speech, and identify its topics.

自动识别仇恨和辱骂内容对于打击有害网络内容的传播及其破坏性影响至关重要。现有的大多数工作都是通过检查仇恨言论数据集上训练-测试分裂的泛化误差来评估模型的。这些数据集的定义和标记标准往往不同，导致在预测新领域和数据集时泛化性能较差。本研究提出了一种新的多任务学习（MTL）管道，可同时在多个仇恨言论数据集上进行训练，以构建一个更全面的分类模型。我们使用数据集级的 "留一弃一 "评估（指定一个数据集进行测试，并在所有其他数据集上进行联合训练），在以前未见过的新数据集上试用 MTL 检测。我们的结果始终优于大量现有工作。在对训练-测试分离的泛化误差进行检查时，我们显示出了很好的结果，而在对以前未见过的数据集进行预测时，我们的结果也有了很大的改进。此外，我们还建立了一个名为 PubFigs 的新数据集，重点关注美国公众政治人物的问题言论。我们使用亚马逊 MTurk 对 20,000 多条推文进行了众包标注，并对 PubFigs 中所有 305,235 条推文中的问题言论进行了机器标注。我们发现，辱骂性和仇恨性推文主要来自右倾人物，涉及伊斯兰教、妇女、种族和移民等六个主题。我们的研究表明，MTL 建立的嵌入可以同时区分辱骂性和仇恨性言论，并识别其主题。

{"title":"Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public Figures","authors":"Lanqin Yuan, Marian-Andrei Rizoiu","doi":"10.1016/j.csl.2024.101690","DOIUrl":"10.1016/j.csl.2024.101690","url":null,"abstract":"<div><p>Automatic identification of hateful and abusive content is vital in combating the spread of harmful online content and its damaging effects. Most existing works evaluate models by examining the generalization error on train–test splits on hate speech datasets. These datasets often differ in their definitions and labeling criteria, leading to poor generalization performance when predicting across new domains and datasets. This work proposes a new Multi-task Learning (MTL) pipeline that trains simultaneously across multiple hate speech datasets to construct a more encompassing classification model. Using a dataset-level leave-one-out evaluation (designating a dataset for testing and jointly training on all others), we trial the MTL detection on new, previously unseen datasets. Our results consistently outperform a large sample of existing work. We show strong results when examining the generalization error in train–test splits and substantial improvements when predicting on previously unseen datasets. Furthermore, we assemble a novel dataset, dubbed <span>PubFigs</span>, focusing on the problematic speech of American Public Political Figures. We crowdsource-label using Amazon MTurk more than 20,000 tweets and machine-label problematic speech in all the 305,235 tweets in <span>PubFigs</span>. We find that the abusive and hate tweeting mainly originates from right-leaning figures and relates to six topics, including Islam, women, ethnicity, and immigrants. We show that MTL builds embeddings that can simultaneously separate abusive from hate speech, and identify its topics.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101690"},"PeriodicalIF":3.1,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000731/pdfft?md5=e169fb47936a2284a9d518194884b197&pid=1-s2.0-S0885230824000731-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141853188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving text classification via computing category correlation matrix from text graph 通过计算文本图中的类别相关矩阵改进文本分类

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-07-09 DOI: 10.1016/j.csl.2024.101688

Zhen Zhang , Mengqiu Liu , Xiyuan Jia , Gongxun Miao , Xin Wang , Hao Ni , Guohua Wu

In text classification task, models have shown remarkable accuracy across various datasets. However, confusion often arises when certain categories within the dataset are too similar, causing misclassification of certain samples. This paper proposes an improved method for this problem, through the creation of a three-layer text graph for the corpus, which is used to calculate the Category Correlation Matrix (CCM). Additionally, this paper introduces category-adaptive contrastive learning for text embedding from the encoder, enhancing the model’s ability to distinguish between samples in confusable categories that are easily confused. Soft labels are generated using this matrix to guide the classifier, preventing the model from becoming overconfident with one-hot vectors. The efficacy of this approach was demonstrated through experimental evaluations on three text encoders and six different datasets.

在文本分类任务中，各种模型在各种数据集上都表现出了卓越的准确性。然而，当数据集中的某些类别过于相似时，往往会产生混淆，导致对某些样本的错误分类。本文针对这一问题提出了一种改进方法，即为语料库创建一个三层文本图，用于计算类别相关矩阵（CCM）。此外，本文还为编码器的文本嵌入引入了类别自适应对比学习，增强了模型区分易混淆类别样本的能力。利用该矩阵生成软标签来引导分类器，防止模型对单点向量过于自信。通过对三种文本编码器和六个不同数据集的实验评估，证明了这种方法的有效性。

{"title":"Improving text classification via computing category correlation matrix from text graph","authors":"Zhen Zhang , Mengqiu Liu , Xiyuan Jia , Gongxun Miao , Xin Wang , Hao Ni , Guohua Wu","doi":"10.1016/j.csl.2024.101688","DOIUrl":"10.1016/j.csl.2024.101688","url":null,"abstract":"<div><p>In text classification task, models have shown remarkable accuracy across various datasets. However, confusion often arises when certain categories within the dataset are too similar, causing misclassification of certain samples. This paper proposes an improved method for this problem, through the creation of a three-layer text graph for the corpus, which is used to calculate the Category Correlation Matrix (CCM). Additionally, this paper introduces category-adaptive contrastive learning for text embedding from the encoder, enhancing the model’s ability to distinguish between samples in confusable categories that are easily confused. Soft labels are generated using this matrix to guide the classifier, preventing the model from becoming overconfident with one-hot vectors. The efficacy of this approach was demonstrated through experimental evaluations on three text encoders and six different datasets.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101688"},"PeriodicalIF":3.1,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000718/pdfft?md5=936898b07abaca17411cf1265567ad9a&pid=1-s2.0-S0885230824000718-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141637623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

C-KGE: Curriculum learning-based Knowledge Graph Embedding C-KGE：基于课程学习的知识图谱嵌入

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-07-08 DOI: 10.1016/j.csl.2024.101689

Diange Zhou , Shengwen Li , Lijun Dong , Renyao Chen , Xiaoyue Peng , Hong Yao

Knowledge graph embedding (KGE) aims to embed entities and relations in knowledge graphs (KGs) into a continuous, low-dimensional vector space. It has been shown as an effective tool for integrating knowledge graphs to improve various intelligent applications, such as question answering and information extraction. However, previous KGE models ignore the hidden natural order of knowledge learning on learning the embeddings of entities and relations, leaving room for improvement in their performance. Inspired by the easy-to-hard pattern used in human knowledge learning, this paper proposes a Curriculum learning-based KGE (C-KGE) model, which learns the embeddings of entities and relations from “basic knowledge” to “domain knowledge”. Specifically, a seed set representing the basic knowledge and several knowledge subsets are identified from KG. Then, entity overlap is employed to score the learning difficulty of each subset. Finally, C-KGE trains the entities and relations in each subset according to the learning difficulty score of each subset. C-KGE leverages trained embeddings of the seed set as prior knowledge and learns knowledge subsets iteratively to transfer knowledge between the seed set and subsets, smoothing the learning process of knowledge facts. Experimental results on real-world datasets demonstrate that the proposed model achieves improved embedding performances as well as reducing training time. Our codes and data will be released later.

知识图谱嵌入（KGE）旨在将知识图谱（KG）中的实体和关系嵌入到一个连续的低维向量空间中。它已被证明是整合知识图谱以改进各种智能应用（如问题解答和信息提取）的有效工具。然而，以往的知识图谱模型在学习实体和关系的嵌入时忽略了知识学习的隐性自然顺序，因此其性能还有待提高。受人类知识学习从易到难模式的启发，本文提出了一种基于课程学习的 KGE（C-KGE）模型，该模型从 "基础知识 "到 "领域知识 "学习实体和关系的嵌入。具体来说，首先从 KGE 中识别出代表基础知识的种子集和若干知识子集。然后，利用实体重叠度对每个子集的学习难度进行评分。最后，C-KGE 根据每个子集中的学习难度评分，训练每个子集中的实体和关系。C-KGE 利用训练好的种子集嵌入作为先验知识，并迭代学习知识子集，在种子集和子集之间传递知识，从而平滑知识事实的学习过程。在实际数据集上的实验结果表明，所提出的模型不仅提高了嵌入性能，而且缩短了训练时间。我们的代码和数据将于稍后发布。

{"title":"C-KGE: Curriculum learning-based Knowledge Graph Embedding","authors":"Diange Zhou , Shengwen Li , Lijun Dong , Renyao Chen , Xiaoyue Peng , Hong Yao","doi":"10.1016/j.csl.2024.101689","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101689","url":null,"abstract":"<div><p>Knowledge graph embedding (KGE) aims to embed entities and relations in knowledge graphs (KGs) into a continuous, low-dimensional vector space. It has been shown as an effective tool for integrating knowledge graphs to improve various intelligent applications, such as question answering and information extraction. However, previous KGE models ignore the hidden natural order of knowledge learning on learning the embeddings of entities and relations, leaving room for improvement in their performance. Inspired by the easy-to-hard pattern used in human knowledge learning, this paper proposes a <strong>C</strong>urriculum learning-based <strong>KGE</strong> (C-KGE) model, which learns the embeddings of entities and relations from “basic knowledge” to “domain knowledge”. Specifically, a seed set representing the basic knowledge and several knowledge subsets are identified from KG. Then, entity overlap is employed to score the learning difficulty of each subset. Finally, C-KGE trains the entities and relations in each subset according to the learning difficulty score of each subset. C-KGE leverages trained embeddings of the seed set as prior knowledge and learns knowledge subsets iteratively to transfer knowledge between the seed set and subsets, smoothing the learning process of knowledge facts. Experimental results on real-world datasets demonstrate that the proposed model achieves improved embedding performances as well as reducing training time. Our codes and data will be released later.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101689"},"PeriodicalIF":3.1,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S088523082400072X/pdfft?md5=fb33df044eeec38fa247696a89eb8787&pid=1-s2.0-S088523082400072X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141607237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Seq2Seq dynamic planning network for progressive text generation 用于渐进文本生成的 Seq2Seq 动态规划网络

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-07-06 DOI: 10.1016/j.csl.2024.101687

Di Wu, Peng Cheng, Yuying Zheng

Long text generation is a hot topic in natural language processing. To address the problem of insufficient semantic representation and incoherent text generation in existing long text models, the Seq2Seq dynamic planning network progressive text generation model (DPPG-BART) is proposed. In the data pre-processing stage, the lexical division sorting algorithm is used. To obtain hierarchical sequences of keywords with clear information content, word weight values are calculated and ranked by TF-IDF of word embedding. To enhance the input representation, the dynamic planning progressive generation network is constructed. Positional features and word embedding vector features are integrated at the input side of the model. At the same time, to enrich the semantic information and expand the content of the text, the relevant concept words are generated by the concept expansion module. The scoring network and feedback mechanism are used to adjust the concept expansion module. Experimental results show that the DPPG-BART model is optimized over GPT2-S, GPT2-L, BART and ProGen-2 model approaches in terms of metric values of MSJ, B-BLEU and FBD on long text datasets from two different domains, CNN and Writing Prompts.

长文本生成是自然语言处理领域的一个热门话题。针对现有长文本模型中语义表征不足和文本生成不连贯的问题，提出了 Seq2Seq 动态规划网络渐进文本生成模型（DPPG-BART）。在数据预处理阶段，采用词性划分排序算法。为获得信息内容清晰的关键词分层序列，通过词嵌入的 TF-IDF 计算词权重值并进行排序。为增强输入表示，构建了动态规划渐进生成网络。在模型的输入端集成了位置特征和词嵌入向量特征。同时，为了丰富语义信息和扩展文本内容，概念扩展模块会生成相关的概念词。评分网络和反馈机制用于调整概念扩展模块。实验结果表明，在 CNN 和写作提示这两个不同领域的长文本数据集上，DPPG-BART 模型在 MSJ、B-BLEU 和 FBD 的度量值方面优于 GPT2-S、GPT2-L、BART 和 ProGen-2 模型方法。

{"title":"Seq2Seq dynamic planning network for progressive text generation","authors":"Di Wu, Peng Cheng, Yuying Zheng","doi":"10.1016/j.csl.2024.101687","DOIUrl":"10.1016/j.csl.2024.101687","url":null,"abstract":"<div><p>Long text generation is a hot topic in natural language processing. To address the problem of insufficient semantic representation and incoherent text generation in existing long text models, the Seq2Seq dynamic planning network progressive text generation model (DPPG-BART) is proposed. In the data pre-processing stage, the lexical division sorting algorithm is used. To obtain hierarchical sequences of keywords with clear information content, word weight values are calculated and ranked by TF-IDF of word embedding. To enhance the input representation, the dynamic planning progressive generation network is constructed. Positional features and word embedding vector features are integrated at the input side of the model. At the same time, to enrich the semantic information and expand the content of the text, the relevant concept words are generated by the concept expansion module. The scoring network and feedback mechanism are used to adjust the concept expansion module. Experimental results show that the DPPG-BART model is optimized over GPT2-S, GPT2-L, BART and ProGen-2 model approaches in terms of metric values of MSJ, B-BLEU and FBD on long text datasets from two different domains, CNN and Writing Prompts.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101687"},"PeriodicalIF":3.1,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000706/pdfft?md5=9c314286f96f095183826029b974049f&pid=1-s2.0-S0885230824000706-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141623113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modified R-BERT with global semantic information for relation classification task 利用全局语义信息进行关系分类任务的改良 R-BERT

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-07-06 DOI: 10.1016/j.csl.2024.101686

Yuhua Wang , Junying Hu , Yongli Su , Bo Zhang , Kai Sun , Hai Zhang

The objective of the relation classification task is to extract relations between entities. Recent studies have found that R-BERT (Wu and He, 2019) based on pre-trained BERT (Devlin et al., 2019) acquires extremely good results in the relation classification task. However, this method does not take into account the semantic differences between different kinds of entities and global semantic information either. In this paper, we set two different fully connected layers to take into account the semantic difference between subject and object entities. Besides, we build a new module named Concat Module to fully fuse the semantic information among the subject entity vector, object entity vector, and the whole sample sentence representation vector. In addition, we apply the average pooling to acquire a better entity representation of each entity and add the activation operation with a new fully connected layer after our Concat Module. Modifying R-BERT, we propose a new model named BERT with Global Semantic Information (GSR-BERT) for relation classification tasks. We use our approach on two datasets: the SemEval-2010 Task 8 dataset and the Chinese character relationship classification dataset. Our approach achieves a significant improvement over the two datasets. It means that our approach enjoys transferability across different datasets. Furthermore, we prove that these policies we used in our approach also enjoy applicability to named entity recognition task.

关系分类任务的目标是提取实体之间的关系。最近的研究发现，基于预训练 BERT（Devlin 等人，2019 年）的 R-BERT（Wu 和 He，2019 年）在关系分类任务中获得了非常好的结果。然而，这种方法也没有考虑到不同类型实体之间的语义差异和全局语义信息。在本文中，我们设置了两个不同的全连接层，以考虑主体和客体实体之间的语义差异。此外，我们还建立了一个名为 Concat Module 的新模块，以充分融合主语实体向量、宾语实体向量和整个样本句子表示向量之间的语义信息。此外，我们还应用了平均池化技术来获取每个实体的更好的实体表示，并在 Concat 模块之后添加了一个新的全连接层的激活操作。在 R-BERT 的基础上，我们为关系分类任务提出了一个新模型，名为 "全局语义信息 BERT"（GSR-BERT）。我们在两个数据集上使用了我们的方法：SemEval-2010 Task 8 数据集和汉字关系分类数据集。我们的方法在这两个数据集上取得了显著的改进。这意味着我们的方法可以在不同的数据集之间移植。此外，我们还证明了我们方法中使用的这些策略也适用于命名实体识别任务。

{"title":"Modified R-BERT with global semantic information for relation classification task","authors":"Yuhua Wang , Junying Hu , Yongli Su , Bo Zhang , Kai Sun , Hai Zhang","doi":"10.1016/j.csl.2024.101686","DOIUrl":"10.1016/j.csl.2024.101686","url":null,"abstract":"<div><p>The objective of the relation classification task is to extract relations between entities. Recent studies have found that R-BERT (Wu and He, 2019) based on pre-trained BERT (Devlin et al., 2019) acquires extremely good results in the relation classification task. However, this method does not take into account the semantic differences between different kinds of entities and global semantic information either. In this paper, we set two different fully connected layers to take into account the semantic difference between subject and object entities. Besides, we build a new module named Concat Module to fully fuse the semantic information among the subject entity vector, object entity vector, and the whole sample sentence representation vector. In addition, we apply the average pooling to acquire a better entity representation of each entity and add the activation operation with a new fully connected layer after our Concat Module. Modifying R-BERT, we propose a new model named BERT with Global Semantic Information (GSR-BERT) for relation classification tasks. We use our approach on two datasets: the SemEval-2010 Task 8 dataset and the Chinese character relationship classification dataset. Our approach achieves a significant improvement over the two datasets. It means that our approach enjoys transferability across different datasets. Furthermore, we prove that these policies we used in our approach also enjoy applicability to named entity recognition task.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101686"},"PeriodicalIF":3.1,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S088523082400069X/pdfft?md5=0315d6e108caefa08e405818e501bafd&pid=1-s2.0-S088523082400069X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141637622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge 第 7 届 CHiME 挑战赛 UDASE 任务中对语音增强方法的客观和主观评估

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-07-06 DOI: 10.1016/j.csl.2024.101685

Simon Leglaive , Matthieu Fraticelli , Hend ElGhazaly , Léonie Borne , Mostafa Sadeghi , Scott Wisdom , Manuel Pariente , John R. Hershey , Daniel Pressnitzer , Jon P. Barker

Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals. However, the synthetic training conditions may not accurately reflect real-world conditions encountered during testing. This discrepancy can result in poor performance when the test domain significantly differs from the synthetic training domain. To tackle this issue, the UDASE task of the 7th CHiME challenge aimed to leverage real-world noisy speech recordings from the test domain for unsupervised domain adaptation of speech enhancement models. Specifically, this test domain corresponds to the CHiME-5 dataset, characterized by real multi-speaker and conversational speech recordings made in noisy and reverberant domestic environments, for which ground-truth clean speech signals are not available. In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results. This analysis reveals a limited correlation between subjective ratings and several supervised nonintrusive performance metrics recently proposed for speech enhancement. Conversely, the results suggest that more traditional intrusive objective metrics can be used for in-domain performance evaluation using the reverberant LibriCHiME-5 dataset developed for the challenge. The subjective evaluation indicates that all systems successfully reduced the background noise, but always at the expense of increased distortion. Out of the four speech enhancement methods evaluated subjectively, only one demonstrated an improvement in overall quality compared to the unprocessed noisy speech, highlighting the difficulty of the task. The tools and audio material created for the CHiME-7 UDASE task are shared with the community.

用于语音增强的监督模型是利用人工生成的干净语音和噪声信号混合物进行训练的。然而，合成训练条件可能无法准确反映测试过程中遇到的实际情况。当测试域与合成训练域有显著差异时，这种差异会导致性能低下。为了解决这个问题，第七届 CHiME 挑战赛的 UDASE 任务旨在利用来自测试域的真实世界噪声语音记录，对语音增强模型进行无监督域适应。具体来说，该测试域与 CHiME-5 数据集相对应，其特点是在嘈杂和混响的家庭环境中录制的真实多讲话者会话语音记录，而这些记录无法获得地面真实的干净语音信号。在本文中，我们介绍了提交给 CHiME-7 UDASE 任务的系统的客观和主观评价，并对结果进行了分析。分析表明，主观评价与最近提出的几种用于语音增强的有监督非侵入式性能指标之间的相关性有限。相反，结果表明，使用为挑战赛开发的混响LibriCHiME-5数据集，更传统的侵入式客观指标可用于域内性能评估。主观评估结果表明，所有系统都成功降低了背景噪声，但总是以增加失真为代价。在主观评估的四种语音增强方法中，只有一种与未经处理的噪声语音相比，整体质量有所提高，这凸显了这项任务的难度。为 CHiME-7 UDASE 任务创建的工具和音频资料已与社区共享。

{"title":"Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge","authors":"Simon Leglaive , Matthieu Fraticelli , Hend ElGhazaly , Léonie Borne , Mostafa Sadeghi , Scott Wisdom , Manuel Pariente , John R. Hershey , Daniel Pressnitzer , Jon P. Barker","doi":"10.1016/j.csl.2024.101685","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101685","url":null,"abstract":"<div><p>Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals. However, the synthetic training conditions may not accurately reflect real-world conditions encountered during testing. This discrepancy can result in poor performance when the test domain significantly differs from the synthetic training domain. To tackle this issue, the UDASE task of the 7th CHiME challenge aimed to leverage real-world noisy speech recordings from the test domain for unsupervised domain adaptation of speech enhancement models. Specifically, this test domain corresponds to the CHiME-5 dataset, characterized by real multi-speaker and conversational speech recordings made in noisy and reverberant domestic environments, for which ground-truth clean speech signals are not available. In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results. This analysis reveals a limited correlation between subjective ratings and several supervised nonintrusive performance metrics recently proposed for speech enhancement. Conversely, the results suggest that more traditional intrusive objective metrics can be used for in-domain performance evaluation using the reverberant LibriCHiME-5 dataset developed for the challenge. The subjective evaluation indicates that all systems successfully reduced the background noise, but always at the expense of increased distortion. Out of the four speech enhancement methods evaluated subjectively, only one demonstrated an improvement in overall quality compared to the unprocessed noisy speech, highlighting the difficulty of the task. The tools and audio material created for the CHiME-7 UDASE task are shared with the community.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101685"},"PeriodicalIF":3.1,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000688/pdfft?md5=8f9da64ecc09fa13d3d77b048c8fa3ae&pid=1-s2.0-S0885230824000688-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141607236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multilingual non-intrusive binaural intelligibility prediction based on phone classification 基于手机分类的多语言非侵入式双耳可懂度预测

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-07-03 DOI: 10.1016/j.csl.2024.101684

Jana Roßbach , Kirsten C. Wagener , Bernd T. Meyer

Speech intelligibility (SI) prediction models are a valuable tool for the development of speech processing algorithms for hearing aids or consumer electronics. For the use in realistic environments it is desirable that the SI model is non-intrusive (does not require separate input of original and degraded speech, transcripts or a-priori knowledge about the signals) and does a binaural processing of the audio signals. Most of the existing SI models do not fulfill all of these criteria. In this study, we propose an SI model based on phone probabilities obtained from a deep neural net. The model comprises a binaural enhancement stage for prediction of the speech recognition threshold (SRT) in realistic acoustic scenes. In the first part of the study, SRT predictions in different spatial configurations are compared to the results from normal-hearing listeners. On average, our approach produces lower errors and higher correlations compared to three intrusive baseline models. In the second part, we explore if measures relevant in spatial hearing, i.e., the intelligibility level difference (ILD) and the binaural ILD (BILD), can be predicted with our modeling approach. We also investigate if a language mismatch between training and testing the model plays a role when predicting ILD and BILD. This point is especially important for low-resource languages, where not thousands of hours of language material are available for training. Binaural benefits are predicted by our model with an error of 1.5 dB. This is slightly higher than the error with a competitive baseline MBSTOI (1.1 dB), but does not require separate input of original and degraded speech. We also find that good binaural predictions can be obtained with models that are not specifically trained with the target language.

语音清晰度（SI）预测模型是开发助听器或消费电子产品语音处理算法的重要工具。为了在现实环境中使用，SI 模型最好是非侵入式的（不需要分别输入原始语音和降级语音、文字记录或有关信号的先验知识），并能对音频信号进行双耳处理。大多数现有的 SI 模型并不符合所有这些标准。在本研究中，我们提出了一种基于深度神经网络获得的电话概率的 SI 模型。该模型包括一个双耳增强阶段，用于预测现实声学场景中的语音识别阈值（SRT）。在研究的第一部分，不同空间配置下的 SRT 预测结果与正常听力听者的结果进行了比较。平均而言，与三个干扰基线模型相比，我们的方法产生的误差更低，相关性更高。在第二部分中，我们探讨了与空间听力相关的指标，即可懂度级差（ILD）和双耳可懂度级差（BILD），是否可以用我们的建模方法预测。我们还研究了在预测 ILD 和 BILD 时，训练和测试模型之间的语言不匹配是否会产生影响。这一点对于低资源语言尤为重要，因为在低资源语言中，没有数千小时的语言材料可用于训练。我们的模型在预测双耳优势时误差为 1.5 dB。这略高于具有竞争力的基线 MBSTOI 误差（1.1 dB），但不需要分别输入原始语音和降级语音。我们还发现，没有经过目标语言专门训练的模型也能获得良好的双耳预测效果。

{"title":"Multilingual non-intrusive binaural intelligibility prediction based on phone classification","authors":"Jana Roßbach , Kirsten C. Wagener , Bernd T. Meyer","doi":"10.1016/j.csl.2024.101684","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101684","url":null,"abstract":"<div><p>Speech intelligibility (SI) prediction models are a valuable tool for the development of speech processing algorithms for hearing aids or consumer electronics. For the use in realistic environments it is desirable that the SI model is non-intrusive (does not require separate input of original and degraded speech, transcripts or <em>a-priori</em> knowledge about the signals) and does a binaural processing of the audio signals. Most of the existing SI models do not fulfill all of these criteria. In this study, we propose an SI model based on phone probabilities obtained from a deep neural net. The model comprises a binaural enhancement stage for prediction of the speech recognition threshold (SRT) in realistic acoustic scenes. In the first part of the study, SRT predictions in different spatial configurations are compared to the results from normal-hearing listeners. On average, our approach produces lower errors and higher correlations compared to three intrusive baseline models. In the second part, we explore if measures relevant in spatial hearing, i.e., the intelligibility level difference (ILD) and the binaural ILD (BILD), can be predicted with our modeling approach. We also investigate if a language mismatch between training and testing the model plays a role when predicting ILD and BILD. This point is especially important for low-resource languages, where not thousands of hours of language material are available for training. Binaural benefits are predicted by our model with an error of 1.5 dB. This is slightly higher than the error with a competitive baseline MBSTOI (1.1 dB), but does not require separate input of original and degraded speech. We also find that good binaural predictions can be obtained with models that are not specifically trained with the target language.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101684"},"PeriodicalIF":3.1,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000676/pdfft?md5=2480b19144d8254f73d5748237f56388&pid=1-s2.0-S0885230824000676-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neural multi-task learning for end-to-end Arabic aspect-based sentiment analysis 基于阿拉伯语方面的端到端情感分析的神经多任务学习

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-06-23 DOI: 10.1016/j.csl.2024.101683

Rajae Bensoltane, Taher Zaki

Most existing aspect-based sentiment analysis (ABSA) methods perform the tasks of aspect extraction and sentiment classification independently, assuming that the aspect terms are already determined when handling the aspect sentiment classification task. However, such settings are neither practical nor appropriate in real-life applications, as aspects must be extracted prior to sentiment classification. This study aims to overcome this shortcoming by jointly identifying aspect terms and the corresponding sentiments using a multi-task learning approach based on a unified tagging scheme. The proposed model uses the Bidirectional Encoder Representations from Transformers (BERT) model to produce the input representations, followed by a Bidirectional Gated Recurrent Unit (BiGRU) layer for further contextual and semantic coding. An attention layer is added on top of BiGRU to force the model to focus on the important parts of the sentence. Finally, a Conditional Random Fields (CRF) layer is used to handle inter-label dependencies. Experiments conducted on a reference Arabic hotel dataset show that the proposed model significantly outperforms the baseline and related work models.

大多数现有的基于方面的情感分析（ABSA）方法都是独立完成方面提取和情感分类任务的，假设在处理方面情感分类任务时已经确定了方面术语。然而，这种设置在实际应用中既不实用也不合适，因为在进行情感分类之前必须先提取方面。本研究旨在克服这一缺陷，采用基于统一标记方案的多任务学习方法，联合识别方面术语和相应的情感。所提出的模型使用来自变换器的双向编码器表征（BERT）模型来生成输入表征，然后使用双向门控递归单元（BiGRU）层进一步进行上下文和语义编码。在 BiGRU 的基础上增加了注意力层，以迫使模型关注句子的重要部分。最后，条件随机场（CRF）层用于处理标签间的依赖关系。在参考阿拉伯语酒店数据集上进行的实验表明，所提出的模型明显优于基线模型和相关模型。

{"title":"Neural multi-task learning for end-to-end Arabic aspect-based sentiment analysis","authors":"Rajae Bensoltane, Taher Zaki","doi":"10.1016/j.csl.2024.101683","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101683","url":null,"abstract":"<div><p>Most existing aspect-based sentiment analysis (ABSA) methods perform the tasks of aspect extraction and sentiment classification independently, assuming that the aspect terms are already determined when handling the aspect sentiment classification task. However, such settings are neither practical nor appropriate in real-life applications, as aspects must be extracted prior to sentiment classification. This study aims to overcome this shortcoming by jointly identifying aspect terms and the corresponding sentiments using a multi-task learning approach based on a unified tagging scheme. The proposed model uses the Bidirectional Encoder Representations from Transformers (BERT) model to produce the input representations, followed by a Bidirectional Gated Recurrent Unit (BiGRU) layer for further contextual and semantic coding. An attention layer is added on top of BiGRU to force the model to focus on the important parts of the sentence. Finally, a Conditional Random Fields (CRF) layer is used to handle inter-label dependencies. Experiments conducted on a reference Arabic hotel dataset show that the proposed model significantly outperforms the baseline and related work models.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101683"},"PeriodicalIF":3.1,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000664/pdfft?md5=5af89b8ac3b7169819a4f2bf2d9a12ff&pid=1-s2.0-S0885230824000664-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141483685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Misogynistic attitude detection in YouTube comments and replies: A high-quality dataset and algorithmic models 检测 YouTube 评论和回复中的厌女态度：高质量数据集和算法模型

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-06-22 DOI: 10.1016/j.csl.2024.101682

Aakash Singh , Deepawali Sharma , Vivek Kumar Singh

Social media platforms are now not only a medium for expressing users views, feelings, emotions and sentiments but are also being abused by people to propagate unpleasant and hateful content. Consequently, research efforts have been made to develop techniques and models for automatically detecting and identifying hateful, abusive, vulgar, and offensive content on different platforms. Although significant progress has been made on the task, the research on design of methods to detect misogynistic attitude of people in non-English and code-mixed languages is not very well-developed. Non-availability of suitable datasets and resources is one main reason for this. Therefore, this paper attempts to bridge this research gap by presenting a high-quality curated dataset in the Hindi-English code-mixed language. The dataset includes 12,698 YouTube comments and replies, with each comment annotated under two-level categories, first as optimistic and pessimistic, and then into different types at second level based on the content. The inter-annotator agreement in the dataset is found to be 0.84 for the first subtask, and 0.79 for the second subtask, indicating the reasonably high quality of annotations. Different algorithmic models are explored for the task of automatic detection of the misogynistic attitude expressed in the comments, with the mBERT model giving best performance on both subtasks (reported macro average F1 scores of 0.59 and 0.52, and weighted average F1 scores of 0.66 and 0.65, respectively). The analysis and results suggest that the dataset can be used for further research on the topic and that the developed algorithmic models can be applied for automatic detection of misogynistic attitude in social media conversations and posts.

现在，社交媒体平台不仅是表达用户观点、感受、情绪和情感的媒介，而且还被人们滥用来传播令人不快和仇恨的内容。因此，研究人员一直在努力开发自动检测和识别不同平台上的仇恨、辱骂、低俗和攻击性内容的技术和模型。虽然这项任务已经取得了重大进展，但在设计方法以检测非英语和代码混合语言中人们的厌恶态度方面的研究还不是很完善。缺乏合适的数据集和资源是造成这种情况的主要原因之一。因此，本文试图通过提供一个高质量的印地语-英语混合编码语言数据集来弥补这一研究空白。该数据集包括 12,698 条 YouTube 评论和回复，每条评论都有两个级别的注释类别，首先是乐观和悲观，然后在第二个级别根据内容分为不同类型。数据集中第一个子任务的注释者之间的一致性为 0.84，第二个子任务的一致性为 0.79，表明注释的质量相当高。在自动检测评论中表达的厌女态度这一任务中，探索了不同的算法模型，其中 mBERT 模型在两个子任务中的表现最佳（报告的宏观平均 F1 分数分别为 0.59 和 0.52，加权平均 F1 分数分别为 0.66 和 0.65）。分析和结果表明，该数据集可用于该主题的进一步研究，所开发的算法模型可用于自动检测社交媒体对话和帖子中的厌女态度。

{"title":"Misogynistic attitude detection in YouTube comments and replies: A high-quality dataset and algorithmic models","authors":"Aakash Singh , Deepawali Sharma , Vivek Kumar Singh","doi":"10.1016/j.csl.2024.101682","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101682","url":null,"abstract":"<div><p>Social media platforms are now not only a medium for expressing users views, feelings, emotions and sentiments but are also being abused by people to propagate unpleasant and hateful content. Consequently, research efforts have been made to develop techniques and models for automatically detecting and identifying hateful, abusive, vulgar, and offensive content on different platforms. Although significant progress has been made on the task, the research on design of methods to detect misogynistic attitude of people in non-English and code-mixed languages is not very well-developed. Non-availability of suitable datasets and resources is one main reason for this. Therefore, this paper attempts to bridge this research gap by presenting a high-quality curated dataset in the Hindi-English code-mixed language. The dataset includes 12,698 YouTube comments and replies, with each comment annotated under two-level categories, first as optimistic and pessimistic, and then into different types at second level based on the content. The inter-annotator agreement in the dataset is found to be 0.84 for the first subtask, and 0.79 for the second subtask, indicating the reasonably high quality of annotations. Different algorithmic models are explored for the task of automatic detection of the misogynistic attitude expressed in the comments, with the mBERT model giving best performance on both subtasks (reported macro average F1 scores of 0.59 and 0.52, and weighted average F1 scores of 0.66 and 0.65, respectively). The analysis and results suggest that the dataset can be used for further research on the topic and that the developed algorithmic models can be applied for automatic detection of misogynistic attitude in social media conversations and posts.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101682"},"PeriodicalIF":3.1,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000652/pdfft?md5=1fb50b1ad09f16299853e9624ad9718d&pid=1-s2.0-S0885230824000652-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141483686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0