Pub Date : 2024-07-09DOI: 10.1016/j.csl.2024.101688
Zhen Zhang , Mengqiu Liu , Xiyuan Jia , Gongxun Miao , Xin Wang , Hao Ni , Guohua Wu
In text classification task, models have shown remarkable accuracy across various datasets. However, confusion often arises when certain categories within the dataset are too similar, causing misclassification of certain samples. This paper proposes an improved method for this problem, through the creation of a three-layer text graph for the corpus, which is used to calculate the Category Correlation Matrix (CCM). Additionally, this paper introduces category-adaptive contrastive learning for text embedding from the encoder, enhancing the model’s ability to distinguish between samples in confusable categories that are easily confused. Soft labels are generated using this matrix to guide the classifier, preventing the model from becoming overconfident with one-hot vectors. The efficacy of this approach was demonstrated through experimental evaluations on three text encoders and six different datasets.
{"title":"Improving text classification via computing category correlation matrix from text graph","authors":"Zhen Zhang , Mengqiu Liu , Xiyuan Jia , Gongxun Miao , Xin Wang , Hao Ni , Guohua Wu","doi":"10.1016/j.csl.2024.101688","DOIUrl":"10.1016/j.csl.2024.101688","url":null,"abstract":"<div><p>In text classification task, models have shown remarkable accuracy across various datasets. However, confusion often arises when certain categories within the dataset are too similar, causing misclassification of certain samples. This paper proposes an improved method for this problem, through the creation of a three-layer text graph for the corpus, which is used to calculate the Category Correlation Matrix (CCM). Additionally, this paper introduces category-adaptive contrastive learning for text embedding from the encoder, enhancing the model’s ability to distinguish between samples in confusable categories that are easily confused. Soft labels are generated using this matrix to guide the classifier, preventing the model from becoming overconfident with one-hot vectors. The efficacy of this approach was demonstrated through experimental evaluations on three text encoders and six different datasets.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101688"},"PeriodicalIF":3.1,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000718/pdfft?md5=936898b07abaca17411cf1265567ad9a&pid=1-s2.0-S0885230824000718-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141637623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1016/j.csl.2024.101689
Diange Zhou , Shengwen Li , Lijun Dong , Renyao Chen , Xiaoyue Peng , Hong Yao
Knowledge graph embedding (KGE) aims to embed entities and relations in knowledge graphs (KGs) into a continuous, low-dimensional vector space. It has been shown as an effective tool for integrating knowledge graphs to improve various intelligent applications, such as question answering and information extraction. However, previous KGE models ignore the hidden natural order of knowledge learning on learning the embeddings of entities and relations, leaving room for improvement in their performance. Inspired by the easy-to-hard pattern used in human knowledge learning, this paper proposes a Curriculum learning-based KGE (C-KGE) model, which learns the embeddings of entities and relations from “basic knowledge” to “domain knowledge”. Specifically, a seed set representing the basic knowledge and several knowledge subsets are identified from KG. Then, entity overlap is employed to score the learning difficulty of each subset. Finally, C-KGE trains the entities and relations in each subset according to the learning difficulty score of each subset. C-KGE leverages trained embeddings of the seed set as prior knowledge and learns knowledge subsets iteratively to transfer knowledge between the seed set and subsets, smoothing the learning process of knowledge facts. Experimental results on real-world datasets demonstrate that the proposed model achieves improved embedding performances as well as reducing training time. Our codes and data will be released later.
{"title":"C-KGE: Curriculum learning-based Knowledge Graph Embedding","authors":"Diange Zhou , Shengwen Li , Lijun Dong , Renyao Chen , Xiaoyue Peng , Hong Yao","doi":"10.1016/j.csl.2024.101689","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101689","url":null,"abstract":"<div><p>Knowledge graph embedding (KGE) aims to embed entities and relations in knowledge graphs (KGs) into a continuous, low-dimensional vector space. It has been shown as an effective tool for integrating knowledge graphs to improve various intelligent applications, such as question answering and information extraction. However, previous KGE models ignore the hidden natural order of knowledge learning on learning the embeddings of entities and relations, leaving room for improvement in their performance. Inspired by the easy-to-hard pattern used in human knowledge learning, this paper proposes a <strong>C</strong>urriculum learning-based <strong>KGE</strong> (C-KGE) model, which learns the embeddings of entities and relations from “basic knowledge” to “domain knowledge”. Specifically, a seed set representing the basic knowledge and several knowledge subsets are identified from KG. Then, entity overlap is employed to score the learning difficulty of each subset. Finally, C-KGE trains the entities and relations in each subset according to the learning difficulty score of each subset. C-KGE leverages trained embeddings of the seed set as prior knowledge and learns knowledge subsets iteratively to transfer knowledge between the seed set and subsets, smoothing the learning process of knowledge facts. Experimental results on real-world datasets demonstrate that the proposed model achieves improved embedding performances as well as reducing training time. Our codes and data will be released later.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101689"},"PeriodicalIF":3.1,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S088523082400072X/pdfft?md5=fb33df044eeec38fa247696a89eb8787&pid=1-s2.0-S088523082400072X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141607237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-06DOI: 10.1016/j.csl.2024.101687
Di Wu, Peng Cheng, Yuying Zheng
Long text generation is a hot topic in natural language processing. To address the problem of insufficient semantic representation and incoherent text generation in existing long text models, the Seq2Seq dynamic planning network progressive text generation model (DPPG-BART) is proposed. In the data pre-processing stage, the lexical division sorting algorithm is used. To obtain hierarchical sequences of keywords with clear information content, word weight values are calculated and ranked by TF-IDF of word embedding. To enhance the input representation, the dynamic planning progressive generation network is constructed. Positional features and word embedding vector features are integrated at the input side of the model. At the same time, to enrich the semantic information and expand the content of the text, the relevant concept words are generated by the concept expansion module. The scoring network and feedback mechanism are used to adjust the concept expansion module. Experimental results show that the DPPG-BART model is optimized over GPT2-S, GPT2-L, BART and ProGen-2 model approaches in terms of metric values of MSJ, B-BLEU and FBD on long text datasets from two different domains, CNN and Writing Prompts.
{"title":"Seq2Seq dynamic planning network for progressive text generation","authors":"Di Wu, Peng Cheng, Yuying Zheng","doi":"10.1016/j.csl.2024.101687","DOIUrl":"10.1016/j.csl.2024.101687","url":null,"abstract":"<div><p>Long text generation is a hot topic in natural language processing. To address the problem of insufficient semantic representation and incoherent text generation in existing long text models, the Seq2Seq dynamic planning network progressive text generation model (DPPG-BART) is proposed. In the data pre-processing stage, the lexical division sorting algorithm is used. To obtain hierarchical sequences of keywords with clear information content, word weight values are calculated and ranked by TF-IDF of word embedding. To enhance the input representation, the dynamic planning progressive generation network is constructed. Positional features and word embedding vector features are integrated at the input side of the model. At the same time, to enrich the semantic information and expand the content of the text, the relevant concept words are generated by the concept expansion module. The scoring network and feedback mechanism are used to adjust the concept expansion module. Experimental results show that the DPPG-BART model is optimized over GPT2-S, GPT2-L, BART and ProGen-2 model approaches in terms of metric values of MSJ, B-BLEU and FBD on long text datasets from two different domains, CNN and Writing Prompts.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101687"},"PeriodicalIF":3.1,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000706/pdfft?md5=9c314286f96f095183826029b974049f&pid=1-s2.0-S0885230824000706-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141623113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-06DOI: 10.1016/j.csl.2024.101686
Yuhua Wang , Junying Hu , Yongli Su , Bo Zhang , Kai Sun , Hai Zhang
The objective of the relation classification task is to extract relations between entities. Recent studies have found that R-BERT (Wu and He, 2019) based on pre-trained BERT (Devlin et al., 2019) acquires extremely good results in the relation classification task. However, this method does not take into account the semantic differences between different kinds of entities and global semantic information either. In this paper, we set two different fully connected layers to take into account the semantic difference between subject and object entities. Besides, we build a new module named Concat Module to fully fuse the semantic information among the subject entity vector, object entity vector, and the whole sample sentence representation vector. In addition, we apply the average pooling to acquire a better entity representation of each entity and add the activation operation with a new fully connected layer after our Concat Module. Modifying R-BERT, we propose a new model named BERT with Global Semantic Information (GSR-BERT) for relation classification tasks. We use our approach on two datasets: the SemEval-2010 Task 8 dataset and the Chinese character relationship classification dataset. Our approach achieves a significant improvement over the two datasets. It means that our approach enjoys transferability across different datasets. Furthermore, we prove that these policies we used in our approach also enjoy applicability to named entity recognition task.
{"title":"Modified R-BERT with global semantic information for relation classification task","authors":"Yuhua Wang , Junying Hu , Yongli Su , Bo Zhang , Kai Sun , Hai Zhang","doi":"10.1016/j.csl.2024.101686","DOIUrl":"10.1016/j.csl.2024.101686","url":null,"abstract":"<div><p>The objective of the relation classification task is to extract relations between entities. Recent studies have found that R-BERT (Wu and He, 2019) based on pre-trained BERT (Devlin et al., 2019) acquires extremely good results in the relation classification task. However, this method does not take into account the semantic differences between different kinds of entities and global semantic information either. In this paper, we set two different fully connected layers to take into account the semantic difference between subject and object entities. Besides, we build a new module named Concat Module to fully fuse the semantic information among the subject entity vector, object entity vector, and the whole sample sentence representation vector. In addition, we apply the average pooling to acquire a better entity representation of each entity and add the activation operation with a new fully connected layer after our Concat Module. Modifying R-BERT, we propose a new model named BERT with Global Semantic Information (GSR-BERT) for relation classification tasks. We use our approach on two datasets: the SemEval-2010 Task 8 dataset and the Chinese character relationship classification dataset. Our approach achieves a significant improvement over the two datasets. It means that our approach enjoys transferability across different datasets. Furthermore, we prove that these policies we used in our approach also enjoy applicability to named entity recognition task.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101686"},"PeriodicalIF":3.1,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S088523082400069X/pdfft?md5=0315d6e108caefa08e405818e501bafd&pid=1-s2.0-S088523082400069X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141637622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-06DOI: 10.1016/j.csl.2024.101685
Simon Leglaive , Matthieu Fraticelli , Hend ElGhazaly , Léonie Borne , Mostafa Sadeghi , Scott Wisdom , Manuel Pariente , John R. Hershey , Daniel Pressnitzer , Jon P. Barker
Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals. However, the synthetic training conditions may not accurately reflect real-world conditions encountered during testing. This discrepancy can result in poor performance when the test domain significantly differs from the synthetic training domain. To tackle this issue, the UDASE task of the 7th CHiME challenge aimed to leverage real-world noisy speech recordings from the test domain for unsupervised domain adaptation of speech enhancement models. Specifically, this test domain corresponds to the CHiME-5 dataset, characterized by real multi-speaker and conversational speech recordings made in noisy and reverberant domestic environments, for which ground-truth clean speech signals are not available. In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results. This analysis reveals a limited correlation between subjective ratings and several supervised nonintrusive performance metrics recently proposed for speech enhancement. Conversely, the results suggest that more traditional intrusive objective metrics can be used for in-domain performance evaluation using the reverberant LibriCHiME-5 dataset developed for the challenge. The subjective evaluation indicates that all systems successfully reduced the background noise, but always at the expense of increased distortion. Out of the four speech enhancement methods evaluated subjectively, only one demonstrated an improvement in overall quality compared to the unprocessed noisy speech, highlighting the difficulty of the task. The tools and audio material created for the CHiME-7 UDASE task are shared with the community.
{"title":"Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge","authors":"Simon Leglaive , Matthieu Fraticelli , Hend ElGhazaly , Léonie Borne , Mostafa Sadeghi , Scott Wisdom , Manuel Pariente , John R. Hershey , Daniel Pressnitzer , Jon P. Barker","doi":"10.1016/j.csl.2024.101685","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101685","url":null,"abstract":"<div><p>Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals. However, the synthetic training conditions may not accurately reflect real-world conditions encountered during testing. This discrepancy can result in poor performance when the test domain significantly differs from the synthetic training domain. To tackle this issue, the UDASE task of the 7th CHiME challenge aimed to leverage real-world noisy speech recordings from the test domain for unsupervised domain adaptation of speech enhancement models. Specifically, this test domain corresponds to the CHiME-5 dataset, characterized by real multi-speaker and conversational speech recordings made in noisy and reverberant domestic environments, for which ground-truth clean speech signals are not available. In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results. This analysis reveals a limited correlation between subjective ratings and several supervised nonintrusive performance metrics recently proposed for speech enhancement. Conversely, the results suggest that more traditional intrusive objective metrics can be used for in-domain performance evaluation using the reverberant LibriCHiME-5 dataset developed for the challenge. The subjective evaluation indicates that all systems successfully reduced the background noise, but always at the expense of increased distortion. Out of the four speech enhancement methods evaluated subjectively, only one demonstrated an improvement in overall quality compared to the unprocessed noisy speech, highlighting the difficulty of the task. The tools and audio material created for the CHiME-7 UDASE task are shared with the community.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101685"},"PeriodicalIF":3.1,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000688/pdfft?md5=8f9da64ecc09fa13d3d77b048c8fa3ae&pid=1-s2.0-S0885230824000688-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141607236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-03DOI: 10.1016/j.csl.2024.101684
Jana Roßbach , Kirsten C. Wagener , Bernd T. Meyer
Speech intelligibility (SI) prediction models are a valuable tool for the development of speech processing algorithms for hearing aids or consumer electronics. For the use in realistic environments it is desirable that the SI model is non-intrusive (does not require separate input of original and degraded speech, transcripts or a-priori knowledge about the signals) and does a binaural processing of the audio signals. Most of the existing SI models do not fulfill all of these criteria. In this study, we propose an SI model based on phone probabilities obtained from a deep neural net. The model comprises a binaural enhancement stage for prediction of the speech recognition threshold (SRT) in realistic acoustic scenes. In the first part of the study, SRT predictions in different spatial configurations are compared to the results from normal-hearing listeners. On average, our approach produces lower errors and higher correlations compared to three intrusive baseline models. In the second part, we explore if measures relevant in spatial hearing, i.e., the intelligibility level difference (ILD) and the binaural ILD (BILD), can be predicted with our modeling approach. We also investigate if a language mismatch between training and testing the model plays a role when predicting ILD and BILD. This point is especially important for low-resource languages, where not thousands of hours of language material are available for training. Binaural benefits are predicted by our model with an error of 1.5 dB. This is slightly higher than the error with a competitive baseline MBSTOI (1.1 dB), but does not require separate input of original and degraded speech. We also find that good binaural predictions can be obtained with models that are not specifically trained with the target language.
语音清晰度(SI)预测模型是开发助听器或消费电子产品语音处理算法的重要工具。为了在现实环境中使用,SI 模型最好是非侵入式的(不需要分别输入原始语音和降级语音、文字记录或有关信号的先验知识),并能对音频信号进行双耳处理。大多数现有的 SI 模型并不符合所有这些标准。在本研究中,我们提出了一种基于深度神经网络获得的电话概率的 SI 模型。该模型包括一个双耳增强阶段,用于预测现实声学场景中的语音识别阈值(SRT)。在研究的第一部分,不同空间配置下的 SRT 预测结果与正常听力听者的结果进行了比较。平均而言,与三个干扰基线模型相比,我们的方法产生的误差更低,相关性更高。在第二部分中,我们探讨了与空间听力相关的指标,即可懂度级差(ILD)和双耳可懂度级差(BILD),是否可以用我们的建模方法预测。我们还研究了在预测 ILD 和 BILD 时,训练和测试模型之间的语言不匹配是否会产生影响。这一点对于低资源语言尤为重要,因为在低资源语言中,没有数千小时的语言材料可用于训练。我们的模型在预测双耳优势时误差为 1.5 dB。这略高于具有竞争力的基线 MBSTOI 误差(1.1 dB),但不需要分别输入原始语音和降级语音。我们还发现,没有经过目标语言专门训练的模型也能获得良好的双耳预测效果。
{"title":"Multilingual non-intrusive binaural intelligibility prediction based on phone classification","authors":"Jana Roßbach , Kirsten C. Wagener , Bernd T. Meyer","doi":"10.1016/j.csl.2024.101684","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101684","url":null,"abstract":"<div><p>Speech intelligibility (SI) prediction models are a valuable tool for the development of speech processing algorithms for hearing aids or consumer electronics. For the use in realistic environments it is desirable that the SI model is non-intrusive (does not require separate input of original and degraded speech, transcripts or <em>a-priori</em> knowledge about the signals) and does a binaural processing of the audio signals. Most of the existing SI models do not fulfill all of these criteria. In this study, we propose an SI model based on phone probabilities obtained from a deep neural net. The model comprises a binaural enhancement stage for prediction of the speech recognition threshold (SRT) in realistic acoustic scenes. In the first part of the study, SRT predictions in different spatial configurations are compared to the results from normal-hearing listeners. On average, our approach produces lower errors and higher correlations compared to three intrusive baseline models. In the second part, we explore if measures relevant in spatial hearing, i.e., the intelligibility level difference (ILD) and the binaural ILD (BILD), can be predicted with our modeling approach. We also investigate if a language mismatch between training and testing the model plays a role when predicting ILD and BILD. This point is especially important for low-resource languages, where not thousands of hours of language material are available for training. Binaural benefits are predicted by our model with an error of 1.5 dB. This is slightly higher than the error with a competitive baseline MBSTOI (1.1 dB), but does not require separate input of original and degraded speech. We also find that good binaural predictions can be obtained with models that are not specifically trained with the target language.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101684"},"PeriodicalIF":3.1,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000676/pdfft?md5=2480b19144d8254f73d5748237f56388&pid=1-s2.0-S0885230824000676-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-23DOI: 10.1016/j.csl.2024.101683
Rajae Bensoltane, Taher Zaki
Most existing aspect-based sentiment analysis (ABSA) methods perform the tasks of aspect extraction and sentiment classification independently, assuming that the aspect terms are already determined when handling the aspect sentiment classification task. However, such settings are neither practical nor appropriate in real-life applications, as aspects must be extracted prior to sentiment classification. This study aims to overcome this shortcoming by jointly identifying aspect terms and the corresponding sentiments using a multi-task learning approach based on a unified tagging scheme. The proposed model uses the Bidirectional Encoder Representations from Transformers (BERT) model to produce the input representations, followed by a Bidirectional Gated Recurrent Unit (BiGRU) layer for further contextual and semantic coding. An attention layer is added on top of BiGRU to force the model to focus on the important parts of the sentence. Finally, a Conditional Random Fields (CRF) layer is used to handle inter-label dependencies. Experiments conducted on a reference Arabic hotel dataset show that the proposed model significantly outperforms the baseline and related work models.
{"title":"Neural multi-task learning for end-to-end Arabic aspect-based sentiment analysis","authors":"Rajae Bensoltane, Taher Zaki","doi":"10.1016/j.csl.2024.101683","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101683","url":null,"abstract":"<div><p>Most existing aspect-based sentiment analysis (ABSA) methods perform the tasks of aspect extraction and sentiment classification independently, assuming that the aspect terms are already determined when handling the aspect sentiment classification task. However, such settings are neither practical nor appropriate in real-life applications, as aspects must be extracted prior to sentiment classification. This study aims to overcome this shortcoming by jointly identifying aspect terms and the corresponding sentiments using a multi-task learning approach based on a unified tagging scheme. The proposed model uses the Bidirectional Encoder Representations from Transformers (BERT) model to produce the input representations, followed by a Bidirectional Gated Recurrent Unit (BiGRU) layer for further contextual and semantic coding. An attention layer is added on top of BiGRU to force the model to focus on the important parts of the sentence. Finally, a Conditional Random Fields (CRF) layer is used to handle inter-label dependencies. Experiments conducted on a reference Arabic hotel dataset show that the proposed model significantly outperforms the baseline and related work models.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101683"},"PeriodicalIF":3.1,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000664/pdfft?md5=5af89b8ac3b7169819a4f2bf2d9a12ff&pid=1-s2.0-S0885230824000664-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141483685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social media platforms are now not only a medium for expressing users views, feelings, emotions and sentiments but are also being abused by people to propagate unpleasant and hateful content. Consequently, research efforts have been made to develop techniques and models for automatically detecting and identifying hateful, abusive, vulgar, and offensive content on different platforms. Although significant progress has been made on the task, the research on design of methods to detect misogynistic attitude of people in non-English and code-mixed languages is not very well-developed. Non-availability of suitable datasets and resources is one main reason for this. Therefore, this paper attempts to bridge this research gap by presenting a high-quality curated dataset in the Hindi-English code-mixed language. The dataset includes 12,698 YouTube comments and replies, with each comment annotated under two-level categories, first as optimistic and pessimistic, and then into different types at second level based on the content. The inter-annotator agreement in the dataset is found to be 0.84 for the first subtask, and 0.79 for the second subtask, indicating the reasonably high quality of annotations. Different algorithmic models are explored for the task of automatic detection of the misogynistic attitude expressed in the comments, with the mBERT model giving best performance on both subtasks (reported macro average F1 scores of 0.59 and 0.52, and weighted average F1 scores of 0.66 and 0.65, respectively). The analysis and results suggest that the dataset can be used for further research on the topic and that the developed algorithmic models can be applied for automatic detection of misogynistic attitude in social media conversations and posts.
现在,社交媒体平台不仅是表达用户观点、感受、情绪和情感的媒介,而且还被人们滥用来传播令人不快和仇恨的内容。因此,研究人员一直在努力开发自动检测和识别不同平台上的仇恨、辱骂、低俗和攻击性内容的技术和模型。虽然这项任务已经取得了重大进展,但在设计方法以检测非英语和代码混合语言中人们的厌恶态度方面的研究还不是很完善。缺乏合适的数据集和资源是造成这种情况的主要原因之一。因此,本文试图通过提供一个高质量的印地语-英语混合编码语言数据集来弥补这一研究空白。该数据集包括 12,698 条 YouTube 评论和回复,每条评论都有两个级别的注释类别,首先是乐观和悲观,然后在第二个级别根据内容分为不同类型。数据集中第一个子任务的注释者之间的一致性为 0.84,第二个子任务的一致性为 0.79,表明注释的质量相当高。在自动检测评论中表达的厌女态度这一任务中,探索了不同的算法模型,其中 mBERT 模型在两个子任务中的表现最佳(报告的宏观平均 F1 分数分别为 0.59 和 0.52,加权平均 F1 分数分别为 0.66 和 0.65)。分析和结果表明,该数据集可用于该主题的进一步研究,所开发的算法模型可用于自动检测社交媒体对话和帖子中的厌女态度。
{"title":"Misogynistic attitude detection in YouTube comments and replies: A high-quality dataset and algorithmic models","authors":"Aakash Singh , Deepawali Sharma , Vivek Kumar Singh","doi":"10.1016/j.csl.2024.101682","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101682","url":null,"abstract":"<div><p>Social media platforms are now not only a medium for expressing users views, feelings, emotions and sentiments but are also being abused by people to propagate unpleasant and hateful content. Consequently, research efforts have been made to develop techniques and models for automatically detecting and identifying hateful, abusive, vulgar, and offensive content on different platforms. Although significant progress has been made on the task, the research on design of methods to detect misogynistic attitude of people in non-English and code-mixed languages is not very well-developed. Non-availability of suitable datasets and resources is one main reason for this. Therefore, this paper attempts to bridge this research gap by presenting a high-quality curated dataset in the Hindi-English code-mixed language. The dataset includes 12,698 YouTube comments and replies, with each comment annotated under two-level categories, first as optimistic and pessimistic, and then into different types at second level based on the content. The inter-annotator agreement in the dataset is found to be 0.84 for the first subtask, and 0.79 for the second subtask, indicating the reasonably high quality of annotations. Different algorithmic models are explored for the task of automatic detection of the misogynistic attitude expressed in the comments, with the mBERT model giving best performance on both subtasks (reported macro average F1 scores of 0.59 and 0.52, and weighted average F1 scores of 0.66 and 0.65, respectively). The analysis and results suggest that the dataset can be used for further research on the topic and that the developed algorithmic models can be applied for automatic detection of misogynistic attitude in social media conversations and posts.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101682"},"PeriodicalIF":3.1,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000652/pdfft?md5=1fb50b1ad09f16299853e9624ad9718d&pid=1-s2.0-S0885230824000652-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141483686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1016/j.csl.2024.101681
Tuğba Pamay Arslan, Gülşen Eryiğit
Coreference resolution (CR), which is the identification of in-text mentions that refer to the same entity, is a crucial step in natural language understanding. While CR in English has been studied for quite a long time, studies for pro-dropped and morphologically rich languages is an active research area which has yet to reach sufficient maturity. Turkish, a morphologically highly-rich language, poses interesting challenges for natural language processing tasks, including CR, due to its agglutinative nature and consequent pronoun-dropping phenomenon. This article explores the use of different neural CR architectures (i.e., mention-pair, mention-ranking, and end-to-end) on Turkish, a morphologically highly-rich language, by formulating multiple research questions around the impacts of dropped pronouns, data quality, and interlingual transfer. The preparations made to explore these research questions and the findings obtained as a result of our explorations revealed the first Turkish CR dataset that includes dropped pronoun annotations (of size 4K entities/22K mentions), new state-of-the-art results on Turkish CR, the first neural end-to-end Turkish CR results (70.4% F-score), the first multilingual end-to-end CR results including Turkish (yielding 1.0 percentage points improvement on Turkish) and the demonstration of the positive impact of dropped pronouns on CR of pro-dropped and morphologically rich languages, for the first time in the literature. Our research has brought Turkish end-to-end CR performances (72.0% F-score) to similar levels with other languages, surpassing the baseline scores by 32.1 percentage points.
{"title":"Enhancing Turkish Coreference Resolution: Insights from deep learning, dropped pronouns, and multilingual transfer learning","authors":"Tuğba Pamay Arslan, Gülşen Eryiğit","doi":"10.1016/j.csl.2024.101681","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101681","url":null,"abstract":"<div><p>Coreference resolution (CR), which is the identification of in-text mentions that refer to the same entity, is a crucial step in natural language understanding. While CR in English has been studied for quite a long time, studies for pro-dropped and morphologically rich languages is an active research area which has yet to reach sufficient maturity. Turkish, a morphologically highly-rich language, poses interesting challenges for natural language processing tasks, including CR, due to its agglutinative nature and consequent pronoun-dropping phenomenon. This article explores the use of different neural CR architectures (i.e., mention-pair, mention-ranking, and end-to-end) on Turkish, a morphologically highly-rich language, by formulating multiple research questions around the impacts of dropped pronouns, data quality, and interlingual transfer. The preparations made to explore these research questions and the findings obtained as a result of our explorations revealed the first Turkish CR dataset that includes dropped pronoun annotations (of size 4K entities/22K mentions), new state-of-the-art results on Turkish CR, the first neural end-to-end Turkish CR results (70.4% F-score), the first multilingual end-to-end CR results including Turkish (yielding 1.0 percentage points improvement on Turkish) and the demonstration of the positive impact of dropped pronouns on CR of pro-dropped and morphologically rich languages, for the first time in the literature. Our research has brought Turkish end-to-end CR performances (72.0% F-score) to similar levels with other languages, surpassing the baseline scores by 32.1 percentage points.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101681"},"PeriodicalIF":3.1,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000640/pdfft?md5=75cd60c63807520ee823be3bbb1025ae&pid=1-s2.0-S0885230824000640-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141444378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-12DOI: 10.1016/j.csl.2024.101668
Mamta , Asif Ekbal
Social media, e-commerce, and other online platforms have witnessed tremendous growth in multilingual users. This requires addressing the code-mixing phenomenon, i.e. mixing of more than one language for providing a rich native user experience. User reviews and comments may benefit service providers in terms of customer management. Aspect based Sentiment Analysis (ABSA) provides a fine-grained analysis of these reviews by identifying the aspects mentioned and classifies the polarities (i.e., positive, negative, neutral, and conflict). The research in this direction has mainly focused on resource-rich monolingual languages like English, which does not suffice for analyzing multilingual code-mixed reviews. In this paper, we introduce a new task to facilitate the research on code-mixed ABSA. We offer a benchmark setup by creating a code-mixed Hinglish (i.e., mixing of Hindi and English) dataset for ABSA, which is annotated with aspect terms and their sentiment values. To demonstrate the effective usage of the dataset, we develop several deep learning based models for aspect term extraction and sentiment analysis, and establish them as the baselines for further research in this direction. 1
{"title":"Quality achhi hai (is good), satisfied! Towards aspect based sentiment analysis in code-mixed language","authors":"Mamta , Asif Ekbal","doi":"10.1016/j.csl.2024.101668","DOIUrl":"10.1016/j.csl.2024.101668","url":null,"abstract":"<div><p>Social media, e-commerce, and other online platforms have witnessed tremendous growth in multilingual users. This requires addressing the code-mixing phenomenon, i.e. mixing of more than one language for providing a rich native user experience. User reviews and comments may benefit service providers in terms of customer management. Aspect based Sentiment Analysis (ABSA) provides a fine-grained analysis of these reviews by identifying the aspects mentioned and classifies the polarities (i.e., positive, negative, neutral, and conflict). The research in this direction has mainly focused on resource-rich monolingual languages like English, which does not suffice for analyzing multilingual code-mixed reviews. In this paper, we introduce a new task to facilitate the research on code-mixed ABSA. We offer a benchmark setup by creating a code-mixed Hinglish (i.e., mixing of Hindi and English) dataset for ABSA, which is annotated with aspect terms and their sentiment values. To demonstrate the effective usage of the dataset, we develop several deep learning based models for aspect term extraction and sentiment analysis, and establish them as the baselines for further research in this direction. <span><sup>1</sup></span></p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101668"},"PeriodicalIF":4.3,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000512/pdfft?md5=d4cf7f510d6f46e21b19e99b8421ebc3&pid=1-s2.0-S0885230824000512-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141399023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}