Shaina Raza , Abdullah Y. Muaad , Emrul Hasan , Muskan Garg , Zainab Al-Zanbouri , Syed Raza Bashir
{"title":"RESPECT: A framework for promoting inclusive and respectful conversations in online communications","authors":"Shaina Raza , Abdullah Y. Muaad , Emrul Hasan , Muskan Garg , Zainab Al-Zanbouri , Syed Raza Bashir","doi":"10.1016/j.nlp.2025.100126","DOIUrl":null,"url":null,"abstract":"<div><div>Toxicity and bias in online conversations hinder respectful interactions, leading to issues such as harassment and discrimination. While advancements in natural language processing (NLP) have improved the detection and mitigation of toxicity on digital platforms, the evolving nature of social media conversations demands continuous innovation. Previous efforts have made strides in identifying and reducing toxicity; however, a unified and adaptable framework for managing toxic content across diverse online discourse remains essential. This paper introduces a comprehensive framework <strong>R</strong><span>ESPECT</span> designed to effectively identify and mitigate toxicity in online conversations. The framework comprises two components: an encoder-only model for detecting toxicity and a decoder-only model for generating debiased versions of the text. By leveraging the capabilities of transformer-based models, toxicity is addressed as a binary classification problem. Subsequently, open-source and proprietary large language models are utilized through prompt-based approaches to rewrite toxic text into non-toxic, and making sure these are contextually accurate alternatives. Empirical results demonstrate that this approach significantly reduces toxicity across various conversational styles, fostering safer and more respectful communication in online environments.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"10 ","pages":"Article 100126"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719125000020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Toxicity and bias in online conversations hinder respectful interactions, leading to issues such as harassment and discrimination. While advancements in natural language processing (NLP) have improved the detection and mitigation of toxicity on digital platforms, the evolving nature of social media conversations demands continuous innovation. Previous efforts have made strides in identifying and reducing toxicity; however, a unified and adaptable framework for managing toxic content across diverse online discourse remains essential. This paper introduces a comprehensive framework RESPECT designed to effectively identify and mitigate toxicity in online conversations. The framework comprises two components: an encoder-only model for detecting toxicity and a decoder-only model for generating debiased versions of the text. By leveraging the capabilities of transformer-based models, toxicity is addressed as a binary classification problem. Subsequently, open-source and proprietary large language models are utilized through prompt-based approaches to rewrite toxic text into non-toxic, and making sure these are contextually accurate alternatives. Empirical results demonstrate that this approach significantly reduces toxicity across various conversational styles, fostering safer and more respectful communication in online environments.