真实性的回响：在大语言模型时代重拾人类情感。

IF 4.8 Q2 MULTIDISCIPLINARY SCIENCES PNAS nexus Pub Date : 2025-02-25 eCollection Date: 2025-02-01 DOI:10.1093/pnasnexus/pgaf034

Yifei Wang, Ashkan Eshghi, Yi Ding, Ram Gopal

{"title":"真实性的回响：在大语言模型时代重拾人类情感。","authors":"Yifei Wang, Ashkan Eshghi, Yi Ding, Ram Gopal","doi":"10.1093/pnasnexus/pgaf034","DOIUrl":null,"url":null,"abstract":"This paper scrutinizes the unintended consequences of employing large language models (LLMs) like ChatGPT for editing user-generated content (UGC), particularly focusing on alterations in sentiment. Through a detailed analysis of a climate change tweet dataset, we uncover that LLM-rephrased tweets tend to display a more neutral sentiment than their original counterparts. By replicating an established study on public opinions regarding climate change, we illustrate how such sentiment alterations can potentially skew the results of research relying on UGC. To counteract the biases introduced by LLMs, our research outlines two effective strategies. First, we employ predictive models capable of retroactively identifying the true human sentiment underlying the original communications, utilizing the altered sentiment expressed in LLM-rephrased tweets as a basis. While useful, this approach faces limitations when the origin of the text-whether directly crafted by a human or modified by an LLM-remains uncertain. To address such scenarios where the text's provenance is ambiguous, we develop a second approach based on the fine-tuning of LLMs. This fine-tuning process not only helps in aligning the sentiment of LLM-generated texts more closely with human sentiment but also offers a robust solution to the challenges posed by the indeterminate origins of digital content. This research highlights the impact of LLMs on the linguistic characteristics and sentiment of UGC, and more importantly, offers practical solutions to mitigate these biases, thereby ensuring the continued reliability of sentiment analysis in research and policy.","PeriodicalId":74468,"journal":{"name":"PNAS nexus","volume":"4 2","pages":"pgaf034"},"PeriodicalIF":4.8000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11852273/pdf/","citationCount":"0","resultStr":"{\"title\":\"Echoes of authenticity: Reclaiming human sentiment in the large language model era.\",\"authors\":\"Yifei Wang, Ashkan Eshghi, Yi Ding, Ram Gopal\",\"doi\":\"10.1093/pnasnexus/pgaf034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper scrutinizes the unintended consequences of employing large language models (LLMs) like ChatGPT for editing user-generated content (UGC), particularly focusing on alterations in sentiment. Through a detailed analysis of a climate change tweet dataset, we uncover that LLM-rephrased tweets tend to display a more neutral sentiment than their original counterparts. By replicating an established study on public opinions regarding climate change, we illustrate how such sentiment alterations can potentially skew the results of research relying on UGC. To counteract the biases introduced by LLMs, our research outlines two effective strategies. First, we employ predictive models capable of retroactively identifying the true human sentiment underlying the original communications, utilizing the altered sentiment expressed in LLM-rephrased tweets as a basis. While useful, this approach faces limitations when the origin of the text-whether directly crafted by a human or modified by an LLM-remains uncertain. To address such scenarios where the text's provenance is ambiguous, we develop a second approach based on the fine-tuning of LLMs. This fine-tuning process not only helps in aligning the sentiment of LLM-generated texts more closely with human sentiment but also offers a robust solution to the challenges posed by the indeterminate origins of digital content. This research highlights the impact of LLMs on the linguistic characteristics and sentiment of UGC, and more importantly, offers practical solutions to mitigate these biases, thereby ensuring the continued reliability of sentiment analysis in research and policy.\",\"PeriodicalId\":74468,\"journal\":{\"name\":\"PNAS nexus\",\"volume\":\"4 2\",\"pages\":\"pgaf034\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11852273/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PNAS nexus\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/pnasnexus/pgaf034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PNAS nexus","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/pnasnexus/pgaf034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

本文仔细研究了使用像ChatGPT这样的大型语言模型（llm）来编辑用户生成内容（UGC）的意外后果，特别是关注情感的变化。通过对气候变化推文数据集的详细分析，我们发现，法学硕士改写的推文往往比原始对应的推文表现出更中性的情绪。通过复制一项关于气候变化的公众意见的既定研究，我们说明了这种情绪变化如何可能扭曲依赖于UGC的研究结果。为了抵消法学硕士带来的偏见，我们的研究概述了两种有效的策略。首先，我们采用能够追溯识别原始通信背后的真实人类情感的预测模型，利用llm重新措辞的推文中表达的改变情感作为基础。虽然有用，但当文本的来源（无论是由人工直接制作还是由法学硕士修改）仍然不确定时，这种方法面临限制。为了解决文本来源不明确的情况，我们基于llm的微调开发了第二种方法。这种微调过程不仅有助于将法学硕士生成的文本的情感与人类情感更紧密地结合起来，而且还为数字内容的不确定来源所带来的挑战提供了强有力的解决方案。本研究强调了法学硕士对UGC语言特征和情感的影响，更重要的是，提供了切实可行的解决方案来减轻这些偏见，从而确保情感分析在研究和政策中的持续可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Echoes of authenticity: Reclaiming human sentiment in the large language model era.

This paper scrutinizes the unintended consequences of employing large language models (LLMs) like ChatGPT for editing user-generated content (UGC), particularly focusing on alterations in sentiment. Through a detailed analysis of a climate change tweet dataset, we uncover that LLM-rephrased tweets tend to display a more neutral sentiment than their original counterparts. By replicating an established study on public opinions regarding climate change, we illustrate how such sentiment alterations can potentially skew the results of research relying on UGC. To counteract the biases introduced by LLMs, our research outlines two effective strategies. First, we employ predictive models capable of retroactively identifying the true human sentiment underlying the original communications, utilizing the altered sentiment expressed in LLM-rephrased tweets as a basis. While useful, this approach faces limitations when the origin of the text-whether directly crafted by a human or modified by an LLM-remains uncertain. To address such scenarios where the text's provenance is ambiguous, we develop a second approach based on the fine-tuning of LLMs. This fine-tuning process not only helps in aligning the sentiment of LLM-generated texts more closely with human sentiment but also offers a robust solution to the challenges posed by the indeterminate origins of digital content. This research highlights the impact of LLMs on the linguistic characteristics and sentiment of UGC, and more importantly, offers practical solutions to mitigate these biases, thereby ensuring the continued reliability of sentiment analysis in research and policy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PNAS nexus

CiteScore

1.80

自引率

0.00%

发文量