真实性的回响:在大语言模型时代重拾人类情感。

IF 4.8 Q2 MULTIDISCIPLINARY SCIENCES PNAS nexus Pub Date : 2025-02-25 eCollection Date: 2025-02-01 DOI:10.1093/pnasnexus/pgaf034
Yifei Wang, Ashkan Eshghi, Yi Ding, Ram Gopal
{"title":"真实性的回响:在大语言模型时代重拾人类情感。","authors":"Yifei Wang, Ashkan Eshghi, Yi Ding, Ram Gopal","doi":"10.1093/pnasnexus/pgaf034","DOIUrl":null,"url":null,"abstract":"<p><p>This paper scrutinizes the unintended consequences of employing large language models (LLMs) like ChatGPT for editing user-generated content (UGC), particularly focusing on alterations in sentiment. Through a detailed analysis of a climate change tweet dataset, we uncover that LLM-rephrased tweets tend to display a more neutral sentiment than their original counterparts. By replicating an established study on public opinions regarding climate change, we illustrate how such sentiment alterations can potentially skew the results of research relying on UGC. To counteract the biases introduced by LLMs, our research outlines two effective strategies. First, we employ predictive models capable of retroactively identifying the true human sentiment underlying the original communications, utilizing the altered sentiment expressed in LLM-rephrased tweets as a basis. While useful, this approach faces limitations when the origin of the text-whether directly crafted by a human or modified by an LLM-remains uncertain. To address such scenarios where the text's provenance is ambiguous, we develop a second approach based on the fine-tuning of LLMs. This fine-tuning process not only helps in aligning the sentiment of LLM-generated texts more closely with human sentiment but also offers a robust solution to the challenges posed by the indeterminate origins of digital content. This research highlights the impact of LLMs on the linguistic characteristics and sentiment of UGC, and more importantly, offers practical solutions to mitigate these biases, thereby ensuring the continued reliability of sentiment analysis in research and policy.</p>","PeriodicalId":74468,"journal":{"name":"PNAS nexus","volume":"4 2","pages":"pgaf034"},"PeriodicalIF":4.8000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11852273/pdf/","citationCount":"0","resultStr":"{\"title\":\"Echoes of authenticity: Reclaiming human sentiment in the large language model era.\",\"authors\":\"Yifei Wang, Ashkan Eshghi, Yi Ding, Ram Gopal\",\"doi\":\"10.1093/pnasnexus/pgaf034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This paper scrutinizes the unintended consequences of employing large language models (LLMs) like ChatGPT for editing user-generated content (UGC), particularly focusing on alterations in sentiment. Through a detailed analysis of a climate change tweet dataset, we uncover that LLM-rephrased tweets tend to display a more neutral sentiment than their original counterparts. By replicating an established study on public opinions regarding climate change, we illustrate how such sentiment alterations can potentially skew the results of research relying on UGC. To counteract the biases introduced by LLMs, our research outlines two effective strategies. First, we employ predictive models capable of retroactively identifying the true human sentiment underlying the original communications, utilizing the altered sentiment expressed in LLM-rephrased tweets as a basis. While useful, this approach faces limitations when the origin of the text-whether directly crafted by a human or modified by an LLM-remains uncertain. To address such scenarios where the text's provenance is ambiguous, we develop a second approach based on the fine-tuning of LLMs. This fine-tuning process not only helps in aligning the sentiment of LLM-generated texts more closely with human sentiment but also offers a robust solution to the challenges posed by the indeterminate origins of digital content. This research highlights the impact of LLMs on the linguistic characteristics and sentiment of UGC, and more importantly, offers practical solutions to mitigate these biases, thereby ensuring the continued reliability of sentiment analysis in research and policy.</p>\",\"PeriodicalId\":74468,\"journal\":{\"name\":\"PNAS nexus\",\"volume\":\"4 2\",\"pages\":\"pgaf034\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11852273/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PNAS nexus\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/pnasnexus/pgaf034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PNAS nexus","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/pnasnexus/pgaf034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

本文仔细研究了使用像ChatGPT这样的大型语言模型(llm)来编辑用户生成内容(UGC)的意外后果,特别是关注情感的变化。通过对气候变化推文数据集的详细分析,我们发现,法学硕士改写的推文往往比原始对应的推文表现出更中性的情绪。通过复制一项关于气候变化的公众意见的既定研究,我们说明了这种情绪变化如何可能扭曲依赖于UGC的研究结果。为了抵消法学硕士带来的偏见,我们的研究概述了两种有效的策略。首先,我们采用能够追溯识别原始通信背后的真实人类情感的预测模型,利用llm重新措辞的推文中表达的改变情感作为基础。虽然有用,但当文本的来源(无论是由人工直接制作还是由法学硕士修改)仍然不确定时,这种方法面临限制。为了解决文本来源不明确的情况,我们基于llm的微调开发了第二种方法。这种微调过程不仅有助于将法学硕士生成的文本的情感与人类情感更紧密地结合起来,而且还为数字内容的不确定来源所带来的挑战提供了强有力的解决方案。本研究强调了法学硕士对UGC语言特征和情感的影响,更重要的是,提供了切实可行的解决方案来减轻这些偏见,从而确保情感分析在研究和政策中的持续可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Echoes of authenticity: Reclaiming human sentiment in the large language model era.

This paper scrutinizes the unintended consequences of employing large language models (LLMs) like ChatGPT for editing user-generated content (UGC), particularly focusing on alterations in sentiment. Through a detailed analysis of a climate change tweet dataset, we uncover that LLM-rephrased tweets tend to display a more neutral sentiment than their original counterparts. By replicating an established study on public opinions regarding climate change, we illustrate how such sentiment alterations can potentially skew the results of research relying on UGC. To counteract the biases introduced by LLMs, our research outlines two effective strategies. First, we employ predictive models capable of retroactively identifying the true human sentiment underlying the original communications, utilizing the altered sentiment expressed in LLM-rephrased tweets as a basis. While useful, this approach faces limitations when the origin of the text-whether directly crafted by a human or modified by an LLM-remains uncertain. To address such scenarios where the text's provenance is ambiguous, we develop a second approach based on the fine-tuning of LLMs. This fine-tuning process not only helps in aligning the sentiment of LLM-generated texts more closely with human sentiment but also offers a robust solution to the challenges posed by the indeterminate origins of digital content. This research highlights the impact of LLMs on the linguistic characteristics and sentiment of UGC, and more importantly, offers practical solutions to mitigate these biases, thereby ensuring the continued reliability of sentiment analysis in research and policy.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.80
自引率
0.00%
发文量
0
期刊最新文献
Unified climate factors predict influenza outbreak seasonality across tropical and temperate regions. Weight-loss dynamics with tirzepatide versus semaglutide. Functional anisotropy of the elephant trunk skin: A biological blueprint for grasping, protection, and tactile sensing. The DdD protein confers intracellular and extracellular immunity to the leaderless bacteriocin enterocin DD14. Why do experts miss AI's errors? Evidence from a randomized labeling experiment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1