{"title":"真实性的回响:在大语言模型时代重拾人类情感。","authors":"Yifei Wang, Ashkan Eshghi, Yi Ding, Ram Gopal","doi":"10.1093/pnasnexus/pgaf034","DOIUrl":null,"url":null,"abstract":"<p><p>This paper scrutinizes the unintended consequences of employing large language models (LLMs) like ChatGPT for editing user-generated content (UGC), particularly focusing on alterations in sentiment. Through a detailed analysis of a climate change tweet dataset, we uncover that LLM-rephrased tweets tend to display a more neutral sentiment than their original counterparts. By replicating an established study on public opinions regarding climate change, we illustrate how such sentiment alterations can potentially skew the results of research relying on UGC. To counteract the biases introduced by LLMs, our research outlines two effective strategies. First, we employ predictive models capable of retroactively identifying the true human sentiment underlying the original communications, utilizing the altered sentiment expressed in LLM-rephrased tweets as a basis. While useful, this approach faces limitations when the origin of the text-whether directly crafted by a human or modified by an LLM-remains uncertain. To address such scenarios where the text's provenance is ambiguous, we develop a second approach based on the fine-tuning of LLMs. This fine-tuning process not only helps in aligning the sentiment of LLM-generated texts more closely with human sentiment but also offers a robust solution to the challenges posed by the indeterminate origins of digital content. This research highlights the impact of LLMs on the linguistic characteristics and sentiment of UGC, and more importantly, offers practical solutions to mitigate these biases, thereby ensuring the continued reliability of sentiment analysis in research and policy.</p>","PeriodicalId":74468,"journal":{"name":"PNAS nexus","volume":"4 2","pages":"pgaf034"},"PeriodicalIF":4.8000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11852273/pdf/","citationCount":"0","resultStr":"{\"title\":\"Echoes of authenticity: Reclaiming human sentiment in the large language model era.\",\"authors\":\"Yifei Wang, Ashkan Eshghi, Yi Ding, Ram Gopal\",\"doi\":\"10.1093/pnasnexus/pgaf034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This paper scrutinizes the unintended consequences of employing large language models (LLMs) like ChatGPT for editing user-generated content (UGC), particularly focusing on alterations in sentiment. Through a detailed analysis of a climate change tweet dataset, we uncover that LLM-rephrased tweets tend to display a more neutral sentiment than their original counterparts. By replicating an established study on public opinions regarding climate change, we illustrate how such sentiment alterations can potentially skew the results of research relying on UGC. To counteract the biases introduced by LLMs, our research outlines two effective strategies. First, we employ predictive models capable of retroactively identifying the true human sentiment underlying the original communications, utilizing the altered sentiment expressed in LLM-rephrased tweets as a basis. While useful, this approach faces limitations when the origin of the text-whether directly crafted by a human or modified by an LLM-remains uncertain. To address such scenarios where the text's provenance is ambiguous, we develop a second approach based on the fine-tuning of LLMs. This fine-tuning process not only helps in aligning the sentiment of LLM-generated texts more closely with human sentiment but also offers a robust solution to the challenges posed by the indeterminate origins of digital content. This research highlights the impact of LLMs on the linguistic characteristics and sentiment of UGC, and more importantly, offers practical solutions to mitigate these biases, thereby ensuring the continued reliability of sentiment analysis in research and policy.</p>\",\"PeriodicalId\":74468,\"journal\":{\"name\":\"PNAS nexus\",\"volume\":\"4 2\",\"pages\":\"pgaf034\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11852273/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PNAS nexus\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/pnasnexus/pgaf034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PNAS nexus","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/pnasnexus/pgaf034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Echoes of authenticity: Reclaiming human sentiment in the large language model era.
This paper scrutinizes the unintended consequences of employing large language models (LLMs) like ChatGPT for editing user-generated content (UGC), particularly focusing on alterations in sentiment. Through a detailed analysis of a climate change tweet dataset, we uncover that LLM-rephrased tweets tend to display a more neutral sentiment than their original counterparts. By replicating an established study on public opinions regarding climate change, we illustrate how such sentiment alterations can potentially skew the results of research relying on UGC. To counteract the biases introduced by LLMs, our research outlines two effective strategies. First, we employ predictive models capable of retroactively identifying the true human sentiment underlying the original communications, utilizing the altered sentiment expressed in LLM-rephrased tweets as a basis. While useful, this approach faces limitations when the origin of the text-whether directly crafted by a human or modified by an LLM-remains uncertain. To address such scenarios where the text's provenance is ambiguous, we develop a second approach based on the fine-tuning of LLMs. This fine-tuning process not only helps in aligning the sentiment of LLM-generated texts more closely with human sentiment but also offers a robust solution to the challenges posed by the indeterminate origins of digital content. This research highlights the impact of LLMs on the linguistic characteristics and sentiment of UGC, and more importantly, offers practical solutions to mitigate these biases, thereby ensuring the continued reliability of sentiment analysis in research and policy.