解码人工智能与人类作者身份:通过 NLP 和统计分析揭示细微差别

Mayowa Akinwande, Oluwaseyi Adeliyi, Toyyibat Yussuph
{"title":"解码人工智能与人类作者身份:通过 NLP 和统计分析揭示细微差别","authors":"Mayowa Akinwande, Oluwaseyi Adeliyi, Toyyibat Yussuph","doi":"arxiv-2408.00769","DOIUrl":null,"url":null,"abstract":"This research explores the nuanced differences in texts produced by AI and\nthose written by humans, aiming to elucidate how language is expressed\ndifferently by AI and humans. Through comprehensive statistical data analysis,\nthe study investigates various linguistic traits, patterns of creativity, and\npotential biases inherent in human-written and AI- generated texts. The\nsignificance of this research lies in its contribution to understanding AI's\ncreative capabilities and its impact on literature, communication, and societal\nframeworks. By examining a meticulously curated dataset comprising 500K essays\nspanning diverse topics and genres, generated by LLMs, or written by humans,\nthe study uncovers the deeper layers of linguistic expression and provides\ninsights into the cognitive processes underlying both AI and human-driven\ntextual compositions. The analysis revealed that human-authored essays tend to\nhave a higher total word count on average than AI-generated essays but have a\nshorter average word length compared to AI- generated essays, and while both\ngroups exhibit high levels of fluency, the vocabulary diversity of Human\nauthored content is higher than AI generated content. However, AI- generated\nessays show a slightly higher level of novelty, suggesting the potential for\ngenerating more original content through AI systems. The paper addresses\nchallenges in assessing the language generation capabilities of AI models and\nemphasizes the importance of datasets that reflect the complexities of human-AI\ncollaborative writing. Through systematic preprocessing and rigorous\nstatistical analysis, this study offers valuable insights into the evolving\nlandscape of AI-generated content and informs future developments in natural\nlanguage processing (NLP).","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis\",\"authors\":\"Mayowa Akinwande, Oluwaseyi Adeliyi, Toyyibat Yussuph\",\"doi\":\"arxiv-2408.00769\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research explores the nuanced differences in texts produced by AI and\\nthose written by humans, aiming to elucidate how language is expressed\\ndifferently by AI and humans. Through comprehensive statistical data analysis,\\nthe study investigates various linguistic traits, patterns of creativity, and\\npotential biases inherent in human-written and AI- generated texts. The\\nsignificance of this research lies in its contribution to understanding AI's\\ncreative capabilities and its impact on literature, communication, and societal\\nframeworks. By examining a meticulously curated dataset comprising 500K essays\\nspanning diverse topics and genres, generated by LLMs, or written by humans,\\nthe study uncovers the deeper layers of linguistic expression and provides\\ninsights into the cognitive processes underlying both AI and human-driven\\ntextual compositions. The analysis revealed that human-authored essays tend to\\nhave a higher total word count on average than AI-generated essays but have a\\nshorter average word length compared to AI- generated essays, and while both\\ngroups exhibit high levels of fluency, the vocabulary diversity of Human\\nauthored content is higher than AI generated content. However, AI- generated\\nessays show a slightly higher level of novelty, suggesting the potential for\\ngenerating more original content through AI systems. The paper addresses\\nchallenges in assessing the language generation capabilities of AI models and\\nemphasizes the importance of datasets that reflect the complexities of human-AI\\ncollaborative writing. Through systematic preprocessing and rigorous\\nstatistical analysis, this study offers valuable insights into the evolving\\nlandscape of AI-generated content and informs future developments in natural\\nlanguage processing (NLP).\",\"PeriodicalId\":501285,\"journal\":{\"name\":\"arXiv - CS - Digital Libraries\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Digital Libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.00769\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.00769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究探讨了人工智能生成的文本与人类撰写的文本之间的细微差别,旨在阐明人工智能和人类是如何以不同的方式表达语言的。通过全面的统计数据分析,本研究调查了人类撰写的文本和人工智能生成的文本中固有的各种语言特征、创造性模式和潜在偏见。这项研究的意义在于,它有助于理解人工智能的创造能力及其对文学、交流和社会框架的影响。该研究通过检查一个精心策划的数据集,其中包括 500K 篇由 LLM 生成或由人类撰写的不同主题和体裁的论文,揭示了语言表达的深层含义,并提供了对人工智能和人类文本创作的认知过程的见解。分析表明,人类撰写的文章平均总字数往往高于人工智能生成的文章,但平均字长却短于人工智能生成的文章;虽然两组文章都表现出较高的流畅性,但人类撰写的内容的词汇多样性却高于人工智能生成的内容。不过,人工智能生成的文章显示出稍高的新颖性,这表明人工智能系统有可能生成更多原创内容。本文探讨了评估人工智能模型语言生成能力的挑战,并强调了反映人类-人工智能协作写作复杂性的数据集的重要性。通过系统的预处理和严格的统计分析,本研究为了解人工智能生成内容的演变过程提供了宝贵的见解,并为自然语言处理(NLP)的未来发展提供了参考。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis
This research explores the nuanced differences in texts produced by AI and those written by humans, aiming to elucidate how language is expressed differently by AI and humans. Through comprehensive statistical data analysis, the study investigates various linguistic traits, patterns of creativity, and potential biases inherent in human-written and AI- generated texts. The significance of this research lies in its contribution to understanding AI's creative capabilities and its impact on literature, communication, and societal frameworks. By examining a meticulously curated dataset comprising 500K essays spanning diverse topics and genres, generated by LLMs, or written by humans, the study uncovers the deeper layers of linguistic expression and provides insights into the cognitive processes underlying both AI and human-driven textual compositions. The analysis revealed that human-authored essays tend to have a higher total word count on average than AI-generated essays but have a shorter average word length compared to AI- generated essays, and while both groups exhibit high levels of fluency, the vocabulary diversity of Human authored content is higher than AI generated content. However, AI- generated essays show a slightly higher level of novelty, suggesting the potential for generating more original content through AI systems. The paper addresses challenges in assessing the language generation capabilities of AI models and emphasizes the importance of datasets that reflect the complexities of human-AI collaborative writing. Through systematic preprocessing and rigorous statistical analysis, this study offers valuable insights into the evolving landscape of AI-generated content and informs future developments in natural language processing (NLP).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Publishing Instincts: An Exploration-Exploitation Framework for Studying Academic Publishing Behavior and "Home Venues" Research Citations Building Trust in Wikipedia Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness Towards understanding evolution of science through language model series Ensuring Adherence to Standards in Experiment-Related Metadata Entered Via Spreadsheets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1