Heather Desaire, Aleesa E Chua, Madeline Isom, Romana Jarosova, David Hua
{"title":"使用现成的机器学习工具将学术科学写作与人类或ChatGPT区分开来,准确率超过99%。","authors":"Heather Desaire, Aleesa E Chua, Madeline Isom, Romana Jarosova, David Hua","doi":"10.1016/j.xcrp.2023.101426","DOIUrl":null,"url":null,"abstract":"<p><p>ChatGPT has enabled access to artificial intelligence (AI)-generated writing for the masses, initiating a culture shift in the way people work, learn, and write. The need to discriminate human writing from AI is now both critical and urgent. Addressing this need, we report a method for discriminating text generated by ChatGPT from (human) academic scientists, relying on prevalent and accessible supervised classification methods. The approach uses new features for discriminating (these) humans from AI; as examples, scientists write long paragraphs and have a penchant for equivocal language, frequently using words like \"but,\" \"however,\" and \"although.\" With a set of 20 features, we built a model that assigns the author, as human or AI, at over 99% accuracy. This strategy could be further adapted and developed by others with basic skills in supervised classification, enabling access to many highly accurate and targeted models for detecting AI usage in academic writing and beyond.</p>","PeriodicalId":9703,"journal":{"name":"Cell Reports Physical Science","volume":"4 6","pages":""},"PeriodicalIF":7.9000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/9e/2a/nihms-1911044.PMC10328544.pdf","citationCount":"9","resultStr":"{\"title\":\"Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools.\",\"authors\":\"Heather Desaire, Aleesa E Chua, Madeline Isom, Romana Jarosova, David Hua\",\"doi\":\"10.1016/j.xcrp.2023.101426\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>ChatGPT has enabled access to artificial intelligence (AI)-generated writing for the masses, initiating a culture shift in the way people work, learn, and write. The need to discriminate human writing from AI is now both critical and urgent. Addressing this need, we report a method for discriminating text generated by ChatGPT from (human) academic scientists, relying on prevalent and accessible supervised classification methods. The approach uses new features for discriminating (these) humans from AI; as examples, scientists write long paragraphs and have a penchant for equivocal language, frequently using words like \\\"but,\\\" \\\"however,\\\" and \\\"although.\\\" With a set of 20 features, we built a model that assigns the author, as human or AI, at over 99% accuracy. This strategy could be further adapted and developed by others with basic skills in supervised classification, enabling access to many highly accurate and targeted models for detecting AI usage in academic writing and beyond.</p>\",\"PeriodicalId\":9703,\"journal\":{\"name\":\"Cell Reports Physical Science\",\"volume\":\"4 6\",\"pages\":\"\"},\"PeriodicalIF\":7.9000,\"publicationDate\":\"2023-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/9e/2a/nihms-1911044.PMC10328544.pdf\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cell Reports Physical Science\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1016/j.xcrp.2023.101426\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell Reports Physical Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1016/j.xcrp.2023.101426","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools.
ChatGPT has enabled access to artificial intelligence (AI)-generated writing for the masses, initiating a culture shift in the way people work, learn, and write. The need to discriminate human writing from AI is now both critical and urgent. Addressing this need, we report a method for discriminating text generated by ChatGPT from (human) academic scientists, relying on prevalent and accessible supervised classification methods. The approach uses new features for discriminating (these) humans from AI; as examples, scientists write long paragraphs and have a penchant for equivocal language, frequently using words like "but," "however," and "although." With a set of 20 features, we built a model that assigns the author, as human or AI, at over 99% accuracy. This strategy could be further adapted and developed by others with basic skills in supervised classification, enabling access to many highly accurate and targeted models for detecting AI usage in academic writing and beyond.
期刊介绍:
Cell Reports Physical Science, a premium open-access journal from Cell Press, features high-quality, cutting-edge research spanning the physical sciences. It serves as an open forum fostering collaboration among physical scientists while championing open science principles. Published works must signify significant advancements in fundamental insight or technological applications within fields such as chemistry, physics, materials science, energy science, engineering, and related interdisciplinary studies. In addition to longer articles, the journal considers impactful short-form reports and short reviews covering recent literature in emerging fields. Continually adapting to the evolving open science landscape, the journal reviews its policies to align with community consensus and best practices.