ScamGen:通过先进的模板增强语料生成技术揭示电信诈骗的心理模式

IF 9 1区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Computers in Human Behavior Pub Date : 2024-09-21 DOI:10.1016/j.chb.2024.108451
Xu Han , Qiang Li , Yaling Qi , Hongbo Cao , Witold Pedrycz , Wei Wang
{"title":"ScamGen:通过先进的模板增强语料生成技术揭示电信诈骗的心理模式","authors":"Xu Han ,&nbsp;Qiang Li ,&nbsp;Yaling Qi ,&nbsp;Hongbo Cao ,&nbsp;Witold Pedrycz ,&nbsp;Wei Wang","doi":"10.1016/j.chb.2024.108451","DOIUrl":null,"url":null,"abstract":"<div><div>Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce <span>ScamGen</span>, a template-based data augmentation technique designed to enhance Chinese telephone scam data. <span>ScamGen</span> leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that <span>ScamGen</span> outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems.</div></div>","PeriodicalId":48471,"journal":{"name":"Computers in Human Behavior","volume":null,"pages":null},"PeriodicalIF":9.0000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ScamGen: Unveiling psychological patterns in tele-scam through advanced template-augmented corpus generation\",\"authors\":\"Xu Han ,&nbsp;Qiang Li ,&nbsp;Yaling Qi ,&nbsp;Hongbo Cao ,&nbsp;Witold Pedrycz ,&nbsp;Wei Wang\",\"doi\":\"10.1016/j.chb.2024.108451\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce <span>ScamGen</span>, a template-based data augmentation technique designed to enhance Chinese telephone scam data. <span>ScamGen</span> leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that <span>ScamGen</span> outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems.</div></div>\",\"PeriodicalId\":48471,\"journal\":{\"name\":\"Computers in Human Behavior\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":9.0000,\"publicationDate\":\"2024-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in Human Behavior\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0747563224003194\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Human Behavior","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0747563224003194","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

摘要

电话诈骗具有深刻的心理影响,往往迫使受害者做出草率而严重的决定。由于电话互动的私密性,全面的数据集非常稀缺,因此研究这些骗局具有挑战性。在本文中,我们介绍了 ScamGen,这是一种基于模板的数据增强技术,旨在增强中国的电话诈骗数据。ScamGen 利用心理学的洞察力生成多样化的真实诈骗场景,重点关注诈骗者和受害者之间的心理动态。这种新颖的方法将心理学理论与数据增强相结合,与传统方法不同,它强调骗子与受害者之间的互动。我们的方法以多源数据收集框架为起点,编制了一个电话诈骗样本的初始种子数据集。利用句子和单词级别的扰动,我们扩展了种子数据,创建了一个涵盖各种诈骗场景的全面、多样的数据集。严格的评估表明,ScamGen 在生成高质量、多样化数据集方面优于大型语言模型。此外,我们还开发了五个深度学习模型,用于该数据集的意图检测,其中 BERT 的精度最高,达到 86.68%。该数据集将公开发布,它标志着我们在了解骗子伎俩和改进电话诈骗检测系统方面迈出了重要一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ScamGen: Unveiling psychological patterns in tele-scam through advanced template-augmented corpus generation
Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce ScamGen, a template-based data augmentation technique designed to enhance Chinese telephone scam data. ScamGen leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that ScamGen outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
19.10
自引率
4.00%
发文量
381
审稿时长
40 days
期刊介绍: Computers in Human Behavior is a scholarly journal that explores the psychological aspects of computer use. It covers original theoretical works, research reports, literature reviews, and software and book reviews. The journal examines both the use of computers in psychology, psychiatry, and related fields, and the psychological impact of computer use on individuals, groups, and society. Articles discuss topics such as professional practice, training, research, human development, learning, cognition, personality, and social interactions. It focuses on human interactions with computers, considering the computer as a medium through which human behaviors are shaped and expressed. Professionals interested in the psychological aspects of computer use will find this journal valuable, even with limited knowledge of computers.
期刊最新文献
What drives AI-based risk information-seeking intent? Insufficiency of risk information versus (Un)certainty of AI chatbots Frustrated cyber-abuser: Narcissistic traits in the context of the basic psychological needs and cyber dating abuse Continuous measures of decision-difficulty captured remotely: Mouse-tracking sensitivity extends to tablets and smartphones The negative consequences of networking through social network services: A social comparison perspective Can online behaviors be linked to mental health? Active versus passive social network usage on depression via envy and self-esteem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1