ScamGen：通过先进的模板增强语料生成技术揭示电信诈骗的心理模式

IF 8.9 1区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Computers in Human Behavior Pub Date : 2025-01-01 Epub Date: 2024-09-21 DOI:10.1016/j.chb.2024.108451

Xu Han , Qiang Li , Yaling Qi , Hongbo Cao , Witold Pedrycz , Wei Wang

{"title":"ScamGen：通过先进的模板增强语料生成技术揭示电信诈骗的心理模式","authors":"Xu Han , Qiang Li , Yaling Qi , Hongbo Cao , Witold Pedrycz , Wei Wang","doi":"10.1016/j.chb.2024.108451","DOIUrl":null,"url":null,"abstract":"<div><div>Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce <span>ScamGen</span>, a template-based data augmentation technique designed to enhance Chinese telephone scam data. <span>ScamGen</span> leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that <span>ScamGen</span> outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems.</div></div>","PeriodicalId":48471,"journal":{"name":"Computers in Human Behavior","volume":"162 ","pages":"Article 108451"},"PeriodicalIF":8.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ScamGen: Unveiling psychological patterns in tele-scam through advanced template-augmented corpus generation\",\"authors\":\"Xu Han , Qiang Li , Yaling Qi , Hongbo Cao , Witold Pedrycz , Wei Wang\",\"doi\":\"10.1016/j.chb.2024.108451\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce <span>ScamGen</span>, a template-based data augmentation technique designed to enhance Chinese telephone scam data. <span>ScamGen</span> leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that <span>ScamGen</span> outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems.</div></div>\",\"PeriodicalId\":48471,\"journal\":{\"name\":\"Computers in Human Behavior\",\"volume\":\"162 \",\"pages\":\"Article 108451\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in Human Behavior\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0747563224003194\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Human Behavior","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0747563224003194","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/21 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

电话诈骗具有深刻的心理影响，往往迫使受害者做出草率而严重的决定。由于电话互动的私密性，全面的数据集非常稀缺，因此研究这些骗局具有挑战性。在本文中，我们介绍了 ScamGen，这是一种基于模板的数据增强技术，旨在增强中国的电话诈骗数据。ScamGen 利用心理学的洞察力生成多样化的真实诈骗场景，重点关注诈骗者和受害者之间的心理动态。这种新颖的方法将心理学理论与数据增强相结合，与传统方法不同，它强调骗子与受害者之间的互动。我们的方法以多源数据收集框架为起点，编制了一个电话诈骗样本的初始种子数据集。利用句子和单词级别的扰动，我们扩展了种子数据，创建了一个涵盖各种诈骗场景的全面、多样的数据集。严格的评估表明，ScamGen 在生成高质量、多样化数据集方面优于大型语言模型。此外，我们还开发了五个深度学习模型，用于该数据集的意图检测，其中 BERT 的精度最高，达到 86.68%。该数据集将公开发布，它标志着我们在了解骗子伎俩和改进电话诈骗检测系统方面迈出了重要一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ScamGen: Unveiling psychological patterns in tele-scam through advanced template-augmented corpus generation

Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce ScamGen, a template-based data augmentation technique designed to enhance Chinese telephone scam data. ScamGen leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that ScamGen outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers in Human Behavior Multiple-

CiteScore

19.10

自引率

4.00%

发文量

381

审稿时长

40 days

期刊介绍： Computers in Human Behavior is a scholarly journal that explores the psychological aspects of computer use. It covers original theoretical works, research reports, literature reviews, and software and book reviews. The journal examines both the use of computers in psychology, psychiatry, and related fields, and the psychological impact of computer use on individuals, groups, and society. Articles discuss topics such as professional practice, training, research, human development, learning, cognition, personality, and social interactions. It focuses on human interactions with computers, considering the computer as a medium through which human behaviors are shaped and expressed. Professionals interested in the psychological aspects of computer use will find this journal valuable, even with limited knowledge of computers.