码字检测,关注两个微博语料库相似词的差异

Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga
{"title":"码字检测,关注两个微博语料库相似词的差异","authors":"Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga","doi":"10.33166/AETIC.2021.02.008","DOIUrl":null,"url":null,"abstract":"Recently, the use of microblogs in drug trafficking has surged and become a social problem. A common method applied by cyber patrols to repress crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages maximally exploit “codewords” rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they gain popularity; thus, effective codeword detection requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts to detect codewords with a high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection to evaluate the effectiveness of the method. The results showed that the proposed method could detect concealed words other than those in the initial list and to a better degree than the baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that instigate crimes, thereby potentially reducing the burden of continuous codeword surveillance.","PeriodicalId":36440,"journal":{"name":"Annals of Emerging Technologies in Computing","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Codeword Detection, Focusing on Differences in Similar Words Between Two Corpora of Microblogs\",\"authors\":\"Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga\",\"doi\":\"10.33166/AETIC.2021.02.008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, the use of microblogs in drug trafficking has surged and become a social problem. A common method applied by cyber patrols to repress crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages maximally exploit “codewords” rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they gain popularity; thus, effective codeword detection requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts to detect codewords with a high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection to evaluate the effectiveness of the method. The results showed that the proposed method could detect concealed words other than those in the initial list and to a better degree than the baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that instigate crimes, thereby potentially reducing the burden of continuous codeword surveillance.\",\"PeriodicalId\":36440,\"journal\":{\"name\":\"Annals of Emerging Technologies in Computing\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Emerging Technologies in Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33166/AETIC.2021.02.008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Emerging Technologies in Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33166/AETIC.2021.02.008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

摘要

最近,微博在毒品交易中的使用激增,成为一个社会问题。网络巡逻队用来打击毒品走私等犯罪的常用方法是搜索与犯罪相关的关键词。然而,犯罪分子在发布诱导犯罪的信息时,最大限度地利用“暗语”,而不是关键词,如“enjo kosai”、“大麻”、“甲基苯丙胺”,来掩饰他们的犯罪意图。研究表明,这些码字一旦流行起来,就会发生变化;因此,有效的码字检测需要大量的工作来跟踪最新的码字。在本研究中,我们将重点放在码字的外观和可能被包含在犯罪帖子中的码字上,以检测可能被包含在犯罪帖子中的码字。我们提出了基于词使用差异的码字检测新方法,并进行了隐藏词检测实验来评估该方法的有效性。实验结果表明,该方法能够有效地检测出初始列表之外的隐藏词,且检测效果优于基线方法。这些发现证明了所提出的方法能够快速自动地检测随时间变化的码字和煽动犯罪的博客帖子,从而潜在地减少持续码字监视的负担。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Codeword Detection, Focusing on Differences in Similar Words Between Two Corpora of Microblogs
Recently, the use of microblogs in drug trafficking has surged and become a social problem. A common method applied by cyber patrols to repress crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages maximally exploit “codewords” rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they gain popularity; thus, effective codeword detection requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts to detect codewords with a high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection to evaluate the effectiveness of the method. The results showed that the proposed method could detect concealed words other than those in the initial list and to a better degree than the baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that instigate crimes, thereby potentially reducing the burden of continuous codeword surveillance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Annals of Emerging Technologies in Computing
Annals of Emerging Technologies in Computing Computer Science-Computer Science (all)
CiteScore
3.50
自引率
0.00%
发文量
26
期刊最新文献
The Proposal of Countermeasures for DeepFake Voices on Social Media Considering Waveform and Text Embedding Lightweight Model for Occlusion Removal from Face Images A Torpor-based Enhanced Security Model for CSMA/CA Protocol in Wireless Networks Enhancing Robot Navigation Efficiency Using Cellular Automata with Active Cells Wildfire Prediction in the United States Using Time Series Forecasting Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1