Codewords Detection in Microblogs Focusing on Differences in Word Use Between Two Corpora

Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga
{"title":"Codewords Detection in Microblogs Focusing on Differences in Word Use Between Two Corpora","authors":"Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga","doi":"10.1109/iCCECE49321.2020.9231109","DOIUrl":null,"url":null,"abstract":"In recent years, drug trafficking using microblogs has risen and become a social problem. A common method of cyber patrols for cracking down on crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages make maximum use of \"codewords\" rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they become popular; therefore, searching for a specific word requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts with aim to detect codewords with the high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection in order to evaluate method effectiveness. The results showed that the proposed method was capable of detecting concealed words other than those in the initial list and to better degree relative to baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that induce crimes, thereby potentially reducing the burden of continuous monitoring of codewords.","PeriodicalId":413847,"journal":{"name":"2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)","volume":"203 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iCCECE49321.2020.9231109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In recent years, drug trafficking using microblogs has risen and become a social problem. A common method of cyber patrols for cracking down on crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages make maximum use of "codewords" rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they become popular; therefore, searching for a specific word requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts with aim to detect codewords with the high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection in order to evaluate method effectiveness. The results showed that the proposed method was capable of detecting concealed words other than those in the initial list and to better degree relative to baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that induce crimes, thereby potentially reducing the burden of continuous monitoring of codewords.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
微博的码字检测——基于两种语料库用词差异的研究
近年来,利用微博贩卖毒品的现象越来越多,已经成为一个社会问题。打击毒品走私等犯罪的网络巡逻常用方法是搜索与犯罪相关的关键词。然而,犯罪分子在发布诱导犯罪的信息时,最大限度地使用“暗语”,而不是关键词,如“恩条kosai”、“大麻”、“甲基苯丙胺”等,来掩饰犯罪意图。研究表明,这些码字一旦流行起来,就会发生变化;因此,搜索一个特定的单词需要花费大量的精力来跟踪最新的码字。在本研究中,我们将重点放在码字的外观和那些可能被包含在犯罪帖子中的码字上,目的是检测那些可能被包含在犯罪帖子中的码字。我们提出了基于词使用差异的码字检测新方法,并进行了隐藏词检测实验,以评估方法的有效性。实验结果表明,该方法能够检测出初始列表之外的隐藏词,且检测程度优于基线方法。这些发现表明,所提出的方法能够快速、自动地检测随时间变化的码字和诱发犯罪的博客文章,从而有可能减少持续监控码字的负担。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Key-Value Store using High Level Synthesis Flow for Securities Trading System Design and Analysis of Fractional-Order PID Controller and its variants for Nonlinear Process using Kalman Filter A CMOS Current Starved VCO for Energy Harvesting applications Iris Recognition Performance Analysis for Noncooperative Conditions Effect of Preprocessing on Performance of Neural Networks for Microscopy Image Classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1