码字检测，关注两个微博语料库相似词的差异

Q2 Computer Science Annals of Emerging Technologies in Computing Pub Date : 2021-04-01 DOI:10.33166/AETIC.2021.02.008

Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga

{"title":"码字检测，关注两个微博语料库相似词的差异","authors":"Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga","doi":"10.33166/AETIC.2021.02.008","DOIUrl":null,"url":null,"abstract":"Recently, the use of microblogs in drug trafficking has surged and become a social problem. A common method applied by cyber patrols to repress crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages maximally exploit “codewords” rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they gain popularity; thus, effective codeword detection requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts to detect codewords with a high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection to evaluate the effectiveness of the method. The results showed that the proposed method could detect concealed words other than those in the initial list and to a better degree than the baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that instigate crimes, thereby potentially reducing the burden of continuous codeword surveillance.","PeriodicalId":36440,"journal":{"name":"Annals of Emerging Technologies in Computing","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Codeword Detection, Focusing on Differences in Similar Words Between Two Corpora of Microblogs\",\"authors\":\"Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga\",\"doi\":\"10.33166/AETIC.2021.02.008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, the use of microblogs in drug trafficking has surged and become a social problem. A common method applied by cyber patrols to repress crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages maximally exploit “codewords” rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they gain popularity; thus, effective codeword detection requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts to detect codewords with a high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection to evaluate the effectiveness of the method. The results showed that the proposed method could detect concealed words other than those in the initial list and to a better degree than the baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that instigate crimes, thereby potentially reducing the burden of continuous codeword surveillance.\",\"PeriodicalId\":36440,\"journal\":{\"name\":\"Annals of Emerging Technologies in Computing\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Emerging Technologies in Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33166/AETIC.2021.02.008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Emerging Technologies in Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33166/AETIC.2021.02.008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

最近，微博在毒品交易中的使用激增，成为一个社会问题。网络巡逻队用来打击毒品走私等犯罪的常用方法是搜索与犯罪相关的关键词。然而，犯罪分子在发布诱导犯罪的信息时，最大限度地利用“暗语”，而不是关键词，如“enjo kosai”、“大麻”、“甲基苯丙胺”，来掩饰他们的犯罪意图。研究表明，这些码字一旦流行起来，就会发生变化;因此，有效的码字检测需要大量的工作来跟踪最新的码字。在本研究中，我们将重点放在码字的外观和可能被包含在犯罪帖子中的码字上，以检测可能被包含在犯罪帖子中的码字。我们提出了基于词使用差异的码字检测新方法，并进行了隐藏词检测实验来评估该方法的有效性。实验结果表明，该方法能够有效地检测出初始列表之外的隐藏词，且检测效果优于基线方法。这些发现证明了所提出的方法能够快速自动地检测随时间变化的码字和煽动犯罪的博客帖子，从而潜在地减少持续码字监视的负担。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Codeword Detection, Focusing on Differences in Similar Words Between Two Corpora of Microblogs

Recently, the use of microblogs in drug trafficking has surged and become a social problem. A common method applied by cyber patrols to repress crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages maximally exploit “codewords” rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they gain popularity; thus, effective codeword detection requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts to detect codewords with a high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection to evaluate the effectiveness of the method. The results showed that the proposed method could detect concealed words other than those in the initial list and to a better degree than the baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that instigate crimes, thereby potentially reducing the burden of continuous codeword surveillance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annals of Emerging Technologies in Computing Computer Science-Computer Science (all)

CiteScore

3.50

自引率

0.00%

发文量