Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga
{"title":"微博的码字检测——基于两种语料库用词差异的研究","authors":"Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga","doi":"10.1109/iCCECE49321.2020.9231109","DOIUrl":null,"url":null,"abstract":"In recent years, drug trafficking using microblogs has risen and become a social problem. A common method of cyber patrols for cracking down on crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages make maximum use of \"codewords\" rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they become popular; therefore, searching for a specific word requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts with aim to detect codewords with the high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection in order to evaluate method effectiveness. The results showed that the proposed method was capable of detecting concealed words other than those in the initial list and to better degree relative to baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that induce crimes, thereby potentially reducing the burden of continuous monitoring of codewords.","PeriodicalId":413847,"journal":{"name":"2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)","volume":"203 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Codewords Detection in Microblogs Focusing on Differences in Word Use Between Two Corpora\",\"authors\":\"Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga\",\"doi\":\"10.1109/iCCECE49321.2020.9231109\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, drug trafficking using microblogs has risen and become a social problem. A common method of cyber patrols for cracking down on crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages make maximum use of \\\"codewords\\\" rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they become popular; therefore, searching for a specific word requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts with aim to detect codewords with the high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection in order to evaluate method effectiveness. The results showed that the proposed method was capable of detecting concealed words other than those in the initial list and to better degree relative to baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that induce crimes, thereby potentially reducing the burden of continuous monitoring of codewords.\",\"PeriodicalId\":413847,\"journal\":{\"name\":\"2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)\",\"volume\":\"203 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iCCECE49321.2020.9231109\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iCCECE49321.2020.9231109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Codewords Detection in Microblogs Focusing on Differences in Word Use Between Two Corpora
In recent years, drug trafficking using microblogs has risen and become a social problem. A common method of cyber patrols for cracking down on crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages make maximum use of "codewords" rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they become popular; therefore, searching for a specific word requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts with aim to detect codewords with the high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection in order to evaluate method effectiveness. The results showed that the proposed method was capable of detecting concealed words other than those in the initial list and to better degree relative to baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that induce crimes, thereby potentially reducing the burden of continuous monitoring of codewords.