从长尾中提取信息:一种社会技术人工智能方法,用于在线非法植物贸易的犯罪学调查

S. Middleton, A. Lavorgna, Geoff Neumann, David Whitehead
{"title":"从长尾中提取信息:一种社会技术人工智能方法,用于在线非法植物贸易的犯罪学调查","authors":"S. Middleton, A. Lavorgna, Geoff Neumann, David Whitehead","doi":"10.1145/3394332.3402838","DOIUrl":null,"url":null,"abstract":"In today's online forums and marketplaces cybercrime activity can often be found lurking in plain sight behind legitimate posts. Most popular criminology techniques are either manually intensive, and so do not scale well, or focus on statistical summaries across websites and can miss infrequent behaviour patterns. We present an inter-disciplinary (computer science, criminology and conservation science) socio-technical artificial intelligence (AI) approach to information extraction from the long tail of online forums around internet-facilitated illegal trades of endangered species. Our methodology is highly iterative, taking entities of interest (e.g. endangered plant species, suspects, locations) identified by a criminologist and using them to direct computer science tools including crawling, searching and information extraction over many steps until an acceptable resulting intelligence package is achieved. We evaluate our approach using two case study experiments, each based on a one-week duration criminology investigation (aided by conservation science experts) and evaluate both named entity (NE) directed graph visualization and Latent Dirichlet Allocation (LDA) topic modelling. NE directed graph visualization consistently outperforms topic modelling for discovering connected entities in the long tail of online forums and marketplaces.","PeriodicalId":435721,"journal":{"name":"Companion Publication of the 12th ACM Conference on Web Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Information Extraction from the Long Tail: A Socio-Technical AI Approach for Criminology Investigations into the Online Illegal Plant Trade\",\"authors\":\"S. Middleton, A. Lavorgna, Geoff Neumann, David Whitehead\",\"doi\":\"10.1145/3394332.3402838\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In today's online forums and marketplaces cybercrime activity can often be found lurking in plain sight behind legitimate posts. Most popular criminology techniques are either manually intensive, and so do not scale well, or focus on statistical summaries across websites and can miss infrequent behaviour patterns. We present an inter-disciplinary (computer science, criminology and conservation science) socio-technical artificial intelligence (AI) approach to information extraction from the long tail of online forums around internet-facilitated illegal trades of endangered species. Our methodology is highly iterative, taking entities of interest (e.g. endangered plant species, suspects, locations) identified by a criminologist and using them to direct computer science tools including crawling, searching and information extraction over many steps until an acceptable resulting intelligence package is achieved. We evaluate our approach using two case study experiments, each based on a one-week duration criminology investigation (aided by conservation science experts) and evaluate both named entity (NE) directed graph visualization and Latent Dirichlet Allocation (LDA) topic modelling. NE directed graph visualization consistently outperforms topic modelling for discovering connected entities in the long tail of online forums and marketplaces.\",\"PeriodicalId\":435721,\"journal\":{\"name\":\"Companion Publication of the 12th ACM Conference on Web Science\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion Publication of the 12th ACM Conference on Web Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3394332.3402838\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 12th ACM Conference on Web Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3394332.3402838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

在当今的在线论坛和市场中,网络犯罪活动经常隐藏在合法帖子的背后。大多数流行的犯罪学技术要么是人工密集型的,因此不能很好地扩展,要么是专注于跨网站的统计摘要,可能会遗漏不常见的行为模式。我们提出了一种跨学科(计算机科学、犯罪学和保护科学)的社会技术人工智能(AI)方法,从围绕互联网促进的濒危物种非法交易的在线论坛的长尾中提取信息。我们的方法是高度迭代的,采用由犯罪学家确定的感兴趣的实体(例如濒危植物物种,嫌疑人,地点),并使用它们指导计算机科学工具,包括爬行,搜索和信息提取,经过许多步骤,直到获得可接受的结果情报包。我们使用两个案例研究实验来评估我们的方法,每个案例研究实验都基于为期一周的犯罪学调查(在保护科学专家的帮助下),并评估命名实体(NE)有向图可视化和潜在狄利克雷分配(LDA)主题建模。在发现在线论坛和市场的长尾连接实体方面,NE有向图可视化始终优于主题建模。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Information Extraction from the Long Tail: A Socio-Technical AI Approach for Criminology Investigations into the Online Illegal Plant Trade
In today's online forums and marketplaces cybercrime activity can often be found lurking in plain sight behind legitimate posts. Most popular criminology techniques are either manually intensive, and so do not scale well, or focus on statistical summaries across websites and can miss infrequent behaviour patterns. We present an inter-disciplinary (computer science, criminology and conservation science) socio-technical artificial intelligence (AI) approach to information extraction from the long tail of online forums around internet-facilitated illegal trades of endangered species. Our methodology is highly iterative, taking entities of interest (e.g. endangered plant species, suspects, locations) identified by a criminologist and using them to direct computer science tools including crawling, searching and information extraction over many steps until an acceptable resulting intelligence package is achieved. We evaluate our approach using two case study experiments, each based on a one-week duration criminology investigation (aided by conservation science experts) and evaluate both named entity (NE) directed graph visualization and Latent Dirichlet Allocation (LDA) topic modelling. NE directed graph visualization consistently outperforms topic modelling for discovering connected entities in the long tail of online forums and marketplaces.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Digital Inequality in Education in Argentina: How the pandemic of 2020 increased existing tensions BehaviourCoach: A Customisable and Socially-Enhanced Exergaming Application Development Framework The Secret Life of Immortal Data Multilingual Symbolic Support for Low Levels of Literacy on the Web Personalisation and Community 2020: User Modelling and Social Connections in Web Science, Healthcare and Education: Chairs’ Welcome and Workshop Summary
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1