A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements

IF 4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine learning and knowledge extraction Pub Date : 2023-05-10 DOI:10.3390/make5020028
L. Summers, Alyssa N. Shallenberger, John Cruz, Lawrence V. Fulton
{"title":"A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements","authors":"L. Summers, Alyssa N. Shallenberger, John Cruz, Lawrence V. Fulton","doi":"10.3390/make5020028","DOIUrl":null,"url":null,"abstract":"Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test multi-input, deep learning (DL) binary classification models to predict the probability of an online escort ad being associated with sex trafficking (ST) activity and aid in the detection and investigation of ST. Data from 12,350 scraped and classified ads were split into training and test sets (80% and 20%, respectively). Multi-input models that included recurrent neural networks (RNN) for text classification, convolutional neural networks (CNN, specifically EfficientNetB6 or ENET) for image/emoji classification, and neural networks (NN) for feature classification were trained and used to classify the 20% test set. The best-performing DL model included text and imagery inputs, resulting in an accuracy of 0.82 and an F1 score of 0.70. More importantly, the best classifier (RNN + ENET) correctly identified 14 of 14 sites that had classification probability estimates of 0.845 or greater (1.0 precision); precision was 96% for the multi-input model (NN + RNN + ENET) when only the ads associated with the highest positive classification probabilities (>0.90) were considered (n = 202 ads). The models developed could be productionalized and piloted with criminal investigators, as they could potentially increase their efficiency in identifying potential ST victims.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"101 1","pages":"460-472"},"PeriodicalIF":4.0000,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge extraction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/make5020028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test multi-input, deep learning (DL) binary classification models to predict the probability of an online escort ad being associated with sex trafficking (ST) activity and aid in the detection and investigation of ST. Data from 12,350 scraped and classified ads were split into training and test sets (80% and 20%, respectively). Multi-input models that included recurrent neural networks (RNN) for text classification, convolutional neural networks (CNN, specifically EfficientNetB6 or ENET) for image/emoji classification, and neural networks (NN) for feature classification were trained and used to classify the 20% test set. The best-performing DL model included text and imagery inputs, resulting in an accuracy of 0.82 and an F1 score of 0.70. More importantly, the best classifier (RNN + ENET) correctly identified 14 of 14 sites that had classification probability estimates of 0.845 or greater (1.0 precision); precision was 96% for the multi-input model (NN + RNN + ENET) when only the ads associated with the highest positive classification probabilities (>0.90) were considered (n = 202 ads). The models developed could be productionalized and piloted with criminal investigators, as they could potentially increase their efficiency in identifying potential ST victims.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种多输入机器学习方法对在线应召广告中的性交易进行分类
性交易的受害者通常是通过在线陪护网站发布广告的。这些广告可以公开访问,但执法部门缺乏资源来梳理数以百计的广告,以识别那些可能涉及性交易的个人。本研究的目的是实施和测试多输入、深度学习(DL)二元分类模型,以预测在线陪游广告与性交易(ST)活动相关的概率,并帮助检测和调查ST。来自12,350个收集和分类广告的数据被分为训练集和测试集(分别为80%和20%)。多输入模型包括用于文本分类的循环神经网络(RNN)、用于图像/表情符号分类的卷积神经网络(CNN,特别是effentnetb6或ENET)和用于特征分类的神经网络(NN),并用于对20%测试集进行分类。表现最好的深度学习模型包括文本和图像输入,其准确率为0.82,F1得分为0.70。更重要的是,最佳分类器(RNN + ENET)正确识别了14个站点中的14个,分类概率估计为0.845或更高(精度为1.0);当只考虑具有最高正分类概率(>0.90)的广告(n = 202个广告)时,多输入模型(NN + RNN + ENET)的准确率为96%。开发的模型可以在刑事调查人员中进行生产和试点,因为它们可能提高他们识别潜在性传播疾病受害者的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.30
自引率
0.00%
发文量
0
审稿时长
7 weeks
期刊最新文献
Knowledge Graph Extraction of Business Interactions from News Text for Business Networking Analysis Machine Learning for an Enhanced Credit Risk Analysis: A Comparative Study of Loan Approval Prediction Models Integrating Mental Health Data A Data Mining Approach for Health Transport Demand Predicting Wind Comfort in an Urban Area: A Comparison of a Regression- with a Classification-CNN for General Wind Rose Statistics An Evaluative Baseline for Sentence-Level Semantic Division
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1