Detecting Human Trafficking: Automated Classification of Online Customer Reviews of Massage Businesses

Ruoting Li, Margaret Tobey, M. Mayorga, Sherrie Caltagirone, Osman Y. Özaltın
{"title":"Detecting Human Trafficking: Automated Classification of Online Customer Reviews of Massage Businesses","authors":"Ruoting Li, Margaret Tobey, M. Mayorga, Sherrie Caltagirone, Osman Y. Özaltın","doi":"10.1287/msom.2023.1196","DOIUrl":null,"url":null,"abstract":"Problem definition: Approximately 11,000 alleged illicit massage businesses (IMBs) exist across the United States hidden in plain sight among legitimate businesses. These illicit businesses frequently exploit workers, many of whom are victims of human trafficking, forced or coerced to provide commercial sex. Academic/practical relevance: Although IMB review boards like Rubmaps.ch can provide first-hand information to identify IMBs, these sites are likely to be closed by law enforcement. Open websites like Yelp.com provide more accessible and detailed information about a larger set of massage businesses. Reviews from these sites can be screened for risk factors of trafficking. Methodology: We develop a natural language processing approach to detect online customer reviews that indicate a massage business is likely engaged in human trafficking. We label data sets of Yelp reviews using knowledge of known IMBs. We develop a lexicon of key words/phrases related to human trafficking and commercial sex acts. We then build two classification models based on this lexicon. We also train two classification models using embeddings from the bidirectional encoder representations from transformers (BERT) model and the Doc2Vec model. Results: We evaluate the performance of these classification models and various ensemble models. The lexicon-based models achieve high precision, whereas the embedding-based models have relatively high recall. The ensemble models provide a compromise and achieve the best performance on the out-of-sample test. Our results verify the usefulness of ensemble methods for building robust models to detect risk factors of human trafficking in reviews on open websites like Yelp. Managerial implications: The proposed models can save countless hours in IMB investigations by automatically sorting through large quantities of data to flag potential illicit activity, eliminating the need for manual screening of these reviews by law enforcement and other stakeholders. Funding: This work was supported by the National Science Foundation [Grant 1936331]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.1196 .","PeriodicalId":119284,"journal":{"name":"Manufacturing & Service Operations Management","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Manufacturing & Service Operations Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/msom.2023.1196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Problem definition: Approximately 11,000 alleged illicit massage businesses (IMBs) exist across the United States hidden in plain sight among legitimate businesses. These illicit businesses frequently exploit workers, many of whom are victims of human trafficking, forced or coerced to provide commercial sex. Academic/practical relevance: Although IMB review boards like Rubmaps.ch can provide first-hand information to identify IMBs, these sites are likely to be closed by law enforcement. Open websites like Yelp.com provide more accessible and detailed information about a larger set of massage businesses. Reviews from these sites can be screened for risk factors of trafficking. Methodology: We develop a natural language processing approach to detect online customer reviews that indicate a massage business is likely engaged in human trafficking. We label data sets of Yelp reviews using knowledge of known IMBs. We develop a lexicon of key words/phrases related to human trafficking and commercial sex acts. We then build two classification models based on this lexicon. We also train two classification models using embeddings from the bidirectional encoder representations from transformers (BERT) model and the Doc2Vec model. Results: We evaluate the performance of these classification models and various ensemble models. The lexicon-based models achieve high precision, whereas the embedding-based models have relatively high recall. The ensemble models provide a compromise and achieve the best performance on the out-of-sample test. Our results verify the usefulness of ensemble methods for building robust models to detect risk factors of human trafficking in reviews on open websites like Yelp. Managerial implications: The proposed models can save countless hours in IMB investigations by automatically sorting through large quantities of data to flag potential illicit activity, eliminating the need for manual screening of these reviews by law enforcement and other stakeholders. Funding: This work was supported by the National Science Foundation [Grant 1936331]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.1196 .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
侦测人口贩卖:按摩业者在线顾客评论的自动分类
问题定义:在美国,大约有1.1万家涉嫌非法的按摩企业(imb)隐藏在合法企业的眼皮底下。这些非法企业经常剥削工人,其中许多人是人口贩运的受害者,被迫或胁迫提供商业性行为。学术/实践相关性:尽管IMB审查委员会像Rubmaps。如果这些网站不能提供第一手信息来识别imb,这些网站很可能会被执法部门关闭。像Yelp.com这样的开放网站提供了更多关于按摩行业的更方便和详细的信息。可以筛选来自这些网站的评论,以发现贩运的危险因素。方法:我们开发了一种自然语言处理方法来检测在线客户评论,这些评论表明按摩业务可能涉及人口贩运。我们使用已知imb的知识标记Yelp评论的数据集。我们开发了一个与人口贩卖和商业性行为相关的关键词/短语词典。然后,我们基于这个词典构建了两个分类模型。我们还使用来自变压器(BERT)模型和Doc2Vec模型的双向编码器表示的嵌入来训练两个分类模型。结果:我们评估了这些分类模型和各种集成模型的性能。基于词典的模型具有较高的准确率,而基于嵌入的模型具有较高的召回率。集成模型提供了一种折衷方案,并在样本外测试中实现了最佳性能。我们的结果验证了集成方法在建立鲁棒模型以检测开放网站(如Yelp)评论中人口贩运风险因素方面的有效性。管理意义:拟议的模型可以通过自动整理大量数据来标记潜在的非法活动,从而节省IMB调查的无数时间,消除了执法部门和其他利益相关者对这些审查进行人工筛选的需要。基金资助:本研究由美国国家科学基金资助[Grant 1936331]。补充材料:在线附录可在https://doi.org/10.1287/msom.2023.1196上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Physician Adoption of AI Assistant Disclosing Delivery Performance Information When Consumers Are Sensitive to Promised Delivery Time, Delivery Reliability, and Price Loyalty Currency and Mental Accounting: Do Consumers Treat Points Like Money? Dealership or Marketplace with Fulfillment Services: A Dynamic Comparison Introduction: Frontiers in Operations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1