Ruoting Li, Margaret Tobey, M. Mayorga, Sherrie Caltagirone, Osman Y. Özaltın
{"title":"Detecting Human Trafficking: Automated Classification of Online Customer Reviews of Massage Businesses","authors":"Ruoting Li, Margaret Tobey, M. Mayorga, Sherrie Caltagirone, Osman Y. Özaltın","doi":"10.1287/msom.2023.1196","DOIUrl":null,"url":null,"abstract":"Problem definition: Approximately 11,000 alleged illicit massage businesses (IMBs) exist across the United States hidden in plain sight among legitimate businesses. These illicit businesses frequently exploit workers, many of whom are victims of human trafficking, forced or coerced to provide commercial sex. Academic/practical relevance: Although IMB review boards like Rubmaps.ch can provide first-hand information to identify IMBs, these sites are likely to be closed by law enforcement. Open websites like Yelp.com provide more accessible and detailed information about a larger set of massage businesses. Reviews from these sites can be screened for risk factors of trafficking. Methodology: We develop a natural language processing approach to detect online customer reviews that indicate a massage business is likely engaged in human trafficking. We label data sets of Yelp reviews using knowledge of known IMBs. We develop a lexicon of key words/phrases related to human trafficking and commercial sex acts. We then build two classification models based on this lexicon. We also train two classification models using embeddings from the bidirectional encoder representations from transformers (BERT) model and the Doc2Vec model. Results: We evaluate the performance of these classification models and various ensemble models. The lexicon-based models achieve high precision, whereas the embedding-based models have relatively high recall. The ensemble models provide a compromise and achieve the best performance on the out-of-sample test. Our results verify the usefulness of ensemble methods for building robust models to detect risk factors of human trafficking in reviews on open websites like Yelp. Managerial implications: The proposed models can save countless hours in IMB investigations by automatically sorting through large quantities of data to flag potential illicit activity, eliminating the need for manual screening of these reviews by law enforcement and other stakeholders. Funding: This work was supported by the National Science Foundation [Grant 1936331]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.1196 .","PeriodicalId":119284,"journal":{"name":"Manufacturing & Service Operations Management","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Manufacturing & Service Operations Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/msom.2023.1196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Problem definition: Approximately 11,000 alleged illicit massage businesses (IMBs) exist across the United States hidden in plain sight among legitimate businesses. These illicit businesses frequently exploit workers, many of whom are victims of human trafficking, forced or coerced to provide commercial sex. Academic/practical relevance: Although IMB review boards like Rubmaps.ch can provide first-hand information to identify IMBs, these sites are likely to be closed by law enforcement. Open websites like Yelp.com provide more accessible and detailed information about a larger set of massage businesses. Reviews from these sites can be screened for risk factors of trafficking. Methodology: We develop a natural language processing approach to detect online customer reviews that indicate a massage business is likely engaged in human trafficking. We label data sets of Yelp reviews using knowledge of known IMBs. We develop a lexicon of key words/phrases related to human trafficking and commercial sex acts. We then build two classification models based on this lexicon. We also train two classification models using embeddings from the bidirectional encoder representations from transformers (BERT) model and the Doc2Vec model. Results: We evaluate the performance of these classification models and various ensemble models. The lexicon-based models achieve high precision, whereas the embedding-based models have relatively high recall. The ensemble models provide a compromise and achieve the best performance on the out-of-sample test. Our results verify the usefulness of ensemble methods for building robust models to detect risk factors of human trafficking in reviews on open websites like Yelp. Managerial implications: The proposed models can save countless hours in IMB investigations by automatically sorting through large quantities of data to flag potential illicit activity, eliminating the need for manual screening of these reviews by law enforcement and other stakeholders. Funding: This work was supported by the National Science Foundation [Grant 1936331]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.1196 .