Catherine Billington, Gonzalo Rivero, Andrew Jannett, Jiating Chen
{"title":"A Machine Learning Model Helps Process Interviewer Comments in Computer-assisted Personal Interview Instruments: A Case Study","authors":"Catherine Billington, Gonzalo Rivero, Andrew Jannett, Jiating Chen","doi":"10.1177/1525822X221107053","DOIUrl":null,"url":null,"abstract":"During data collection, field interviewers often append notes or comments to a case in open text fields to request updates to case-level data. Processing these comments can improve data quality, but many are non-actionable, and processing remains a costly manual task. This article presents a case study using a novel application of machine learning tools to assist in the evaluation of these comments. Using over 5,000 comments from the Medical Expenditure Panel Survey, we built features that were fed to a machine learning model to predict a grouping category for each comment as previously assigned by data technicians to expedite processing. The model achieved high top-3 accuracy and was incorporated into a production tool for editing. A qualitative evaluation of the tool also provided encouraging results. This application of machine learning tools allowed a small but worthwhile increase in processing efficiency, while maintaining exacting standards for data quality.","PeriodicalId":48060,"journal":{"name":"Field Methods","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2022-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Field Methods","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/1525822X221107053","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ANTHROPOLOGY","Score":null,"Total":0}
引用次数: 1
Abstract
During data collection, field interviewers often append notes or comments to a case in open text fields to request updates to case-level data. Processing these comments can improve data quality, but many are non-actionable, and processing remains a costly manual task. This article presents a case study using a novel application of machine learning tools to assist in the evaluation of these comments. Using over 5,000 comments from the Medical Expenditure Panel Survey, we built features that were fed to a machine learning model to predict a grouping category for each comment as previously assigned by data technicians to expedite processing. The model achieved high top-3 accuracy and was incorporated into a production tool for editing. A qualitative evaluation of the tool also provided encouraging results. This application of machine learning tools allowed a small but worthwhile increase in processing efficiency, while maintaining exacting standards for data quality.
期刊介绍:
Field Methods (formerly Cultural Anthropology Methods) is devoted to articles about the methods used by field wzorkers in the social and behavioral sciences and humanities for the collection, management, and analysis data about human thought and/or human behavior in the natural world. Articles should focus on innovations and issues in the methods used, rather than on the reporting of research or theoretical/epistemological questions about research. High-quality articles using qualitative and quantitative methods-- from scientific or interpretative traditions-- dealing with data collection and analysis in applied and scholarly research from writers in the social sciences, humanities, and related professions are all welcome in the pages of the journal.