{"title":"Case reports unlocked: Harnessing large language models to advance research on child maltreatment","authors":"Dragan Stoll , Samuel Wehrli , David Lätsch","doi":"10.1016/j.chiabu.2024.107202","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Research on child protective services (CPS) is impeded by a lack of high-quality structured data. Crucial information on cases is often documented in case files, but only in narrative form. Researchers have applied automated language processing to extract structured data from these narratives, but this has been limited to classification tasks of fairly low complexity. Large language models (LLMs) may work for more challenging tasks.</div></div><div><h3>Objective</h3><div>We aimed to extract structured data from narrative casework reports by applying LLMs to distinguish between different subtypes of violence: child sexual abuse, child physical abuse, a child witnessing domestic violence, and a child being physically aggressive.</div></div><div><h3>Methods</h3><div>We developed a four-stage pipeline comprising of (1) text segmentation, (2) text segment classification, and subsequent labeling of (3) casework reports, and (4) cases. All CPS reports (<em>N</em> = 29,770) between 2008 and 2022 from Switzerland's largest CPS provider were collected. 28,223 text segments were extracted based on pre-defined keywords. Two human reviewers annotated random samples of text segments and reports for training and validation. Model performance was compared against human-coded test data.</div></div><div><h3>Results</h3><div>The best-performing LLM (Mixtral-8x7B) classified text segments with an accuracy of 87 %, outperforming agreement between the two human reviewers (77 %). The model also correctly labelled casework reports with an accuracy of 87 %, but only when disregarding non-extracted text segments in stage (1).</div></div><div><h3>Conclusions</h3><div>LLMs can replicate human coding of text documents even for highly complex tasks that require contextual information. This may considerably advance research on CPS. Transparency can be achieved by backtracking labeling decisions to individual text segments. Keyword-based text segmentation was identified as a weak point, and the potential for bias that may occur at several stages of the process requires attention.</div></div>","PeriodicalId":51343,"journal":{"name":"Child Abuse & Neglect","volume":"160 ","pages":"Article 107202"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Child Abuse & Neglect","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0145213424005957","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FAMILY STUDIES","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Research on child protective services (CPS) is impeded by a lack of high-quality structured data. Crucial information on cases is often documented in case files, but only in narrative form. Researchers have applied automated language processing to extract structured data from these narratives, but this has been limited to classification tasks of fairly low complexity. Large language models (LLMs) may work for more challenging tasks.
Objective
We aimed to extract structured data from narrative casework reports by applying LLMs to distinguish between different subtypes of violence: child sexual abuse, child physical abuse, a child witnessing domestic violence, and a child being physically aggressive.
Methods
We developed a four-stage pipeline comprising of (1) text segmentation, (2) text segment classification, and subsequent labeling of (3) casework reports, and (4) cases. All CPS reports (N = 29,770) between 2008 and 2022 from Switzerland's largest CPS provider were collected. 28,223 text segments were extracted based on pre-defined keywords. Two human reviewers annotated random samples of text segments and reports for training and validation. Model performance was compared against human-coded test data.
Results
The best-performing LLM (Mixtral-8x7B) classified text segments with an accuracy of 87 %, outperforming agreement between the two human reviewers (77 %). The model also correctly labelled casework reports with an accuracy of 87 %, but only when disregarding non-extracted text segments in stage (1).
Conclusions
LLMs can replicate human coding of text documents even for highly complex tasks that require contextual information. This may considerably advance research on CPS. Transparency can be achieved by backtracking labeling decisions to individual text segments. Keyword-based text segmentation was identified as a weak point, and the potential for bias that may occur at several stages of the process requires attention.
期刊介绍:
Official Publication of the International Society for Prevention of Child Abuse and Neglect. Child Abuse & Neglect The International Journal, provides an international, multidisciplinary forum on all aspects of child abuse and neglect, with special emphasis on prevention and treatment; the scope extends further to all those aspects of life which either favor or hinder child development. While contributions will primarily be from the fields of psychology, psychiatry, social work, medicine, nursing, law enforcement, legislature, education, and anthropology, the Journal encourages the concerned lay individual and child-oriented advocate organizations to contribute.