Evaluation synthesis analysis can be accelerated through text mining, searching, and highlighting: A case-study on data extraction from 631 UNICEF evaluation reports

Lena Schmidt, Pauline Addis, Erica Mattellone, Hannah OKeefe, Kamilla Nabiyeva, Uyen Kim Huynh, Nabamallika Dehingia, Dawn Craig, Fiona Campbell
{"title":"Evaluation synthesis analysis can be accelerated through text mining, searching, and highlighting: A case-study on data extraction from 631 UNICEF evaluation reports","authors":"Lena Schmidt, Pauline Addis, Erica Mattellone, Hannah OKeefe, Kamilla Nabiyeva, Uyen Kim Huynh, Nabamallika Dehingia, Dawn Craig, Fiona Campbell","doi":"10.1101/2024.08.27.24312630","DOIUrl":null,"url":null,"abstract":"Background: The United Nations Children's Fund (UNICEF) is the United Nations agency dedicated to promoting and advocating for the protection of children's rights, meeting their basic needs, and expanding their opportunities to reach their full potential. They achieve this by working with governments, communities, and other partners via programmes that safeguard children from violence, provide access to quality education, ensure that children survive and thrive, provide access to water, sanitation and hygiene, and provide life-saving support in emergency contexts. Programmes are evaluated as part of UNICEF Evaluation Policy, and the publicly available reports include a wealth of information on results, recommendations, and lessons learned. Objective: To critically explore UNICEF's impact, a systematic synthesis of evaluations was conducted to provide a summary of UNICEF main achievements and areas where they could improve, as a reflection of key recommendations, lessons learned, enablers, and barriers to achieving their goals and to steer its future direction and strategy. Since the evaluations are extensive, manual analysis was not feasible, so a semi-automated approach was taken. Methods: This paper examines the automation techniques used to try and increase the feasibility of undertaking broad evaluation syntheses analyses. Our semi-automated human-in-the-loop methods supported data extraction of data for 64 outcomes across 631 evaluation reports; each of which comprised hundreds of pages of text. The outcomes are derived from the five goal areas within UNICEF 2022-2025 Strategic Plan. For text pre-processing we implemented PDF-to-text extraction, section parsing, and sentence mining via a neural network. Data extraction was supported by a freely available text-mining workbench, SWIFT-Review. Here, we describe using comprehensive adjacency-search-based queries to rapidly filter reports by outcomes and to highlight relevant sections of text to expedite data extraction. Results: While the methods used were not expected to produce 100% complete results for each outcome, they present useful automation methods for researchers facing otherwise non-feasible evaluation syntheses tasks. We reduced the text volume down to 8% using deep learning (recall 0.93) and rapidly identified relevant evaluations across outcomes with a median precision of 0.6. All code is available and open-source. Conclusions: When the classic approach of systematically extracting information from all outcomes across all texts exceeds available resources, the proposed automation methods can be employed to speed up the process while retaining scientific rigour and reproducibility.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"79 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.27.24312630","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The United Nations Children's Fund (UNICEF) is the United Nations agency dedicated to promoting and advocating for the protection of children's rights, meeting their basic needs, and expanding their opportunities to reach their full potential. They achieve this by working with governments, communities, and other partners via programmes that safeguard children from violence, provide access to quality education, ensure that children survive and thrive, provide access to water, sanitation and hygiene, and provide life-saving support in emergency contexts. Programmes are evaluated as part of UNICEF Evaluation Policy, and the publicly available reports include a wealth of information on results, recommendations, and lessons learned. Objective: To critically explore UNICEF's impact, a systematic synthesis of evaluations was conducted to provide a summary of UNICEF main achievements and areas where they could improve, as a reflection of key recommendations, lessons learned, enablers, and barriers to achieving their goals and to steer its future direction and strategy. Since the evaluations are extensive, manual analysis was not feasible, so a semi-automated approach was taken. Methods: This paper examines the automation techniques used to try and increase the feasibility of undertaking broad evaluation syntheses analyses. Our semi-automated human-in-the-loop methods supported data extraction of data for 64 outcomes across 631 evaluation reports; each of which comprised hundreds of pages of text. The outcomes are derived from the five goal areas within UNICEF 2022-2025 Strategic Plan. For text pre-processing we implemented PDF-to-text extraction, section parsing, and sentence mining via a neural network. Data extraction was supported by a freely available text-mining workbench, SWIFT-Review. Here, we describe using comprehensive adjacency-search-based queries to rapidly filter reports by outcomes and to highlight relevant sections of text to expedite data extraction. Results: While the methods used were not expected to produce 100% complete results for each outcome, they present useful automation methods for researchers facing otherwise non-feasible evaluation syntheses tasks. We reduced the text volume down to 8% using deep learning (recall 0.93) and rapidly identified relevant evaluations across outcomes with a median precision of 0.6. All code is available and open-source. Conclusions: When the classic approach of systematically extracting information from all outcomes across all texts exceeds available resources, the proposed automation methods can be employed to speed up the process while retaining scientific rigour and reproducibility.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过文本挖掘、搜索和高亮显示,可以加速评价综合分析:从联合国儿童基金会 631 份评价报告中提取数据的案例研究
背景:联合国儿童基金会(UNICEF)是致力于促进和倡导保护儿童权利、满足儿童基本需求、扩大儿童充分发挥潜力的机会的联合国机构。为实现这一目标,联合国儿童基金会与各国政府、社区和其他合作伙伴合作,通过各种计划保护儿童免受暴力侵害,提供优质教育,确保儿童生存和茁壮成长,提供水、环境卫生和个人卫生,并在紧急情况下提供救生支持。作为联合国儿童基金会评价政策的一部分,我们对各项计划进行了评价,公开发布的报告包含大量有关成果、建议和经验教训的信息。目标:为了批判性地探讨联合国儿童基金会的影响,对各项评价进行了系统的综合,以总结联合国儿童基金会的主要成就和可以改进的领域,反映实现其目标的主要建议、经验教训、促进因素和障碍,并指导其未来的方向和战略。由于评价内容广泛,人工分析不可行,因此采用了半自动化方法。方法:本文探讨了为提高进行广泛评估综合分析的可行性而使用的自动化技术。我们的半自动化人环方法支持对 631 份评估报告中的 64 项成果进行数据提取;每份报告都包含数百页的文本。这些成果来自联合国儿童基金会 2022-2025 年战略计划的五个目标领域。在文本预处理方面,我们通过神经网络实现了 PDF 到文本的提取、章节解析和句子挖掘。数据提取由免费提供的文本挖掘工作平台 SWIFT-Review 支持。在此,我们介绍了如何使用基于邻接搜索的综合查询来按结果快速筛选报告,并突出显示文本的相关部分以加快数据提取。结果:虽然所使用的方法并不能为每个结果生成 100% 的完整结果,但它们为面临其他不可行的评估综合任务的研究人员提供了有用的自动化方法。我们利用深度学习(召回率为 0.93)将文本量减少到了 8%,并以 0.6 的中位精度快速识别出了各结果中的相关评价。所有代码都是开源的。结论当系统地从所有文本的所有结果中提取信息的经典方法超出可用资源时,可以采用建议的自动化方法来加快这一过程,同时保持科学的严谨性和可重复性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A case is not a case is not a case - challenges and solutions in determining urolithiasis caseloads using the digital infrastructure of a clinical data warehouse Reliable Online Auditory Cognitive Testing: An observational study Federated Multiple Imputation for Variables that Are Missing Not At Random in Distributed Electronic Health Records Characterizing the connection between Parkinson's disease progression and healthcare utilization Generative AI and Large Language Models in Reducing Medication Related Harm and Adverse Drug Events - A Scoping Review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1