优化人机协作,高效提取文本文件中的高精度信息

Bradley Butcher, Miri Zilka, Jiri Hron, Darren Cook, Adrian Weller
{"title":"优化人机协作,高效提取文本文件中的高精度信息","authors":"Bradley Butcher, Miri Zilka, Jiri Hron, Darren Cook, Adrian Weller","doi":"10.1145/3652591","DOIUrl":null,"url":null,"abstract":"From science to law enforcement, many research questions are answerable only by poring over a large amount of unstructured text documents. While people can extract information from such documents with high accuracy, this is often too time-consuming to be practical. On the other hand, automated approaches produce nearly-immediate results, but are not reliable enough for applications where near-perfect precision is essential. Motivated by two use cases from criminal justice, we consider the benefits and drawbacks of various human-only, human-machine, and machine-only approaches. Finding no tool well suited for our use cases, we develop a human-in-the-loop method for fast but accurate extraction of structured data from unstructured text. The tool is based on automated extraction followed by human validation, and is particularly useful in cases where purely manual extraction is not practical. Testing on three criminal justice datasets, we find that the combination of the computer speed and human understanding yields precision comparable to manual annotation while requiring only a fraction of time, and significantly outperforms the precision of all fully automated baselines.","PeriodicalId":486991,"journal":{"name":"ACM Journal on Responsible Computing","volume":"86 24","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents\",\"authors\":\"Bradley Butcher, Miri Zilka, Jiri Hron, Darren Cook, Adrian Weller\",\"doi\":\"10.1145/3652591\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"From science to law enforcement, many research questions are answerable only by poring over a large amount of unstructured text documents. While people can extract information from such documents with high accuracy, this is often too time-consuming to be practical. On the other hand, automated approaches produce nearly-immediate results, but are not reliable enough for applications where near-perfect precision is essential. Motivated by two use cases from criminal justice, we consider the benefits and drawbacks of various human-only, human-machine, and machine-only approaches. Finding no tool well suited for our use cases, we develop a human-in-the-loop method for fast but accurate extraction of structured data from unstructured text. The tool is based on automated extraction followed by human validation, and is particularly useful in cases where purely manual extraction is not practical. Testing on three criminal justice datasets, we find that the combination of the computer speed and human understanding yields precision comparable to manual annotation while requiring only a fraction of time, and significantly outperforms the precision of all fully automated baselines.\",\"PeriodicalId\":486991,\"journal\":{\"name\":\"ACM Journal on Responsible Computing\",\"volume\":\"86 24\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Journal on Responsible Computing\",\"FirstCategoryId\":\"0\",\"ListUrlMain\":\"https://doi.org/10.1145/3652591\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal on Responsible Computing","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.1145/3652591","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

从科学到执法,许多研究问题只能通过研究大量的非结构化文本文档才能找到答案。虽然人们可以从这些文档中提取出高精度的信息,但这往往过于耗时,不切实际。另一方面,自动方法几乎可以立即产生结果,但对于需要近乎完美的精确度的应用来说却不够可靠。受刑事司法中两个使用案例的启发,我们考虑了各种纯人工、人机和纯机器方法的优点和缺点。我们发现没有一种工具非常适合我们的使用案例,因此我们开发了一种 "人在回路中 "的方法,用于从非结构化文本中快速而准确地提取结构化数据。该工具以自动提取为基础,然后进行人工验证,在纯人工提取不可行的情况下特别有用。我们在三个刑事司法数据集上进行了测试,发现将计算机速度和人的理解力结合起来,可以获得与人工标注相当的精确度,而所需时间仅为人工标注的一小部分,其精确度大大超过了所有全自动基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents
From science to law enforcement, many research questions are answerable only by poring over a large amount of unstructured text documents. While people can extract information from such documents with high accuracy, this is often too time-consuming to be practical. On the other hand, automated approaches produce nearly-immediate results, but are not reliable enough for applications where near-perfect precision is essential. Motivated by two use cases from criminal justice, we consider the benefits and drawbacks of various human-only, human-machine, and machine-only approaches. Finding no tool well suited for our use cases, we develop a human-in-the-loop method for fast but accurate extraction of structured data from unstructured text. The tool is based on automated extraction followed by human validation, and is particularly useful in cases where purely manual extraction is not practical. Testing on three criminal justice datasets, we find that the combination of the computer speed and human understanding yields precision comparable to manual annotation while requiring only a fraction of time, and significantly outperforms the precision of all fully automated baselines.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improving Group Fairness Assessments with Proxies Navigating the EU AI Act Maze using a Decision-Tree Approach This Is Going on Your Permanent Record: A Legal Analysis of Educational Data in the Cloud Mapping the complexity of legal challenges for trustworthy drones on construction sites in the United Kingdom Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1