Evaluating expertise and sample bias effects for privilege classification in e-discovery

J. K. Vinjumur
{"title":"Evaluating expertise and sample bias effects for privilege classification in e-discovery","authors":"J. K. Vinjumur","doi":"10.1145/2746090.2746101","DOIUrl":null,"url":null,"abstract":"In civil litigation, documents that are found to be relevant to a production request are usually subjected to an exhaustive manual review for privilege (e.g, for attorney-client privilege, attorney-work product doctrine) in order to be sure that materials that could be withheld is not inadvertently revealed. Usually, the majority of the cost associated in such review process is due to the procedure of having human annotators linearly review documents (for privilege) that the classifier predicts as responsive. This paper investigates the extent to which such privilege judgments obtained by the annotators are useful for training privilege classifiers. The judgments utilized in this paper are derived from the privilege test collection that was created during the 2010 TREC Legal Track. The collection consists of two classes of annotators: \"expert\" judges, who are topic originators called the Topic Authority (TA) and \"non-expert\" judges called assessors. The questions asked in this paper are; (1) Are cheaper, non-expert annotations from assessors sufficient for classifier training? (2) Does the process of selecting special (adjudicated) documents for training affect the classifier results? The paper studies the effect of training classifiers on multiple annotators (with different expertise) and training sets (with and without selection bias). The findings in this paper show that automated privilege classifiers trained on the unbiased set of annotations yield the best results. The usefulness of the biased annotations (from experts and non-experts) for classifier training are comparable.","PeriodicalId":309125,"journal":{"name":"Proceedings of the 15th International Conference on Artificial Intelligence and Law","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Conference on Artificial Intelligence and Law","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2746090.2746101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In civil litigation, documents that are found to be relevant to a production request are usually subjected to an exhaustive manual review for privilege (e.g, for attorney-client privilege, attorney-work product doctrine) in order to be sure that materials that could be withheld is not inadvertently revealed. Usually, the majority of the cost associated in such review process is due to the procedure of having human annotators linearly review documents (for privilege) that the classifier predicts as responsive. This paper investigates the extent to which such privilege judgments obtained by the annotators are useful for training privilege classifiers. The judgments utilized in this paper are derived from the privilege test collection that was created during the 2010 TREC Legal Track. The collection consists of two classes of annotators: "expert" judges, who are topic originators called the Topic Authority (TA) and "non-expert" judges called assessors. The questions asked in this paper are; (1) Are cheaper, non-expert annotations from assessors sufficient for classifier training? (2) Does the process of selecting special (adjudicated) documents for training affect the classifier results? The paper studies the effect of training classifiers on multiple annotators (with different expertise) and training sets (with and without selection bias). The findings in this paper show that automated privilege classifiers trained on the unbiased set of annotations yield the best results. The usefulness of the biased annotations (from experts and non-experts) for classifier training are comparable.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估专家和样本偏差对电子证据发现中特权分类的影响
在民事诉讼中,被发现与制作请求相关的文件通常要经过详尽的人工审查,以获得保密特权(例如,律师-客户保密特权,律师-工作产品原则),以确保可以保留的材料不会无意中泄露。通常,与这种审查过程相关的大部分成本是由于让人类注释者线性审查分类器预测为响应的文档(为了特权)的过程。本文研究了由注释者获得的特权判断在多大程度上对训练特权分类器有用。本文中使用的判决源自2010年TREC法律轨道期间创建的特权测试集。该集合由两类注释者组成:“专家”评委,他们是主题发起人,称为主题权威(TA)和“非专家”评委,称为评估员。本文提出的问题有:(1)来自评估器的更便宜的非专家注释是否足以用于分类器训练?(2)选择特殊(裁决)文件进行训练的过程是否影响分类器的结果?本文研究了训练分类器对多个标注器(具有不同专业知识)和训练集(具有和不具有选择偏差)的影响。本文的研究结果表明,在无偏注释集上训练的自动特权分类器产生了最好的结果。有偏见的注释(来自专家和非专家)对分类器训练的有用性是可比的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Proceedings of the 15th International Conference on Artificial Intelligence and Law Factors, issues and values: revisiting reasoning with cases Toward machine-assisted participation in eRulemaking: an argumentation model of evaluability Tax non-compliance detection using co-evolution of tax evasion risk and audit likelihood Representation of an actual divorce dispute in the parenting plan support system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1