Huiyuan Xie, Felix Steffek, Joana Ribeiro de Faria, Christine Carter, Jonathan Rutherford
{"title":"CLC-UKET 数据集:英国就业法庭案件结果预测基准","authors":"Huiyuan Xie, Felix Steffek, Joana Ribeiro de Faria, Christine Carter, Jonathan Rutherford","doi":"arxiv-2409.08098","DOIUrl":null,"url":null,"abstract":"This paper explores the intersection of technological innovation and access\nto justice by developing a benchmark for predicting case outcomes in the UK\nEmployment Tribunal (UKET). To address the challenge of extensive manual\nannotation, the study employs a large language model (LLM) for automatic\nannotation, resulting in the creation of the CLC-UKET dataset. The dataset\nconsists of approximately 19,000 UKET cases and their metadata. Comprehensive\nlegal annotations cover facts, claims, precedent references, statutory\nreferences, case outcomes, reasons and jurisdiction codes. Facilitated by the\nCLC-UKET data, we examine a multi-class case outcome prediction task in the\nUKET. Human predictions are collected to establish a performance reference for\nmodel comparison. Empirical results from baseline models indicate that\nfinetuned transformer models outperform zero-shot and few-shot LLMs on the UKET\nprediction task. The performance of zero-shot LLMs can be enhanced by\nintegrating task-related information into few-shot examples. We hope that the\nCLC-UKET dataset, along with human annotations and empirical findings, can\nserve as a valuable benchmark for employment-related dispute resolution.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"157 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal\",\"authors\":\"Huiyuan Xie, Felix Steffek, Joana Ribeiro de Faria, Christine Carter, Jonathan Rutherford\",\"doi\":\"arxiv-2409.08098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper explores the intersection of technological innovation and access\\nto justice by developing a benchmark for predicting case outcomes in the UK\\nEmployment Tribunal (UKET). To address the challenge of extensive manual\\nannotation, the study employs a large language model (LLM) for automatic\\nannotation, resulting in the creation of the CLC-UKET dataset. The dataset\\nconsists of approximately 19,000 UKET cases and their metadata. Comprehensive\\nlegal annotations cover facts, claims, precedent references, statutory\\nreferences, case outcomes, reasons and jurisdiction codes. Facilitated by the\\nCLC-UKET data, we examine a multi-class case outcome prediction task in the\\nUKET. Human predictions are collected to establish a performance reference for\\nmodel comparison. Empirical results from baseline models indicate that\\nfinetuned transformer models outperform zero-shot and few-shot LLMs on the UKET\\nprediction task. The performance of zero-shot LLMs can be enhanced by\\nintegrating task-related information into few-shot examples. We hope that the\\nCLC-UKET dataset, along with human annotations and empirical findings, can\\nserve as a valuable benchmark for employment-related dispute resolution.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"157 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08098\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal
This paper explores the intersection of technological innovation and access
to justice by developing a benchmark for predicting case outcomes in the UK
Employment Tribunal (UKET). To address the challenge of extensive manual
annotation, the study employs a large language model (LLM) for automatic
annotation, resulting in the creation of the CLC-UKET dataset. The dataset
consists of approximately 19,000 UKET cases and their metadata. Comprehensive
legal annotations cover facts, claims, precedent references, statutory
references, case outcomes, reasons and jurisdiction codes. Facilitated by the
CLC-UKET data, we examine a multi-class case outcome prediction task in the
UKET. Human predictions are collected to establish a performance reference for
model comparison. Empirical results from baseline models indicate that
finetuned transformer models outperform zero-shot and few-shot LLMs on the UKET
prediction task. The performance of zero-shot LLMs can be enhanced by
integrating task-related information into few-shot examples. We hope that the
CLC-UKET dataset, along with human annotations and empirical findings, can
serve as a valuable benchmark for employment-related dispute resolution.