Using Unlabeled Data for US Supreme Court Case Classification

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI:10.1109/ICDMW51313.2020.00116

George Sanchez

引用次数: 0

Abstract

The Supreme Court Database provided by Washington University (in St. Louis) School of Law is an essential legal research tool. The Supreme Court Database is organized and categorized to Issue Areas to make it easy for legal researchers to find on-point cases for an area of law. This paper used a semi-supervised learning approach to automatically categorize the Supreme Court's opinions to Issue Areas. An inductive method of clustering then labeling approach was used by employing a nonmetric space of a fast Hierarchical Navigable Small World graph index containing USE (Universal Sentence Encoder) embeddings. After obtaining the labels from the semi-supervised approach, we evaluate several classification approaches to use with the data achieving the weighted average F1-Scores: SVM with Max Norm Features 0.75, RNN 0.78, and BERT 0.68

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用未标记数据进行美国最高法院案件分类

华盛顿大学(圣路易斯)法学院提供的最高法院数据库是必不可少的法律研究工具。最高法院数据库按问题领域进行组织和分类，使法律研究人员能够轻松找到法律领域的重点案例。本文采用半监督学习方法对最高法院的意见进行自动分类。利用包含USE (Universal Sentence Encoder)嵌入的快速分层可导航小世界图索引的非度量空间，采用归纳聚类再标记方法。在从半监督方法中获得标签后，我们评估了几种分类方法，以使用达到加权平均f1分数的数据:最大范数特征的SVM为0.75,RNN为0.78,BERT为0.68

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 International Conference on Data Mining Workshops (ICDMW)

自引率

0.00%

发文量

期刊最新文献

Synthetic Data by Principal Component Analysis Deep Contextualized Word Embedding for Text-based Online User Profiling to Detect Social Bots on Twitter Integration of Fuzzy and Deep Learning in Three-Way Decisions Mining Heterogeneous Data for Formulation Design Restructuring of Hoeffding Trees for Trapezoidal Data Streams