{"title":"Using Unlabeled Data for US Supreme Court Case Classification","authors":"George Sanchez","doi":"10.1109/ICDMW51313.2020.00116","DOIUrl":null,"url":null,"abstract":"The Supreme Court Database provided by Washington University (in St. Louis) School of Law is an essential legal research tool. The Supreme Court Database is organized and categorized to Issue Areas to make it easy for legal researchers to find on-point cases for an area of law. This paper used a semi-supervised learning approach to automatically categorize the Supreme Court's opinions to Issue Areas. An inductive method of clustering then labeling approach was used by employing a nonmetric space of a fast Hierarchical Navigable Small World graph index containing USE (Universal Sentence Encoder) embeddings. After obtaining the labels from the semi-supervised approach, we evaluate several classification approaches to use with the data achieving the weighted average F1-Scores: SVM with Max Norm Features 0.75, RNN 0.78, and BERT 0.68","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Supreme Court Database provided by Washington University (in St. Louis) School of Law is an essential legal research tool. The Supreme Court Database is organized and categorized to Issue Areas to make it easy for legal researchers to find on-point cases for an area of law. This paper used a semi-supervised learning approach to automatically categorize the Supreme Court's opinions to Issue Areas. An inductive method of clustering then labeling approach was used by employing a nonmetric space of a fast Hierarchical Navigable Small World graph index containing USE (Universal Sentence Encoder) embeddings. After obtaining the labels from the semi-supervised approach, we evaluate several classification approaches to use with the data achieving the weighted average F1-Scores: SVM with Max Norm Features 0.75, RNN 0.78, and BERT 0.68