基于文本的司法文书异常检测

Mukhsimbayev Bobur, Kuralbayev Aibek, Bekbaganbetov Abay, Fuad Hajiyev
{"title":"基于文本的司法文书异常检测","authors":"Mukhsimbayev Bobur, Kuralbayev Aibek, Bekbaganbetov Abay, Fuad Hajiyev","doi":"10.1109/AICT50176.2020.9368621","DOIUrl":null,"url":null,"abstract":"The problem of searching for anomalies or outliers are extremely important in various fields with problems like fraud detection, crime research, network reliability analysis, medical diagnostics etc.What is an anomaly in the judicial system? A court case is to be considered as an anomaly if the judge’s decision differs significantly from existing decisions in similar cases.In most cases, the existing outlier’s search methods use high-dimensional domains in which data can contain hundreds of dimensions. Such an approach requires lots of resources and clearly is not efficient.Objectives: In this article, the authors:•present two methods (or two models) for searching for anomalies in judicial practice;•give a comparative analysis of the results of the effectiveness of both methods.Methodology: The First method for searching for anomalies is a mix of two models: classification and similarity algorithms. Here algorithms like Logistic regression, Extreme Gradient Boosting (XGBoost), Tensorflow for classification and Latent Dirichlet Allocation (LDA), Latent semantic indexing (LSI) to find similar documents. The Second method shows the usage of the Bidirectional Encoder Representations from Transformers (BERT) embedding model and the Annoy indexing model.Findings: The second method shows better and fast results for searching outliers.Data source: Authors used the set of acts provided by the Supreme Court of the Republic of Kazakhstan. The dataset contains 1 million text documents and metadata.","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"53 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Anomaly Detection Between Judicial Text-Based Documents\",\"authors\":\"Mukhsimbayev Bobur, Kuralbayev Aibek, Bekbaganbetov Abay, Fuad Hajiyev\",\"doi\":\"10.1109/AICT50176.2020.9368621\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of searching for anomalies or outliers are extremely important in various fields with problems like fraud detection, crime research, network reliability analysis, medical diagnostics etc.What is an anomaly in the judicial system? A court case is to be considered as an anomaly if the judge’s decision differs significantly from existing decisions in similar cases.In most cases, the existing outlier’s search methods use high-dimensional domains in which data can contain hundreds of dimensions. Such an approach requires lots of resources and clearly is not efficient.Objectives: In this article, the authors:•present two methods (or two models) for searching for anomalies in judicial practice;•give a comparative analysis of the results of the effectiveness of both methods.Methodology: The First method for searching for anomalies is a mix of two models: classification and similarity algorithms. Here algorithms like Logistic regression, Extreme Gradient Boosting (XGBoost), Tensorflow for classification and Latent Dirichlet Allocation (LDA), Latent semantic indexing (LSI) to find similar documents. The Second method shows the usage of the Bidirectional Encoder Representations from Transformers (BERT) embedding model and the Annoy indexing model.Findings: The second method shows better and fast results for searching outliers.Data source: Authors used the set of acts provided by the Supreme Court of the Republic of Kazakhstan. The dataset contains 1 million text documents and metadata.\",\"PeriodicalId\":136491,\"journal\":{\"name\":\"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)\",\"volume\":\"53 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICT50176.2020.9368621\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICT50176.2020.9368621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在欺诈检测、犯罪研究、网络可靠性分析、医疗诊断等各个领域中,寻找异常或异常值的问题都是非常重要的。司法系统中的异常是什么?如果法官的判决与类似案件的现有判决有重大不同,则法院案件将被视为异常案件。在大多数情况下,现有的离群值搜索方法使用高维域,其中的数据可以包含数百个维度。这种方法需要大量资源,显然效率不高。目的:在本文中,作者:•提出了两种方法(或两种模型)来搜索司法实践中的异常;•对两种方法的有效性结果进行了比较分析。方法:搜索异常的第一种方法是两种模型的混合:分类和相似算法。这里的算法包括逻辑回归、极端梯度增强(XGBoost)、用于分类的Tensorflow和用于查找类似文档的潜在狄利克雷分配(LDA)、潜在语义索引(LSI)。第二种方法展示了双向编码器表示从变压器(BERT)嵌入模型和骚扰索引模型的使用。结果:第二种方法对异常值的搜索结果更好、更快。数据来源:作者使用了哈萨克斯坦共和国最高法院提供的一套法令。该数据集包含100万个文本文档和元数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Anomaly Detection Between Judicial Text-Based Documents
The problem of searching for anomalies or outliers are extremely important in various fields with problems like fraud detection, crime research, network reliability analysis, medical diagnostics etc.What is an anomaly in the judicial system? A court case is to be considered as an anomaly if the judge’s decision differs significantly from existing decisions in similar cases.In most cases, the existing outlier’s search methods use high-dimensional domains in which data can contain hundreds of dimensions. Such an approach requires lots of resources and clearly is not efficient.Objectives: In this article, the authors:•present two methods (or two models) for searching for anomalies in judicial practice;•give a comparative analysis of the results of the effectiveness of both methods.Methodology: The First method for searching for anomalies is a mix of two models: classification and similarity algorithms. Here algorithms like Logistic regression, Extreme Gradient Boosting (XGBoost), Tensorflow for classification and Latent Dirichlet Allocation (LDA), Latent semantic indexing (LSI) to find similar documents. The Second method shows the usage of the Bidirectional Encoder Representations from Transformers (BERT) embedding model and the Annoy indexing model.Findings: The second method shows better and fast results for searching outliers.Data source: Authors used the set of acts provided by the Supreme Court of the Republic of Kazakhstan. The dataset contains 1 million text documents and metadata.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Blockchain-based open infrastructure for URL filtering in an Internet browser 2D Amplitude-Only Microwave Tomography Algorithm for Breast-Cancer Detection Information Extraction from Arabic Law Documents An Experimental Design Approach to Analyse the Performance of Island-Based Parallel Artificial Bee Colony Algorithm Automation Check Vulnerabilities Of Access Points Based On 802.11 Protocol
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1