Samah Abu Shamma, Aseel Ayasa, Wala’ Sleem, A. Yahya
{"title":"Information Extraction from Arabic Law Documents","authors":"Samah Abu Shamma, Aseel Ayasa, Wala’ Sleem, A. Yahya","doi":"10.1109/AICT50176.2020.9368577","DOIUrl":null,"url":null,"abstract":"Information hidden in unstructured or semi-structured law documents can be very useful but may not be readily accessible. To get this information, an information extraction (IE) system is needed. Making extracted information available in structured form enables answering complex queries that may go well beyond simple keyword search and thus may be of interest to law professionals. In this paper we address the issue of Arabic information extraction from law documents. We describe a system we developed to extract important information, that may be of interest to potential users of these documents, with minimal human intervention. We employs a hybrid approach that utilizes machine learning and rule-based methods and Arabic NLP to facilitate the extraction of needed information. The approach was applied to a limited class of Arabic law documents and we are working on extending it to other document types and to other fields.","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICT50176.2020.9368577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Information hidden in unstructured or semi-structured law documents can be very useful but may not be readily accessible. To get this information, an information extraction (IE) system is needed. Making extracted information available in structured form enables answering complex queries that may go well beyond simple keyword search and thus may be of interest to law professionals. In this paper we address the issue of Arabic information extraction from law documents. We describe a system we developed to extract important information, that may be of interest to potential users of these documents, with minimal human intervention. We employs a hybrid approach that utilizes machine learning and rule-based methods and Arabic NLP to facilitate the extraction of needed information. The approach was applied to a limited class of Arabic law documents and we are working on extending it to other document types and to other fields.