B. D. Martino, Luigi Colucci Cante, Salvatore D'Angelo, A. Esposito, Mariangela Graziano, F. Marulli, Pietro Lupi, Alessandra Cataldi
{"title":"A Big Data Pipeline and Machine Learning for Uniform Semantic Representation of Data and Documents From IT Systems of the Italian Ministry of Justice","authors":"B. D. Martino, Luigi Colucci Cante, Salvatore D'Angelo, A. Esposito, Mariangela Graziano, F. Marulli, Pietro Lupi, Alessandra Cataldi","doi":"10.4018/ijghpc.301579","DOIUrl":null,"url":null,"abstract":"In this paper a Big Data Pipeline is presented, taking in consideration both structured and unstructured data made available by the Italian Ministry of Justice, regarding their Telematic Civil Process. Indeed, the complexity and volume of the data provided by the Ministry requires the application of Big Data analysis techniques, in concert with Machine and Deep Learning frameworks, to be correctly analysed and to obtain meaningful information that could support the Ministry itself in better managing Civil Processes. The Pipeline has two main objectives: to provide a consistent workflow of activities to be applied to the incoming data, aiming at extracting useful information for the Ministry's decision making tasks; to homogenize the incoming data, so that they can be stored in a centralized and coherent Datalake to be used as a reference for further analysis and considerations.","PeriodicalId":43565,"journal":{"name":"International Journal of Grid and High Performance Computing","volume":"55 6 1","pages":"1-31"},"PeriodicalIF":0.6000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Grid and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijghpc.301579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper a Big Data Pipeline is presented, taking in consideration both structured and unstructured data made available by the Italian Ministry of Justice, regarding their Telematic Civil Process. Indeed, the complexity and volume of the data provided by the Ministry requires the application of Big Data analysis techniques, in concert with Machine and Deep Learning frameworks, to be correctly analysed and to obtain meaningful information that could support the Ministry itself in better managing Civil Processes. The Pipeline has two main objectives: to provide a consistent workflow of activities to be applied to the incoming data, aiming at extracting useful information for the Ministry's decision making tasks; to homogenize the incoming data, so that they can be stored in a centralized and coherent Datalake to be used as a reference for further analysis and considerations.