{"title":"An architectural proposal for the interactive publication of the data classification obtained through a Differentially Private Random Decision Forest","authors":"Rosinei Cristiano Pereira, F. Lopes","doi":"10.1109/CLEI47609.2019.235070","DOIUrl":null,"url":null,"abstract":"Data are generated in several contexts, by various devices, and are collected by organizations whose aims to obtain as much information as possible to add value to their business. There are plenty of ethical and non-ethical purposes involved such as identifying consumers' needs and then recommend products and services, developing new business, conducting health-related research in order to reduce medical errors, assessing risk of people developing diseases, so on. The organizations’ concerns about risks associated to potential privacy leaks and their impacts have increased dramatically. Thus, apply data mining in process optimization without compromising sensitive data and provide a strong privacy standard are challenges imposed to data stewards, who use techniques and privacy models during data release process. This study aims to propose a classification decision tree application, developed under the Differential Privacy model definition, whose architecture was designed according to the interactive data release model that deploys a barrier to forbid users to have access data in their raw format. In addition, a self-tuning feature that controls the forest growth was put in place, resulting in a better classification performance if compared to the adoption of a fixed amount of trees in the forest. However, there was an increase in processing time. It also was observed in most of the datasets used in the experiment that beyond a threshold the classification performance is reduced by increasing the number of trees that compose the forest.","PeriodicalId":216193,"journal":{"name":"2019 XLV Latin American Computing Conference (CLEI)","volume":"327 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 XLV Latin American Computing Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI47609.2019.235070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data are generated in several contexts, by various devices, and are collected by organizations whose aims to obtain as much information as possible to add value to their business. There are plenty of ethical and non-ethical purposes involved such as identifying consumers' needs and then recommend products and services, developing new business, conducting health-related research in order to reduce medical errors, assessing risk of people developing diseases, so on. The organizations’ concerns about risks associated to potential privacy leaks and their impacts have increased dramatically. Thus, apply data mining in process optimization without compromising sensitive data and provide a strong privacy standard are challenges imposed to data stewards, who use techniques and privacy models during data release process. This study aims to propose a classification decision tree application, developed under the Differential Privacy model definition, whose architecture was designed according to the interactive data release model that deploys a barrier to forbid users to have access data in their raw format. In addition, a self-tuning feature that controls the forest growth was put in place, resulting in a better classification performance if compared to the adoption of a fixed amount of trees in the forest. However, there was an increase in processing time. It also was observed in most of the datasets used in the experiment that beyond a threshold the classification performance is reduced by increasing the number of trees that compose the forest.