An architectural proposal for the interactive publication of the data classification obtained through a Differentially Private Random Decision Forest

Rosinei Cristiano Pereira, F. Lopes
{"title":"An architectural proposal for the interactive publication of the data classification obtained through a Differentially Private Random Decision Forest","authors":"Rosinei Cristiano Pereira, F. Lopes","doi":"10.1109/CLEI47609.2019.235070","DOIUrl":null,"url":null,"abstract":"Data are generated in several contexts, by various devices, and are collected by organizations whose aims to obtain as much information as possible to add value to their business. There are plenty of ethical and non-ethical purposes involved such as identifying consumers' needs and then recommend products and services, developing new business, conducting health-related research in order to reduce medical errors, assessing risk of people developing diseases, so on. The organizations’ concerns about risks associated to potential privacy leaks and their impacts have increased dramatically. Thus, apply data mining in process optimization without compromising sensitive data and provide a strong privacy standard are challenges imposed to data stewards, who use techniques and privacy models during data release process. This study aims to propose a classification decision tree application, developed under the Differential Privacy model definition, whose architecture was designed according to the interactive data release model that deploys a barrier to forbid users to have access data in their raw format. In addition, a self-tuning feature that controls the forest growth was put in place, resulting in a better classification performance if compared to the adoption of a fixed amount of trees in the forest. However, there was an increase in processing time. It also was observed in most of the datasets used in the experiment that beyond a threshold the classification performance is reduced by increasing the number of trees that compose the forest.","PeriodicalId":216193,"journal":{"name":"2019 XLV Latin American Computing Conference (CLEI)","volume":"327 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 XLV Latin American Computing Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI47609.2019.235070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data are generated in several contexts, by various devices, and are collected by organizations whose aims to obtain as much information as possible to add value to their business. There are plenty of ethical and non-ethical purposes involved such as identifying consumers' needs and then recommend products and services, developing new business, conducting health-related research in order to reduce medical errors, assessing risk of people developing diseases, so on. The organizations’ concerns about risks associated to potential privacy leaks and their impacts have increased dramatically. Thus, apply data mining in process optimization without compromising sensitive data and provide a strong privacy standard are challenges imposed to data stewards, who use techniques and privacy models during data release process. This study aims to propose a classification decision tree application, developed under the Differential Privacy model definition, whose architecture was designed according to the interactive data release model that deploys a barrier to forbid users to have access data in their raw format. In addition, a self-tuning feature that controls the forest growth was put in place, resulting in a better classification performance if compared to the adoption of a fixed amount of trees in the forest. However, there was an increase in processing time. It also was observed in most of the datasets used in the experiment that beyond a threshold the classification performance is reduced by increasing the number of trees that compose the forest.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种基于差分私有随机决策林的数据分类交互式发布的体系结构方案
数据由不同的设备在不同的环境中生成,并由旨在获取尽可能多的信息以增加其业务价值的组织收集。有很多道德和非道德的目的,如确定消费者的需求,然后推荐产品和服务,发展新的业务,进行健康相关的研究,以减少医疗差错,评估人们患疾病的风险,等等。这些组织对潜在隐私泄露风险及其影响的担忧急剧增加。因此,如何在不影响敏感数据的情况下将数据挖掘应用于流程优化,并提供一个强大的隐私标准,是数据管理员在数据发布过程中使用技术和隐私模型所面临的挑战。本研究旨在提出一种分类决策树应用程序,该应用程序在差分隐私模型定义下开发,其架构根据交互式数据发布模型设计,该模型部署了一个屏障,禁止用户访问原始格式的数据。此外,还引入了控制森林生长的自调优功能,与在森林中采用固定数量的树木相比,可以获得更好的分类性能。然而,处理时间有所增加。在实验中使用的大多数数据集中还观察到,超过阈值后,增加组成森林的树木数量会降低分类性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Model for Detecting Conflicts and Dependencies in Non-Functional Requirements Using Scenarios and Use Cases Fusion of infrared and visible images using multiscale morphology Pentest on Internet of Things Devices Development of Emotional Intelligence in Computing Students: The “Experiencia 360°” Project Structuring a Folksonomy in a Community of Questions and Answers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1