流行病学研究中隐私保护线性回归的可扩展性

Hiroaki Kikuchi, H. Hashimoto, H. Yasunaga, Takamichi Saito
{"title":"流行病学研究中隐私保护线性回归的可扩展性","authors":"Hiroaki Kikuchi, H. Hashimoto, H. Yasunaga, Takamichi Saito","doi":"10.1109/AINA.2015.229","DOIUrl":null,"url":null,"abstract":"In many hospitals, data related to patients are observed and collected to a central database for medical research. For instance, DPC dataset, which stands for Disease, Procedure and Combination, covers medical records for more than 7 million patients in more than 1000 hospitals. Using the distributed DPC data set, a number of epidemiological studied are feasible to reveal useful knowledge on medical treatments. Hence, cryptography helps to preserve the privacy of personal data. The study called as Privacy-Preserving Data Mining (PPDM) aims to perform a data mining algorithm with preserving confidentiality of datasets. This paper studies the scalability of privacy-preserving data mining in epidemiological study. As for the data-mining algorithm, we focus to a linear regression since it is used in many applications and simple to be evaluated. We try to identify the linear model to estimate a length of hospital stay from distributed dataset related to the patient and the disease information. Our contributions of this paper include (1) to propose privacy-preserving protocols for linear regression with horizontally or vertically partitioned datasets, and (2) to clarify the limitation of size of problem to be performed. These information are useful to determine the dominant element in PPDM and to figure out the direction of study for further improvement.","PeriodicalId":6845,"journal":{"name":"2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops","volume":"29 1","pages":"510-514"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Scalability of Privacy-Preserving Linear Regression in Epidemiological Studies\",\"authors\":\"Hiroaki Kikuchi, H. Hashimoto, H. Yasunaga, Takamichi Saito\",\"doi\":\"10.1109/AINA.2015.229\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many hospitals, data related to patients are observed and collected to a central database for medical research. For instance, DPC dataset, which stands for Disease, Procedure and Combination, covers medical records for more than 7 million patients in more than 1000 hospitals. Using the distributed DPC data set, a number of epidemiological studied are feasible to reveal useful knowledge on medical treatments. Hence, cryptography helps to preserve the privacy of personal data. The study called as Privacy-Preserving Data Mining (PPDM) aims to perform a data mining algorithm with preserving confidentiality of datasets. This paper studies the scalability of privacy-preserving data mining in epidemiological study. As for the data-mining algorithm, we focus to a linear regression since it is used in many applications and simple to be evaluated. We try to identify the linear model to estimate a length of hospital stay from distributed dataset related to the patient and the disease information. Our contributions of this paper include (1) to propose privacy-preserving protocols for linear regression with horizontally or vertically partitioned datasets, and (2) to clarify the limitation of size of problem to be performed. These information are useful to determine the dominant element in PPDM and to figure out the direction of study for further improvement.\",\"PeriodicalId\":6845,\"journal\":{\"name\":\"2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops\",\"volume\":\"29 1\",\"pages\":\"510-514\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AINA.2015.229\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINA.2015.229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在许多医院,与病人有关的数据被观察并收集到一个中央数据库,用于医学研究。例如,DPC数据集(代表疾病,程序和组合)涵盖了1000多家医院的700多万患者的医疗记录。利用分布式DPC数据集,许多流行病学研究是可行的,可以揭示有用的医疗知识。因此,密码学有助于保护个人数据的隐私。该研究被称为隐私保护数据挖掘(PPDM),旨在执行一种数据挖掘算法,以保护数据集的机密性。本文研究了流行病学研究中隐私保护数据挖掘的可扩展性。至于数据挖掘算法,我们主要关注线性回归,因为它在许多应用中使用,并且易于评估。我们尝试从与患者和疾病信息相关的分布式数据集中识别线性模型来估计住院时间。本文的贡献包括:(1)提出了水平或垂直分割数据集的线性回归的隐私保护协议,以及(2)阐明了要执行的问题的大小限制。这些信息有助于确定PPDM的主导因素,并为进一步改进研究指明方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Scalability of Privacy-Preserving Linear Regression in Epidemiological Studies
In many hospitals, data related to patients are observed and collected to a central database for medical research. For instance, DPC dataset, which stands for Disease, Procedure and Combination, covers medical records for more than 7 million patients in more than 1000 hospitals. Using the distributed DPC data set, a number of epidemiological studied are feasible to reveal useful knowledge on medical treatments. Hence, cryptography helps to preserve the privacy of personal data. The study called as Privacy-Preserving Data Mining (PPDM) aims to perform a data mining algorithm with preserving confidentiality of datasets. This paper studies the scalability of privacy-preserving data mining in epidemiological study. As for the data-mining algorithm, we focus to a linear regression since it is used in many applications and simple to be evaluated. We try to identify the linear model to estimate a length of hospital stay from distributed dataset related to the patient and the disease information. Our contributions of this paper include (1) to propose privacy-preserving protocols for linear regression with horizontally or vertically partitioned datasets, and (2) to clarify the limitation of size of problem to be performed. These information are useful to determine the dominant element in PPDM and to figure out the direction of study for further improvement.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Performance Analysis of WMN-GA Simulation System for Different WMN Architectures Considering OLSR A Network Topology Visualization System Based on Mobile AR Technology A Framework for Security Services Based on Software-Defined Networking Extended Lifetime Based Elliptical Sink-Mobility in Depth Based Routing Protocol for UWSNs A Proposal and Implementation of an ID Federation that Conceals a Web Service from an Authentication Server
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1