{"title":"Using binning to maintain confidentiality of medical data.","authors":"Zhen Lin, Michael Hewett, Russ B Altman","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Biomedical informatics in general and pharmacogenomics in particular require a research platform that simultaneously enables discovery while protecting research subjects' privacy and information confidentiality. The development of inexpensive DNA sequencing and analysis technologies promises unprecedented database access to very specific information about individuals. To allow analysis of this data without compromising the research subjects' privacy, we must develop methods for removing identifying information from medical and genomic data. In this paper, we build upon the idea that binned database records are more difficult to trace back to individuals. We represent symbolic and numeric data hierarchically, and bin them by generalizing the records. We measure the information loss due to binning using an information theoretic measure called mutual information. The results show that we can bin the data to different levels of precision and use the bin size to control the tradeoff between privacy and data resolution.</p>","PeriodicalId":79712,"journal":{"name":"Proceedings. AMIA Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2244360/pdf/procamiasymp00001-0495.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Biomedical informatics in general and pharmacogenomics in particular require a research platform that simultaneously enables discovery while protecting research subjects' privacy and information confidentiality. The development of inexpensive DNA sequencing and analysis technologies promises unprecedented database access to very specific information about individuals. To allow analysis of this data without compromising the research subjects' privacy, we must develop methods for removing identifying information from medical and genomic data. In this paper, we build upon the idea that binned database records are more difficult to trace back to individuals. We represent symbolic and numeric data hierarchically, and bin them by generalizing the records. We measure the information loss due to binning using an information theoretic measure called mutual information. The results show that we can bin the data to different levels of precision and use the bin size to control the tradeoff between privacy and data resolution.