Alexander Hsieh, S. Doan, Michael Conway, Ko-Wei Lin, Hyeon-eui Kim
Lack of standardization in representing phenotype data generated in different studies is a major barrier to data reuse for cross study analyses. To address this issue, we developed DIVER, a tool that identifies and standardizes demographic variables in dbGaP, based on simple natural language processing and standardized terminology mapping. In its evaluation using variables (N=3,565) from a range of pulmonary studies in dbGaP, DIVER proved to be an effective approach to standardizing dbGaP variables by successfully identifying demographic variables with high rates of recall and precision (98% and 94%, respectively). In addition, DIVER correctly modeled 79% of the identified demographic variables at the core semantic level. Examination of variables that DIVER could not handle shed light on where our tool needs enhancement so it can further improve its semantic modeling accuracy. DIVER is an important component of a system for phenotype discovery in dbGaP studies.
{"title":"Demographics Identification: Variable Extraction Resource (DIVER)","authors":"Alexander Hsieh, S. Doan, Michael Conway, Ko-Wei Lin, Hyeon-eui Kim","doi":"10.1109/HISB.2012.17","DOIUrl":"https://doi.org/10.1109/HISB.2012.17","url":null,"abstract":"Lack of standardization in representing phenotype data generated in different studies is a major barrier to data reuse for cross study analyses. To address this issue, we developed DIVER, a tool that identifies and standardizes demographic variables in dbGaP, based on simple natural language processing and standardized terminology mapping. In its evaluation using variables (N=3,565) from a range of pulmonary studies in dbGaP, DIVER proved to be an effective approach to standardizing dbGaP variables by successfully identifying demographic variables with high rates of recall and precision (98% and 94%, respectively). In addition, DIVER correctly modeled 79% of the identified demographic variables at the core semantic level. Examination of variables that DIVER could not handle shed light on where our tool needs enhancement so it can further improve its semantic modeling accuracy. DIVER is an important component of a system for phenotype discovery in dbGaP studies.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116324328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Louise Deléger, Holly Brodzinski, Haijun Zhai, Qi Li, T. Lingren, E. Kirkendall, E. Alessandrini, I. Solti
This study evaluated an automated approach for appendicitis risk stratification of pediatric Emergency Department patients using Conditional Random Fields, rules and Support Vector Machines. The results show that the approach is very promising for appendicitis risk stratification.
{"title":"Using Natural Language Processing and the Electronic Health Record for Appendicitis Risk Stratification","authors":"Louise Deléger, Holly Brodzinski, Haijun Zhai, Qi Li, T. Lingren, E. Kirkendall, E. Alessandrini, I. Solti","doi":"10.1109/HISB.2012.15","DOIUrl":"https://doi.org/10.1109/HISB.2012.15","url":null,"abstract":"This study evaluated an automated approach for appendicitis risk stratification of pediatric Emergency Department patients using Conditional Random Fields, rules and Support Vector Machines. The results show that the approach is very promising for appendicitis risk stratification.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132294351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Desikan, Nisheeth Srivastava, T. Winden, Tammie Lindquist, Heather Britt, J. Srivastava
Ambulatory care sensitive conditions (ACSCs) are characterized as health conditions for which good outpatient care can potentially prevent the need for hospitalization, or for which early intervention can prevent complications or more severe disease. Currently, there are 16 identified ACSCs within the US health system: diabetes short-term complication, perforated appendix, diabetes long-term complication, pediatric asthma, chronic obstructive pulmonary disease, pediatric gastroenteritis, hypertension, congestive heart failure, low birth weight rate, dehydration, bacterial pneumonia, urinary tract infection, angina admission without procedure, uncontrolled diabetes, adult asthma, and lower-extremity amputation among patients with diabetes. Potentially preventable acute health events (PPEs) for such diagnosis codes represent a straightforward opportunity for reducing medical costs while concomitantly improving quality of care. While claims data have previously been used to predict future health outcomes of patients, we report here a novel approach, using data mining techniques, towards supplementing such data with patients' electronic health records (EHR) to develop a clinical decision support system that satisfactorily predicts the onset of PPEs in a large population of patients.
{"title":"Early Prediction of Potentially Preventable Events in Ambulatory Care Sensitive Admissions from Clinical Data","authors":"P. Desikan, Nisheeth Srivastava, T. Winden, Tammie Lindquist, Heather Britt, J. Srivastava","doi":"10.1109/HISB.2012.49","DOIUrl":"https://doi.org/10.1109/HISB.2012.49","url":null,"abstract":"Ambulatory care sensitive conditions (ACSCs) are characterized as health conditions for which good outpatient care can potentially prevent the need for hospitalization, or for which early intervention can prevent complications or more severe disease. Currently, there are 16 identified ACSCs within the US health system: diabetes short-term complication, perforated appendix, diabetes long-term complication, pediatric asthma, chronic obstructive pulmonary disease, pediatric gastroenteritis, hypertension, congestive heart failure, low birth weight rate, dehydration, bacterial pneumonia, urinary tract infection, angina admission without procedure, uncontrolled diabetes, adult asthma, and lower-extremity amputation among patients with diabetes. Potentially preventable acute health events (PPEs) for such diagnosis codes represent a straightforward opportunity for reducing medical costs while concomitantly improving quality of care. While claims data have previously been used to predict future health outcomes of patients, we report here a novel approach, using data mining techniques, towards supplementing such data with patients' electronic health records (EHR) to develop a clinical decision support system that satisfactorily predicts the onset of PPEs in a large population of patients.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130837034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seeking drug-related information is one of the top activities of today's online health consumers. To facilitate consumers' access to trustworthy drug information, we first improve the drug search effectiveness by adding a list of rich and up-to-date brand names to drug content that is typically classified with its active ingredients. Once the consumer finds a drug of interest, we further provide them with an integrated access to other relevant healthcare resources. The results of our computational methods are integrated into a production system and have been used by millions of health consumers.
{"title":"Improving Online Access to Drug-Related Information","authors":"Jiao Li, Ritu Khare, Zhiyong Lu","doi":"10.1109/HISB.2012.73","DOIUrl":"https://doi.org/10.1109/HISB.2012.73","url":null,"abstract":"Seeking drug-related information is one of the top activities of today's online health consumers. To facilitate consumers' access to trustworthy drug information, we first improve the drug search effectiveness by adding a list of rich and up-to-date brand names to drug content that is typically classified with its active ingredients. Once the consumer finds a drug of interest, we further provide them with an integrated access to other relevant healthcare resources. The results of our computational methods are integrated into a production system and have been used by millions of health consumers.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123642312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes an automated detection method with simple algorithm for detecting human embryonic stem cell (hESC) regions in phase contrast images. The algorithm uses both the spatial information as well as the intensity distribution for cell region detection. The method is modeled as a mixture of two Gaussians; hESC and substrate regions. The paper validates the method with various videos acquired under different microscope objectives.
{"title":"Automated Human Embryonic Stem Cell Detection","authors":"B. X. Guan, B. Bhanu, P. Talbot, Sabrina Lin","doi":"10.1109/HISB.2012.25","DOIUrl":"https://doi.org/10.1109/HISB.2012.25","url":null,"abstract":"This paper proposes an automated detection method with simple algorithm for detecting human embryonic stem cell (hESC) regions in phase contrast images. The algorithm uses both the spatial information as well as the intensity distribution for cell region detection. The method is modeled as a mixture of two Gaussians; hESC and substrate regions. The paper validates the method with various videos acquired under different microscope objectives.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127205045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Calvitti, Neal Farber, Yunan Chen, Danielle Zuest, Lin Liu, Kristin Bell, Barbara Gray, Z. Agha
Alan Calvitti Neal Farber Yunan Chen Danielle Zuest Lin Liu Kristin Bell Barbara Gray Zia Agha We develop temporal data mining and visualization methods to quantitatively profile physician Electronic Health Records (EHR) workflow and compare time-at-task versus click count distributions for top-level EHR functionality. The temporal data is based on time-resolved activity during outpatient visits, captured by usability software and audio-video recording and manual coding to physicians' activities.
Alan Calvitti Neal Farber Yunan Chen Danielle Zuest Lin Liu Kristin Bell Barbara Gray Zia Agha我们开发时间数据挖掘和可视化方法来定量描述医生电子健康记录(EHR)工作流程,并比较顶级EHR功能的任务时间和点击数分布。时间数据基于门诊访问期间的时间解析活动,通过可用性软件和音频视频记录以及对医生活动的手动编码来捕获。
{"title":"Temporal Analysis of Physicians' EHR Workflow during Outpatient Visits","authors":"A. Calvitti, Neal Farber, Yunan Chen, Danielle Zuest, Lin Liu, Kristin Bell, Barbara Gray, Z. Agha","doi":"10.1109/HISB.2012.65","DOIUrl":"https://doi.org/10.1109/HISB.2012.65","url":null,"abstract":"Alan Calvitti Neal Farber Yunan Chen Danielle Zuest Lin Liu Kristin Bell Barbara Gray Zia Agha We develop temporal data mining and visualization methods to quantitatively profile physician Electronic Health Records (EHR) workflow and compare time-at-task versus click count distributions for top-level EHR functionality. The temporal data is based on time-resolved activity during outpatient visits, captured by usability software and audio-video recording and manual coding to physicians' activities.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115266263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong
Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.
{"title":"A Randomized Response Model for Privacy-Preserving Data Dissemination","authors":"Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong","doi":"10.1109/HISB.2012.63","DOIUrl":"https://doi.org/10.1109/HISB.2012.63","url":null,"abstract":"Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115400438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haijun Zhai, T. Lingren, Louise Deléger, Qi Li, M. Kaiser, Laura Stoutenborough, I. Solti
Building upon previous work from the general crowdsourcing research, this study investigates the usability of crowdsourcing in the clinical NLP domain for annotating medical named entities and entity linkages in a clinical trial announcement (CTA) corpus. The results indicate that crowdsourcing is a feasible, inexpensive, fast, and practical approach to annotate clinical text (without PHI) on large scale for medical named entities. The crowdsourcing program code was released publicly.
{"title":"Cheap, Fast, and Good Enough for the Non-biomedical Domain but is It Usable for Clinical Natural Language Processing? Evaluating Crowdsourcing for Clinical Trial Announcement Named Entity Annotations","authors":"Haijun Zhai, T. Lingren, Louise Deléger, Qi Li, M. Kaiser, Laura Stoutenborough, I. Solti","doi":"10.1109/HISB.2012.31","DOIUrl":"https://doi.org/10.1109/HISB.2012.31","url":null,"abstract":"Building upon previous work from the general crowdsourcing research, this study investigates the usability of crowdsourcing in the clinical NLP domain for annotating medical named entities and entity linkages in a clinical trial announcement (CTA) corpus. The results indicate that crowdsourcing is a feasible, inexpensive, fast, and practical approach to annotate clinical text (without PHI) on large scale for medical named entities. The crowdsourcing program code was released publicly.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122550048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Li, Haijun Zhai, Louise Deléger, T. Lingren, M. Kaiser, Laura Stoutenborough, I. Solti
The goal of this work is to evaluate binary classification and sequence labeling methods for medication-attribute linkage detection in two clinical corpora. The results show that with parsimonious feature sets both the Support Vector Machine (SVM)-based binary classification and Conditional Random Field (CRF)-based multi-layered sequence labeling methods are achieving high performance.
{"title":"Linking Medications and Their Attributes in Clinical Notes and Clinical Trial Announcements for Information Extraction: A Sequence Labeling Approach","authors":"Qi Li, Haijun Zhai, Louise Deléger, T. Lingren, M. Kaiser, Laura Stoutenborough, I. Solti","doi":"10.1109/HISB.2012.27","DOIUrl":"https://doi.org/10.1109/HISB.2012.27","url":null,"abstract":"The goal of this work is to evaluate binary classification and sequence labeling methods for medication-attribute linkage detection in two clinical corpora. The results show that with parsimonious feature sets both the Support Vector Machine (SVM)-based binary classification and Conditional Random Field (CRF)-based multi-layered sequence labeling methods are achieving high performance.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126224523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nafise Barzigar, Aminmohammad Roozgard, P. Verma, Samuel Cheng
In this paper, an efficient medical image denoising method based on low-rank matrix completion and block matching filtering is proposed. The effectiveness of the algorithm in removing the mixed noise is demonstrated through the results. The results also proved the effectiveness of this algorithm in removing noise from regular structures. This method results in comparable performance with significantly lower computation complexity.
{"title":"Removing Mixture Noise from Medical Images Using Block Matching Filtering and Low-Rank Matrix Completion","authors":"Nafise Barzigar, Aminmohammad Roozgard, P. Verma, Samuel Cheng","doi":"10.1109/HISB.2012.59","DOIUrl":"https://doi.org/10.1109/HISB.2012.59","url":null,"abstract":"In this paper, an efficient medical image denoising method based on low-rank matrix completion and block matching filtering is proposed. The effectiveness of the algorithm in removing the mixed noise is demonstrated through the results. The results also proved the effectiveness of this algorithm in removing noise from regular structures. This method results in comparable performance with significantly lower computation complexity.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126829297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}