With wearable, relatively unobtrusive health monitors and smartphone sensors, it is increasingly easy to collect continuously streaming physiological data in a passive mode without placing much burden on participants. At the same time, smartphones provide the ability to survey participants to provide "ground-truth" reporting on psychological states, although this comes at an increased cost in participant burden. In this paper, we examined how analytical approaches from the field of machine learning could allow us to distill the collected physiological data into actionable decision rules about each individual's psychological state, with the eventual goal of identifying important psychological states (e.g., risk moments) without the need for ongoing burdensome active assessment (e.g., self-report). As a first step towards this goal, we compared two methods: (1) a k-nearest neighbor classifier that uses dynamic time warping distance, and (2) a random forests classifier to predict low and high states of affective arousal states based on features extracted using the tsfresh python package. Then, we compared random-forest-based predictive models tailored for the individual with individual-general models. Results showed that the individual-specific model outperformed the general one. Our results support the feasibility of using passively collected wearable data to predict psychological states, suggesting that by relying on both types of data, the active collection can be reduced or eliminated.
{"title":"Individualized Modeling to Distinguish Between High and Low Arousal States Using Physiological Data.","authors":"Ame Osotsi, Zita Oravecz, Qunhua Li, Joshua Smyth, Timothy R Brick","doi":"10.1007/s41666-019-00064-1","DOIUrl":"https://doi.org/10.1007/s41666-019-00064-1","url":null,"abstract":"<p><p>With wearable, relatively unobtrusive health monitors and smartphone sensors, it is increasingly easy to collect continuously streaming physiological data in a passive mode without placing much burden on participants. At the same time, smartphones provide the ability to survey participants to provide \"ground-truth\" reporting on psychological states, although this comes at an increased cost in participant burden. In this paper, we examined how analytical approaches from the field of machine learning could allow us to distill the collected physiological data into actionable decision rules about each individual's psychological state, with the eventual goal of identifying important psychological states (e.g., risk moments) without the need for ongoing burdensome active assessment (e.g., self-report). As a first step towards this goal, we compared two methods: (1) a k-nearest neighbor classifier that uses dynamic time warping distance, and (2) a random forests classifier to predict low and high states of affective arousal states based on features extracted using the <i>tsfresh</i> python package. Then, we compared random-forest-based predictive models tailored for the individual with individual-general models. Results showed that the individual-specific model outperformed the general one. Our results support the feasibility of using passively collected wearable data to predict psychological states, suggesting that by relying on both types of data, the active collection can be reduced or eliminated.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"4 1","pages":"91-109"},"PeriodicalIF":3.7,"publicationDate":"2020-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144755592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01Epub Date: 2018-11-06DOI: 10.1007/s41666-018-0042-9
Feichen Shen, David W Larson, James M Naessens, Elizabeth B Habermann, Hongfang Liu, Sunghwan Sohn
Postsurgical complications (PSCs) are known as a deviation from the normal postsurgical course and categorized by severity and treatment requirements. Surgical site infection (SSI) is one of major PSCs and the most common healthcare-associated infection, resulting in increased length of hospital stay and cost. In this work, we proposed an automated way to generate keyword features using sublanguage analysis with heuristics to detect SSI from cohort in clinical notes and evaluated these keywords with medical experts. To further valid our approach, we also applied different machine learning algorithms on cohort using automatically generated keywords. The results showed that our approach was able to identify SSI keywords from clinical narratives and can be used as a foundation to develop an information extraction system or support search-based natural language processing (NLP) approaches by augmenting search queries.
{"title":"Detection of Surgical Site Infection Utilizing Automated Feature Generation in Clinical Notes.","authors":"Feichen Shen, David W Larson, James M Naessens, Elizabeth B Habermann, Hongfang Liu, Sunghwan Sohn","doi":"10.1007/s41666-018-0042-9","DOIUrl":"https://doi.org/10.1007/s41666-018-0042-9","url":null,"abstract":"<p><p>Postsurgical complications (PSCs) are known as a deviation from the normal postsurgical course and categorized by severity and treatment requirements. Surgical site infection (SSI) is one of major PSCs and the most common healthcare-associated infection, resulting in increased length of hospital stay and cost. In this work, we proposed an automated way to generate keyword features using sublanguage analysis with heuristics to detect SSI from cohort in clinical notes and evaluated these keywords with medical experts. To further valid our approach, we also applied different machine learning algorithms on cohort using automatically generated keywords. The results showed that our approach was able to identify SSI keywords from clinical narratives and can be used as a foundation to develop an information extraction system or support search-based natural language processing (NLP) approaches by augmenting search queries.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"3 3","pages":"267-282"},"PeriodicalIF":3.7,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855398/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-21eCollection Date: 2019-09-01DOI: 10.1007/s41666-019-00050-7
Ebrahim Oshni Alvandi, George Van Doorn, Mark Symmons
Emotional awareness has been previously investigated among clinicians. In this work, we bring to the fore of research the interest to uncover emotional awareness of clinicians during the tele-mental health session. The study reported here aimed at determining whether clinicians process their own emotions, as well as those of the client, in a computer-mediated context. Also, clinicians' decision-making process was assessed because such action appears to be related to the way they feel and recognise how those emotions may change their thinking and impact their interaction with clients. We estimated that such ability in clinicians' would be contrasted when the psychotherapy-session level is conducted via various technologies. Participant of the study were presented by stimuli in different modes of delivery (e.g. text, audio, and video). The experiment indicates that the ability to manage, perceive, and utilise emotions was as being satisfactory during all modes of delivery. In essence, the findings contribute to the field of remote therapy suggesting emotional awareness as a key cognitive factor in diagnosis.
{"title":"Emotional Awareness and Decision-Making in the Context of Computer-Mediated Psychotherapy.","authors":"Ebrahim Oshni Alvandi, George Van Doorn, Mark Symmons","doi":"10.1007/s41666-019-00050-7","DOIUrl":"https://doi.org/10.1007/s41666-019-00050-7","url":null,"abstract":"<p><p>Emotional awareness has been previously investigated among clinicians. In this work, we bring to the fore of research the interest to uncover emotional awareness of clinicians during the tele-mental health session. The study reported here aimed at determining whether clinicians process their own emotions, as well as those of the client, in a computer-mediated context. Also, clinicians' decision-making process was assessed because such action appears to be related to the way they feel and recognise how those emotions may change their thinking and impact their interaction with clients. We estimated that such ability in clinicians' would be contrasted when the psychotherapy-session level is conducted via various technologies. Participant of the study were presented by stimuli in different modes of delivery (e.g. text, audio, and video). The experiment indicates that the ability to manage, perceive, and utilise emotions was as being satisfactory during all modes of delivery. In essence, the findings contribute to the field of remote therapy suggesting emotional awareness as a key cognitive factor in diagnosis.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"3 3","pages":"345-370"},"PeriodicalIF":5.4,"publicationDate":"2019-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982805/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-08eCollection Date: 2019-12-01DOI: 10.1007/s41666-019-00045-4
Gaurav N Pradhan, Jamie M Bogle, Michael J Cevette, Jan Stepanek
In this paper, we focus on the application of oculometric patterns extracted from raw eye movements during a mental workload task to assess changes in cognitive performance in healthy youth athletes over the course of a typical sport season. Oculometric features pertaining to fixations and saccades were measured on 116 athletes in pre- and post-season testing. Participants were between 7 and 14 years of age at pre-season testing. Due to varied developmental rates, there were large interindividual performance differences during a mental workload task consisting of reading numbers. Based on different reading speeds, we classified three profiles (slow, moderate, and fast) and established their corresponding baselines for oculometric data. Within each profile, we describe changes in oculomotor function based on changes in cognitive performance during the season. To visualize these changes in multidimensional oculometric data, we also present a multidimensional visualization tool named DiViTo (diagnostic visualization tool). These experimental, computational informatics and visualization methodologies may serve to utilize oculometric information to detect changes in cognitive performance due to mild or severe cognitive impairment such as concussion/mild traumatic brain injury, as well as possibly other disorders such as attention deficit hyperactivity disorders, learning/reading disabilities, impairment of alertness, and neurocognitive function.
{"title":"Discovering Oculometric Patterns to Detect Cognitive Performance Changes in Healthy Youth Football Athletes.","authors":"Gaurav N Pradhan, Jamie M Bogle, Michael J Cevette, Jan Stepanek","doi":"10.1007/s41666-019-00045-4","DOIUrl":"https://doi.org/10.1007/s41666-019-00045-4","url":null,"abstract":"<p><p>In this paper, we focus on the application of oculometric patterns extracted from raw eye movements during a mental workload task to assess changes in cognitive performance in healthy youth athletes over the course of a typical sport season. Oculometric features pertaining to fixations and saccades were measured on 116 athletes in pre- and post-season testing. Participants were between 7 and 14 years of age at pre-season testing. Due to varied developmental rates, there were large interindividual performance differences during a mental workload task consisting of reading numbers. Based on different reading speeds, we classified three profiles (slow, moderate, and fast) and established their corresponding baselines for oculometric data. Within each profile, we describe changes in oculomotor function based on changes in cognitive performance during the season. To visualize these changes in multidimensional oculometric data, we also present a multidimensional visualization tool named DiViTo (diagnostic visualization tool). These experimental, computational informatics and visualization methodologies may serve to utilize oculometric information to detect changes in cognitive performance due to mild or severe cognitive impairment such as concussion/mild traumatic brain injury, as well as possibly other disorders such as attention deficit hyperactivity disorders, learning/reading disabilities, impairment of alertness, and neurocognitive function.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"3 4","pages":"371-392"},"PeriodicalIF":5.4,"publicationDate":"2019-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982780/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-01-28eCollection Date: 2019-06-01DOI: 10.1007/s41666-019-00044-5
Sungrim Moon, Sijia Liu, David Chen, Yanshan Wang, Douglas L Wood, Rajeev Chaudhry, Hongfang Liu, Paul Kingsbury
Outside medical records (OMRs) accompanying referred patients are frequently sent as faxes from external healthcare providers. Accessing useful and relevant information from these OMRs in a timely manner is a challenging task due to a combination of the presence of machine-illegible information and the limited system interoperability inherent in healthcare. Little research has been done on investigating information in OMRs. This paper evaluated overlapping and non-overlapping medical concepts captured from digitally faxed OMRs for patients transferring to the Department of Cardiovascular Medicine and from clinical consultant notes generated at the Mayo Clinic. We used optical character recognition (OCR) techniques to make faxed OMRs machine-readable and used natural language processing (NLP) techniques to capture clinical concepts from both machine-readable OMRs and Mayo clinical notes. We measured the level of overlap in medical concepts between OMRs and Mayo clinical narratives in the quantitative approaches and assessed the salience of concepts specific to Cardiovascular Medicine by calculating the ratio of those mentioned concepts relative to an independent clinical corpus. Among the concepts collected from the OMRs, 11.19% of those were also present in the Mayo clinical narratives that were generated within the 3 months after their initial encounter at the Mayo Clinic. For those common concepts, 73.97% were identified in initial consultant notes (ICNs) and 26.03% were captured over subsequent follow-up consultant notes (FCNs). These findings implied that information collected from the OMRs is potentially informative for patient care, but some valuable information (additionally identified in FCNs) collected from the OMRs is not fully used in an earlier stage of the care process. The concepts collected from the ICNs have the highest salience to Cardiovascular Medicine (0.112) compared to concepts in OMRs and concepts in FCNs. Additionally, unique concepts captured in ICNs (unseen in OMRs or FCNs) carried the most salient information (0.094), which demonstrated that ICNs provided the most informative concepts for the care of transferred patients.
{"title":"Salience of Medical Concepts of Inside Clinical Texts and Outside Medical Records for Referred Cardiovascular Patients.","authors":"Sungrim Moon, Sijia Liu, David Chen, Yanshan Wang, Douglas L Wood, Rajeev Chaudhry, Hongfang Liu, Paul Kingsbury","doi":"10.1007/s41666-019-00044-5","DOIUrl":"https://doi.org/10.1007/s41666-019-00044-5","url":null,"abstract":"<p><p>Outside medical records (OMRs) accompanying referred patients are frequently sent as faxes from external healthcare providers. Accessing useful and relevant information from these OMRs in a timely manner is a challenging task due to a combination of the presence of machine-illegible information and the limited system interoperability inherent in healthcare. Little research has been done on investigating information in OMRs. This paper evaluated overlapping and non-overlapping medical concepts captured from digitally faxed OMRs for patients transferring to the Department of Cardiovascular Medicine and from clinical consultant notes generated at the Mayo Clinic. We used optical character recognition (OCR) techniques to make faxed OMRs machine-readable and used natural language processing (NLP) techniques to capture clinical concepts from both machine-readable OMRs and Mayo clinical notes. We measured the level of overlap in medical concepts between OMRs and Mayo clinical narratives in the quantitative approaches and assessed the salience of concepts specific to Cardiovascular Medicine by calculating the ratio of those mentioned concepts relative to an independent clinical corpus. Among the concepts collected from the OMRs, 11.19% of those were also present in the Mayo clinical narratives that were generated within the 3 months after their initial encounter at the Mayo Clinic. For those common concepts, 73.97% were identified in initial consultant notes (ICNs) and 26.03% were captured over subsequent follow-up consultant notes (FCNs). These findings implied that information collected from the OMRs is potentially informative for patient care, but some valuable information (additionally identified in FCNs) collected from the OMRs is not fully used in an earlier stage of the care process. The concepts collected from the ICNs have the highest salience to Cardiovascular Medicine (0.112) compared to concepts in OMRs and concepts in FCNs. Additionally, unique concepts captured in ICNs (unseen in OMRs or FCNs) carried the most salient information (0.094), which demonstrated that ICNs provided the most informative concepts for the care of transferred patients.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"3 2","pages":"200-219"},"PeriodicalIF":0.0,"publicationDate":"2019-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982748/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140913664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-12eCollection Date: 2019-03-01DOI: 10.1007/s41666-018-0036-7
Sabita Acharya, Andrew D Boyd, Richard Cameron, Karen Dunn Lopez, Pamela Martyn-Nemeth, Carolyn Dickens, Amer Ardati, Jose D Flores, Matt Baumann, Betty Welland, Barbara Di Eugenio
Comprehending medical information is a challenging task, especially for people who have not received formal medical education. When patients are discharged from the hospital, they are provided with lengthy medical documents that contain intricate terminologies. Studies have shown that if people do not understand the content of their health documents, they will neither look for new information regarding their illness nor will they take actions to prevent or recover from their health issue. In this article, we highlight the need for generating personalized hospital-stay summaries and several research challenges associated with this task. The proposed directions are directly informed by our ongoing work in generating concise and comprehensible hospitalization summaries that are tailored to suit the patient's understanding of medical terminologies and level of engagement in improving their own health. Our preliminary evaluation shows that our summaries effectively present required medical concepts.
{"title":"What Happened to Me while I Was in the Hospital? Challenges and Opportunities for Generating Patient-Friendly Hospitalization Summaries.","authors":"Sabita Acharya, Andrew D Boyd, Richard Cameron, Karen Dunn Lopez, Pamela Martyn-Nemeth, Carolyn Dickens, Amer Ardati, Jose D Flores, Matt Baumann, Betty Welland, Barbara Di Eugenio","doi":"10.1007/s41666-018-0036-7","DOIUrl":"https://doi.org/10.1007/s41666-018-0036-7","url":null,"abstract":"<p><p>Comprehending medical information is a challenging task, especially for people who have not received formal medical education. When patients are discharged from the hospital, they are provided with lengthy medical documents that contain intricate terminologies. Studies have shown that if people do not understand the content of their health documents, they will neither look for new information regarding their illness nor will they take actions to prevent or recover from their health issue. In this article, we highlight the need for generating personalized hospital-stay summaries and several research challenges associated with this task. The proposed directions are directly informed by our ongoing work in generating concise and comprehensible hospitalization summaries that are tailored to suit the patient's understanding of medical terminologies and level of engagement in improving their own health. Our preliminary evaluation shows that our summaries effectively present required medical concepts.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"3 1","pages":"107-123"},"PeriodicalIF":3.7,"publicationDate":"2018-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982693/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145093425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-10eCollection Date: 2019-03-01DOI: 10.1007/s41666-018-0035-8
Thomas R Kirchner, Hong Gao, Daniel J Lewis, Andrew Anesetti-Rothermel, Heather A Carlos, Brian House
There is growing interest in the way exposure to neighborhood risk and protective factors affects the health of residents. Although multiple approaches have been reported, empirical methods for contrasting the spatial uncertainty of exposure estimates are not well established. The objective of this paper was to contrast real-time versus neighborhood approximated exposure to the landscape of tobacco outlets across the contiguous US. A nationwide density surface of tobacco retail outlet locations was generated using kernel density estimation (KDE). This surface was linked to participants' (Np = 363) inferred residential location, as well as to their real-time geographic locations, recorded every 10 min over 180 days. Real-time exposure was estimated as the hourly product of radius of gyration and average tobacco outlet density (Nhour = 304, 164 h). Ordinal logit modeling was used to assess the distribution of real-time exposure estimates as a function of each participant's residential exposure. Overall, 61.3% of real-time, hourly exposures were of relatively low intensity, and after controlling for temporal and seasonal variation, 72.8% of the variance among these low-level exposures was accounted for by residence in one of the two lowest residential exposure quintiles. Most moderate to high intensity exposures (38.7% of all real-time, hourly exposures) were no more likely to have been contributed by subjects from any single residential exposure cluster than another. Altogether, 55.2% of the variance in real-time exposures was not explained by participants' residential exposure cluster. Calculating hourly exposure estimates made it possible to directly contrast real-time observations with static residential exposure estimates. Results document the substantial degree that real-time exposures can be misclassified by residential approximations, especially in residential areas characterized by moderate to high retail density levels.
{"title":"Individual Mobility and Uncertain Geographic Context: Real-time Versus Neighborhood Approximated Exposure to Retail Tobacco Outlets Across the US.","authors":"Thomas R Kirchner, Hong Gao, Daniel J Lewis, Andrew Anesetti-Rothermel, Heather A Carlos, Brian House","doi":"10.1007/s41666-018-0035-8","DOIUrl":"https://doi.org/10.1007/s41666-018-0035-8","url":null,"abstract":"<p><p>There is growing interest in the way exposure to neighborhood risk and protective factors affects the health of residents. Although multiple approaches have been reported, empirical methods for contrasting the spatial uncertainty of exposure estimates are not well established. The objective of this paper was to contrast real-time versus neighborhood approximated exposure to the landscape of tobacco outlets across the contiguous US. A nationwide density surface of tobacco retail outlet locations was generated using kernel density estimation (KDE). This surface was linked to participants' (<i>N</i> <sub><i>p</i></sub> = 363) inferred residential location, as well as to their real-time geographic locations, recorded every 10 min over 180 days. Real-time exposure was estimated as the hourly product of radius of gyration and average tobacco outlet density (<i>N</i> <sub>hour</sub> = 304, 164 h). Ordinal logit modeling was used to assess the distribution of real-time exposure estimates as a function of each participant's residential exposure. Overall, 61.3% of real-time, hourly exposures were of relatively low intensity, and after controlling for temporal and seasonal variation, 72.8% of the variance among these low-level exposures was accounted for by residence in one of the two lowest residential exposure quintiles. Most moderate to high intensity exposures (38.7% of all real-time, hourly exposures) were no more likely to have been contributed by subjects from any single residential exposure cluster than another. Altogether, 55.2% of the variance in real-time exposures was <i>not</i> explained by participants' residential exposure cluster. Calculating hourly exposure estimates made it possible to directly contrast real-time observations with static residential exposure estimates. Results document the substantial degree that real-time exposures can be misclassified by residential approximations, especially in residential areas characterized by moderate to high retail density levels.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"3 1","pages":"70-85"},"PeriodicalIF":5.4,"publicationDate":"2018-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982724/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-08-30eCollection Date: 2019-03-01DOI: 10.1007/s41666-018-0032-y
Yan Hu, Rui Wang, Feng Chen
Online health discussion forums as information exchange repository are used by different patient groups for sharing experience and seeking advice. Their accessibility is tremendously expanded in the last decade with the rapid growth of mobile internet. Among many popular topics, "drug-drug interactions" (DDIs) forum embeds a large number of DDIs hazards patient experienced however not published. In this paper, we intend to uncover the potential DDIs from the online forums and formulate the task as a sub-graph detection problem, such that co-mentioned drugs and symptoms are modeled as vertices, along with the occurrences are modeled as weighted edges. Therefore, a connected sub-graph consisting of both symptoms and drug vertices reveals DDIs occurrence. We then propose a novel bi-submodular function to characterize the likelihood of DDI occurrence within a connected sub-graph and apply an approximated algorithm to resolve the bi-submodular optimization (BSMO). The complexity of the algorithm is nearly linear. Our extensive experiments demonstrate the effectiveness and efficiency of the proposed approach.
{"title":"Bi-submodular Optimization (BSMO) for Detecting Drug-Drug Interactions (DDIs) from On-line Health Forums.","authors":"Yan Hu, Rui Wang, Feng Chen","doi":"10.1007/s41666-018-0032-y","DOIUrl":"https://doi.org/10.1007/s41666-018-0032-y","url":null,"abstract":"<p><p>Online health discussion forums as information exchange repository are used by different patient groups for sharing experience and seeking advice. Their accessibility is tremendously expanded in the last decade with the rapid growth of mobile internet. Among many popular topics, \"drug-drug interactions\" (DDIs) forum embeds a large number of DDIs hazards patient experienced however not published. In this paper, we intend to uncover the potential DDIs from the online forums and formulate the task as a sub-graph detection problem, such that co-mentioned drugs and symptoms are modeled as vertices, along with the occurrences are modeled as weighted edges. Therefore, a connected sub-graph consisting of both symptoms and drug vertices reveals DDIs occurrence. We then propose a novel bi-submodular function to characterize the likelihood of DDI occurrence within a connected sub-graph and apply an approximated algorithm to resolve the bi-submodular optimization (BSMO). The complexity of the algorithm is nearly linear. Our extensive experiments demonstrate the effectiveness and efficiency of the proposed approach.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"3 1","pages":"19-42"},"PeriodicalIF":5.4,"publicationDate":"2018-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-22eCollection Date: 2018-09-01DOI: 10.1007/s41666-018-0019-8
Keith Feldman, Reid A Johnson, Nitesh V Chawla
Coupled with the rise of data science and machine learning, the increasing availability of digitized health and wellness data has provided an exciting opportunity for complex analyses of problems throughout the healthcare domain. Whereas many early works focused on a particular aspect of patient care, often drawing on data from a specific clinical or administrative source, it has become clear such a single-source approach is insufficient to capture the complexity of the human condition. Instead, adequately modeling health and wellness problems requires the ability to draw upon data spanning multiple facets of an individual's biology, their care, and the social aspects of their life. Although such an awareness has greatly expanded the breadth of health and wellness data collected, the diverse array of data sources and intended uses often leave researchers and practitioners with a scattered and fragmented view of any particular patient. As a result, there exists a clear need to catalogue and organize the range of healthcare data available for analysis. This work represents an effort at developing such an organization, presenting a patient-centric framework deemed the Healthcare Data Spectrum (HDS). Comprised of six layers, the HDS begins with the innermost micro-level omics and macro-level demographic data that directly characterize a patient, and extends at its outermost to aggregate population-level data derived from attributes of care for each individual patient. For each level of the HDS, this manuscript will examine the specific types of constituent data, provide examples of how the data aid in a broad set of research problems, and identify the primary terminology and standards used to describe the data.
{"title":"The State of Data in Healthcare: Path Towards Standardization.","authors":"Keith Feldman, Reid A Johnson, Nitesh V Chawla","doi":"10.1007/s41666-018-0019-8","DOIUrl":"https://doi.org/10.1007/s41666-018-0019-8","url":null,"abstract":"<p><p>Coupled with the rise of data science and machine learning, the increasing availability of digitized health and wellness data has provided an exciting opportunity for complex analyses of problems throughout the healthcare domain. Whereas many early works focused on a particular aspect of patient care, often drawing on data from a specific clinical or administrative source, it has become clear such a single-source approach is insufficient to capture the complexity of the human condition. Instead, adequately modeling health and wellness problems requires the ability to draw upon data spanning multiple facets of an individual's biology, their care, and the social aspects of their life. Although such an awareness has greatly expanded the breadth of health and wellness data collected, the diverse array of data sources and intended uses often leave researchers and practitioners with a scattered and fragmented view of any particular patient. As a result, there exists a clear need to catalogue and organize the range of healthcare data available for analysis. This work represents an effort at developing such an organization, presenting a patient-centric framework deemed the Healthcare Data Spectrum (HDS). Comprised of six layers, the HDS begins with the innermost micro-level omics and macro-level demographic data that directly characterize a patient, and extends at its outermost to aggregate population-level data derived from attributes of care for each individual patient. For each level of the HDS, this manuscript will examine the specific types of constituent data, provide examples of how the data aid in a broad set of research problems, and identify the primary terminology and standards used to describe the data.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"2 3","pages":"248-271"},"PeriodicalIF":5.4,"publicationDate":"2018-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982788/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-06eCollection Date: 2017-12-01DOI: 10.1007/s41666-017-0010-9
S M Shamimul Hasan, Edward A Fox, Keith Bisset, Madhav V Marathe
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. As a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK-a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks-aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed-the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. We show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.
{"title":"EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases.","authors":"S M Shamimul Hasan, Edward A Fox, Keith Bisset, Madhav V Marathe","doi":"10.1007/s41666-017-0010-9","DOIUrl":"https://doi.org/10.1007/s41666-017-0010-9","url":null,"abstract":"<p><p>Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. As a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop <b>EpiK</b>-a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks-aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed-the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. We show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"1 2","pages":"260-303"},"PeriodicalIF":5.4,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982844/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143417135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}