New Generation Sequencing (NGS) technologies offer new insights to researchers in the field of oncogenomics. These technologies provide valuable genetic information by rapidly detecting and identifying expected mutations to improve clinical treatments. To be used effectively, this large amount of data has to be processed, explored and interpreted carefully and quickly. Meanwhile, cancer research continues to publish new theories and findings based on large-scale collaborative projects that provide publicly available genomic and clinical cancer data. However, researchers have a hard time using the data to its full potential although it's readily available. Between the growing output size and complexity of NGS technologies, and the growing number of publicly available heterogeneous databases, processing and exploring this data can become a challenge for the average researcher. This paper presents GenomeViewer's functionalities, which specializes in visualization of somatic mutations in cancer genomics. This easy to use software will enable cancer researchers to seamlessly compare their data against publicly available resources. GenomeViewer uses "Big Data" technologies such as Spark and Parquet, and is based on the UC Berkeley's Analysis Data Model (ADAM) genomic format for cloud scale computing. Our hope is that GenomeViewer will become the preferred tool for viewing somatic mutations for researchers in cancer genomics.
{"title":"GENOMEVIEWER: An Interactive Genomic Somatic Mutation Visualizer.","authors":"Beatriz S. Kanzki, A. April","doi":"10.1145/3079452.3079477","DOIUrl":"https://doi.org/10.1145/3079452.3079477","url":null,"abstract":"New Generation Sequencing (NGS) technologies offer new insights to researchers in the field of oncogenomics. These technologies provide valuable genetic information by rapidly detecting and identifying expected mutations to improve clinical treatments. To be used effectively, this large amount of data has to be processed, explored and interpreted carefully and quickly. Meanwhile, cancer research continues to publish new theories and findings based on large-scale collaborative projects that provide publicly available genomic and clinical cancer data. However, researchers have a hard time using the data to its full potential although it's readily available. Between the growing output size and complexity of NGS technologies, and the growing number of publicly available heterogeneous databases, processing and exploring this data can become a challenge for the average researcher. This paper presents GenomeViewer's functionalities, which specializes in visualization of somatic mutations in cancer genomics. This easy to use software will enable cancer researchers to seamlessly compare their data against publicly available resources. GenomeViewer uses \"Big Data\" technologies such as Spark and Parquet, and is based on the UC Berkeley's Analysis Data Model (ADAM) genomic format for cloud scale computing. Our hope is that GenomeViewer will become the preferred tool for viewing somatic mutations for researchers in cancer genomics.","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128609082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Galletta, A. Celesti, F. Tusa, M. Fazio, P. Bramanti, M. Villari
Nowadays, we are observing an explosion in the proliferation of clinical data. In this context, a typical example of the well-known big data problem is represented by the huge amount of Magnetic Resonance Imaging (MRI) files that need to be stored and analysed. Although the Cloud computing technology can address such a demanding problem, data reliability, availability and privacy are three of the major concerns against the large scale adoption of Cloud storage systems in the healthcare context - this is why hospitals are reluctant to move the patients' data over the Cloud. In this paper, we focus on data reliability and availability and we discuss an approach that allows healthcare centres storing clinical data in a Multi-Cloud storage environment while guaranteeing patients' privacy. Experiments proved the feasibility of our approach.
{"title":"Big MRI Data Dissemination and Retrieval in a Multi-Cloud Hospital Storage System","authors":"A. Galletta, A. Celesti, F. Tusa, M. Fazio, P. Bramanti, M. Villari","doi":"10.1145/3079452.3079507","DOIUrl":"https://doi.org/10.1145/3079452.3079507","url":null,"abstract":"Nowadays, we are observing an explosion in the proliferation of clinical data. In this context, a typical example of the well-known big data problem is represented by the huge amount of Magnetic Resonance Imaging (MRI) files that need to be stored and analysed. Although the Cloud computing technology can address such a demanding problem, data reliability, availability and privacy are three of the major concerns against the large scale adoption of Cloud storage systems in the healthcare context - this is why hospitals are reluctant to move the patients' data over the Cloud. In this paper, we focus on data reliability and availability and we discuss an approach that allows healthcare centres storing clinical data in a Multi-Cloud storage environment while guaranteeing patients' privacy. Experiments proved the feasibility of our approach.","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125629076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Public Health is crucial to manage and monitor threats to the health of the population. In recent years, Twitter has been successfully applied to monitor diseases through its ability to provide near real-time data and proved to be an asset to the domain. This research aims to further explore capabilities of Twitter in the disease surveillance field by focusing on its geolocation feature and health mentions, identifiable through disease-specific language patterns present in Twitter messages.
{"title":"Text Mining from Social Media for Public Health Applications","authors":"Joana M. Barros","doi":"10.1145/3079452.3079475","DOIUrl":"https://doi.org/10.1145/3079452.3079475","url":null,"abstract":"Public Health is crucial to manage and monitor threats to the health of the population. In recent years, Twitter has been successfully applied to monitor diseases through its ability to provide near real-time data and proved to be an asset to the domain. This research aims to further explore capabilities of Twitter in the disease surveillance field by focusing on its geolocation feature and health mentions, identifiable through disease-specific language patterns present in Twitter messages.","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126894360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chadi Helwe, Shady Elbassuoni, Mirabelle Geha, E. Hitti, C. Obermeyer
A standard procedure in the medical domain is to code discharge diagnoses into a set of manageable categories known as the CCS codes. This is typically done by first manually coding the discharge diagnoses into the standard ICD codes and then using a one-to-one mapping between ICD and CCS codes. In this paper, we study the applicability of deep learning to perform automatic coding of discharge diagnoses into CCS codes. In particular, we build an LSTM network combined with a dense neural network that uses medically-trained word embeddings to code discharge diagnoses into single-level CCS codes. We also investigate the advantage of mapping discharge diagnoses into UMLS concepts before coding is carried out. Experimental results based on a large dataset of manually coded discharge diagnoses show that our deep-learning model outperforms the state-of-the-art automatic coding approaches and that the mapping to UMLS concepts consistently results in significant improvement in the coding accuracy.
{"title":"CCS Coding of Discharge Diagnoses via Deep Neural Networks","authors":"Chadi Helwe, Shady Elbassuoni, Mirabelle Geha, E. Hitti, C. Obermeyer","doi":"10.1145/3079452.3079498","DOIUrl":"https://doi.org/10.1145/3079452.3079498","url":null,"abstract":"A standard procedure in the medical domain is to code discharge diagnoses into a set of manageable categories known as the CCS codes. This is typically done by first manually coding the discharge diagnoses into the standard ICD codes and then using a one-to-one mapping between ICD and CCS codes. In this paper, we study the applicability of deep learning to perform automatic coding of discharge diagnoses into CCS codes. In particular, we build an LSTM network combined with a dense neural network that uses medically-trained word embeddings to code discharge diagnoses into single-level CCS codes. We also investigate the advantage of mapping discharge diagnoses into UMLS concepts before coding is carried out. Experimental results based on a large dataset of manually coded discharge diagnoses show that our deep-learning model outperforms the state-of-the-art automatic coding approaches and that the mapping to UMLS concepts consistently results in significant improvement in the coding accuracy.","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124405445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vaccine hesitancy, traditionally linked to issues of trust, misinformation and prior beliefs, has been increasingly fuelled by influential groups on social media (SM) and the Internet. Analysis of news media and social networks (SN) accessible in real-time provides a new opportunity for detecting changes in public confidence in vaccines. However, different concerns are important in different regions, and reasons for hesitancy and the role of opinion leaders vary between sub-controversies in the broader vaccination debates. It is therefore important for public health professionals to gain an overview of the emerging debates in cyberspace, identify influential users and rumours, and assess their impact in order to know how to respond. The VAC Medi+Board project aims to visualise the diffusion of rumours through SN and assess the impact of key individuals. We include, as a case study, discussions during winter 2015-16 pertaining to the alleged side-effects of the HPV vaccine.
{"title":"Who is Spreading Rumours about Vaccines?: Influential User Impact Modelling in Social Networks","authors":"P. Kostkova, Vino Mano, H. Larson, W. Schulz","doi":"10.1145/3079452.3079505","DOIUrl":"https://doi.org/10.1145/3079452.3079505","url":null,"abstract":"Vaccine hesitancy, traditionally linked to issues of trust, misinformation and prior beliefs, has been increasingly fuelled by influential groups on social media (SM) and the Internet. Analysis of news media and social networks (SN) accessible in real-time provides a new opportunity for detecting changes in public confidence in vaccines. However, different concerns are important in different regions, and reasons for hesitancy and the role of opinion leaders vary between sub-controversies in the broader vaccination debates. It is therefore important for public health professionals to gain an overview of the emerging debates in cyberspace, identify influential users and rumours, and assess their impact in order to know how to respond. The VAC Medi+Board project aims to visualise the diffusion of rumours through SN and assess the impact of key individuals. We include, as a case study, discussions during winter 2015-16 pertaining to the alleged side-effects of the HPV vaccine.","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123546386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Stephens, R. Rodríguez-Ramírez, V. Mireles, Sergio Hernández López, Concepción Garcia-Aguirre, J. Ortiz, N. Mantilla-Beniers
Internet-based monitoring of influenza-like illnesses (ILI) has become more common since its beginnings nearly a decade ago, both through estimates based on the number of searches for influenza-related terms (e.g., Google flu trends), or by means of participatory surveillance systems. The latter, often seen as ways of engaging people in matters of scientific and public health importance, gather a wealth of potentially valuable epidemiological information complementary to that obtained through the established disease surveillance networks and also usually absent from search-based web algorithms. We present a statistical analysis of the data from the Mexican monitoring website "Reporta" by which the risk factors linked to reporting of ILI symptoms as outcome among its participants are determined, and interpret these results based on current knowledge of the factors that influence transmission of infection resulting in disease. Besides standard factors associated with enhanced susceptibility to infection some novel behavioral factors linked to high risk were: (i) use of public transport; (ii) frequent contact with animals, and (iii) use of non-standard interventions, such as homeopathy. While close contact with large groups of people in public transportation is generally assumed to be important in disease spread, frequent contact with animals is not. Our results are consistent with previous observations that animals may serve as mobile fomites and hence increase the propensity to develop disease. We conclude that analysis of rich information sets from Internet-based systems may suggest novel ideas on disease spread that are worth following up with field research.
{"title":"Risk Factors Linked to Influenza-like Illness as Identified from the Mexican Participatory Surveillance System: Risk Factors in ILI","authors":"C. Stephens, R. Rodríguez-Ramírez, V. Mireles, Sergio Hernández López, Concepción Garcia-Aguirre, J. Ortiz, N. Mantilla-Beniers","doi":"10.1145/3079452.3079471","DOIUrl":"https://doi.org/10.1145/3079452.3079471","url":null,"abstract":"Internet-based monitoring of influenza-like illnesses (ILI) has become more common since its beginnings nearly a decade ago, both through estimates based on the number of searches for influenza-related terms (e.g., Google flu trends), or by means of participatory surveillance systems. The latter, often seen as ways of engaging people in matters of scientific and public health importance, gather a wealth of potentially valuable epidemiological information complementary to that obtained through the established disease surveillance networks and also usually absent from search-based web algorithms. We present a statistical analysis of the data from the Mexican monitoring website \"Reporta\" by which the risk factors linked to reporting of ILI symptoms as outcome among its participants are determined, and interpret these results based on current knowledge of the factors that influence transmission of infection resulting in disease. Besides standard factors associated with enhanced susceptibility to infection some novel behavioral factors linked to high risk were: (i) use of public transport; (ii) frequent contact with animals, and (iii) use of non-standard interventions, such as homeopathy. While close contact with large groups of people in public transportation is generally assumed to be important in disease spread, frequent contact with animals is not. Our results are consistent with previous observations that animals may serve as mobile fomites and hence increase the propensity to develop disease. We conclude that analysis of rich information sets from Internet-based systems may suggest novel ideas on disease spread that are worth following up with field research.","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124837425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper is devoted to mathematical modelling of the progression and stages of breast cancer. The "Consolidated mathematical growth Model of primary tumor (PT) and secondary distant metastases (MTS) in patients with lymph nodes MTS (Stage III)" (CoM-III) is proposed as a new research tool. The CoM-III rests on an exponential tumor growth model and consists of a system of determinate nonlinear and linear equations. The CoM-III describes correctly primary tumor growth (parameter T) and distant metastases growth (parameter M, parameter N). The CoM-III model and predictive software: a) detect different growth periods of primary tumor and distant metastases in patients with lymph nodes MTS; b) make forecast of the period of the distant metastases appearance in patients with lymph nodes MTS; c) have higher average prediction accuracy than the other tools; d) can improve forecasts on survival of breast cancer and facilitate optimisation of diagnostic tests. The CoM-III enables us, for the first time, to predict the it whole natural history of PT and secondary distant MTS growth of patients with/without lymph nodes MTS on each stage relying only on PT sizes.
{"title":"On Consolidated Predictive Model of the Natural History of Breast Cancer: Primary Tumor and Secondary Metastases in Patients with Lymph Nodes Metastases","authors":"E. Tyuryumina, A. Neznanov","doi":"10.1145/3079452.3079461","DOIUrl":"https://doi.org/10.1145/3079452.3079461","url":null,"abstract":"This paper is devoted to mathematical modelling of the progression and stages of breast cancer. The \"Consolidated mathematical growth Model of primary tumor (PT) and secondary distant metastases (MTS) in patients with lymph nodes MTS (Stage III)\" (CoM-III) is proposed as a new research tool. The CoM-III rests on an exponential tumor growth model and consists of a system of determinate nonlinear and linear equations. The CoM-III describes correctly primary tumor growth (parameter T) and distant metastases growth (parameter M, parameter N). The CoM-III model and predictive software: a) detect different growth periods of primary tumor and distant metastases in patients with lymph nodes MTS; b) make forecast of the period of the distant metastases appearance in patients with lymph nodes MTS; c) have higher average prediction accuracy than the other tools; d) can improve forecasts on survival of breast cancer and facilitate optimisation of diagnostic tests. The CoM-III enables us, for the first time, to predict the it whole natural history of PT and secondary distant MTS growth of patients with/without lymph nodes MTS on each stage relying only on PT sizes.","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130261757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The U.S. Food and Drug Administration uses the Center for Food Safety and Applied Nutrition (CFSAN) Adverse Event Reporting System (CAERS) as the primary tool for identifying new and emerging dietary supplement adverse events. Despite mandatory and voluntary reporting of dietary supplement adverse events to CAERS, many continue to go unreported. Availability of social media has enabled dietary supplement consumers to freely share their concerns and experiences online. Such consumer generated information can be a useful source to further monitor the safety of dietary supplements. To study the usefulness of social media (Twitter in particular) for safety surveillance of dietary supplements, we developed a computational processing pipeline: 1) machine learning based identification of potential Twitter posts (tweets) of personal experiences related to the use of dietary supplements, 2) detection of potential supplement events from these tweets using the medpie open source tool, and 3) mapping detected events to effects through the taxonomy provided in SNOMED CT. Using our pipeline, we identified, from a group of 1,244,661 tweets collected, a total of 17,346 personal experience tweets pertaining to 4 dietary supplements. A total of 191 effects were mapped to SNOMED CT and we discovered that 48 of the 191 effects are not listed in either of the two online sources we referenced. However, the effects discovered from the social media data will need to be verified and confirmed with other sources and/or clinical evidences.
{"title":"Discovering Potential Effects of Dietary Supplements from Twitter Data","authors":"Keyuan Jiang","doi":"10.1145/3079452.3079467","DOIUrl":"https://doi.org/10.1145/3079452.3079467","url":null,"abstract":"The U.S. Food and Drug Administration uses the Center for Food Safety and Applied Nutrition (CFSAN) Adverse Event Reporting System (CAERS) as the primary tool for identifying new and emerging dietary supplement adverse events. Despite mandatory and voluntary reporting of dietary supplement adverse events to CAERS, many continue to go unreported. Availability of social media has enabled dietary supplement consumers to freely share their concerns and experiences online. Such consumer generated information can be a useful source to further monitor the safety of dietary supplements. To study the usefulness of social media (Twitter in particular) for safety surveillance of dietary supplements, we developed a computational processing pipeline: 1) machine learning based identification of potential Twitter posts (tweets) of personal experiences related to the use of dietary supplements, 2) detection of potential supplement events from these tweets using the medpie open source tool, and 3) mapping detected events to effects through the taxonomy provided in SNOMED CT. Using our pipeline, we identified, from a group of 1,244,661 tweets collected, a total of 17,346 personal experience tweets pertaining to 4 dietary supplements. A total of 191 effects were mapped to SNOMED CT and we discovered that 48 of the 191 effects are not listed in either of the two online sources we referenced. However, the effects discovered from the social media data will need to be verified and confirmed with other sources and/or clinical evidences.","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117098618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karthik Srinivasan, Faiz Currim, S. Ram, M. Mehl, Casey Lindberg, Esther Sternberg, Perry Skeath, Davida Herzl, Reuben Herzl, M. Lunden, Nicole Goebel, Scott Andrews, B. Najafi, J. Razjouyan, Hyo-Ki Lee, Brian Gilligan, J. Heerwagen, Kevin Kampschroer, Kelli Canada
Recent development of wearable sensor technologies have made it possible to capture concurrent data streams for ambient environment and instantaneous physiological stress response at a fine granularity. Characterizing the delay in physiological stress response time to each environment stimulus is as important as capturing the magnitude of the effect. In this paper, we discuss and evaluate a new regularization-based statistical method to determine the ideal lagged effect of five environmental factors-carbon dioxide, temperature, relative humidity, atmospheric pressure and noise levels on instantaneous stress response. Using this method, we infer that the first four environment variables have a cumulative lagged effect, of approximately 60 minutes, on stress response whereas noise level has an instantaneous effect on stress response. The proposed transformations to inputs result in models with better fit and predictive performance. This study not only informs the field of environment-wellbeing research about the cumulative lagged effects of the specified environmental factors, but also proposes a new method for determining optimal feature transformation in similar smart health studies.
{"title":"A Regularization Approach for Identifying Cumulative Lagged Effects in Smart Health Applications","authors":"Karthik Srinivasan, Faiz Currim, S. Ram, M. Mehl, Casey Lindberg, Esther Sternberg, Perry Skeath, Davida Herzl, Reuben Herzl, M. Lunden, Nicole Goebel, Scott Andrews, B. Najafi, J. Razjouyan, Hyo-Ki Lee, Brian Gilligan, J. Heerwagen, Kevin Kampschroer, Kelli Canada","doi":"10.1145/3079452.3079503","DOIUrl":"https://doi.org/10.1145/3079452.3079503","url":null,"abstract":"Recent development of wearable sensor technologies have made it possible to capture concurrent data streams for ambient environment and instantaneous physiological stress response at a fine granularity. Characterizing the delay in physiological stress response time to each environment stimulus is as important as capturing the magnitude of the effect. In this paper, we discuss and evaluate a new regularization-based statistical method to determine the ideal lagged effect of five environmental factors-carbon dioxide, temperature, relative humidity, atmospheric pressure and noise levels on instantaneous stress response. Using this method, we infer that the first four environment variables have a cumulative lagged effect, of approximately 60 minutes, on stress response whereas noise level has an instantaneous effect on stress response. The proposed transformations to inputs result in models with better fit and predictive performance. This study not only informs the field of environment-wellbeing research about the cumulative lagged effects of the specified environmental factors, but also proposes a new method for determining optimal feature transformation in similar smart health studies.","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114276830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Majed M. Al-Jefri, R. Evans, Pietro Ghezzi, Gulden Uchyigit
Automatic assessment of the quality of online health information is a need especially with the massive growth of online content. In this paper, we present an approach to assessing the quality of health webpages based on their content rather than on purely technical features, by applying machine learning techniques to the automatic identification of evidence-based health information. Several machine learning approaches were applied to learn classifiers using different combinations of features. Three datasets were used in this study for three different diseases, namely shingles, flu and migraine. The results obtained using the classifiers were promising in terms of precision and recall especially with diseases with few different pathogenic mechanisms.
{"title":"Using Machine Learning for Automatic Identification of Evidence-Based Health Information on the Web","authors":"Majed M. Al-Jefri, R. Evans, Pietro Ghezzi, Gulden Uchyigit","doi":"10.1145/3079452.3079470","DOIUrl":"https://doi.org/10.1145/3079452.3079470","url":null,"abstract":"Automatic assessment of the quality of online health information is a need especially with the massive growth of online content. In this paper, we present an approach to assessing the quality of health webpages based on their content rather than on purely technical features, by applying machine learning techniques to the automatic identification of evidence-based health information. Several machine learning approaches were applied to learn classifiers using different combinations of features. Three datasets were used in this study for three different diseases, namely shingles, flu and migraine. The results obtained using the classifiers were promising in terms of precision and recall especially with diseases with few different pathogenic mechanisms.","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122078899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}