The increasing challenges and requirements of medical image retrieval systems are leading the scientific community towards exploring modern representation methods as a means to improve clinical information retrieval as we know it. While current research tackles medical image retrieval through text-based, visual-based, or mixed approaches, representation learning can play an important role in improving retrieval capabilities by encoding medical image content into compact representations, addressing the problem of dimensionality. This paper introduces the potential of representation learning for the retrieval of high dimensionality imaging studies through automatically learned representations for regions of interest. Preliminary results are presented for feature learning through adversarial auto-encoding, based on the VISCERAL medical image retrieval benchmark.
{"title":"Volumetric Feature Learning for Query-by-Example in Medical Imaging Archives","authors":"Eduardo Pinho, J. F. Silva, C. Costa","doi":"10.1109/CBMS.2019.00038","DOIUrl":"https://doi.org/10.1109/CBMS.2019.00038","url":null,"abstract":"The increasing challenges and requirements of medical image retrieval systems are leading the scientific community towards exploring modern representation methods as a means to improve clinical information retrieval as we know it. While current research tackles medical image retrieval through text-based, visual-based, or mixed approaches, representation learning can play an important role in improving retrieval capabilities by encoding medical image content into compact representations, addressing the problem of dimensionality. This paper introduces the potential of representation learning for the retrieval of high dimensionality imaging studies through automatically learned representations for regions of interest. Preliminary results are presented for feature learning through adversarial auto-encoding, based on the VISCERAL medical image retrieval benchmark.","PeriodicalId":311634,"journal":{"name":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126343492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Hulme, Charlotte Stockton-Powdrell, S. Lewis, G. Martin, S. Bucci, B. Parsia, A. Casson, I. Habli, Niels Peek
Ecological Momentary Assessment (EMA) tools are used to monitor the thoughts and feelings of people in their everyday lives over time. In this paper we examine the feasibility of multi-item, multi-subject Hidden Markov Models (HMMs) to identify response clusters in people with schizophrenia. Data comprise 49 participants from two randomised clinical trials using the mobile app ClinTouch, an EMA tool for daily monitoring of schizophrenia symptoms. The app was used for up to 12 weeks (median follow-up 83 days, 78% response rate). We find that a 3-cluster model with 3 states per cluster performs best amongst the configurations tested, and the feasibility of HMMs as applied to multi-item EMA data is demonstrated. However, there is substantial heterogeneity between participants within each hidden state for which sampling error due to short observation periods is a likely contributor. More data are needed to validate and refine the modelling approach taken here.
{"title":"Cluster Hidden Markov Models: An Application to Ecological Momentary Assessment of Schizophrenia","authors":"W. Hulme, Charlotte Stockton-Powdrell, S. Lewis, G. Martin, S. Bucci, B. Parsia, A. Casson, I. Habli, Niels Peek","doi":"10.1109/CBMS.2019.00030","DOIUrl":"https://doi.org/10.1109/CBMS.2019.00030","url":null,"abstract":"Ecological Momentary Assessment (EMA) tools are used to monitor the thoughts and feelings of people in their everyday lives over time. In this paper we examine the feasibility of multi-item, multi-subject Hidden Markov Models (HMMs) to identify response clusters in people with schizophrenia. Data comprise 49 participants from two randomised clinical trials using the mobile app ClinTouch, an EMA tool for daily monitoring of schizophrenia symptoms. The app was used for up to 12 weeks (median follow-up 83 days, 78% response rate). We find that a 3-cluster model with 3 states per cluster performs best amongst the configurations tested, and the feasibility of HMMs as applied to multi-item EMA data is demonstrated. However, there is substantial heterogeneity between participants within each hidden state for which sampling error due to short observation periods is a likely contributor. More data are needed to validate and refine the modelling approach taken here.","PeriodicalId":311634,"journal":{"name":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128151467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vedhas Pandit, Maximilian Schmitt, N. Cummins, Björn Schuller
Affective computing 'in the wild' is of huge relevance to the healthcare field, like it is for many industries today. Applications of direct relevance are patient monitoring (e.g., emotional state, depression and pain monitoring), health information mining, diagnosis and opinion mining (e.g., from medical reports and drug reviews). The prevalence of the text modality in the medical field for various reasons – e.g., privacy laws, high costs and prohibitory memory requirements for audio and video data – has made the text modality the most popular. Deviating away from traditionally a classification task at a sample-level, the promising baseline results for the Audio/Visual Emotion Challenge (AVEC) 2017 make a strong case for the suitability of text data for a 'time-continuous' affect estimation. For the very first time, we present insights into the inner workings of deep learning, 'in the wild' affect-predicting, time-continuous regression model. We compute relevance of the sparse text-based bag-of-words features (BoTW) of the AVEC 2017 challenge in estimating the three affect labels, viz. arousal, valence and liking, by using a layerwise relevance propagation method(LRP). Interestingly, the trained models are found to rely more on adjectives and adverbs such as 'schlecht', 'gut', 'genau' with positive or negative connotations, and action descriptors such as and – quite analogous to the human perception of emotion expression.
{"title":"I Know How you Feel Now, and Here's why!: Demystifying Time-Continuous High Resolution Text-Based Affect Predictions in the Wild","authors":"Vedhas Pandit, Maximilian Schmitt, N. Cummins, Björn Schuller","doi":"10.1109/CBMS.2019.00096","DOIUrl":"https://doi.org/10.1109/CBMS.2019.00096","url":null,"abstract":"Affective computing 'in the wild' is of huge relevance to the healthcare field, like it is for many industries today. Applications of direct relevance are patient monitoring (e.g., emotional state, depression and pain monitoring), health information mining, diagnosis and opinion mining (e.g., from medical reports and drug reviews). The prevalence of the text modality in the medical field for various reasons – e.g., privacy laws, high costs and prohibitory memory requirements for audio and video data – has made the text modality the most popular. Deviating away from traditionally a classification task at a sample-level, the promising baseline results for the Audio/Visual Emotion Challenge (AVEC) 2017 make a strong case for the suitability of text data for a 'time-continuous' affect estimation. For the very first time, we present insights into the inner workings of deep learning, 'in the wild' affect-predicting, time-continuous regression model. We compute relevance of the sparse text-based bag-of-words features (BoTW) of the AVEC 2017 challenge in estimating the three affect labels, viz. arousal, valence and liking, by using a layerwise relevance propagation method(LRP). Interestingly, the trained models are found to rely more on adjectives and adverbs such as 'schlecht', 'gut', 'genau' with positive or negative connotations, and action descriptors such as and – quite analogous to the human perception of emotion expression.","PeriodicalId":311634,"journal":{"name":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114811636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robin Kraft, Ferdinand Birk, M. Reichert, A. Deshpande, W. Schlee, B. Langguth, H. Baumeister, T. Probst, M. Spiliopoulou, R. Pryss
Smart devices and low-powered sensors are becoming increasingly ubiquitous and nowadays almost all of these devices are connected, which is a promising foundation for crowdsensing of data related to various environmental phenomena. Resulting data is especially meaningful when it is related to time and location. Interestingly, many existing approaches built their solution on monolithic backends that process data on a per-request basis. However, for many scenarios, such technical setting is not suitable for managing data requests of a large crowd. For example, when dealing with millions of data points, still many challenges arise for modern smartphones if calculations or advanced visualization features must be accomplished directly on the smartphone. Therefore, the work at hand proposes an architectural design for managing geospatial data of tinnitus patients, which combines a cloudnative approach with Big Data concepts used in the Internet of Things. The presented architectural design shall serve as a generic foundation to implement (1) a scalable backend for a platform that covers the aforementioned crowdsensing requirements as well as to provide (2) a sophisticated stream processing concept to calculate and pre-aggregate incoming measurement data of tinnitus patients. Following this, this paper presents a visualization feature to provide users with a comprehensive overview of noise levels in their environment based on noise measurements. This shall help tinnitus or hearing-impaired patients to avoid locations with a burdensome sound level.
{"title":"Design and Implementation of a Scalable Crowdsensing Platform for Geospatial Data of Tinnitus Patients","authors":"Robin Kraft, Ferdinand Birk, M. Reichert, A. Deshpande, W. Schlee, B. Langguth, H. Baumeister, T. Probst, M. Spiliopoulou, R. Pryss","doi":"10.1109/CBMS.2019.00068","DOIUrl":"https://doi.org/10.1109/CBMS.2019.00068","url":null,"abstract":"Smart devices and low-powered sensors are becoming increasingly ubiquitous and nowadays almost all of these devices are connected, which is a promising foundation for crowdsensing of data related to various environmental phenomena. Resulting data is especially meaningful when it is related to time and location. Interestingly, many existing approaches built their solution on monolithic backends that process data on a per-request basis. However, for many scenarios, such technical setting is not suitable for managing data requests of a large crowd. For example, when dealing with millions of data points, still many challenges arise for modern smartphones if calculations or advanced visualization features must be accomplished directly on the smartphone. Therefore, the work at hand proposes an architectural design for managing geospatial data of tinnitus patients, which combines a cloudnative approach with Big Data concepts used in the Internet of Things. The presented architectural design shall serve as a generic foundation to implement (1) a scalable backend for a platform that covers the aforementioned crowdsensing requirements as well as to provide (2) a sophisticated stream processing concept to calculate and pre-aggregate incoming measurement data of tinnitus patients. Following this, this paper presents a visualization feature to provide users with a comprehensive overview of noise levels in their environment based on noise measurements. This shall help tinnitus or hearing-impaired patients to avoid locations with a burdensome sound level.","PeriodicalId":311634,"journal":{"name":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121808934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deregulated splicing factors have shown to be associated with the development of several types of cancer and, therefore, the determination of such alterations can help the development of tumor-specific molecular targets for early prognosis and therapy. Determining the relevant splicing factors, however, is not a straightforward task mainly due to the heterogeneity of tumors and the variability across samples. In this work, a methodology based on supervised machine learning methods is proposed, allowing the determination of subsets of relevant factors that best discriminate samples. The methodology comprises three main phases: first, a ranking of splicing factors is determined by means of applying feature weighting algorithms; second, the best subset of factors that allows the induction of an accurate classifier is detected; then the confidence over the induced classifier is assessed by means of explaining the individual predictions. Finally, the utility and benefit of the proposed methodology are illustrated by means of analyzing a small dataset of neuroendocrine lung carcinoids, and the results showed that there exist small subsets of deregulated factors which can effectively distinguish between tumor samples and their respective adjacent non-tumor tissues.
{"title":"A Supervised Methodology for Analyzing Dysregulation in Splicing Machinery: An Application in Cancer Diagnosis","authors":"O. R. Pupo, R. Luque, J. Castaño, S. Ventura","doi":"10.1109/CBMS.2019.00035","DOIUrl":"https://doi.org/10.1109/CBMS.2019.00035","url":null,"abstract":"Deregulated splicing factors have shown to be associated with the development of several types of cancer and, therefore, the determination of such alterations can help the development of tumor-specific molecular targets for early prognosis and therapy. Determining the relevant splicing factors, however, is not a straightforward task mainly due to the heterogeneity of tumors and the variability across samples. In this work, a methodology based on supervised machine learning methods is proposed, allowing the determination of subsets of relevant factors that best discriminate samples. The methodology comprises three main phases: first, a ranking of splicing factors is determined by means of applying feature weighting algorithms; second, the best subset of factors that allows the induction of an accurate classifier is detected; then the confidence over the induced classifier is assessed by means of explaining the individual predictions. Finally, the utility and benefit of the proposed methodology are illustrated by means of analyzing a small dataset of neuroendocrine lung carcinoids, and the results showed that there exist small subsets of deregulated factors which can effectively distinguish between tumor samples and their respective adjacent non-tumor tissues.","PeriodicalId":311634,"journal":{"name":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132185608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anastasia Krithara, F. Aisopos, Vassiliki Rentoumi, A. Nentidis, K. Bougiatiotis, Maria-Esther Vidal, Ernestina Menasalvas Ruiz, A. R. González, E. Samaras, P. Garrard, M. Torrente, M. P. Pulla, Nikos Dimakopoulos, R. Mauricio, Jordi Rambla De Argila, G. Tartaglia, G. Paliouras
The vision of IASIS project is to turn the wave of big biomedical data heading our way into actionable knowledge for decision makers. This is achieved by integrating data from disparate sources, including genomics, electronic health records and bibliography, and applying advanced analytics methods to discover useful patterns. The goal is to turn large amounts of available data into actionable information to authorities for planning public health activities and policies. The integration and analysis of these heterogeneous sources of information will enable the best decisions to be made, allowing for diagnosis and treatment to be personalised to each individual. The project offers a common representation schema for the heterogeneous data sources. The iASiS infrastructure is able to convert clinical notes into usable data, combine them with genomic data, related bibliography, image data and more, and create a global knowledge base. This facilitates the use of intelligent methods in order to discover useful patterns across different resources. Using semantic integration of data gives the opportunity to generate information that is rich, auditable and reliable. This information can be used to provide better care, reduce errors and create more confidence in sharing data, thus providing more insights and opportunities. Data resources for two different disease categories are explored within the iASiS use cases, dementia and lung cancer.
{"title":"iASiS: Towards Heterogeneous Big Data Analysis for Personalized Medicine","authors":"Anastasia Krithara, F. Aisopos, Vassiliki Rentoumi, A. Nentidis, K. Bougiatiotis, Maria-Esther Vidal, Ernestina Menasalvas Ruiz, A. R. González, E. Samaras, P. Garrard, M. Torrente, M. P. Pulla, Nikos Dimakopoulos, R. Mauricio, Jordi Rambla De Argila, G. Tartaglia, G. Paliouras","doi":"10.1109/CBMS.2019.00032","DOIUrl":"https://doi.org/10.1109/CBMS.2019.00032","url":null,"abstract":"The vision of IASIS project is to turn the wave of big biomedical data heading our way into actionable knowledge for decision makers. This is achieved by integrating data from disparate sources, including genomics, electronic health records and bibliography, and applying advanced analytics methods to discover useful patterns. The goal is to turn large amounts of available data into actionable information to authorities for planning public health activities and policies. The integration and analysis of these heterogeneous sources of information will enable the best decisions to be made, allowing for diagnosis and treatment to be personalised to each individual. The project offers a common representation schema for the heterogeneous data sources. The iASiS infrastructure is able to convert clinical notes into usable data, combine them with genomic data, related bibliography, image data and more, and create a global knowledge base. This facilitates the use of intelligent methods in order to discover useful patterns across different resources. Using semantic integration of data gives the opportunity to generate information that is rich, auditable and reliable. This information can be used to provide better care, reduce errors and create more confidence in sharing data, thus providing more insights and opportunities. Data resources for two different disease categories are explored within the iASiS use cases, dementia and lung cancer.","PeriodicalId":311634,"journal":{"name":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131672641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chronic diseases require ongoing care to improve patients' quality of life. Large amounts of public and private investment are consumed in dealing with issues like employee absenteeism, early retirement and social spending. Nowadays, it is estimated that 12% of natural deaths occur suddenly of which 88% are of cardiac origin. Early heart beat anomalies detection plays a key role in preventing cardiac diseases. This paper proposes the use of time series data mining to extract relevant electrocardiogram (ECG) features to predict the probability of ventricular fibrillation (VF) events. Decision trees, k-nearest neighbors, support vector machines, logistic regression and neural networks have been applied to ECG data. Different feature sets have been proposed and evaluated combining different beat sequences lengths (1, 3, 6 or 9 beats), ECG data points (P, Q, R, S, T) and segments (PS, QT, ST, PR and RR). These data mining models could be implemented in computer-aided diagnosis (CAD) systems to evaluate long-term ECG data of a patient and identify VF events in advance.
{"title":"ECG Feature Extraction and Ventricular Fibrillation (VF) Prediction using Data Mining Techniques","authors":"Allan Calderon, A. Pérez-Pérez, J. P. Valente","doi":"10.1109/CBMS.2019.00014","DOIUrl":"https://doi.org/10.1109/CBMS.2019.00014","url":null,"abstract":"Chronic diseases require ongoing care to improve patients' quality of life. Large amounts of public and private investment are consumed in dealing with issues like employee absenteeism, early retirement and social spending. Nowadays, it is estimated that 12% of natural deaths occur suddenly of which 88% are of cardiac origin. Early heart beat anomalies detection plays a key role in preventing cardiac diseases. This paper proposes the use of time series data mining to extract relevant electrocardiogram (ECG) features to predict the probability of ventricular fibrillation (VF) events. Decision trees, k-nearest neighbors, support vector machines, logistic regression and neural networks have been applied to ECG data. Different feature sets have been proposed and evaluated combining different beat sequences lengths (1, 3, 6 or 9 beats), ECG data points (P, Q, R, S, T) and segments (PS, QT, ST, PR and RR). These data mining models could be implemented in computer-aided diagnosis (CAD) systems to evaluate long-term ECG data of a patient and identify VF events in advance.","PeriodicalId":311634,"journal":{"name":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131276866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
By a trajectory, in the medical world, we mean the sequence of clinical events that occur to a patient in some time frame, as implicitly stored in patients' Electronical Health Records. A set of trajectories can be summarized in a trajectory graph, whose paths contain the most common trajectories followed by patients. The graph contains events on its nodes and the edges indicate the temporal relations. Previous works on building trajectory graphs only allow for one event at each node, and conversely for an event type to appear in only one node, thus losing information and potentially mixing different groups of patients. Here we develop a procedure to extract the trajectory graphs that goes beyond both limitations, thus more accurately reflecting the original dataset. In addition, it is close to a notion of patient state which clinicians use intuitively, facilitating interpretation. We evaluate the procedure on two real-world datasets, one related to diagnostics at hospital admissions, and the other on prescriptions in intensive care units, with reasonable and potentially useful results. The method is described here in the medical context only, but it is of general applicability for sequences of events.
{"title":"Interpretable Patient Trajectories from Temporally Annotated Health Records","authors":"Martí Zamora, Ricard Gavaldà","doi":"10.1109/CBMS.2019.00112","DOIUrl":"https://doi.org/10.1109/CBMS.2019.00112","url":null,"abstract":"By a trajectory, in the medical world, we mean the sequence of clinical events that occur to a patient in some time frame, as implicitly stored in patients' Electronical Health Records. A set of trajectories can be summarized in a trajectory graph, whose paths contain the most common trajectories followed by patients. The graph contains events on its nodes and the edges indicate the temporal relations. Previous works on building trajectory graphs only allow for one event at each node, and conversely for an event type to appear in only one node, thus losing information and potentially mixing different groups of patients. Here we develop a procedure to extract the trajectory graphs that goes beyond both limitations, thus more accurately reflecting the original dataset. In addition, it is close to a notion of patient state which clinicians use intuitively, facilitating interpretation. We evaluate the procedure on two real-world datasets, one related to diagnostics at hospital admissions, and the other on prescriptions in intensive care units, with reasonable and potentially useful results. The method is described here in the medical context only, but it is of general applicability for sequences of events.","PeriodicalId":311634,"journal":{"name":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126588936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yihan Deng, Peter Dolog, Jörn-Markus Gass, K. Denecke
The postoperative health status of an obesity patient indicates the outcome of the surgical treatment. By each postoperative revisit, physicians need to go through the previous patient records to recall the patient status and to evaluate the postoperative risk of readmission. In order to support in this process, we develop a method to extract indicators and to analyse weight changes, so that potential complications and risks of clinical readmission can be recognized timely. In this paper, we will compare two approaches that are based on traditional machine learning and neural networks. Relevant aspects referring to a health status change or treatment-relevant aspects are extracted from the outpatient medical records as they are generated for each postoperative revisit. The performance of traditional machine learning on the task of obesity-related entity extraction is compared with one variation of attentive recurrent neural networks. The ensemble classifier of binary attentive bi-LSTM with the data balancing using conditional generative adversarial networks (CGAN) has achieved F1 measure of 86.5% on the task of classification of eight classes of obesity-related entities. We conclude that for processing a small data set using neural networks, a data balancing method should firstly be applied to achieve an extended corpus and a general representation, which can apparently increase the differentiability of the input data. A fine-tuning in the networks can provide further enhancement of the performance.
{"title":"Obesity Entity Extraction from Real Outpatient Records: When Learning-Based Methods Meet Small Imbalanced Medical Data Sets","authors":"Yihan Deng, Peter Dolog, Jörn-Markus Gass, K. Denecke","doi":"10.1109/CBMS.2019.00087","DOIUrl":"https://doi.org/10.1109/CBMS.2019.00087","url":null,"abstract":"The postoperative health status of an obesity patient indicates the outcome of the surgical treatment. By each postoperative revisit, physicians need to go through the previous patient records to recall the patient status and to evaluate the postoperative risk of readmission. In order to support in this process, we develop a method to extract indicators and to analyse weight changes, so that potential complications and risks of clinical readmission can be recognized timely. In this paper, we will compare two approaches that are based on traditional machine learning and neural networks. Relevant aspects referring to a health status change or treatment-relevant aspects are extracted from the outpatient medical records as they are generated for each postoperative revisit. The performance of traditional machine learning on the task of obesity-related entity extraction is compared with one variation of attentive recurrent neural networks. The ensemble classifier of binary attentive bi-LSTM with the data balancing using conditional generative adversarial networks (CGAN) has achieved F1 measure of 86.5% on the task of classification of eight classes of obesity-related entities. We conclude that for processing a small data set using neural networks, a data balancing method should firstly be applied to achieve an extended corpus and a general representation, which can apparently increase the differentiability of the input data. A fine-tuning in the networks can provide further enhancement of the performance.","PeriodicalId":311634,"journal":{"name":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121755594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jordi Carrere-Molina, L. Subirats, Jordi Casas-Roma
Single Nucleotide Polymorphisms (SNPs) are the most common inter-individual variations in the human being. They gained popularity with the irruption of Next Generation Sequencing (NGS) as disease biomarkers for diagnosis and/or prognosis using Genome-Wide Association Study. They are along the genome but mostly in the non-coding regions. In these cases, SNPs may affect regulatory regions, such as promoters, enhancers or microRNA (miRNA) binding sites. miRNAs are short non-coding RNAs, that are estimated to regulate up to 60% of gene expression at the post-transcriptional level. It is well known they are implied in many diseases by misregulating the expression of genes. New computational technologies allow extracting more information from RNA-Seq data, being able not only to measure the gene expression but also mapping SNPs on the genome. To understand and model the effects of this type of RNAs in disease phenotype, machine learning algorithms will be trained using SNPs located in the 3'UTR (UnTranslated Region) of deregulated genes to find biomarkers and describe the mechanism of action.
{"title":"Towards an Analysis of Post-Transcriptional Gene Regulation in Psoriasis via microRNAs using Machine Learning Algorithms","authors":"Jordi Carrere-Molina, L. Subirats, Jordi Casas-Roma","doi":"10.1109/CBMS.2019.00125","DOIUrl":"https://doi.org/10.1109/CBMS.2019.00125","url":null,"abstract":"Single Nucleotide Polymorphisms (SNPs) are the most common inter-individual variations in the human being. They gained popularity with the irruption of Next Generation Sequencing (NGS) as disease biomarkers for diagnosis and/or prognosis using Genome-Wide Association Study. They are along the genome but mostly in the non-coding regions. In these cases, SNPs may affect regulatory regions, such as promoters, enhancers or microRNA (miRNA) binding sites. miRNAs are short non-coding RNAs, that are estimated to regulate up to 60% of gene expression at the post-transcriptional level. It is well known they are implied in many diseases by misregulating the expression of genes. New computational technologies allow extracting more information from RNA-Seq data, being able not only to measure the gene expression but also mapping SNPs on the genome. To understand and model the effects of this type of RNAs in disease phenotype, machine learning algorithms will be trained using SNPs located in the 3'UTR (UnTranslated Region) of deregulated genes to find biomarkers and describe the mechanism of action.","PeriodicalId":311634,"journal":{"name":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115531738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}