Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995541
James Kemp, Christopher Barker, Norm M. Good, Michael Bain
Fraud and waste is a costly problem in medical insurance. Utilising sequence information for anomaly detection is under-explored in this domain. We present a multi-part method employing sequential pattern mining for identifying and grouping comparable courses of treatment, finding patterns within those courses, calculating the cost of possible additional or upcoded claims in unusual patterns, and ranking the providers based on potential recoverable costs. We applied this method to real-world radiation therapy data. Results were assessed by experts at the Australian Government Department of Health, and were found to be interpretable and informative. Previously unknown anomalous claim patterns were discovered, and confirmation of a previously suspected anomalous claim pattern was also obtained. Outlying providers each claimed up to ${$}$486,617.60 in potentially recoverable costs. Our method was able to identify anomalous claims as well as the patterns in which they were anomalous, making the results easily interpretable. The method is currently being implemented for another problem involving sequential data at the Department of Health.
{"title":"Sequential pattern detection for identifying courses of treatment and anomalous claim behaviour in medical insurance","authors":"James Kemp, Christopher Barker, Norm M. Good, Michael Bain","doi":"10.1109/BIBM55620.2022.9995541","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995541","url":null,"abstract":"Fraud and waste is a costly problem in medical insurance. Utilising sequence information for anomaly detection is under-explored in this domain. We present a multi-part method employing sequential pattern mining for identifying and grouping comparable courses of treatment, finding patterns within those courses, calculating the cost of possible additional or upcoded claims in unusual patterns, and ranking the providers based on potential recoverable costs. We applied this method to real-world radiation therapy data. Results were assessed by experts at the Australian Government Department of Health, and were found to be interpretable and informative. Previously unknown anomalous claim patterns were discovered, and confirmation of a previously suspected anomalous claim pattern was also obtained. Outlying providers each claimed up to ${$}$486,617.60 in potentially recoverable costs. Our method was able to identify anomalous claims as well as the patterns in which they were anomalous, making the results easily interpretable. The method is currently being implemented for another problem involving sequential data at the Department of Health.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115589684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995213
Xingyue Wang, Jiansheng Fang, Na Zeng, Jingqi Huang, Hanpei Miao, W. Kwapong, Ziyi Zhang, Shuting Zhang, Jiang Liu
Existing multi-view learning methods based on the information bottleneck principle exhibit impressing generalization by capturing inter-view consistency and complementarity. They leverage cross-view joint information (consistency) and view-specific information (complementarity) while discarding redundant information. By fusing visual features, multi-view learning methods help medical image processing to produce more reliable predictions. However, multi-views of medical images often have low consistency and high complementarity due to modal differences in imaging or different projection depths, thus challenging existing methods to balance them to the maximal extent. To mitigate such an issue, we improve the information bottleneck (IB) loss function with a balanced regularization term, termed IBB loss, reassembling the constraints of multi-view consistency and complementarity. In particular, the balanced regularization term with a unique trade-off factor in IBB loss helps minimize the mutual information on consistency and complementarity to strike a balance. In addition, we devise a triplet multi-view network named TM net to learn the consistent and complementary features from multi-view medical images. By evaluating two datasets, we demonstrate the superiority of our method against several counterparts. The extensive experiments also confirm that our IBB loss significantly improves multi-view learning in medical images.
{"title":"Reassembling Consistent-Complementary Constraints in Triplet Network for Multi-view Learning of Medical Images","authors":"Xingyue Wang, Jiansheng Fang, Na Zeng, Jingqi Huang, Hanpei Miao, W. Kwapong, Ziyi Zhang, Shuting Zhang, Jiang Liu","doi":"10.1109/BIBM55620.2022.9995213","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995213","url":null,"abstract":"Existing multi-view learning methods based on the information bottleneck principle exhibit impressing generalization by capturing inter-view consistency and complementarity. They leverage cross-view joint information (consistency) and view-specific information (complementarity) while discarding redundant information. By fusing visual features, multi-view learning methods help medical image processing to produce more reliable predictions. However, multi-views of medical images often have low consistency and high complementarity due to modal differences in imaging or different projection depths, thus challenging existing methods to balance them to the maximal extent. To mitigate such an issue, we improve the information bottleneck (IB) loss function with a balanced regularization term, termed IBB loss, reassembling the constraints of multi-view consistency and complementarity. In particular, the balanced regularization term with a unique trade-off factor in IBB loss helps minimize the mutual information on consistency and complementarity to strike a balance. In addition, we devise a triplet multi-view network named TM net to learn the consistent and complementary features from multi-view medical images. By evaluating two datasets, we demonstrate the superiority of our method against several counterparts. The extensive experiments also confirm that our IBB loss significantly improves multi-view learning in medical images.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115684957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9994959
Shoujia Zhang, Wei Li, Weidong Xie, Linjie Wang
In recent decades, the rapid development of gene sequencing and computer technology has increased the growth of high-dimensional microarray data. Some machine learning methods have been successfully applied to it to help classify cancer. In most cases, high dimensionality and the small sample size of microarray data restricted the performance of cancer classification. This problem usually issolved bysome feature selection methods. However, most of them neglect the exploitation of relations among genes. This paper proposes a novel feature selection method by fusing multiple gene relation network information based on community detection (MGRCD). The proposed method divides all genes into different communities. Then, the genes most associated with cancer classification are selected from each community. The proposed method satisfies both maximum relevances gene with cancer and minimum redundancy among genes for the selected optimal feature subset. The experiment results show that the proposed gene selection method can effectively improve classification performance.
{"title":"Feature Selection for Microarray Data via Community Detection Fusing Multiple Gene Relation Networks Information","authors":"Shoujia Zhang, Wei Li, Weidong Xie, Linjie Wang","doi":"10.1109/BIBM55620.2022.9994959","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9994959","url":null,"abstract":"In recent decades, the rapid development of gene sequencing and computer technology has increased the growth of high-dimensional microarray data. Some machine learning methods have been successfully applied to it to help classify cancer. In most cases, high dimensionality and the small sample size of microarray data restricted the performance of cancer classification. This problem usually issolved bysome feature selection methods. However, most of them neglect the exploitation of relations among genes. This paper proposes a novel feature selection method by fusing multiple gene relation network information based on community detection (MGRCD). The proposed method divides all genes into different communities. Then, the genes most associated with cancer classification are selected from each community. The proposed method satisfies both maximum relevances gene with cancer and minimum redundancy among genes for the selected optimal feature subset. The experiment results show that the proposed gene selection method can effectively improve classification performance.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114877097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9994950
Manna Xiao, Hulin Kuang, Jin Liu, Yan Zhang, Yizhen Xiang, Jianxin Wang
Resting-state functional magnetic resonance imaging (rs-fMRI) images have been widely used for diagnosis of schizophrenia. With rs-fMRI, most existing schizophrenia diagnostic methods have revealed schizophrenia’s functional abnormalities from the following three scales, i.e., regional neural activity alterations, functional connectivity abnormalities and brain network dysfunctions. However, many schizophrenia diagnosis methods do not consider the fusion of features from the three scales. In this study, we propose a schizophrenia diagnostic method based on multi-scale feature representation and ensemble learning. Firstly, features including the three scales (region, connectivity and network) are extracted from rs-fMRI images using the brainnetome atlas. For each scale, feature selection, i.e., least absolute shrinkage and selection operator, is applied to identify effective sub-features related to schizophrenia classification by a grid search. Then the selected sub-features of each scale are input to support vector machine with linear kernel to classify schizophrenia patients and healthy controls respectively. To further improve the schizophrenia diagnostic performance, an ensemble learning framework named E-RCN is proposed to average the probabilities obtained by the classifiers of each scale in decision level. By leave-one-out cross-validation on the center for biomedical research excellence dataset (COBRE), our proposed method achieves encouraging diagnosis performance, outperforming several state-of-the-art methods. In addition, ranked by the occurence frequency of each brain region within the leave-one-out cross-validation experiments, some brain regions related to schizophrenia, i.e., thalamus and middle temporal gyrus, and important elaborate subregions, i.e., Tha_L_8_8, MTG_L_4_4 and MTG_R_4_4, are found.
{"title":"Integrating Multi-scale Feature Representation and Ensemble Learning for Schizophrenia Diagnosis","authors":"Manna Xiao, Hulin Kuang, Jin Liu, Yan Zhang, Yizhen Xiang, Jianxin Wang","doi":"10.1109/BIBM55620.2022.9994950","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9994950","url":null,"abstract":"Resting-state functional magnetic resonance imaging (rs-fMRI) images have been widely used for diagnosis of schizophrenia. With rs-fMRI, most existing schizophrenia diagnostic methods have revealed schizophrenia’s functional abnormalities from the following three scales, i.e., regional neural activity alterations, functional connectivity abnormalities and brain network dysfunctions. However, many schizophrenia diagnosis methods do not consider the fusion of features from the three scales. In this study, we propose a schizophrenia diagnostic method based on multi-scale feature representation and ensemble learning. Firstly, features including the three scales (region, connectivity and network) are extracted from rs-fMRI images using the brainnetome atlas. For each scale, feature selection, i.e., least absolute shrinkage and selection operator, is applied to identify effective sub-features related to schizophrenia classification by a grid search. Then the selected sub-features of each scale are input to support vector machine with linear kernel to classify schizophrenia patients and healthy controls respectively. To further improve the schizophrenia diagnostic performance, an ensemble learning framework named E-RCN is proposed to average the probabilities obtained by the classifiers of each scale in decision level. By leave-one-out cross-validation on the center for biomedical research excellence dataset (COBRE), our proposed method achieves encouraging diagnosis performance, outperforming several state-of-the-art methods. In addition, ranked by the occurence frequency of each brain region within the leave-one-out cross-validation experiments, some brain regions related to schizophrenia, i.e., thalamus and middle temporal gyrus, and important elaborate subregions, i.e., Tha_L_8_8, MTG_L_4_4 and MTG_R_4_4, are found.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117268811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9994999
Xu Zhao, Yuxin Kang, Hansheng Li, Jiayu Luo, Lei Cui, Jun Feng, Lin Yang
Deep convolutional neural networks (DCNNs) significantly improve the performance of medical image segmentation. Nevertheless, medical images frequently experience distribution discrepancies, which fails to maintain their robustness when applying trained models to unseen clinical data. To address this problem, domain generalization methods were proposed to enhance the generalization ability of DCNNs. Feature space-based data augmentation methods have proven their effectiveness to improve domain generalization. However, existing methods still mainly rely on certain prior knowledge or assumption, which has limitations in enriching the diversity of source domain data. In this paper, we propose a random feature augmentation (RFA) method to diversify source domain data at the feature level without prior knowledge. Specifically, we explore the effectiveness of random convolution at the feature level for the first time and prove experimentallyt hat itc an adequately preserve domain-invariant information while perturbing domainspecific information. Furthermore, tocapture the same domain-invariant information from the augmented features of RFA, we present a domain-invariant consistent learning strategy to enable DCNNs to learn a more generalized representation. Our proposed method achieves state-of-the-art performance on two medical image segmentation tasks, including optic cup/disc segmentation on fundus images and prostate segmentation on MRI images.
{"title":"A Random Feature Augmentation for Domain Generalization in Medical Image Segmentation","authors":"Xu Zhao, Yuxin Kang, Hansheng Li, Jiayu Luo, Lei Cui, Jun Feng, Lin Yang","doi":"10.1109/BIBM55620.2022.9994999","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9994999","url":null,"abstract":"Deep convolutional neural networks (DCNNs) significantly improve the performance of medical image segmentation. Nevertheless, medical images frequently experience distribution discrepancies, which fails to maintain their robustness when applying trained models to unseen clinical data. To address this problem, domain generalization methods were proposed to enhance the generalization ability of DCNNs. Feature space-based data augmentation methods have proven their effectiveness to improve domain generalization. However, existing methods still mainly rely on certain prior knowledge or assumption, which has limitations in enriching the diversity of source domain data. In this paper, we propose a random feature augmentation (RFA) method to diversify source domain data at the feature level without prior knowledge. Specifically, we explore the effectiveness of random convolution at the feature level for the first time and prove experimentallyt hat itc an adequately preserve domain-invariant information while perturbing domainspecific information. Furthermore, tocapture the same domain-invariant information from the augmented features of RFA, we present a domain-invariant consistent learning strategy to enable DCNNs to learn a more generalized representation. Our proposed method achieves state-of-the-art performance on two medical image segmentation tasks, including optic cup/disc segmentation on fundus images and prostate segmentation on MRI images.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123223379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995079
Lu Zhao, Liming Yuan, Zhenliang Li, Xianbin Wen
Multi-Instance Learning (MIL) is a weakly supervised learning paradigm, in which every training example is a labeled bag of unlabeled instances. In typical MIL applications, instances are often used for describing the features of regions/parts in a whole object, e.g., regional patches/lesions in an eye-fundus image. However, for a (semantically) complex part the standard MIL formulation puts a heavy burden on the representation ability of the corresponding instance. To alleviate this pressure, we still adopt a bag-of-instances as an example in this paper, but extract from each instance a set of representations using $1 times1$ convolutions. The advantages of this tactic are two-fold: i) This set of representations can be regarded as multi-view representations for an instance; ii) Compared to building multi-view representations directly from scratch, extracting them automatically using $1 times1$ convolutions is more economical, and may be more effective since $1 times1$ convolutions can be embedded into the whole network. Furthermore, we apply two consecutive multi-instance pooling operations on the reconstituted bag that has actually become a bag of sets of multi-view representations. We have conducted extensive experiments on several canonical MIL data sets from different application domains. The experimental results show that the proposed framework outperforms the standard MIL formulation in terms of classification performance and has good interpretability.
{"title":"Multi-View Representation Learning for Multi-Instance Learning with Applications to Medical Image Classification","authors":"Lu Zhao, Liming Yuan, Zhenliang Li, Xianbin Wen","doi":"10.1109/BIBM55620.2022.9995079","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995079","url":null,"abstract":"Multi-Instance Learning (MIL) is a weakly supervised learning paradigm, in which every training example is a labeled bag of unlabeled instances. In typical MIL applications, instances are often used for describing the features of regions/parts in a whole object, e.g., regional patches/lesions in an eye-fundus image. However, for a (semantically) complex part the standard MIL formulation puts a heavy burden on the representation ability of the corresponding instance. To alleviate this pressure, we still adopt a bag-of-instances as an example in this paper, but extract from each instance a set of representations using $1 times1$ convolutions. The advantages of this tactic are two-fold: i) This set of representations can be regarded as multi-view representations for an instance; ii) Compared to building multi-view representations directly from scratch, extracting them automatically using $1 times1$ convolutions is more economical, and may be more effective since $1 times1$ convolutions can be embedded into the whole network. Furthermore, we apply two consecutive multi-instance pooling operations on the reconstituted bag that has actually become a bag of sets of multi-view representations. We have conducted extensive experiments on several canonical MIL data sets from different application domains. The experimental results show that the proposed framework outperforms the standard MIL formulation in terms of classification performance and has good interpretability.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"430 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121965445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995366
Jianyu Shi, Xiaohong Liu, Guoxing Yang, Guangyu Wang
Computed tomography (CT) is one of the most imaging methods widely used to locate lesions such as nodules, tumors, and cysts, and make primary diagnosis. For clearer imaging of anatomical or lesions, contrast-enhanced CT (CECT) scans are imaging with injecting a contrast agent into a patient during examination. But there are limits to iodine contrast injections so that CECT scans are not convenient like non-contrast enhanced CT (NECT). Recently, deep learning models bring impressive results in computer vision, including image translation. So, we would like to apply image translation methods to generate CECT images from the more accessible NECT images, and evaluate the effects of generated images on image detection tasks. In this study, we propose a method called cross-modal enhancement training strategy for thyroid anatomy detection, which employs CycleGAN to translate non-constrast enhanced CT images to enhanced CT style images with content reserved. The experiments are conducted on thyroid CT images with anatomy object annotation. The experimental results show that by adding translated images into the training dataset, the performance of thyroid anatomy detection can be effectively improved. We achieve the best mAP of 82.5% compared to 73.2% in the along non-contrast enhanced CT training.
{"title":"Enhanced CT Image Generation by GAN for Improving Thyroid Anatomy Detection","authors":"Jianyu Shi, Xiaohong Liu, Guoxing Yang, Guangyu Wang","doi":"10.1109/BIBM55620.2022.9995366","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995366","url":null,"abstract":"Computed tomography (CT) is one of the most imaging methods widely used to locate lesions such as nodules, tumors, and cysts, and make primary diagnosis. For clearer imaging of anatomical or lesions, contrast-enhanced CT (CECT) scans are imaging with injecting a contrast agent into a patient during examination. But there are limits to iodine contrast injections so that CECT scans are not convenient like non-contrast enhanced CT (NECT). Recently, deep learning models bring impressive results in computer vision, including image translation. So, we would like to apply image translation methods to generate CECT images from the more accessible NECT images, and evaluate the effects of generated images on image detection tasks. In this study, we propose a method called cross-modal enhancement training strategy for thyroid anatomy detection, which employs CycleGAN to translate non-constrast enhanced CT images to enhanced CT style images with content reserved. The experiments are conducted on thyroid CT images with anatomy object annotation. The experimental results show that by adding translated images into the training dataset, the performance of thyroid anatomy detection can be effectively improved. We achieve the best mAP of 82.5% compared to 73.2% in the along non-contrast enhanced CT training.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116991912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995466
Peng Zhang, Shikui Tu
Predicting the synergistic effects of drug combinations can accelerate the identification process of novel potential combination therapies for clinical studies. Although extensive efforts have been made in the field, the problem is still challenging due to the high sparsity of drug combinations’ synergy data and the existence of false positive combinations resulted from the noise in experiments. In this paper, we develop a Knowledge Graph Embedding-based method for predicting the synergistic effects of Drug Combinations, namely KGE-DC, which fully extracts the features of drug combinations. Firstly, a largescale knowledge graph including drugs, targets, enzymes and transporters is constructed, therefore, the sparsity of the drug combinations’ data is reduced and the reliability of the data is increased. Then, knowledge graph embedding, which are capable of capturing complex semantic information of various entities in the knowledge graph, is adopted for learning low-dimensional representations for the drugs and cell lines. Finally, the synergy scores of drug combinations are predicted based on the drug and cell line embeddings of the drug combinations’ synergy data. Extensive experiments on benchmark dataset with four different synergy types demonstrate that KGE-DC outperforms state-of the-art methods on both the regression and classification tasks, namely predicting the synergy scores of drug combinations and predicting whether the drug combinations are synergistic combinations. Our results indicate that KGE-DC is a valuable tool to facilitate the discovery of novel combination therapies for cancer treatment. The implemented code and experimental dataset are available online at https://github.com/yushenshashen/KGE-DC.
{"title":"A knowledge graph embedding-based method for predicting the synergistic effects of drug combinations","authors":"Peng Zhang, Shikui Tu","doi":"10.1109/BIBM55620.2022.9995466","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995466","url":null,"abstract":"Predicting the synergistic effects of drug combinations can accelerate the identification process of novel potential combination therapies for clinical studies. Although extensive efforts have been made in the field, the problem is still challenging due to the high sparsity of drug combinations’ synergy data and the existence of false positive combinations resulted from the noise in experiments. In this paper, we develop a Knowledge Graph Embedding-based method for predicting the synergistic effects of Drug Combinations, namely KGE-DC, which fully extracts the features of drug combinations. Firstly, a largescale knowledge graph including drugs, targets, enzymes and transporters is constructed, therefore, the sparsity of the drug combinations’ data is reduced and the reliability of the data is increased. Then, knowledge graph embedding, which are capable of capturing complex semantic information of various entities in the knowledge graph, is adopted for learning low-dimensional representations for the drugs and cell lines. Finally, the synergy scores of drug combinations are predicted based on the drug and cell line embeddings of the drug combinations’ synergy data. Extensive experiments on benchmark dataset with four different synergy types demonstrate that KGE-DC outperforms state-of the-art methods on both the regression and classification tasks, namely predicting the synergy scores of drug combinations and predicting whether the drug combinations are synergistic combinations. Our results indicate that KGE-DC is a valuable tool to facilitate the discovery of novel combination therapies for cancer treatment. The implemented code and experimental dataset are available online at https://github.com/yushenshashen/KGE-DC.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124729181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995528
Yupei Zhang, Yunan Xu, Rui An, Yuxin Li, Shuhui Liu, Xuequn Shang
This paper proposes a representation learning model to identify task-state fMRIs for knowledge-concept recognition, which has the potential to model the human cognitive expression system. The traditional CNN-LSTM is usually employed to learn deep features from fMRIs, where CNN aims at extracting the spatial structure and LSTM accounts for the temporal structure. However, the manifold smoothness of the latent features caused by the fMRI sequence is often ignored, leading to unsteady data representation. In this paper, we model latent features as a hidden Markov chain and introduce a Markov-guided Spatio-Temporal Network (MSTNet) for brain image representation. Concretely, MSTNet has three parts: CNN that aims to learn latent features from 3D fMRI frames where a Markov Regularization enforces the neighborhood frames to have similar features, LSTM integrates all frames of an fMRI sequence into a feature vector and fully connected network (FCN) that is to implement the brain image classification. Our model is trained towards minimizing the cross entropy (CE) loss. Our experiment is conducted on the brain fMRI datasets achieved by scanning college students when they were learning five concepts of computer science. The results show that the proposed MSTNet can benefit from the introduced Markov regularization and thus result in improved performance on the brain activity classification. This study not only shows an effective fMRI classification model with Markov regularization but also provides the potential to understand brain intelligence and help patients with language disabilities.
{"title":"Markov Guided Spatio-Temporal Networks for Brain Image Classification*","authors":"Yupei Zhang, Yunan Xu, Rui An, Yuxin Li, Shuhui Liu, Xuequn Shang","doi":"10.1109/BIBM55620.2022.9995528","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995528","url":null,"abstract":"This paper proposes a representation learning model to identify task-state fMRIs for knowledge-concept recognition, which has the potential to model the human cognitive expression system. The traditional CNN-LSTM is usually employed to learn deep features from fMRIs, where CNN aims at extracting the spatial structure and LSTM accounts for the temporal structure. However, the manifold smoothness of the latent features caused by the fMRI sequence is often ignored, leading to unsteady data representation. In this paper, we model latent features as a hidden Markov chain and introduce a Markov-guided Spatio-Temporal Network (MSTNet) for brain image representation. Concretely, MSTNet has three parts: CNN that aims to learn latent features from 3D fMRI frames where a Markov Regularization enforces the neighborhood frames to have similar features, LSTM integrates all frames of an fMRI sequence into a feature vector and fully connected network (FCN) that is to implement the brain image classification. Our model is trained towards minimizing the cross entropy (CE) loss. Our experiment is conducted on the brain fMRI datasets achieved by scanning college students when they were learning five concepts of computer science. The results show that the proposed MSTNet can benefit from the introduced Markov regularization and thus result in improved performance on the brain activity classification. This study not only shows an effective fMRI classification model with Markov regularization but also provides the potential to understand brain intelligence and help patients with language disabilities.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124730893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995126
F. Milicchio, Marco Oliva, Mattia C. F. Prosperi
Advances in next-generation sequencing (NGS) have not only increased the overall throughput of genomic content (e.g. Illumina NovaSeq up to 6, 000GB), but also provided technology miniaturization (e.g. Oxford Nanopore MinION) enabling real-time, mobile experiments. Single Instruction/Multiple Data (SIMD) hardware acceleration is increasingly used to improve performance of NGS data processing tools, while generic template programming libraries are advantageous to adapt to the fast changes in sequencing and computing platforms. We here present a novel k-mer parser written in ISO C++ that exploits an interleaved, non-sequential, hardware accelerated SIMD implementation within a generic programming framework called libseq. We benchmarked our k-mer parser using different NGS experimental datasets comparing with other two popular k-mer counting tools (DSK and KMC3). On an Intel machine with AVX2 (Quad-Core Intel Core i5 CPU, 32 GB RAM), using simulated in-memory reads, DSK and KMC3 were on average 3. 6x and 1. 03x times slower than our parser across k value ranges of 35-63. On real sequencing experiments, DSK and KMC3 were on average 8. 3x and 28. 8x times slower in file/read parsing and k-mer building than ours. Since our tool uses generic programming, other methods that rely on k-mers (e.g. de Bruijn graphs) can directly benefit from its SIMD acceleration. Our k-mer parser and libseq 2.0 are released under the BSD license and available at https://zenodo.org/record/7015294.
{"title":"An interleaved hardware-accelerated k-mer parser","authors":"F. Milicchio, Marco Oliva, Mattia C. F. Prosperi","doi":"10.1109/BIBM55620.2022.9995126","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995126","url":null,"abstract":"Advances in next-generation sequencing (NGS) have not only increased the overall throughput of genomic content (e.g. Illumina NovaSeq up to 6, 000GB), but also provided technology miniaturization (e.g. Oxford Nanopore MinION) enabling real-time, mobile experiments. Single Instruction/Multiple Data (SIMD) hardware acceleration is increasingly used to improve performance of NGS data processing tools, while generic template programming libraries are advantageous to adapt to the fast changes in sequencing and computing platforms. We here present a novel k-mer parser written in ISO C++ that exploits an interleaved, non-sequential, hardware accelerated SIMD implementation within a generic programming framework called libseq. We benchmarked our k-mer parser using different NGS experimental datasets comparing with other two popular k-mer counting tools (DSK and KMC3). On an Intel machine with AVX2 (Quad-Core Intel Core i5 CPU, 32 GB RAM), using simulated in-memory reads, DSK and KMC3 were on average 3. 6x and 1. 03x times slower than our parser across k value ranges of 35-63. On real sequencing experiments, DSK and KMC3 were on average 8. 3x and 28. 8x times slower in file/read parsing and k-mer building than ours. Since our tool uses generic programming, other methods that rely on k-mers (e.g. de Bruijn graphs) can directly benefit from its SIMD acceleration. Our k-mer parser and libseq 2.0 are released under the BSD license and available at https://zenodo.org/record/7015294.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129419395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}