Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9994950
Manna Xiao, Hulin Kuang, Jin Liu, Yan Zhang, Yizhen Xiang, Jianxin Wang
Resting-state functional magnetic resonance imaging (rs-fMRI) images have been widely used for diagnosis of schizophrenia. With rs-fMRI, most existing schizophrenia diagnostic methods have revealed schizophrenia’s functional abnormalities from the following three scales, i.e., regional neural activity alterations, functional connectivity abnormalities and brain network dysfunctions. However, many schizophrenia diagnosis methods do not consider the fusion of features from the three scales. In this study, we propose a schizophrenia diagnostic method based on multi-scale feature representation and ensemble learning. Firstly, features including the three scales (region, connectivity and network) are extracted from rs-fMRI images using the brainnetome atlas. For each scale, feature selection, i.e., least absolute shrinkage and selection operator, is applied to identify effective sub-features related to schizophrenia classification by a grid search. Then the selected sub-features of each scale are input to support vector machine with linear kernel to classify schizophrenia patients and healthy controls respectively. To further improve the schizophrenia diagnostic performance, an ensemble learning framework named E-RCN is proposed to average the probabilities obtained by the classifiers of each scale in decision level. By leave-one-out cross-validation on the center for biomedical research excellence dataset (COBRE), our proposed method achieves encouraging diagnosis performance, outperforming several state-of-the-art methods. In addition, ranked by the occurence frequency of each brain region within the leave-one-out cross-validation experiments, some brain regions related to schizophrenia, i.e., thalamus and middle temporal gyrus, and important elaborate subregions, i.e., Tha_L_8_8, MTG_L_4_4 and MTG_R_4_4, are found.
{"title":"Integrating Multi-scale Feature Representation and Ensemble Learning for Schizophrenia Diagnosis","authors":"Manna Xiao, Hulin Kuang, Jin Liu, Yan Zhang, Yizhen Xiang, Jianxin Wang","doi":"10.1109/BIBM55620.2022.9994950","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9994950","url":null,"abstract":"Resting-state functional magnetic resonance imaging (rs-fMRI) images have been widely used for diagnosis of schizophrenia. With rs-fMRI, most existing schizophrenia diagnostic methods have revealed schizophrenia’s functional abnormalities from the following three scales, i.e., regional neural activity alterations, functional connectivity abnormalities and brain network dysfunctions. However, many schizophrenia diagnosis methods do not consider the fusion of features from the three scales. In this study, we propose a schizophrenia diagnostic method based on multi-scale feature representation and ensemble learning. Firstly, features including the three scales (region, connectivity and network) are extracted from rs-fMRI images using the brainnetome atlas. For each scale, feature selection, i.e., least absolute shrinkage and selection operator, is applied to identify effective sub-features related to schizophrenia classification by a grid search. Then the selected sub-features of each scale are input to support vector machine with linear kernel to classify schizophrenia patients and healthy controls respectively. To further improve the schizophrenia diagnostic performance, an ensemble learning framework named E-RCN is proposed to average the probabilities obtained by the classifiers of each scale in decision level. By leave-one-out cross-validation on the center for biomedical research excellence dataset (COBRE), our proposed method achieves encouraging diagnosis performance, outperforming several state-of-the-art methods. In addition, ranked by the occurence frequency of each brain region within the leave-one-out cross-validation experiments, some brain regions related to schizophrenia, i.e., thalamus and middle temporal gyrus, and important elaborate subregions, i.e., Tha_L_8_8, MTG_L_4_4 and MTG_R_4_4, are found.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117268811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9994999
Xu Zhao, Yuxin Kang, Hansheng Li, Jiayu Luo, Lei Cui, Jun Feng, Lin Yang
Deep convolutional neural networks (DCNNs) significantly improve the performance of medical image segmentation. Nevertheless, medical images frequently experience distribution discrepancies, which fails to maintain their robustness when applying trained models to unseen clinical data. To address this problem, domain generalization methods were proposed to enhance the generalization ability of DCNNs. Feature space-based data augmentation methods have proven their effectiveness to improve domain generalization. However, existing methods still mainly rely on certain prior knowledge or assumption, which has limitations in enriching the diversity of source domain data. In this paper, we propose a random feature augmentation (RFA) method to diversify source domain data at the feature level without prior knowledge. Specifically, we explore the effectiveness of random convolution at the feature level for the first time and prove experimentallyt hat itc an adequately preserve domain-invariant information while perturbing domainspecific information. Furthermore, tocapture the same domain-invariant information from the augmented features of RFA, we present a domain-invariant consistent learning strategy to enable DCNNs to learn a more generalized representation. Our proposed method achieves state-of-the-art performance on two medical image segmentation tasks, including optic cup/disc segmentation on fundus images and prostate segmentation on MRI images.
{"title":"A Random Feature Augmentation for Domain Generalization in Medical Image Segmentation","authors":"Xu Zhao, Yuxin Kang, Hansheng Li, Jiayu Luo, Lei Cui, Jun Feng, Lin Yang","doi":"10.1109/BIBM55620.2022.9994999","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9994999","url":null,"abstract":"Deep convolutional neural networks (DCNNs) significantly improve the performance of medical image segmentation. Nevertheless, medical images frequently experience distribution discrepancies, which fails to maintain their robustness when applying trained models to unseen clinical data. To address this problem, domain generalization methods were proposed to enhance the generalization ability of DCNNs. Feature space-based data augmentation methods have proven their effectiveness to improve domain generalization. However, existing methods still mainly rely on certain prior knowledge or assumption, which has limitations in enriching the diversity of source domain data. In this paper, we propose a random feature augmentation (RFA) method to diversify source domain data at the feature level without prior knowledge. Specifically, we explore the effectiveness of random convolution at the feature level for the first time and prove experimentallyt hat itc an adequately preserve domain-invariant information while perturbing domainspecific information. Furthermore, tocapture the same domain-invariant information from the augmented features of RFA, we present a domain-invariant consistent learning strategy to enable DCNNs to learn a more generalized representation. Our proposed method achieves state-of-the-art performance on two medical image segmentation tasks, including optic cup/disc segmentation on fundus images and prostate segmentation on MRI images.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123223379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995079
Lu Zhao, Liming Yuan, Zhenliang Li, Xianbin Wen
Multi-Instance Learning (MIL) is a weakly supervised learning paradigm, in which every training example is a labeled bag of unlabeled instances. In typical MIL applications, instances are often used for describing the features of regions/parts in a whole object, e.g., regional patches/lesions in an eye-fundus image. However, for a (semantically) complex part the standard MIL formulation puts a heavy burden on the representation ability of the corresponding instance. To alleviate this pressure, we still adopt a bag-of-instances as an example in this paper, but extract from each instance a set of representations using $1 times1$ convolutions. The advantages of this tactic are two-fold: i) This set of representations can be regarded as multi-view representations for an instance; ii) Compared to building multi-view representations directly from scratch, extracting them automatically using $1 times1$ convolutions is more economical, and may be more effective since $1 times1$ convolutions can be embedded into the whole network. Furthermore, we apply two consecutive multi-instance pooling operations on the reconstituted bag that has actually become a bag of sets of multi-view representations. We have conducted extensive experiments on several canonical MIL data sets from different application domains. The experimental results show that the proposed framework outperforms the standard MIL formulation in terms of classification performance and has good interpretability.
{"title":"Multi-View Representation Learning for Multi-Instance Learning with Applications to Medical Image Classification","authors":"Lu Zhao, Liming Yuan, Zhenliang Li, Xianbin Wen","doi":"10.1109/BIBM55620.2022.9995079","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995079","url":null,"abstract":"Multi-Instance Learning (MIL) is a weakly supervised learning paradigm, in which every training example is a labeled bag of unlabeled instances. In typical MIL applications, instances are often used for describing the features of regions/parts in a whole object, e.g., regional patches/lesions in an eye-fundus image. However, for a (semantically) complex part the standard MIL formulation puts a heavy burden on the representation ability of the corresponding instance. To alleviate this pressure, we still adopt a bag-of-instances as an example in this paper, but extract from each instance a set of representations using $1 times1$ convolutions. The advantages of this tactic are two-fold: i) This set of representations can be regarded as multi-view representations for an instance; ii) Compared to building multi-view representations directly from scratch, extracting them automatically using $1 times1$ convolutions is more economical, and may be more effective since $1 times1$ convolutions can be embedded into the whole network. Furthermore, we apply two consecutive multi-instance pooling operations on the reconstituted bag that has actually become a bag of sets of multi-view representations. We have conducted extensive experiments on several canonical MIL data sets from different application domains. The experimental results show that the proposed framework outperforms the standard MIL formulation in terms of classification performance and has good interpretability.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121965445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995366
Jianyu Shi, Xiaohong Liu, Guoxing Yang, Guangyu Wang
Computed tomography (CT) is one of the most imaging methods widely used to locate lesions such as nodules, tumors, and cysts, and make primary diagnosis. For clearer imaging of anatomical or lesions, contrast-enhanced CT (CECT) scans are imaging with injecting a contrast agent into a patient during examination. But there are limits to iodine contrast injections so that CECT scans are not convenient like non-contrast enhanced CT (NECT). Recently, deep learning models bring impressive results in computer vision, including image translation. So, we would like to apply image translation methods to generate CECT images from the more accessible NECT images, and evaluate the effects of generated images on image detection tasks. In this study, we propose a method called cross-modal enhancement training strategy for thyroid anatomy detection, which employs CycleGAN to translate non-constrast enhanced CT images to enhanced CT style images with content reserved. The experiments are conducted on thyroid CT images with anatomy object annotation. The experimental results show that by adding translated images into the training dataset, the performance of thyroid anatomy detection can be effectively improved. We achieve the best mAP of 82.5% compared to 73.2% in the along non-contrast enhanced CT training.
{"title":"Enhanced CT Image Generation by GAN for Improving Thyroid Anatomy Detection","authors":"Jianyu Shi, Xiaohong Liu, Guoxing Yang, Guangyu Wang","doi":"10.1109/BIBM55620.2022.9995366","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995366","url":null,"abstract":"Computed tomography (CT) is one of the most imaging methods widely used to locate lesions such as nodules, tumors, and cysts, and make primary diagnosis. For clearer imaging of anatomical or lesions, contrast-enhanced CT (CECT) scans are imaging with injecting a contrast agent into a patient during examination. But there are limits to iodine contrast injections so that CECT scans are not convenient like non-contrast enhanced CT (NECT). Recently, deep learning models bring impressive results in computer vision, including image translation. So, we would like to apply image translation methods to generate CECT images from the more accessible NECT images, and evaluate the effects of generated images on image detection tasks. In this study, we propose a method called cross-modal enhancement training strategy for thyroid anatomy detection, which employs CycleGAN to translate non-constrast enhanced CT images to enhanced CT style images with content reserved. The experiments are conducted on thyroid CT images with anatomy object annotation. The experimental results show that by adding translated images into the training dataset, the performance of thyroid anatomy detection can be effectively improved. We achieve the best mAP of 82.5% compared to 73.2% in the along non-contrast enhanced CT training.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116991912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995466
Peng Zhang, Shikui Tu
Predicting the synergistic effects of drug combinations can accelerate the identification process of novel potential combination therapies for clinical studies. Although extensive efforts have been made in the field, the problem is still challenging due to the high sparsity of drug combinations’ synergy data and the existence of false positive combinations resulted from the noise in experiments. In this paper, we develop a Knowledge Graph Embedding-based method for predicting the synergistic effects of Drug Combinations, namely KGE-DC, which fully extracts the features of drug combinations. Firstly, a largescale knowledge graph including drugs, targets, enzymes and transporters is constructed, therefore, the sparsity of the drug combinations’ data is reduced and the reliability of the data is increased. Then, knowledge graph embedding, which are capable of capturing complex semantic information of various entities in the knowledge graph, is adopted for learning low-dimensional representations for the drugs and cell lines. Finally, the synergy scores of drug combinations are predicted based on the drug and cell line embeddings of the drug combinations’ synergy data. Extensive experiments on benchmark dataset with four different synergy types demonstrate that KGE-DC outperforms state-of the-art methods on both the regression and classification tasks, namely predicting the synergy scores of drug combinations and predicting whether the drug combinations are synergistic combinations. Our results indicate that KGE-DC is a valuable tool to facilitate the discovery of novel combination therapies for cancer treatment. The implemented code and experimental dataset are available online at https://github.com/yushenshashen/KGE-DC.
{"title":"A knowledge graph embedding-based method for predicting the synergistic effects of drug combinations","authors":"Peng Zhang, Shikui Tu","doi":"10.1109/BIBM55620.2022.9995466","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995466","url":null,"abstract":"Predicting the synergistic effects of drug combinations can accelerate the identification process of novel potential combination therapies for clinical studies. Although extensive efforts have been made in the field, the problem is still challenging due to the high sparsity of drug combinations’ synergy data and the existence of false positive combinations resulted from the noise in experiments. In this paper, we develop a Knowledge Graph Embedding-based method for predicting the synergistic effects of Drug Combinations, namely KGE-DC, which fully extracts the features of drug combinations. Firstly, a largescale knowledge graph including drugs, targets, enzymes and transporters is constructed, therefore, the sparsity of the drug combinations’ data is reduced and the reliability of the data is increased. Then, knowledge graph embedding, which are capable of capturing complex semantic information of various entities in the knowledge graph, is adopted for learning low-dimensional representations for the drugs and cell lines. Finally, the synergy scores of drug combinations are predicted based on the drug and cell line embeddings of the drug combinations’ synergy data. Extensive experiments on benchmark dataset with four different synergy types demonstrate that KGE-DC outperforms state-of the-art methods on both the regression and classification tasks, namely predicting the synergy scores of drug combinations and predicting whether the drug combinations are synergistic combinations. Our results indicate that KGE-DC is a valuable tool to facilitate the discovery of novel combination therapies for cancer treatment. The implemented code and experimental dataset are available online at https://github.com/yushenshashen/KGE-DC.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124729181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995528
Yupei Zhang, Yunan Xu, Rui An, Yuxin Li, Shuhui Liu, Xuequn Shang
This paper proposes a representation learning model to identify task-state fMRIs for knowledge-concept recognition, which has the potential to model the human cognitive expression system. The traditional CNN-LSTM is usually employed to learn deep features from fMRIs, where CNN aims at extracting the spatial structure and LSTM accounts for the temporal structure. However, the manifold smoothness of the latent features caused by the fMRI sequence is often ignored, leading to unsteady data representation. In this paper, we model latent features as a hidden Markov chain and introduce a Markov-guided Spatio-Temporal Network (MSTNet) for brain image representation. Concretely, MSTNet has three parts: CNN that aims to learn latent features from 3D fMRI frames where a Markov Regularization enforces the neighborhood frames to have similar features, LSTM integrates all frames of an fMRI sequence into a feature vector and fully connected network (FCN) that is to implement the brain image classification. Our model is trained towards minimizing the cross entropy (CE) loss. Our experiment is conducted on the brain fMRI datasets achieved by scanning college students when they were learning five concepts of computer science. The results show that the proposed MSTNet can benefit from the introduced Markov regularization and thus result in improved performance on the brain activity classification. This study not only shows an effective fMRI classification model with Markov regularization but also provides the potential to understand brain intelligence and help patients with language disabilities.
{"title":"Markov Guided Spatio-Temporal Networks for Brain Image Classification*","authors":"Yupei Zhang, Yunan Xu, Rui An, Yuxin Li, Shuhui Liu, Xuequn Shang","doi":"10.1109/BIBM55620.2022.9995528","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995528","url":null,"abstract":"This paper proposes a representation learning model to identify task-state fMRIs for knowledge-concept recognition, which has the potential to model the human cognitive expression system. The traditional CNN-LSTM is usually employed to learn deep features from fMRIs, where CNN aims at extracting the spatial structure and LSTM accounts for the temporal structure. However, the manifold smoothness of the latent features caused by the fMRI sequence is often ignored, leading to unsteady data representation. In this paper, we model latent features as a hidden Markov chain and introduce a Markov-guided Spatio-Temporal Network (MSTNet) for brain image representation. Concretely, MSTNet has three parts: CNN that aims to learn latent features from 3D fMRI frames where a Markov Regularization enforces the neighborhood frames to have similar features, LSTM integrates all frames of an fMRI sequence into a feature vector and fully connected network (FCN) that is to implement the brain image classification. Our model is trained towards minimizing the cross entropy (CE) loss. Our experiment is conducted on the brain fMRI datasets achieved by scanning college students when they were learning five concepts of computer science. The results show that the proposed MSTNet can benefit from the introduced Markov regularization and thus result in improved performance on the brain activity classification. This study not only shows an effective fMRI classification model with Markov regularization but also provides the potential to understand brain intelligence and help patients with language disabilities.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124730893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995126
F. Milicchio, Marco Oliva, Mattia C. F. Prosperi
Advances in next-generation sequencing (NGS) have not only increased the overall throughput of genomic content (e.g. Illumina NovaSeq up to 6, 000GB), but also provided technology miniaturization (e.g. Oxford Nanopore MinION) enabling real-time, mobile experiments. Single Instruction/Multiple Data (SIMD) hardware acceleration is increasingly used to improve performance of NGS data processing tools, while generic template programming libraries are advantageous to adapt to the fast changes in sequencing and computing platforms. We here present a novel k-mer parser written in ISO C++ that exploits an interleaved, non-sequential, hardware accelerated SIMD implementation within a generic programming framework called libseq. We benchmarked our k-mer parser using different NGS experimental datasets comparing with other two popular k-mer counting tools (DSK and KMC3). On an Intel machine with AVX2 (Quad-Core Intel Core i5 CPU, 32 GB RAM), using simulated in-memory reads, DSK and KMC3 were on average 3. 6x and 1. 03x times slower than our parser across k value ranges of 35-63. On real sequencing experiments, DSK and KMC3 were on average 8. 3x and 28. 8x times slower in file/read parsing and k-mer building than ours. Since our tool uses generic programming, other methods that rely on k-mers (e.g. de Bruijn graphs) can directly benefit from its SIMD acceleration. Our k-mer parser and libseq 2.0 are released under the BSD license and available at https://zenodo.org/record/7015294.
{"title":"An interleaved hardware-accelerated k-mer parser","authors":"F. Milicchio, Marco Oliva, Mattia C. F. Prosperi","doi":"10.1109/BIBM55620.2022.9995126","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995126","url":null,"abstract":"Advances in next-generation sequencing (NGS) have not only increased the overall throughput of genomic content (e.g. Illumina NovaSeq up to 6, 000GB), but also provided technology miniaturization (e.g. Oxford Nanopore MinION) enabling real-time, mobile experiments. Single Instruction/Multiple Data (SIMD) hardware acceleration is increasingly used to improve performance of NGS data processing tools, while generic template programming libraries are advantageous to adapt to the fast changes in sequencing and computing platforms. We here present a novel k-mer parser written in ISO C++ that exploits an interleaved, non-sequential, hardware accelerated SIMD implementation within a generic programming framework called libseq. We benchmarked our k-mer parser using different NGS experimental datasets comparing with other two popular k-mer counting tools (DSK and KMC3). On an Intel machine with AVX2 (Quad-Core Intel Core i5 CPU, 32 GB RAM), using simulated in-memory reads, DSK and KMC3 were on average 3. 6x and 1. 03x times slower than our parser across k value ranges of 35-63. On real sequencing experiments, DSK and KMC3 were on average 8. 3x and 28. 8x times slower in file/read parsing and k-mer building than ours. Since our tool uses generic programming, other methods that rely on k-mers (e.g. de Bruijn graphs) can directly benefit from its SIMD acceleration. Our k-mer parser and libseq 2.0 are released under the BSD license and available at https://zenodo.org/record/7015294.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129419395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.10113358
Lin Xi, Xiangyang Yuan, Jing Liu, X. Tang
With the aim of screening the prognostic genes for breast cancer (BRCA) and exploring the possible mechanism and clinical value of these genes in the growth and regression stage of disease, we study the genes in the public gene expression omnibus (GEO) GSE22820 and the cancer genome atlas (TCGA). To achieve high-confidence gene candidates for BRCA, we present a hybrid gene and module analysis pipeline that strategically considers data mining on different datasets. Ultimately, four gene candidates, i.e., PLIN1, GPD1, LIPE, and CHRDL1, are targeted for BRCA. Afterwards, Kaplan-Meier survival analysis is performed on these genes for verification, revealing that the overall survival time of patients with low expression of these genes was shorter than that of patients with high expression (with P<0.05). Moreover, in order to study the role of these genes in the mechanisms and functionality related to cytoplasmic lipid metabolism, functional enrichment and pathway analysis are implemented. The results indicate that the expression of the four discovered genes plays an adverse role in BRCA development and could serve as effective biomarkers for predicting the formation and progression of BRCA.
{"title":"Four potential prognostic markers for breast cancer identified by hybrid gene and module expression analysis","authors":"Lin Xi, Xiangyang Yuan, Jing Liu, X. Tang","doi":"10.1109/BIBM55620.2022.10113358","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.10113358","url":null,"abstract":"With the aim of screening the prognostic genes for breast cancer (BRCA) and exploring the possible mechanism and clinical value of these genes in the growth and regression stage of disease, we study the genes in the public gene expression omnibus (GEO) GSE22820 and the cancer genome atlas (TCGA). To achieve high-confidence gene candidates for BRCA, we present a hybrid gene and module analysis pipeline that strategically considers data mining on different datasets. Ultimately, four gene candidates, i.e., PLIN1, GPD1, LIPE, and CHRDL1, are targeted for BRCA. Afterwards, Kaplan-Meier survival analysis is performed on these genes for verification, revealing that the overall survival time of patients with low expression of these genes was shorter than that of patients with high expression (with P<0.05). Moreover, in order to study the role of these genes in the mechanisms and functionality related to cytoplasmic lipid metabolism, functional enrichment and pathway analysis are implemented. The results indicate that the expression of the four discovered genes plays an adverse role in BRCA development and could serve as effective biomarkers for predicting the formation and progression of BRCA.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129872611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995108
Thi Thuy Duong Vu, Jaehee Jung
The Gene Ontology (GO) database contains approximately 40,000 classes of terms arranged in a hierarchical relationship. These terms mainly define protein functions and are used in bioinformatics to automatically predict protein functions using their sequences. Recently, several models have been studied, such as ProtBert and ProteinBERT, which predict protein functions by fine-tuning a pretrained model of the nucleotide sequence using a self-supervised deep method. We proposed two models to predict GO using protein features extracted by the ProtBert model to annotate proteins with their GO terms. Additionally, we customized the ProteinBERT model and fine-tuned it to predict GO terms. The experiment showed that protein embeddings created using pretrained transformer models can be used as a source of data for tasks involving sequence prediction, with a focus on protein functions. The suggested models allow flexible sequence lengths and provide improved performance compared to other comparison methods.
{"title":"Gene Ontology based protein functional annotation using pretrained embeddings","authors":"Thi Thuy Duong Vu, Jaehee Jung","doi":"10.1109/BIBM55620.2022.9995108","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995108","url":null,"abstract":"The Gene Ontology (GO) database contains approximately 40,000 classes of terms arranged in a hierarchical relationship. These terms mainly define protein functions and are used in bioinformatics to automatically predict protein functions using their sequences. Recently, several models have been studied, such as ProtBert and ProteinBERT, which predict protein functions by fine-tuning a pretrained model of the nucleotide sequence using a self-supervised deep method. We proposed two models to predict GO using protein features extracted by the ProtBert model to annotate proteins with their GO terms. Additionally, we customized the ProteinBERT model and fine-tuned it to predict GO terms. The experiment showed that protein embeddings created using pretrained transformer models can be used as a source of data for tasks involving sequence prediction, with a focus on protein functions. The suggested models allow flexible sequence lengths and provide improved performance compared to other comparison methods.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128311954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995565
Guang Zheng, Ye Lv, Junmei Zhao, Hongtao Guo
In China, the integration of traditional Chinese medicine (TCM) and western medicine against rheumatoid arthritis (RA) delivers both low-level inflammation indexes and less side effects. Thus, exploring the mechanism of TCM against RA might help to explore the pathology of RA. As the diagnosis of TCM is based on syndrome differentiation and deficiency of liver and kidney is the leading one, then, explore associated Chinese medicines against RA may shed light on the therapeutic regulation network. In this study, network pharmacology analysis was carried out based on bioactive compounds, targeted proteins and protein interactions towards RA. As a result, the regulation network against RA delivered by the top five Chinese medicines was constructed. Further bioinformatics analysis on participant genes not only elucidate the relationships to RA and immune response, but also the reduced side effects e.g. osteoporosis. Validation of the therapeutic effect on RA patients was done via check indexes on C-reactive protein and erythrocyte sedimentation rate. Potential effects of the delivered regulation network were demonstrated with heatmap on the microarray data of RA synovium tissue.
{"title":"The Regulation Networks of Chinese Medicines Against Rheumatoid Arthritis with Syndrome of Deficiency of Liver and Kidney","authors":"Guang Zheng, Ye Lv, Junmei Zhao, Hongtao Guo","doi":"10.1109/BIBM55620.2022.9995565","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995565","url":null,"abstract":"In China, the integration of traditional Chinese medicine (TCM) and western medicine against rheumatoid arthritis (RA) delivers both low-level inflammation indexes and less side effects. Thus, exploring the mechanism of TCM against RA might help to explore the pathology of RA. As the diagnosis of TCM is based on syndrome differentiation and deficiency of liver and kidney is the leading one, then, explore associated Chinese medicines against RA may shed light on the therapeutic regulation network. In this study, network pharmacology analysis was carried out based on bioactive compounds, targeted proteins and protein interactions towards RA. As a result, the regulation network against RA delivered by the top five Chinese medicines was constructed. Further bioinformatics analysis on participant genes not only elucidate the relationships to RA and immune response, but also the reduced side effects e.g. osteoporosis. Validation of the therapeutic effect on RA patients was done via check indexes on C-reactive protein and erythrocyte sedimentation rate. Potential effects of the delivered regulation network were demonstrated with heatmap on the microarray data of RA synovium tissue.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128389814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}