Objective: This study aimed to explore the mechanism of action of Wumei Pills (WMP) in treating gastric cancer (GC) based on network pharmacology and molecular docking. Methods: The Wumei Pills’ active ingredients were obtained from the traditional Chinese medicine system pharmacology database, and the target sites were obtained from the PharmMapper database. GC’ s target genes were identified through GeneCards, the Therapeutic Target Database, and other databases. The intersection of the two was used to determine the target of active ingredients of WMP that were related to GC. Cytoscape 3.7.0 was used to establish the network map of “ compound-traditional Chinese medicine-ingredient-target” to screen the core components. The Search Tool for the Retrieval of Interacting Genes/Proteins database and Cytoscape 3.7.0 were used to analyze and visualize potential genes of WMP in treating GC. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment were conducted through Metascape. The “ target-critical path” network diagram was created by screening relevant pathways with the enrichment score. KM plotter and Gene Expression Profiling Interactive Analysis database were used to draw GC related survival curve online for core genes. AutoDock Vina and PyMol software were used to conduct molecular docking and visualization. Results: There were 99 intersection targets of the active ingredients of WMP and the disease. Protein-protein interaction network topology analysis revealed ALB, EGFR, SRC, and other key targets. Molecular docking results showed that the key active components had good binding with the core target, and ALB and ESR1 genes were significant in survival analysis. Conclusion:WMP could treat GC via beta-sitosterol, stigmasterol, and other active ingredients that acted on ALB, EGFR, SRC, and other targets. The mechanism could be related to the epithelial cell signal transduction pathway in Helicobacter pylori infection, which played a multi-target and multi-pathway therapeutic role.
{"title":"The Mechanism of Action of Network Pharmacology Integrated with Molecular Docking to Explore Wumei Pills in Treating Gastric Cancer","authors":"Zhongwen Lu, Shuang Zhang, Fei Teng, Xuanhe Tian, Xijian Liu, Xiaochun Han","doi":"10.1109/BIBM55620.2022.9995670","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995670","url":null,"abstract":"Objective: This study aimed to explore the mechanism of action of Wumei Pills (WMP) in treating gastric cancer (GC) based on network pharmacology and molecular docking. Methods: The Wumei Pills’ active ingredients were obtained from the traditional Chinese medicine system pharmacology database, and the target sites were obtained from the PharmMapper database. GC’ s target genes were identified through GeneCards, the Therapeutic Target Database, and other databases. The intersection of the two was used to determine the target of active ingredients of WMP that were related to GC. Cytoscape 3.7.0 was used to establish the network map of “ compound-traditional Chinese medicine-ingredient-target” to screen the core components. The Search Tool for the Retrieval of Interacting Genes/Proteins database and Cytoscape 3.7.0 were used to analyze and visualize potential genes of WMP in treating GC. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment were conducted through Metascape. The “ target-critical path” network diagram was created by screening relevant pathways with the enrichment score. KM plotter and Gene Expression Profiling Interactive Analysis database were used to draw GC related survival curve online for core genes. AutoDock Vina and PyMol software were used to conduct molecular docking and visualization. Results: There were 99 intersection targets of the active ingredients of WMP and the disease. Protein-protein interaction network topology analysis revealed ALB, EGFR, SRC, and other key targets. Molecular docking results showed that the key active components had good binding with the core target, and ALB and ESR1 genes were significant in survival analysis. Conclusion:WMP could treat GC via beta-sitosterol, stigmasterol, and other active ingredients that acted on ALB, EGFR, SRC, and other targets. The mechanism could be related to the epithelial cell signal transduction pathway in Helicobacter pylori infection, which played a multi-target and multi-pathway therapeutic role.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123868116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995116
Yacong Li, Lei Ma, Qince Li, Henggui Zhang, Kuanquan Wang
Biological pacemaker is a therapy for cardiac rhythm disease, which can be transformed from ventricular myocytes (VMs) by overexpressing HCN gene which codes the expression of hyperpolarization-activated current (${mathrm {I}}_{mathrm{f}}$) and knocking off Kir2.1 gene which codes inward-rectifier potassium current (${mathrm {I}}_{mathrm{K1}}$). Our previous study built a biological pacemaker single cell model and clarified the underlying mechanisms of how gene expressing levels influence the pacemaking activity of single pacemaker cell. But the pacemaking ability of pacemaker tissue has not been researched systematically. And what factors may have effects on pacemaker’s synchronization and spontaneous beating propagation are not clear. Biological research indicated that both sinoatrial node and pacemaker cells has less expression of connexin than unexcitable cardiac cells, which provides a possibility that improve pacemaking ability of pacemaker by decreasing its cell coupling. Another possible factor is the number of pacemaker cells. According to the common sense, increasing cell number can promote pacemaking behaviours, but overmuch pacemaker cells is unreasonable in clinic. As a result, the balance between pacemaker number and cell coupling is important when applying biological pacemaker. In this study, we constructed a two-dimensional cardiac tissue model with the description of electrophysiology to illustrate the relationship between gap junction and cell number. Based on this model, we modified the cell coupling between pacemaker cells by adjusting the diffusion coefficient of tissue with different pacemaker number. In different condition, the synchronization, pacemaking cycle length and electrical signal propagation were evaluated. It can be concluded that weakening cell coupling among pacemaker cells can lift the efficiency of bio-pacemaker therapy. This study may contribute to produce effective pacemaker in clinic.
{"title":"Effect of cell coupling between pacemaker cells on the biological pacemaker in cardiac tissue model","authors":"Yacong Li, Lei Ma, Qince Li, Henggui Zhang, Kuanquan Wang","doi":"10.1109/BIBM55620.2022.9995116","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995116","url":null,"abstract":"Biological pacemaker is a therapy for cardiac rhythm disease, which can be transformed from ventricular myocytes (VMs) by overexpressing HCN gene which codes the expression of hyperpolarization-activated current (${mathrm {I}}_{mathrm{f}}$) and knocking off Kir2.1 gene which codes inward-rectifier potassium current (${mathrm {I}}_{mathrm{K1}}$). Our previous study built a biological pacemaker single cell model and clarified the underlying mechanisms of how gene expressing levels influence the pacemaking activity of single pacemaker cell. But the pacemaking ability of pacemaker tissue has not been researched systematically. And what factors may have effects on pacemaker’s synchronization and spontaneous beating propagation are not clear. Biological research indicated that both sinoatrial node and pacemaker cells has less expression of connexin than unexcitable cardiac cells, which provides a possibility that improve pacemaking ability of pacemaker by decreasing its cell coupling. Another possible factor is the number of pacemaker cells. According to the common sense, increasing cell number can promote pacemaking behaviours, but overmuch pacemaker cells is unreasonable in clinic. As a result, the balance between pacemaker number and cell coupling is important when applying biological pacemaker. In this study, we constructed a two-dimensional cardiac tissue model with the description of electrophysiology to illustrate the relationship between gap junction and cell number. Based on this model, we modified the cell coupling between pacemaker cells by adjusting the diffusion coefficient of tissue with different pacemaker number. In different condition, the synchronization, pacemaking cycle length and electrical signal propagation were evaluated. It can be concluded that weakening cell coupling among pacemaker cells can lift the efficiency of bio-pacemaker therapy. This study may contribute to produce effective pacemaker in clinic.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121591447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995387
Si-Jiu Wu, Tianyu Huang, Yihao Li
This paper proposes a shallow convolutional neural network (CNN) model to improve the efficiency and accuracy of real-time human activity recognition (HAR). In the traditional convolutional network, an Mix-Patch-Layer (MPL) block based on the attention mechanism is added to enhance the expressiveness of the network extracted features. This block makes the features in the network focus on the information between different parts of itself, which makes up for the loss of global information in temporal data features. Experiments show that the block can improve real-time human recognition accuracy and efficiency with a shallow network.
{"title":"A rehabilitation activity monitoring method based on Shallow-CNN","authors":"Si-Jiu Wu, Tianyu Huang, Yihao Li","doi":"10.1109/BIBM55620.2022.9995387","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995387","url":null,"abstract":"This paper proposes a shallow convolutional neural network (CNN) model to improve the efficiency and accuracy of real-time human activity recognition (HAR). In the traditional convolutional network, an Mix-Patch-Layer (MPL) block based on the attention mechanism is added to enhance the expressiveness of the network extracted features. This block makes the features in the network focus on the information between different parts of itself, which makes up for the loss of global information in temporal data features. Experiments show that the block can improve real-time human recognition accuracy and efficiency with a shallow network.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115827635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
single-cell RNA-sequencing (scRNA-seq) technology can reveal cellular heterogeneity with high throughput and resolution, facilitating the profiling of single-cell transcriptomes. However, due to some experimental factors, a large number of missing values are generated in scRNA-seq data, which are called dropout events, and this phenomenon affects the downstream analysis. Imputation is an effective denoising method, but existing imputation methods still face a huge challenge: lack of interpretability. In this study, we propose single-cell Self-Attention Generative Adversarial Networks(scSAGAN), a semi-supervised imputation method for scRNA-seq data. scSAGAN mainly uses Semi-Supervised Learning (SSL) and Probabilistic Latent Semantic Analysis (PLSA), which can not only learn the potential characteristics of different types of cells but explain their imputation behavior. In clustering experiments, scSAGAN exhibits better clustering performance than all baselines on 7 datasets. Next, we interpret the imputation behavior of scSAGAN on datasets such as Alzheimer’s disease and find causative genes associated with the corresponding datasets. scSAGAN is currently an open-source method, available at https://github.com/zehaoxiongl23/scSAGAN.
{"title":"scSAGAN: A scRNA-seq data imputation method based on Semi-Supervised Learning and Probabilistic Latent Semantic Analysis","authors":"Zehao Xiong, Xiangtao Chen, Jiawei Luo, Cong Shen, Zhongyuan Xu","doi":"10.1109/BIBM55620.2022.9995463","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995463","url":null,"abstract":"single-cell RNA-sequencing (scRNA-seq) technology can reveal cellular heterogeneity with high throughput and resolution, facilitating the profiling of single-cell transcriptomes. However, due to some experimental factors, a large number of missing values are generated in scRNA-seq data, which are called dropout events, and this phenomenon affects the downstream analysis. Imputation is an effective denoising method, but existing imputation methods still face a huge challenge: lack of interpretability. In this study, we propose single-cell Self-Attention Generative Adversarial Networks(scSAGAN), a semi-supervised imputation method for scRNA-seq data. scSAGAN mainly uses Semi-Supervised Learning (SSL) and Probabilistic Latent Semantic Analysis (PLSA), which can not only learn the potential characteristics of different types of cells but explain their imputation behavior. In clustering experiments, scSAGAN exhibits better clustering performance than all baselines on 7 datasets. Next, we interpret the imputation behavior of scSAGAN on datasets such as Alzheimer’s disease and find causative genes associated with the corresponding datasets. scSAGAN is currently an open-source method, available at https://github.com/zehaoxiongl23/scSAGAN.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122465997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9994973
Bo Liu, Hong Song, Qiang Li, Yucong Lin, Jian Yang
Lung cancer is with the highest morbidity and mortality, and early detection of cancerous changes is essential to reduce the risk of death. To achieve this, it is necessary to reduce the false positive rate of detection. In this paper, we propose a novel asymmetric residual network, called 3D ARCNN, to reduce false positive rate of lung nodules detection. 3D ARCNN consists of asymmetric convolutional and multilayer cascaded residual network structures. To solve the problem of deep neural network with large amounts of parameters and poor reproduction ability, the proposed model uses asymmetric convolution to reduce model parameters and enhance the generalization ability of the model. In addition, the model uses an internally cascaded multi-stage residual to prevent the gradient vanishing and exploding problems of deep networks. Experiments are performed on the public dataset LUNA16. Our method achieved high detection sensitivity of 91.6%, 92.7%, 93.2% and 95.8% at 1, 2, 4 and 8 false positives per scan, respectively, which got an average CPM index of 0.912. Experimental results show that the proposed 3D ARCNN is very useful for reducing the false positive rate of lung nodules in the clinic.
{"title":"3D ARCNN: An Asymmetric Residual CNN for Decreasing False Positive Rate of Lung Nodules Detection","authors":"Bo Liu, Hong Song, Qiang Li, Yucong Lin, Jian Yang","doi":"10.1109/BIBM55620.2022.9994973","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9994973","url":null,"abstract":"Lung cancer is with the highest morbidity and mortality, and early detection of cancerous changes is essential to reduce the risk of death. To achieve this, it is necessary to reduce the false positive rate of detection. In this paper, we propose a novel asymmetric residual network, called 3D ARCNN, to reduce false positive rate of lung nodules detection. 3D ARCNN consists of asymmetric convolutional and multilayer cascaded residual network structures. To solve the problem of deep neural network with large amounts of parameters and poor reproduction ability, the proposed model uses asymmetric convolution to reduce model parameters and enhance the generalization ability of the model. In addition, the model uses an internally cascaded multi-stage residual to prevent the gradient vanishing and exploding problems of deep networks. Experiments are performed on the public dataset LUNA16. Our method achieved high detection sensitivity of 91.6%, 92.7%, 93.2% and 95.8% at 1, 2, 4 and 8 false positives per scan, respectively, which got an average CPM index of 0.912. Experimental results show that the proposed 3D ARCNN is very useful for reducing the false positive rate of lung nodules in the clinic.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128089242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995453
Xinlong Liu, Zepeng Sun, W. Liu, Feng Qiao, Li Cui, Jing Yang, Jingjie Sha, Jian Li, Li-Qun Xu
Solid-state nanopores have shown impressive performances in several sequencing research scenarios, such as biomolecule conformation detection, biomarker identification, and protein fingerprinting. In all these scenarios, accurate event detection is the fundamental step toward data analysis. Most existing event detection methods use either user-defined thresholds or adaptive thresholds determined automatically by the data. The former class depends heavily on human expertise, which is labor-intensive; the latter appears to be more advanced, however, the setting of threshold parameters is somewhat tricky. Hence, the results are usually inconsistent among different methods. In this paper, we develop a novel event detection method, where the selection threshold is computed following the principle governed by an analytical expression. Unlike other methods, each event’s starting and ending points are located based on the slope rather than picking the first point whose current value goes across the baseline. Moreover, we add a method to determine whether multiple levels are present within each event. We then evaluate the method on two groups of current traces generated by short ssDNA and 48.5kb λ-DNA samples, respectively. The results show that our method performs well on detecting challenging translocation events with relatively low amplitudes, and is also able to accurately locate the starting/end points of each level of the events.
{"title":"Multi-level translocation events analysis in solid-state nanopore current traces","authors":"Xinlong Liu, Zepeng Sun, W. Liu, Feng Qiao, Li Cui, Jing Yang, Jingjie Sha, Jian Li, Li-Qun Xu","doi":"10.1109/BIBM55620.2022.9995453","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995453","url":null,"abstract":"Solid-state nanopores have shown impressive performances in several sequencing research scenarios, such as biomolecule conformation detection, biomarker identification, and protein fingerprinting. In all these scenarios, accurate event detection is the fundamental step toward data analysis. Most existing event detection methods use either user-defined thresholds or adaptive thresholds determined automatically by the data. The former class depends heavily on human expertise, which is labor-intensive; the latter appears to be more advanced, however, the setting of threshold parameters is somewhat tricky. Hence, the results are usually inconsistent among different methods. In this paper, we develop a novel event detection method, where the selection threshold is computed following the principle governed by an analytical expression. Unlike other methods, each event’s starting and ending points are located based on the slope rather than picking the first point whose current value goes across the baseline. Moreover, we add a method to determine whether multiple levels are present within each event. We then evaluate the method on two groups of current traces generated by short ssDNA and 48.5kb λ-DNA samples, respectively. The results show that our method performs well on detecting challenging translocation events with relatively low amplitudes, and is also able to accurately locate the starting/end points of each level of the events.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132605757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995199
Tiantian Li, Daming Zhu, Haitao Jiang, Haodi Feng, Xuefeng Cui
We focus on a new problem that is formulated to find a longest k-tuple of common sub-strings (abbr. k-CSSs) of two or more strings. We present a suffix tree based algorithm for this problem, which can find a longest k-CSS of m strings in $O(kmn^{k})$ time and $O(kmn)$ space where n is the length sum of the m strings. This algorithm can be used to approximate the longest k-CSS problem to a performance ratio $frac{1}{epsilon}$ in $O(kmn^{lceilepsilon krceil})$ time for $epsilonin(0,1]$. Since the algorithm has the space complexity in linear order of n, it will show advantage in comparing particularly long strings. This algorithm proves that the problem that asks to find a longest gapped pattern of non-constant number of strings is polynomial time solvable if the gap number is restricted constant, although the problem without any restriction on the gap number was proved NP-Hard. Using a C++ tool that is reliant on the algorithm, we performed experiments of finding longest 2-CSSs, 3-CSSs and 5-CSSs of 2 ~ 14 COVID-19 S-proteins. Under the help of longest 2-CSSs and 3-CSSs of COVID-19 S-proteins, we identified the mutation sites in the S-proteins of two COVID-19 variants Delta and Omicron. The algorithm based tool is available for downloading at https://github.com/lytt0/k-CSS.
{"title":"Longest k-tuple Common Sub-Strings","authors":"Tiantian Li, Daming Zhu, Haitao Jiang, Haodi Feng, Xuefeng Cui","doi":"10.1109/BIBM55620.2022.9995199","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995199","url":null,"abstract":"We focus on a new problem that is formulated to find a longest k-tuple of common sub-strings (abbr. k-CSSs) of two or more strings. We present a suffix tree based algorithm for this problem, which can find a longest k-CSS of m strings in $O(kmn^{k})$ time and $O(kmn)$ space where n is the length sum of the m strings. This algorithm can be used to approximate the longest k-CSS problem to a performance ratio $frac{1}{epsilon}$ in $O(kmn^{lceilepsilon krceil})$ time for $epsilonin(0,1]$. Since the algorithm has the space complexity in linear order of n, it will show advantage in comparing particularly long strings. This algorithm proves that the problem that asks to find a longest gapped pattern of non-constant number of strings is polynomial time solvable if the gap number is restricted constant, although the problem without any restriction on the gap number was proved NP-Hard. Using a C++ tool that is reliant on the algorithm, we performed experiments of finding longest 2-CSSs, 3-CSSs and 5-CSSs of 2 ~ 14 COVID-19 S-proteins. Under the help of longest 2-CSSs and 3-CSSs of COVID-19 S-proteins, we identified the mutation sites in the S-proteins of two COVID-19 variants Delta and Omicron. The algorithm based tool is available for downloading at https://github.com/lytt0/k-CSS.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132781884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995370
Dylan Lebatteux, Hugo Soudeyns, I. Boucoiran, S. Gantt, Abdoulaye Baniré Diallo
Discriminative k-mers are unique genomic regions that characterize a given viral family, genus, species, or variant. Most existing algorithms for identifying discriminative k-mer sets are limited to returning raw sub-sequences. However, to explain the discriminative properties of a given k-mer for specific taxonomic groups of viruses, it is important to identify the variations (nucleotide sequences derived from an initial k-mer having undergone one or more nucleotide changes) of this k-mer that occur in other groups of viruses. These variations as well as their frequencies of occurrence, their genomic location and their potential influence on biological functions r epresent important insights to understand the classification process. In this article, we introduce KANALYZER, a novel algorithm to identify variations of discriminative k-mers and associated information according to viral taxonomy. The algorithm was assessed to identify k-mer variations in both simulated and real viral sequence sets. In these evaluations, KANALYZER correctly and quickly identified over 95% of the variations and associated information. KANALYZER algorithm is integrated directly into CASTOR-KRFE discriminative k-mers identification tool pipeline. The source code, detailed results and data to reproduce the experiments are available at https://github.com/bioinfoUQAM/CASTOR_KRFE.
{"title":"KANALYZER: a method to identify variations of discriminative k-mers in genomic sequences","authors":"Dylan Lebatteux, Hugo Soudeyns, I. Boucoiran, S. Gantt, Abdoulaye Baniré Diallo","doi":"10.1109/BIBM55620.2022.9995370","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995370","url":null,"abstract":"Discriminative k-mers are unique genomic regions that characterize a given viral family, genus, species, or variant. Most existing algorithms for identifying discriminative k-mer sets are limited to returning raw sub-sequences. However, to explain the discriminative properties of a given k-mer for specific taxonomic groups of viruses, it is important to identify the variations (nucleotide sequences derived from an initial k-mer having undergone one or more nucleotide changes) of this k-mer that occur in other groups of viruses. These variations as well as their frequencies of occurrence, their genomic location and their potential influence on biological functions r epresent important insights to understand the classification process. In this article, we introduce KANALYZER, a novel algorithm to identify variations of discriminative k-mers and associated information according to viral taxonomy. The algorithm was assessed to identify k-mer variations in both simulated and real viral sequence sets. In these evaluations, KANALYZER correctly and quickly identified over 95% of the variations and associated information. KANALYZER algorithm is integrated directly into CASTOR-KRFE discriminative k-mers identification tool pipeline. The source code, detailed results and data to reproduce the experiments are available at https://github.com/bioinfoUQAM/CASTOR_KRFE.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134012136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9994867
Pei-Chi Huang, Ejan Shakya, Myoungkyu Song, M. Subramaniam
As biofilms research grows rapidly, a corpus of bibliographic literature (i.e., documents) is increasing at an incredible rate. Many researchers often need to inspect these large document collections, including (1) text, (2) images, and (3) captions, to understand underlying biological mechanisms and make a critical decision. However, researchers have great difficulty in exploring such ever-growing large datasets in labor-intensive processes. Thus, automation of such tasks is urgently required for the automatic identification or classification of a large volume of document collections. To address this problem, we present a multimodal deep learning-based approach to automatically classify documents for a specialized information retrieval technique based on biofilm images, captions, and texts, which is a major source of information for the classification of documents. Images, captions, and texts from biofilm documents are represented in a large vector space. Then, they are fed into convolutional neural networks (CNNs), to improve similarity matching and relevance. Our extensive experiments and analysis will take captions, texts, or images as unimodal models as inputs and concatenate them all into multimodal models. The trained models for this classification approach in turn help a search engine to precisely identify relevant and domain-specific documents from a large volume of document collections for further research direction in biofilm development.
{"title":"BioMDSE: A Multimodal Deep Learning-Based Search Engine Framework for Biofilm Documents Classifications","authors":"Pei-Chi Huang, Ejan Shakya, Myoungkyu Song, M. Subramaniam","doi":"10.1109/BIBM55620.2022.9994867","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9994867","url":null,"abstract":"As biofilms research grows rapidly, a corpus of bibliographic literature (i.e., documents) is increasing at an incredible rate. Many researchers often need to inspect these large document collections, including (1) text, (2) images, and (3) captions, to understand underlying biological mechanisms and make a critical decision. However, researchers have great difficulty in exploring such ever-growing large datasets in labor-intensive processes. Thus, automation of such tasks is urgently required for the automatic identification or classification of a large volume of document collections. To address this problem, we present a multimodal deep learning-based approach to automatically classify documents for a specialized information retrieval technique based on biofilm images, captions, and texts, which is a major source of information for the classification of documents. Images, captions, and texts from biofilm documents are represented in a large vector space. Then, they are fed into convolutional neural networks (CNNs), to improve similarity matching and relevance. Our extensive experiments and analysis will take captions, texts, or images as unimodal models as inputs and concatenate them all into multimodal models. The trained models for this classification approach in turn help a search engine to precisely identify relevant and domain-specific documents from a large volume of document collections for further research direction in biofilm development.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131925908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-06DOI: 10.1109/BIBM55620.2022.9995601
Zeyu Gao, Anyu Mao, Jialun Wu, Yang Li, Chunbao Wang, C. Ding, Tieliang Gong, Chen Li
Computational Pathology (CPATH) offers the possibility for highly accurate and low-cost automated pathological diagnosis. However, the high time cost of model inference is one of the main issues limiting the application of CPATH methods. Due to the large size of Whole-Slide Image (WSI), commonly used CPATH methods divided a WSI into a large number of image patches at relatively high magnification, then predicted each image patch individually, which is time-consuming. In this paper, we propose a novel Uncertainty-based Model Acceleration (UMA) method for reducing the time cost of model inference, thereby relieving the deployment burden of CPATH applications. Enlightened by the slide-viewing process of pathologists, only a few high-uncertain regions are regarded as “suspicious” regions that need to be predicted at high magnification, and most of the regions in WSI are predicted at low magnification, thereby reducing the times of image patch extraction and prediction. Meanwhile, uncertainty estimation ensures prediction accuracy at low magnification. We take two fundamental CPATH classification tasks (i.e., cancer region detection and subtyping) as examples. Extensive experiments on two large-scale renal cell carcinoma classification datasets demonstrate that our UMA can significantly reduce the time cost of model inference while maintaining competitive classification performance.
{"title":"Uncertainty-based Model Acceleration for Cancer Classification in Whole-Slide Images","authors":"Zeyu Gao, Anyu Mao, Jialun Wu, Yang Li, Chunbao Wang, C. Ding, Tieliang Gong, Chen Li","doi":"10.1109/BIBM55620.2022.9995601","DOIUrl":"https://doi.org/10.1109/BIBM55620.2022.9995601","url":null,"abstract":"Computational Pathology (CPATH) offers the possibility for highly accurate and low-cost automated pathological diagnosis. However, the high time cost of model inference is one of the main issues limiting the application of CPATH methods. Due to the large size of Whole-Slide Image (WSI), commonly used CPATH methods divided a WSI into a large number of image patches at relatively high magnification, then predicted each image patch individually, which is time-consuming. In this paper, we propose a novel Uncertainty-based Model Acceleration (UMA) method for reducing the time cost of model inference, thereby relieving the deployment burden of CPATH applications. Enlightened by the slide-viewing process of pathologists, only a few high-uncertain regions are regarded as “suspicious” regions that need to be predicted at high magnification, and most of the regions in WSI are predicted at low magnification, thereby reducing the times of image patch extraction and prediction. Meanwhile, uncertainty estimation ensures prediction accuracy at low magnification. We take two fundamental CPATH classification tasks (i.e., cancer region detection and subtyping) as examples. Extensive experiments on two large-scale renal cell carcinoma classification datasets demonstrate that our UMA can significantly reduce the time cost of model inference while maintaining competitive classification performance.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134377290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}