Pub Date : 2024-10-03DOI: 10.1109/TCBB.2024.3473899
Zhenhao Sun;Meng Wang;Shiqi Wang;Sam Kwong
In this paper, we propose a Learning-based gEnome Codec (LEC), which is designed for high efficiency and enhanced flexibility. The LEC integrates several advanced technologies, including Group of Bases (GoB) compression, multi-stride coding and bidirectional prediction, all of which are aimed at optimizing the balance between coding complexity and performance in lossless compression. The model applied in our proposed codec is data-driven, based on deep neural networks to infer probabilities for each symbol, enabling fully parallel encoding and decoding with configured complexity for diverse applications. Based upon a set of configurations on compression ratios and inference speed, experimental results show that the proposed method is very efficient in terms of compression performance and provides improved flexibility in real-world applications.
{"title":"LEC-Codec: Learning-Based Genome Data Compression","authors":"Zhenhao Sun;Meng Wang;Shiqi Wang;Sam Kwong","doi":"10.1109/TCBB.2024.3473899","DOIUrl":"10.1109/TCBB.2024.3473899","url":null,"abstract":"In this paper, we propose a Learning-based gEnome Codec (LEC), which is designed for high efficiency and enhanced flexibility. The LEC integrates several advanced technologies, including Group of Bases (GoB) compression, multi-stride coding and bidirectional prediction, all of which are aimed at optimizing the balance between coding complexity and performance in lossless compression. The model applied in our proposed codec is data-driven, based on deep neural networks to infer probabilities for each symbol, enabling fully parallel encoding and decoding with configured complexity for diverse applications. Based upon a set of configurations on compression ratios and inference speed, experimental results show that the proposed method is very efficient in terms of compression performance and provides improved flexibility in real-world applications.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2447-2458"},"PeriodicalIF":3.6,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142371724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-27DOI: 10.1109/TCBB.2024.3469164
Xuena Liang;Junliang Shang;Jin-Xing Liu;Chun-Hou Zheng;Juan Wang
Recent advancements in spatially transcriptomics (ST) technologies have enabled the comprehensive measurement of gene expression profiles while preserving the spatial information of cells. Combining gene expression profiles and spatial information has been the most commonly used method to identify spatial functional domains and genes. However, most existing spatial domain decipherer methods are more focused on spatially neighboring structures and fail to take into account balancing the self-characteristics and the spatial structure dependency of spots. Therefore, we propose a novel model called SpaGCAC, which recognizes spatial domains with the help of an adaptive feature-spatial balanced graph convolutional network named AFSBGCN. The AFSBGCN can dynamically learn the relationship between spatial local topology structures and the self-characteristics of spots by adaptively increasing or declining the weight on the self-characteristics during message aggregation. Moreover, to better capture the local structures of spots, SpaGCAC exploits a local topology structure contrastive learning strategy. Meanwhile, SpaGCAC utilizes a probability distribution contrastive learning strategy to increase the similarity of probability distributions for points belonging to the same category. We validate the performance of SpaGCAC for spatial domain identification on four spatial transcriptomic datasets. In comparison with seven spatial domain recognition methods, SpaGCAC achieved the highest NMI median of 0.683 and the second highest ARI median of 0.559 on the multi-slice DLPFC dataset. SpaGCAC achieved the best results on all three other single-slice datasets. The above-mentioned results show that SpaGCAC outperforms most existing methods, providing enhanced insights into tissue heterogeneity.
{"title":"Enhancing Spatial Domain Identification in Spatially Resolved Transcriptomics Using Graph Convolutional Networks With Adaptively Feature-Spatial Balance and Contrastive Learning","authors":"Xuena Liang;Junliang Shang;Jin-Xing Liu;Chun-Hou Zheng;Juan Wang","doi":"10.1109/TCBB.2024.3469164","DOIUrl":"10.1109/TCBB.2024.3469164","url":null,"abstract":"Recent advancements in spatially transcriptomics (ST) technologies have enabled the comprehensive measurement of gene expression profiles while preserving the spatial information of cells. Combining gene expression profiles and spatial information has been the most commonly used method to identify spatial functional domains and genes. However, most existing spatial domain decipherer methods are more focused on spatially neighboring structures and fail to take into account balancing the self-characteristics and the spatial structure dependency of spots. Therefore, we propose a novel model called SpaGCAC, which recognizes spatial domains with the help of an adaptive feature-spatial balanced graph convolutional network named AFSBGCN. The AFSBGCN can dynamically learn the relationship between spatial local topology structures and the self-characteristics of spots by adaptively increasing or declining the weight on the self-characteristics during message aggregation. Moreover, to better capture the local structures of spots, SpaGCAC exploits a local topology structure contrastive learning strategy. Meanwhile, SpaGCAC utilizes a probability distribution contrastive learning strategy to increase the similarity of probability distributions for points belonging to the same category. We validate the performance of SpaGCAC for spatial domain identification on four spatial transcriptomic datasets. In comparison with seven spatial domain recognition methods, SpaGCAC achieved the highest NMI median of 0.683 and the second highest ARI median of 0.559 on the multi-slice DLPFC dataset. SpaGCAC achieved the best results on all three other single-slice datasets. The above-mentioned results show that SpaGCAC outperforms most existing methods, providing enhanced insights into tissue heterogeneity.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2406-2417"},"PeriodicalIF":3.6,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-27DOI: 10.1109/TCBB.2024.3468434
Yuyang Xu;Jingbo Zhou;Haochao Ying;Jintai Chen;Wei Chen;Danny Z. Chen;Jian Wu
Drug Target Interaction (DTI) prediction plays a crucial role in in-silico drug discovery, especially for deep learning (DL) models. Along this line, existing methods usually first extract features from drugs and target proteins, and use drug-target pairs to train DL models. However, these DL-based methods essentially rely on similar structures and patterns defined by the homologous proteins from a large amount of data. When few drug-target interactions are known for a newly discovered protein and its homologous proteins, prediction performance can suffer notable reduction. In this paper, we propose a novel Protein-Context enhanced Master/Slave Framework (PCMS), for zero-shot DTI prediction. This framework facilitates the efficient discovery of ligands for newly discovered target proteins, addressing the challenge of predicting interactions without prior data. Specifically, the PCMS framework consists of two main components: a Master Learner and a Slave Learner. The Master Learner first learns the target protein context information, and then adaptively generates the corresponding parameters for the Slave Learner. The Slave Learner then perform zero-shot DTI prediction in different protein contexts. Extensive experiments verify the effectiveness of our PCMS compared to state-of-the-art methods in various metrics on two public datasets.
{"title":"A Protein-Context Enhanced Master Slave Framework for Zero-Shot Drug Target Interaction Prediction","authors":"Yuyang Xu;Jingbo Zhou;Haochao Ying;Jintai Chen;Wei Chen;Danny Z. Chen;Jian Wu","doi":"10.1109/TCBB.2024.3468434","DOIUrl":"10.1109/TCBB.2024.3468434","url":null,"abstract":"Drug Target Interaction (DTI) prediction plays a crucial role in in-silico drug discovery, especially for deep learning (DL) models. Along this line, existing methods usually first extract features from drugs and target proteins, and use drug-target pairs to train DL models. However, these DL-based methods essentially rely on similar structures and patterns defined by the homologous proteins from a large amount of data. When few drug-target interactions are known for a newly discovered protein and its homologous proteins, prediction performance can suffer notable reduction. In this paper, we propose a novel Protein-Context enhanced Master/Slave Framework (PCMS), for zero-shot DTI prediction. This framework facilitates the efficient discovery of ligands for newly discovered target proteins, addressing the challenge of predicting interactions without prior data. Specifically, the PCMS framework consists of two main components: a Master Learner and a Slave Learner. The Master Learner first learns the target protein context information, and then adaptively generates the corresponding parameters for the Slave Learner. The Slave Learner then perform zero-shot DTI prediction in different protein contexts. Extensive experiments verify the effectiveness of our PCMS compared to state-of-the-art methods in various metrics on two public datasets.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2359-2370"},"PeriodicalIF":3.6,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-27DOI: 10.1109/TCBB.2024.3470592
Garud Iyengar;Mitch Perry
Models for microbial interactions attempt to understand and predict the steady state network of inter-species relationships in a community, e.g. competition for shared metabolites, and cooperation through cross-feeding. Flux balance analysis (FBA) is an approach that was introduced to model the interaction of a particular microbial species with its environment. This approach has been extended to analyzing interactions in a community of microbes; however, these approaches have two important drawbacks: first, one has to numerically solve a differential equation to identify the steady state, and second, there are no methods available to analyze the stability of the steady state. We propose a game theory based community FBA model wherein species compete to maximize their individual growth rate, and the state of the community is given by the resulting Nash equilibrium. We develop a computationally efficient method for directly computing the steady state biomasses and fluxes without solving a differential equation. We also develop a method to determine the stability of a steady state to perturbations in the biomasses and to invasion by new species. We report the results of applying our proposed framework to a small community of four E. coli