Motivation: Embryo selection is one of the critical factors in determining the success of pregnancy in in vitro fertilization (IVF) procedures. Using artificial intelligence to aid in embryo selection could effectively address the current time-consuming, expensive, subjectively influenced process of embryo assessment by trained embryologists. However, current deep learning-based methods often focus on blastocyst segmentation, grading, or predicting cell development via time-lapse videos, often overlooking morphokinetic parameters or lacking interpretability. Given the significance of both morphokinetic and morphological evaluation in predicting the implantation potential of cleavage-stage embryos, as emphasized by previous research, there is a necessity for an automated method to segment cleavage-stage embryos to improve this process.
Results: In this article, we introduce the SAM-based Dual Branch Segmentation Pipeline for automated segmentation of blastomeres in cleavage-stage embryos. Leveraging the powerful segmentation capability of SAM, the instance branch conducts instance segmentation of blastomeres, while the semantic branch performs semantic segmentation of fragments. Due to the lack of publicly available datasets, we construct the CleavageEmbryo dataset, the first dataset of human cleavage-stage embryos with pixel-level annotations containing fragment information. We train and test a series of state-of-the-art segmentation algorithms on CleavageEmbryo. Our experiments demonstrate that our method outperforms existing algorithms in terms of objective metrics (mAP 0.748 on blastomeres, Dice 0.694 on fragments) and visual quality, enabling more accurate segmentation of cleavage-stage embryos.
Availability and implementation: The code and sample data in this study can be found at: Https://github.com/12austincc/Cleavage-StageEmbryoSegmentation.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Cleavage-Stage Embryo Segmentation Using SAM-Based Dual Branch Pipeline: Development and Evaluation with the CleavageEmbryo Dataset.","authors":"Chensheng Zhang, Xintong Shi, Xinyue Yin, Jiayi Sun, Jianhui Zhao, Yi Zhang","doi":"10.1093/bioinformatics/btae617","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae617","url":null,"abstract":"<p><strong>Motivation: </strong>Embryo selection is one of the critical factors in determining the success of pregnancy in in vitro fertilization (IVF) procedures. Using artificial intelligence to aid in embryo selection could effectively address the current time-consuming, expensive, subjectively influenced process of embryo assessment by trained embryologists. However, current deep learning-based methods often focus on blastocyst segmentation, grading, or predicting cell development via time-lapse videos, often overlooking morphokinetic parameters or lacking interpretability. Given the significance of both morphokinetic and morphological evaluation in predicting the implantation potential of cleavage-stage embryos, as emphasized by previous research, there is a necessity for an automated method to segment cleavage-stage embryos to improve this process.</p><p><strong>Results: </strong>In this article, we introduce the SAM-based Dual Branch Segmentation Pipeline for automated segmentation of blastomeres in cleavage-stage embryos. Leveraging the powerful segmentation capability of SAM, the instance branch conducts instance segmentation of blastomeres, while the semantic branch performs semantic segmentation of fragments. Due to the lack of publicly available datasets, we construct the CleavageEmbryo dataset, the first dataset of human cleavage-stage embryos with pixel-level annotations containing fragment information. We train and test a series of state-of-the-art segmentation algorithms on CleavageEmbryo. Our experiments demonstrate that our method outperforms existing algorithms in terms of objective metrics (mAP 0.748 on blastomeres, Dice 0.694 on fragments) and visual quality, enabling more accurate segmentation of cleavage-stage embryos.</p><p><strong>Availability and implementation: </strong>The code and sample data in this study can be found at: Https://github.com/12austincc/Cleavage-StageEmbryoSegmentation.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-03DOI: 10.1093/bioinformatics/btae579
Minghui Li, Yao Shi, Shengqing Hu, Shengshan Hu, Peijin Guo, Wei Wan, Leo Yu Zhang, Shirui Pan, Jizhou Li, Lichao Sun, Xiaoli Lan
Motivation: Predicting the binding affinity between antigens and antibodies accurately is crucial for assessing therapeutic antibody effectiveness and enhancing antibody engineering and vaccine design. Traditional machine learning methods have been widely used for this purpose, relying on interfacial amino acids' structural information. Nevertheless, due to technological limitations and high costs of acquiring structural data, the structures of most antigens and antibodies are unknown, and sequence-based methods have gained attention. Existing sequence-based approaches designed for protein-protein affinity prediction exhibit a significant drop in performance when applied directly to antibody-antigen affinity prediction due to imbalanced training data and lacking design in the model framework specifically for antibody-antigen, hindering the learning of key features of antibodies and antigens. Therefore, we propose MVSF-AB, a Multi-View Sequence Feature learning for accurate Antibody-antigen Binding affinity prediction.
Results: MVSF-AB designs a multi-view method that fuses semantic features and residue features to fully utilize the sequence information of antibody-antigen and predicts the binding affinity. Experimental results demonstrate that MVSF-AB outperforms existing approaches in predicting unobserved natural antibody-antigen affinity and maintains its effectiveness when faced with mutant strains of antibodies.
Availability and implementation: Datasets we used and source code are available on our public GitHub repository https://github.com/TAI-Medical-Lab/MVSF-AB.
{"title":"MVSF-AB: Accurate antibody-antigen binding affinity prediction via multi-view sequence feature learning.","authors":"Minghui Li, Yao Shi, Shengqing Hu, Shengshan Hu, Peijin Guo, Wei Wan, Leo Yu Zhang, Shirui Pan, Jizhou Li, Lichao Sun, Xiaoli Lan","doi":"10.1093/bioinformatics/btae579","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae579","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting the binding affinity between antigens and antibodies accurately is crucial for assessing therapeutic antibody effectiveness and enhancing antibody engineering and vaccine design. Traditional machine learning methods have been widely used for this purpose, relying on interfacial amino acids' structural information. Nevertheless, due to technological limitations and high costs of acquiring structural data, the structures of most antigens and antibodies are unknown, and sequence-based methods have gained attention. Existing sequence-based approaches designed for protein-protein affinity prediction exhibit a significant drop in performance when applied directly to antibody-antigen affinity prediction due to imbalanced training data and lacking design in the model framework specifically for antibody-antigen, hindering the learning of key features of antibodies and antigens. Therefore, we propose MVSF-AB, a Multi-View Sequence Feature learning for accurate Antibody-antigen Binding affinity prediction.</p><p><strong>Results: </strong>MVSF-AB designs a multi-view method that fuses semantic features and residue features to fully utilize the sequence information of antibody-antigen and predicts the binding affinity. Experimental results demonstrate that MVSF-AB outperforms existing approaches in predicting unobserved natural antibody-antigen affinity and maintains its effectiveness when faced with mutant strains of antibodies.</p><p><strong>Availability and implementation: </strong>Datasets we used and source code are available on our public GitHub repository https://github.com/TAI-Medical-Lab/MVSF-AB.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1093/bioinformatics/btae559
Jiaxing Huang, Yaoru Luo, Yuanhao Guo, Wenjing Li, Zichen Wang, Guole Liu, Ge Yang
Motivation: Intracellular organelle networks (IONs) such as the endoplasmic reticulum (ER) network and the mitochondrial (MITO) network serve crucial physiological functions. The morphology of these networks plays a critical role in mediating their functions. Accurate image segmentation is required for analyzing the morphology and topology of these networks for applications such as molecular mechanism analysis and drug target screening. So far, however, progress has been hindered by their structural complexity and density.
Results: In this study, we first establish a rigorous performance baseline for accurate segmentation of these organelle networks from fluorescence microscopy images by optimizing a baseline U-Net model. We then develop the multi-resolution encoder (MRE) and the hierarchical fusion loss (Lhf) based on two inductive components, namely low-level features and topological self-similarity, to assist the model in better adapting to the task of segmenting IONs. Empowered by MRE and Lhf, both U-Net and Pyramid Vision Transformer (PVT) outperform competing state-of-the-art models such as U-Net++, HR-Net, nnU-Net, and TransUNet on custom datasets of the ER network and the MITO network, as well as on public datasets of another biological network, the retinal blood vessel network. In addition, integrating MRE and Lhf with models such as HR-Net and TransUNet also enhances their segmentation performance. These experimental results confirm the generalization capability and potential of our approach. Furthermore, accurate segmentation of the ER network enables analysis that provides novel insights into its dynamic morphological and topological properties.
Availability and implementation: Code and data are openly accessible at https://github.com/cbmi-group/MRE.
{"title":"Accurate segmentation of intracellular organelle networks using low-level features and topological self-similarity.","authors":"Jiaxing Huang, Yaoru Luo, Yuanhao Guo, Wenjing Li, Zichen Wang, Guole Liu, Ge Yang","doi":"10.1093/bioinformatics/btae559","DOIUrl":"10.1093/bioinformatics/btae559","url":null,"abstract":"<p><strong>Motivation: </strong>Intracellular organelle networks (IONs) such as the endoplasmic reticulum (ER) network and the mitochondrial (MITO) network serve crucial physiological functions. The morphology of these networks plays a critical role in mediating their functions. Accurate image segmentation is required for analyzing the morphology and topology of these networks for applications such as molecular mechanism analysis and drug target screening. So far, however, progress has been hindered by their structural complexity and density.</p><p><strong>Results: </strong>In this study, we first establish a rigorous performance baseline for accurate segmentation of these organelle networks from fluorescence microscopy images by optimizing a baseline U-Net model. We then develop the multi-resolution encoder (MRE) and the hierarchical fusion loss (Lhf) based on two inductive components, namely low-level features and topological self-similarity, to assist the model in better adapting to the task of segmenting IONs. Empowered by MRE and Lhf, both U-Net and Pyramid Vision Transformer (PVT) outperform competing state-of-the-art models such as U-Net++, HR-Net, nnU-Net, and TransUNet on custom datasets of the ER network and the MITO network, as well as on public datasets of another biological network, the retinal blood vessel network. In addition, integrating MRE and Lhf with models such as HR-Net and TransUNet also enhances their segmentation performance. These experimental results confirm the generalization capability and potential of our approach. Furthermore, accurate segmentation of the ER network enables analysis that provides novel insights into its dynamic morphological and topological properties.</p><p><strong>Availability and implementation: </strong>Code and data are openly accessible at https://github.com/cbmi-group/MRE.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11467052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142303201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1093/bioinformatics/btae611
Yanfang Li, Shihua Zhang
Motivation: Spatial transcriptomics (ST) technologies provide richer insights into the molecular characteristics of cells by simultaneously measuring gene expression profiles and their relative locations. However, each slice can only contain limited biological variation, and since there are almost always non-negligible batch effects across different slices, integrating numerous slices to account for batch effects and locations is not straightforward. Performing multi-slice integration, dimensionality reduction, and other downstream analyses separately often results in suboptimal embeddings for technical artifacts and biological variations. Joint modeling integrating these steps can enhance our understanding of the complex interplay between technical artifacts and biological signals, leading to more accurate and insightful results.
Results: In this context, we propose a hierarchical hidden Markov random field model STADIA to reduce batch effects, extract common biological patterns across multiple ST slices, and simultaneously identify spatial domains. We demonstrate the effectiveness of STADIA using five datasets from different species (human and mouse), various organs (brain, skin, and liver), and diverse platforms (10x Visium, ST, and Slice-seqV2). STADIA can capture common tissue structures across multiple slices and preserve slice-specific biological signals. In addition, STADIA outperforms the other three competing methods (PRECAST, fastMNN, and Harmony) in terms of the balance between batch mixing and spatial domain identification, and it demonstrates the advantage of joint modeling when compared to STAGATE and GraphST.
Availability and implementation: The source code implemented by R is available at https://github.com/zhanglabtools/STADIA and archived with version 1.01 on Zenodo https://zenodo.org/records/13637744.
{"title":"Statistical batch-aware embedded integration, dimension reduction, and alignment for spatial transcriptomics.","authors":"Yanfang Li, Shihua Zhang","doi":"10.1093/bioinformatics/btae611","DOIUrl":"10.1093/bioinformatics/btae611","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial transcriptomics (ST) technologies provide richer insights into the molecular characteristics of cells by simultaneously measuring gene expression profiles and their relative locations. However, each slice can only contain limited biological variation, and since there are almost always non-negligible batch effects across different slices, integrating numerous slices to account for batch effects and locations is not straightforward. Performing multi-slice integration, dimensionality reduction, and other downstream analyses separately often results in suboptimal embeddings for technical artifacts and biological variations. Joint modeling integrating these steps can enhance our understanding of the complex interplay between technical artifacts and biological signals, leading to more accurate and insightful results.</p><p><strong>Results: </strong>In this context, we propose a hierarchical hidden Markov random field model STADIA to reduce batch effects, extract common biological patterns across multiple ST slices, and simultaneously identify spatial domains. We demonstrate the effectiveness of STADIA using five datasets from different species (human and mouse), various organs (brain, skin, and liver), and diverse platforms (10x Visium, ST, and Slice-seqV2). STADIA can capture common tissue structures across multiple slices and preserve slice-specific biological signals. In addition, STADIA outperforms the other three competing methods (PRECAST, fastMNN, and Harmony) in terms of the balance between batch mixing and spatial domain identification, and it demonstrates the advantage of joint modeling when compared to STAGATE and GraphST.</p><p><strong>Availability and implementation: </strong>The source code implemented by R is available at https://github.com/zhanglabtools/STADIA and archived with version 1.01 on Zenodo https://zenodo.org/records/13637744.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11512591/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1093/bioinformatics/btae563
Jipeng Huang, Chang Sun, Minglei Li, Rong Tang, Bin Xie, Shuqin Wang, Jin-Mao Wei
Motivation: Exploring the association between drugs and targets is essential for drug discovery and repurposing. Comparing with the traditional methods that regard the exploration as a binary classification task, predicting the drug-target binding affinity can provide more specific information. Many studies work based on the assumption that similar drugs may interact with the same target. These methods constructed a symmetric graph according to the undirected drug similarity or target similarity. Although these similarities can measure the difference between two molecules, it is unable to analyze the inclusion relationship of their substructure. For example, if drug A contains all the substructures of drug B, then in the message-passing mechanism of the graph neural network, drug A should acquire all the properties of drug B, while drug B should only obtain some of the properties of A.
Results: To this end, we proposed a structure-inclusive similarity (SIS) which measures the similarity of two drugs by considering the inclusion relationship of their substructures. Based on SIS, we constructed a drug graph and a target graph, respectively, and predicted the binding affinities between drugs and targets by a graph convolutional network-based model. Experimental results show that considering the inclusion relationship of the substructure of two molecules can effectively improve the accuracy of the prediction model. The performance of our SIS-based prediction method outperforms several state-of-the-art methods for drug-target binding affinity prediction. The case studies demonstrate that our model is a practical tool to predict the binding affinity between drugs and targets.
Availability and implementation: Source codes and data are available at https://github.com/HuangStomach/SISDTA.
动机探索药物与靶点之间的关联对于药物发现和再利用至关重要。与将探索视为二元分类任务的传统方法相比,预测药物与靶点的结合亲和力能提供更具体的信息。许多研究都基于相似药物可能与相同靶点相互作用的假设。这些方法根据无向药物相似性或靶点相似性构建了对称图。虽然这些相似性可以衡量两个分子之间的差异,但却无法分析其子结构的包含关系。例如,如果药物 A 包含药物 B 的所有子结构,那么在图神经网络的消息传递机制中,药物 A 应获得药物 B 的所有属性,而药物 B 只应获得药物 A 的部分属性:为此,我们提出了一种结构包含相似性(SIS),它通过考虑两种药物的子结构的包含关系来衡量它们的相似性。基于 SIS,我们分别构建了药物图和靶点图,并通过基于图卷积网络的模型预测了药物和靶点之间的结合亲和力。实验结果表明,考虑两个分子亚结构的包含关系能有效提高预测模型的准确性。我们基于 SIS 的预测方法的性能优于几种最先进的药物-靶标结合亲和力预测方法。案例研究表明,我们的模型是预测药物与靶标结合亲和力的实用工具:源代码和数据见 https://github.com/HuangStomach/SISDTA.Supplementary 信息:补充数据可在 Bioinformatics online 上获取。
{"title":"Structure-inclusive similarity based directed GNN: a method that can control information flow to predict drug-target binding affinity.","authors":"Jipeng Huang, Chang Sun, Minglei Li, Rong Tang, Bin Xie, Shuqin Wang, Jin-Mao Wei","doi":"10.1093/bioinformatics/btae563","DOIUrl":"10.1093/bioinformatics/btae563","url":null,"abstract":"<p><strong>Motivation: </strong>Exploring the association between drugs and targets is essential for drug discovery and repurposing. Comparing with the traditional methods that regard the exploration as a binary classification task, predicting the drug-target binding affinity can provide more specific information. Many studies work based on the assumption that similar drugs may interact with the same target. These methods constructed a symmetric graph according to the undirected drug similarity or target similarity. Although these similarities can measure the difference between two molecules, it is unable to analyze the inclusion relationship of their substructure. For example, if drug A contains all the substructures of drug B, then in the message-passing mechanism of the graph neural network, drug A should acquire all the properties of drug B, while drug B should only obtain some of the properties of A.</p><p><strong>Results: </strong>To this end, we proposed a structure-inclusive similarity (SIS) which measures the similarity of two drugs by considering the inclusion relationship of their substructures. Based on SIS, we constructed a drug graph and a target graph, respectively, and predicted the binding affinities between drugs and targets by a graph convolutional network-based model. Experimental results show that considering the inclusion relationship of the substructure of two molecules can effectively improve the accuracy of the prediction model. The performance of our SIS-based prediction method outperforms several state-of-the-art methods for drug-target binding affinity prediction. The case studies demonstrate that our model is a practical tool to predict the binding affinity between drugs and targets.</p><p><strong>Availability and implementation: </strong>Source codes and data are available at https://github.com/HuangStomach/SISDTA.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11474107/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142303301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1093/bioinformatics/btae570
Guanyu Qiao, Guohua Wang, Yang Li
Motivation: The prediction of drug-target interaction is a vital task in the biomedical field, aiding in the discovery of potential molecular targets of drugs and the development of targeted therapy methods with higher efficacy and fewer side effects. Although there are various methods for drug-target interaction (DTI) prediction based on heterogeneous information networks, these methods face challenges in capturing the fundamental interaction between drugs and targets and ensuring the interpretability of the model. Moreover, they need to construct meta-paths artificially or a lot of feature engineering (prior knowledge), and graph generation can fuse information more flexibly without meta-path selection.
Results: We propose a causal enhanced method for drug-target interaction (CE-DTI) prediction that integrates graph generation and multi-source information fusion. First, we represent drugs and targets by modeling the fusion of their multi-source information through automatic graph generation. Once drugs and targets are combined, a network of drug-target pairs is constructed, transforming the prediction of drug-target interactions into a node classification problem. Specifically, the influence of surrounding nodes on the central node is separated into two groups: causal and non-causal variable nodes. Causal variable nodes significantly impact the central node's classification, while non-causal variable nodes do not. Causal invariance is then used to enhance the contrastive learning of the drug-target pairs network. Our method demonstrates excellent performance compared with other competitive benchmark methods across multiple datasets. At the same time, the experimental results also show that the causal enhancement strategy can explore the potential causal effects between DTPs, and discover new potential targets. Additionally, case studies demonstrate that this method can identify potential drug targets.
Availability and implementation: The source code of AdaDR is available at: https://github.com/catly/CE-DTI.
{"title":"Causal enhanced drug-target interaction prediction based on graph generation and multi-source information fusion.","authors":"Guanyu Qiao, Guohua Wang, Yang Li","doi":"10.1093/bioinformatics/btae570","DOIUrl":"10.1093/bioinformatics/btae570","url":null,"abstract":"<p><strong>Motivation: </strong>The prediction of drug-target interaction is a vital task in the biomedical field, aiding in the discovery of potential molecular targets of drugs and the development of targeted therapy methods with higher efficacy and fewer side effects. Although there are various methods for drug-target interaction (DTI) prediction based on heterogeneous information networks, these methods face challenges in capturing the fundamental interaction between drugs and targets and ensuring the interpretability of the model. Moreover, they need to construct meta-paths artificially or a lot of feature engineering (prior knowledge), and graph generation can fuse information more flexibly without meta-path selection.</p><p><strong>Results: </strong>We propose a causal enhanced method for drug-target interaction (CE-DTI) prediction that integrates graph generation and multi-source information fusion. First, we represent drugs and targets by modeling the fusion of their multi-source information through automatic graph generation. Once drugs and targets are combined, a network of drug-target pairs is constructed, transforming the prediction of drug-target interactions into a node classification problem. Specifically, the influence of surrounding nodes on the central node is separated into two groups: causal and non-causal variable nodes. Causal variable nodes significantly impact the central node's classification, while non-causal variable nodes do not. Causal invariance is then used to enhance the contrastive learning of the drug-target pairs network. Our method demonstrates excellent performance compared with other competitive benchmark methods across multiple datasets. At the same time, the experimental results also show that the causal enhancement strategy can explore the potential causal effects between DTPs, and discover new potential targets. Additionally, case studies demonstrate that this method can identify potential drug targets.</p><p><strong>Availability and implementation: </strong>The source code of AdaDR is available at: https://github.com/catly/CE-DTI.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1093/bioinformatics/btae619
Maxwell Murphy, Bryan Greenhouse
Motivation: Malaria parasite genetic data can provide insight into parasite phenotypes, evolution, and transmission. However, estimating key parameters such as allele frequencies, multiplicity of infection (MOI), and within-host relatedness from genetic data is challenging, particularly in the presence of multiple related coinfecting strains. Existing methods often rely on single nucleotide polymorphism (SNP) data and do not account for within-host relatedness.
Results: We present Multiplicity Of Infection and allele frequency REcovery (MOIRE), a Bayesian approach to estimate allele frequencies, MOI, and within-host relatedness from genetic data subject to experimental error. MOIRE accommodates both polyallelic and SNP data, making it applicable to diverse genotyping panels. We also introduce a novel metric, the effective MOI (eMOI), which integrates MOI and within-host relatedness, providing a robust and interpretable measure of genetic diversity. Extensive simulations and real-world data from a malaria study in Namibia demonstrate the superior performance of MOIRE over naive estimation methods, accurately estimating MOI up to seven with moderate-sized panels of diverse loci (e.g. microhaplotypes). MOIRE also revealed substantial heterogeneity in population mean MOI and mean relatedness across health districts in Namibia, suggesting detectable differences in transmission dynamics. Notably, eMOI emerges as a portable metric of within-host diversity, facilitating meaningful comparisons across settings when allele frequencies or genotyping panels differ. Compared to existing software, MOIRE enables more comprehensive insights into within-host diversity and population structure.
Availability and implementation: MOIRE is available as an R package at https://eppicenter.github.io/moire/.
动机:疟原虫基因数据可以让我们深入了解寄生虫的表型、进化和传播。然而,从遗传数据中估算等位基因频率、感染倍率(MOI)和宿主内相关性等关键参数具有挑战性,尤其是在存在多个相关共感染菌株的情况下。现有方法通常依赖于单核苷酸多态性(SNP)数据,并不考虑宿主内相关性:结果:我们提出了 MOIRE(感染多重性和等位基因频率恢复),这是一种贝叶斯方法,可从受实验误差影响的基因数据中估算等位基因频率、感染多重性和宿主内相关性。MOIRE 同时适用于多等位基因和 SNP 数据,因此适用于不同的基因分型面板。我们还引入了一种新的指标--有效MOI(eMOI),它整合了MOI和宿主内相关性,为遗传多样性提供了一种稳健且可解释的衡量标准。来自纳米比亚疟疾研究的大量模拟和实际数据表明,MOIRE 的性能优于传统的估算方法,它能准确估算出中等规模的不同基因位点(如微组型)的 MOI,最高可达 7。MOIRE 还揭示了纳米比亚各卫生区人口平均 MOI 和平均亲缘关系的巨大异质性,表明在传播动态中存在可检测到的差异。值得注意的是,eMOI 是一种可移植的宿主内多样性指标,在等位基因频率或基因分型面板不同的情况下,便于进行有意义的跨环境比较。与现有软件相比,MOIRE 能够更全面地揭示宿主内多样性和种群结构:MOIRE是一个R软件包,可在https://eppicenter.github.io/moire/.Supplementary:补充数据可在 Bioinformatics online 上获取。
{"title":"MOIRE: a software package for the estimation of allele frequencies and effective multiplicity of infection from polyallelic data.","authors":"Maxwell Murphy, Bryan Greenhouse","doi":"10.1093/bioinformatics/btae619","DOIUrl":"10.1093/bioinformatics/btae619","url":null,"abstract":"<p><strong>Motivation: </strong>Malaria parasite genetic data can provide insight into parasite phenotypes, evolution, and transmission. However, estimating key parameters such as allele frequencies, multiplicity of infection (MOI), and within-host relatedness from genetic data is challenging, particularly in the presence of multiple related coinfecting strains. Existing methods often rely on single nucleotide polymorphism (SNP) data and do not account for within-host relatedness.</p><p><strong>Results: </strong>We present Multiplicity Of Infection and allele frequency REcovery (MOIRE), a Bayesian approach to estimate allele frequencies, MOI, and within-host relatedness from genetic data subject to experimental error. MOIRE accommodates both polyallelic and SNP data, making it applicable to diverse genotyping panels. We also introduce a novel metric, the effective MOI (eMOI), which integrates MOI and within-host relatedness, providing a robust and interpretable measure of genetic diversity. Extensive simulations and real-world data from a malaria study in Namibia demonstrate the superior performance of MOIRE over naive estimation methods, accurately estimating MOI up to seven with moderate-sized panels of diverse loci (e.g. microhaplotypes). MOIRE also revealed substantial heterogeneity in population mean MOI and mean relatedness across health districts in Namibia, suggesting detectable differences in transmission dynamics. Notably, eMOI emerges as a portable metric of within-host diversity, facilitating meaningful comparisons across settings when allele frequencies or genotyping panels differ. Compared to existing software, MOIRE enables more comprehensive insights into within-host diversity and population structure.</p><p><strong>Availability and implementation: </strong>MOIRE is available as an R package at https://eppicenter.github.io/moire/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11524891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1093/bioinformatics/btae578
Weijia Jin, Yi Xia, Javlon Nizomov, Yunlong Liu, Zhigang Li, Qing Lu, Li Chen
Summary: Massively parallel reporter assay (MPRA) is an important technology for evaluating the impact of genetic variants on gene regulation. Here, we present MPRAVarDB, an online database and web server for exploring regulatory effects of genetic variants. MPRAVarDB harbors 18 MPRA experiments designed to assess the regulatory effects of genetic variants associated with GWAS loci, eQTLs, and genomic features, totaling 242 818 variants tested more than 30 cell lines and 30 human diseases or traits. MPRAVarDB enables users to query MPRA variants by genomic region, disease and cell line, or any combination of these parameters. Notably, MPRAVarDB offers a suite of pretrained machine-learning models tailored to the specific disease and cell line, facilitating the prediction of regulatory variants. The user-friendly interface allows users to receive query and prediction results with just a few clicks.
Availability and implementation: https://mpravardb.rc.ufl.edu.
{"title":"MPRAVarDB: an online database and web server for exploring regulatory effects of genetic variants.","authors":"Weijia Jin, Yi Xia, Javlon Nizomov, Yunlong Liu, Zhigang Li, Qing Lu, Li Chen","doi":"10.1093/bioinformatics/btae578","DOIUrl":"10.1093/bioinformatics/btae578","url":null,"abstract":"<p><strong>Summary: </strong>Massively parallel reporter assay (MPRA) is an important technology for evaluating the impact of genetic variants on gene regulation. Here, we present MPRAVarDB, an online database and web server for exploring regulatory effects of genetic variants. MPRAVarDB harbors 18 MPRA experiments designed to assess the regulatory effects of genetic variants associated with GWAS loci, eQTLs, and genomic features, totaling 242 818 variants tested more than 30 cell lines and 30 human diseases or traits. MPRAVarDB enables users to query MPRA variants by genomic region, disease and cell line, or any combination of these parameters. Notably, MPRAVarDB offers a suite of pretrained machine-learning models tailored to the specific disease and cell line, facilitating the prediction of regulatory variants. The user-friendly interface allows users to receive query and prediction results with just a few clicks.</p><p><strong>Availability and implementation: </strong>https://mpravardb.rc.ufl.edu.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464417/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142334304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1093/bioinformatics/btae607
Xiao Liang, Pei Liu, Li Xue, Baiyun Chen, Wei Liu, Wanwan Shi, Yongwang Wang, Xiangtao Chen, Jiawei Luo
Motivation: Recent advances in spatial transcriptomics technologies have provided multi-modality data integrating gene expression, spatial context, and histological images. Accurately identifying spatial domains and spatially variable genes is crucial for understanding tissue structures and biological functions. However, effectively combining multi-modality data to identify spatial domains and determining SVGs closely related to these spatial domains remains a challenge.
Results: In this study, we propose spatial transcriptomics multi-modality and multi-granularity collaborative learning (spaMMCL). For detecting spatial domains, spaMMCL mitigates the adverse effects of modality bias by masking portions of gene expression data, integrates gene and image features using a shared graph convolutional network, and employs graph self-supervised learning to deal with noise from feature fusion. Simultaneously, based on the identified spatial domains, spaMMCL integrates various strategies to detect potential SVGs at different granularities, enhancing their reliability and biological significance. Experimental results demonstrate that spaMMCL substantially improves the identification of spatial domains and SVGs.
Availability and implementation: The code and data of spaMMCL are available on Github: Https://github.com/liangxiao-cs/spaMMCL.
{"title":"A multi-modality and multi-granularity collaborative learning framework for identifying spatial domains and spatially variable genes.","authors":"Xiao Liang, Pei Liu, Li Xue, Baiyun Chen, Wei Liu, Wanwan Shi, Yongwang Wang, Xiangtao Chen, Jiawei Luo","doi":"10.1093/bioinformatics/btae607","DOIUrl":"10.1093/bioinformatics/btae607","url":null,"abstract":"<p><strong>Motivation: </strong>Recent advances in spatial transcriptomics technologies have provided multi-modality data integrating gene expression, spatial context, and histological images. Accurately identifying spatial domains and spatially variable genes is crucial for understanding tissue structures and biological functions. However, effectively combining multi-modality data to identify spatial domains and determining SVGs closely related to these spatial domains remains a challenge.</p><p><strong>Results: </strong>In this study, we propose spatial transcriptomics multi-modality and multi-granularity collaborative learning (spaMMCL). For detecting spatial domains, spaMMCL mitigates the adverse effects of modality bias by masking portions of gene expression data, integrates gene and image features using a shared graph convolutional network, and employs graph self-supervised learning to deal with noise from feature fusion. Simultaneously, based on the identified spatial domains, spaMMCL integrates various strategies to detect potential SVGs at different granularities, enhancing their reliability and biological significance. Experimental results demonstrate that spaMMCL substantially improves the identification of spatial domains and SVGs.</p><p><strong>Availability and implementation: </strong>The code and data of spaMMCL are available on Github: Https://github.com/liangxiao-cs/spaMMCL.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11513014/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1093/bioinformatics/btae635
{"title":"Correction to: Teaching bioinformatics through the analysis of SARS-CoV-2: project-based training for computer science students.","authors":"","doi":"10.1093/bioinformatics/btae635","DOIUrl":"10.1093/bioinformatics/btae635","url":null,"abstract":"","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"40 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11513013/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}