Bioinformatics最新文献_第2页

PDBImages: A Command Line Tool for Automated Macromolecular Structure Visualization PDBImages：自动化大分子结构可视化的命令行工具

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-12-12 DOI: 10.1093/bioinformatics/btad744

Adam Midlik, Sreenath Nair, Stephen Anyango, Mandar Deshpande, David Sehnal, Mihaly Varadi, Sameer Velankar

Summary PDBImages is an innovative, open-source Node.js package that harnesses the power of the popular macromolecule structure visualization software Mol*. Designed for use by the scientific community, PDBImages provides a means to generate high-quality images for PDB and AlphaFold DB models. Its unique ability to render and save images directly to files in a browserless mode sets it apart, offering users a streamlined, automated process for macromolecular structure visualization. Here, we detail the implementation of PDBImages, enumerating its diverse image types and elaborating on its user-friendly setup. This powerful tool opens a new gateway for researchers to visualize, analyse, and share their work, fostering a deeper understanding of bioinformatics. Availability and Implementation PDBImages is available as an npm package from https://www.npmjs.com/package/pdb-images. The source code is available from https://github.com/PDBeurope/pdb-images.

摘要 PDBImages 是一个创新的开源 Node.js 软件包，它利用了流行的大分子结构可视化软件 Mol* 的强大功能。PDBImages 专为科学界设计，提供了一种为 PDB 和 AlphaFold DB 模型生成高质量图像的方法。PDBImages 能以无浏览器模式直接渲染图像并将其保存到文件中，这种独特的功能使其与众不同，为用户提供了简化、自动化的大分子结构可视化流程。在此，我们将详细介绍 PDBImages 的实现过程，列举其多种图像类型，并详细说明其用户友好型设置。这一功能强大的工具为研究人员可视化、分析和共享他们的工作开辟了新的途径，促进了对生物信息学的深入理解。可用性与实现 PDBImages 作为 npm 软件包可从 https://www.npmjs.com/package/pdb-images 获取。源代码可从 https://github.com/PDBeurope/pdb-images 获取。

引用次数: 0

Rarity: Discovering rare cell populations from single-cell imaging data 稀有性：从单细胞成像数据中发现罕见细胞群

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-12-12 DOI: 10.1093/bioinformatics/btad750

Kaspar Märtens, Michele Bortolomeazzi, Lucia Montorsi, Jo Spencer, Francesca Ciccarelli, Christopher Yau

Motivation Cell type identification plays an important role in the analysis and interpretation of single-cell data and can be carried out via supervised or unsupervised clustering approaches. Supervised methods are best suited where we can list all cell types and their respective marker genes a priori. While unsupervised clustering algorithms look for groups of cells with similar expression properties. This property permits the identification of both known and unknown cell populations, making unsupervised methods suitable for discovery. Success is dependent on the relative strength of the expression signature of each group as well as the number of cells. Rare cell types therefore present a particular challenge that are magnified when they are defined by differentially expressing a small number of genes. Results Typical unsupervised approaches fail to identify such rare sub-populations, and these cells tend to be absorbed into more prevalent cell types. In order to balance these competing demands, we have developed a novel statistical framework for unsupervised clustering, named Rarity, that enables the discovery process for rare cell types to be more robust, consistent and interpretable. We achieve this by devising a novel clustering method based on a Bayesian latent variable model in which we assign cells to inferred latent binary on/off expression profiles. This lets us achieve increased sensitivity to rare cell populations while also allowing us to control and interpret potential false positive discoveries. We systematically study the challenges associated with rare cell type identification and demonstrate the utility of Rarity on various IMC data sets. Availability Implementation of Rarity together with examples are available from the Github repository (https://github.com/kasparmartens/rarity). Supplementary information Supplementary data are available at Bioinformatics online.

细胞类型鉴定在单细胞数据的分析和解读中起着重要作用，可通过有监督或无监督聚类方法进行。有监督的方法最适合我们先验地列出所有细胞类型及其各自的标记基因。而无监督聚类算法则是寻找具有相似表达特性的细胞群。这种特性允许识别已知和未知的细胞群，使无监督方法适用于发现。成功与否取决于每组细胞表达特征的相对强度以及细胞数量。因此，稀有细胞类型是一个特殊的挑战，当它们是由少量基因的差异表达所定义时，这一挑战就会被放大。结果典型的无监督方法无法识别这种稀有亚群，这些细胞往往会被吸收到更普遍的细胞类型中。为了平衡这些相互竞争的需求，我们为无监督聚类开发了一种名为 "稀有性"（Rarity）的新型统计框架，使稀有细胞类型的发现过程更加稳健、一致和可解释。为了实现这一目标，我们设计了一种基于贝叶斯潜变量模型的新型聚类方法，在该模型中，我们将细胞分配到推断出的潜二元开/关表达谱中。这让我们提高了对罕见细胞群的敏感性，同时也让我们能够控制和解释潜在的假阳性发现。我们系统地研究了与罕见细胞类型鉴定相关的挑战，并在各种 IMC 数据集上展示了 Rarity 的实用性。可用性 Rarity 的实现和示例可从 Github 存储库 (https://github.com/kasparmartens/rarity) 获取。补充信息补充数据可在 Bioinformatics online 上获取。

{"title":"Rarity: Discovering rare cell populations from single-cell imaging data","authors":"Kaspar Märtens, Michele Bortolomeazzi, Lucia Montorsi, Jo Spencer, Francesca Ciccarelli, Christopher Yau","doi":"10.1093/bioinformatics/btad750","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad750","url":null,"abstract":"Motivation Cell type identification plays an important role in the analysis and interpretation of single-cell data and can be carried out via supervised or unsupervised clustering approaches. Supervised methods are best suited where we can list all cell types and their respective marker genes a priori. While unsupervised clustering algorithms look for groups of cells with similar expression properties. This property permits the identification of both known and unknown cell populations, making unsupervised methods suitable for discovery. Success is dependent on the relative strength of the expression signature of each group as well as the number of cells. Rare cell types therefore present a particular challenge that are magnified when they are defined by differentially expressing a small number of genes. Results Typical unsupervised approaches fail to identify such rare sub-populations, and these cells tend to be absorbed into more prevalent cell types. In order to balance these competing demands, we have developed a novel statistical framework for unsupervised clustering, named Rarity, that enables the discovery process for rare cell types to be more robust, consistent and interpretable. We achieve this by devising a novel clustering method based on a Bayesian latent variable model in which we assign cells to inferred latent binary on/off expression profiles. This lets us achieve increased sensitivity to rare cell populations while also allowing us to control and interpret potential false positive discoveries. We systematically study the challenges associated with rare cell type identification and demonstrate the utility of Rarity on various IMC data sets. Availability Implementation of Rarity together with examples are available from the Github repository (https://github.com/kasparmartens/rarity). Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"19 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138683421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GDmicro: classifying host disease status with GCN and Deep adaptation network based on the human gut microbiome data GDmicro：利用基于人类肠道微生物组数据的 GCN 和深度适应网络对宿主疾病状况进行分类

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-12-12 DOI: 10.1093/bioinformatics/btad747

Herui Liao, Jiayu Shang, Yanni Sun

Motivation With advances in metagenomic sequencing technologies, there are accumulating studies revealing the associations between the human gut microbiome and some human diseases. These associations shed light on using gut microbiome data to distinguish case and control samples of a specific disease, which is also called host disease status classification. Importantly, using learning-based models to distinguish the disease and control samples is expected to identify important biomarkers more accurately than abundance-based statistical analysis. However, available tools have not fully addressed two challenges associated with this task: limited labeled microbiome data and decreased accuracy in cross-studies. The confounding factors such as the diet, technical biases in sample collection/sequencing across different studies/cohorts often jeopardize the generalization of the learning model. Results To address these challenges, we develop a new tool GDmicro, which combines semi-supervised learning and domain adaptation to achieve a more generalized model using limited labeled samples. We evaluated GDmicro on human gut microbiome data from 11 cohorts covering 5 different diseases. The results show that GDmicro has better performance and robustness than state-of-the-art tools. In particular, it improves the AUC from 0.783 to 0.949 in identifying inflammatory bowel disease. Furthermore, GDmicro can identify potential biomarkers with greater accuracy than abundance-based statistical analysis methods. It also reveals the contribution of these biomarkers to the host’s disease status. Availability and implementation https://github.com/liaoherui/GDmicro Supplementary information Supplementary data are available at Bioinformatics online

动机随着元基因组测序技术的发展，越来越多的研究揭示了人类肠道微生物组与某些人类疾病之间的关联。这些关联为利用肠道微生物组数据区分特定疾病的病例和对照样本（也称为宿主疾病状态分类）提供了启示。重要的是，与基于丰度的统计分析相比，使用基于学习的模型来区分疾病和对照样本有望更准确地识别重要的生物标志物。然而，现有的工具还没有完全解决与这项任务相关的两个难题：标注的微生物组数据有限和交叉研究的准确性降低。饮食、不同研究/队列中样本采集/测序的技术偏差等混杂因素往往会影响学习模型的通用性。结果为了应对这些挑战，我们开发了一种新工具 GDmicro，它结合了半监督学习和领域适应性，能利用有限的标记样本建立更具普适性的模型。我们对来自 11 个队列、涵盖 5 种不同疾病的人类肠道微生物组数据进行了 GDmicro 评估。结果表明，与最先进的工具相比，GDmicro 具有更好的性能和鲁棒性。特别是，它在识别炎症性肠病方面的 AUC 从 0.783 提高到了 0.949。此外，与基于丰度的统计分析方法相比，GDmicro 能更准确地识别潜在的生物标记物。它还能揭示这些生物标记物对宿主疾病状态的贡献。可用性和实施 https://github.com/liaoherui/GDmicro 补充信息补充数据可在生物信息学网上查阅

{"title":"GDmicro: classifying host disease status with GCN and Deep adaptation network based on the human gut microbiome data","authors":"Herui Liao, Jiayu Shang, Yanni Sun","doi":"10.1093/bioinformatics/btad747","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad747","url":null,"abstract":"Motivation With advances in metagenomic sequencing technologies, there are accumulating studies revealing the associations between the human gut microbiome and some human diseases. These associations shed light on using gut microbiome data to distinguish case and control samples of a specific disease, which is also called host disease status classification. Importantly, using learning-based models to distinguish the disease and control samples is expected to identify important biomarkers more accurately than abundance-based statistical analysis. However, available tools have not fully addressed two challenges associated with this task: limited labeled microbiome data and decreased accuracy in cross-studies. The confounding factors such as the diet, technical biases in sample collection/sequencing across different studies/cohorts often jeopardize the generalization of the learning model. Results To address these challenges, we develop a new tool GDmicro, which combines semi-supervised learning and domain adaptation to achieve a more generalized model using limited labeled samples. We evaluated GDmicro on human gut microbiome data from 11 cohorts covering 5 different diseases. The results show that GDmicro has better performance and robustness than state-of-the-art tools. In particular, it improves the AUC from 0.783 to 0.949 in identifying inflammatory bowel disease. Furthermore, GDmicro can identify potential biomarkers with greater accuracy than abundance-based statistical analysis methods. It also reveals the contribution of these biomarkers to the host’s disease status. Availability and implementation https://github.com/liaoherui/GDmicro Supplementary information Supplementary data are available at Bioinformatics online","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"103 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MMCL-CDR: Enhancing Cancer Drug Response Prediction with Multi-Omics and Morphology Images Contrastive Representation Learning MMCL-CDR：利用多图像和形态图像对比表征学习加强癌症药物反应预测

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-12-09 DOI: 10.1093/bioinformatics/btad734

Yang Li, Zihou Guo, Xin Gao, Guohua Wang

Motivation Cancer is a complex disease that results in a significant number of global fatalities. Treatment strategies can vary among patients, even if they have the same type of cancer. The application of precision medicine in cancer shows promise for treating different types of cancer, reducing healthcare expenses, and improving recovery rates. To achieve personalized cancer treatment, machine learning models have been developed to predict drug responses based on tumor and drug characteristics. However, current studies either focus on constructing homogeneous networks from single data source or heterogeneous networks from multi-omics data. While multi-omics data have shown potential in predicting drug responses in cancer cell lines, there is still a lack of research that effectively utilizes insights from different modalities. Furthermore, effectively utilizing the multi-modal knowledge of cancer cell lines poses a challenge due to the heterogeneity inherent in these modalities. Results To address these challenges, we introduce MMCL-CDR, a multi-modal approach for cancer drug response prediction that integrates copy number variation, gene expression, morphology images of cell lines and chemical structure of drugs. The objective of MMCL-CDR is to align cancer cell lines across different data modalities by learning cell line representations from omic and image data, and combined with structural drug representations to enhance the prediction of Cancer Drug Responses (CDR). We have carried out comprehensive experiments and show that our model significantly outperforms other state-of-the-art methods in CDR prediction. The experimental results also prove that the model can learn more accurate cell line representation by integrating multi-omics and morphological data from cell lines, thereby improving the accuracy of CDR prediction. In addition, the ablation study and qualitative analysis also confirm the effectiveness of each part of our proposed model. Last but not least, MMCL-CDR opens up a new dimension for cancer drug response prediction through multimodal contrastive learning, pioneering a novel approach that integrates multi-omics and multi-modal drug and cell line modeling. Availability and Implementation MMCL-CDR is available at https://github.com/catly/MMCL-CDR

动机癌症是一种复杂的疾病，在全球造成大量死亡。即使是同一种癌症，不同患者的治疗策略也会有所不同。精准医疗在癌症中的应用为治疗不同类型的癌症、降低医疗费用和提高康复率带来了希望。为了实现个性化癌症治疗，人们开发了机器学习模型，根据肿瘤和药物特征预测药物反应。然而，目前的研究要么侧重于从单一数据源构建同构网络，要么侧重于从多组学数据构建异构网络。虽然多组学数据在预测癌症细胞系的药物反应方面已显示出潜力，但仍然缺乏有效利用不同模式的洞察力的研究。此外，有效利用癌症细胞系的多模态知识也是一项挑战，因为这些模态存在固有的异质性。结果为了应对这些挑战，我们引入了 MMCL-CDR，这是一种用于癌症药物反应预测的多模态方法，它整合了拷贝数变异、基因表达、细胞系形态图像和药物化学结构。MMCL-CDR 的目标是通过从 omic 和图像数据中学习细胞系表征，并结合药物结构表征来对不同数据模式的癌细胞系进行比对，从而增强对癌症药物反应（CDR）的预测。我们进行了全面的实验，结果表明我们的模型在 CDR 预测方面明显优于其他最先进的方法。实验结果还证明，该模型可以通过整合细胞系的多组学和形态学数据，学习到更准确的细胞系表征，从而提高 CDR 预测的准确性。此外，消融研究和定性分析也证实了我们提出的模型各部分的有效性。最后但并非最不重要的一点是，MMCL-CDR 通过多模态对比学习为癌症药物反应预测开辟了一个新的维度，开创了一种将多组学和多模态药物及细胞系建模相结合的新方法。可用性与实施 MMCL-CDR 可从 https://github.com/catly/MMCL-CDR 网站获取。

{"title":"MMCL-CDR: Enhancing Cancer Drug Response Prediction with Multi-Omics and Morphology Images Contrastive Representation Learning","authors":"Yang Li, Zihou Guo, Xin Gao, Guohua Wang","doi":"10.1093/bioinformatics/btad734","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad734","url":null,"abstract":"Motivation Cancer is a complex disease that results in a significant number of global fatalities. Treatment strategies can vary among patients, even if they have the same type of cancer. The application of precision medicine in cancer shows promise for treating different types of cancer, reducing healthcare expenses, and improving recovery rates. To achieve personalized cancer treatment, machine learning models have been developed to predict drug responses based on tumor and drug characteristics. However, current studies either focus on constructing homogeneous networks from single data source or heterogeneous networks from multi-omics data. While multi-omics data have shown potential in predicting drug responses in cancer cell lines, there is still a lack of research that effectively utilizes insights from different modalities. Furthermore, effectively utilizing the multi-modal knowledge of cancer cell lines poses a challenge due to the heterogeneity inherent in these modalities. Results To address these challenges, we introduce MMCL-CDR, a multi-modal approach for cancer drug response prediction that integrates copy number variation, gene expression, morphology images of cell lines and chemical structure of drugs. The objective of MMCL-CDR is to align cancer cell lines across different data modalities by learning cell line representations from omic and image data, and combined with structural drug representations to enhance the prediction of Cancer Drug Responses (CDR). We have carried out comprehensive experiments and show that our model significantly outperforms other state-of-the-art methods in CDR prediction. The experimental results also prove that the model can learn more accurate cell line representation by integrating multi-omics and morphological data from cell lines, thereby improving the accuracy of CDR prediction. In addition, the ablation study and qualitative analysis also confirm the effectiveness of each part of our proposed model. Last but not least, MMCL-CDR opens up a new dimension for cancer drug response prediction through multimodal contrastive learning, pioneering a novel approach that integrates multi-omics and multi-modal drug and cell line modeling. Availability and Implementation MMCL-CDR is available at https://github.com/catly/MMCL-CDR","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138563967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

OBMeta: a comprehensive web server to analyze and validate gut microbial features and biomarkers for obesity-associated metabolic diseases OBMeta：用于分析和验证肥胖相关代谢疾病的肠道微生物特征和生物标志物的综合网络服务器

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-12-09 DOI: 10.1093/bioinformatics/btad715

Cuifang Xu, Jiating Huang, Yongqiang Gao, Weixing Zhao, Yiqi Shen, Feihong Luo, Gang Yu, Feng Zhu, Yan Ni

Motivation Gut dysbiosis is closely associated with obesity and related metabolic diseases including type 2 diabetes (T2D) and non-alcoholic fatty liver disease (NAFLD). The gut microbial features and biomarkers have been increasingly investigated in many studies, which require further validation due to the limited sample size and various confounding factors that may affect microbial compositions in a single study. So far, it lacks a comprehensive bioinformatics pipeline providing automated statistical analysis and integrating multiple independent studies for cross-validation simultaneously. Results OBMeta aims to streamline the standard metagenomics data analysis from diversity analysis, comparative analysis, and functional analysis to co-abundance network analysis. In addition, a curated database has been established with a total of 90 public research projects, covering three different phenotypes (Obesity, T2D, and NAFLD) and more than five different intervention strategies (exercise, diet, probiotics, medication, and surgery). With OBMeta, users can not only analyze their research projects but also search and match public datasets for cross-validation. Moreover, OBMeta provides cross-phenotype and cross-intervention-based advanced validation that maximally supports preliminary findings from an individual study. To summarize, OBMeta is a comprehensive web server to analyze and validate gut microbial features and biomarkers for obesity-associated metabolic diseases. Availability OBMeta is freely available at: http://obmeta.met-bioinformatics.cn/. Supplementary information Supplementary data are available at Bioinformatics online.

动机肠道菌群失调与肥胖及相关代谢性疾病（包括 2 型糖尿病和非酒精性脂肪肝）密切相关。许多研究对肠道微生物特征和生物标志物进行了越来越多的调查，但由于样本量有限以及各种混杂因素可能会影响单项研究中的微生物组成，因此需要进一步验证。迄今为止，还缺乏一个全面的生物信息学管道来提供自动统计分析，并同时整合多个独立研究进行交叉验证。结果 OBMeta 旨在简化标准的元基因组学数据分析，从多样性分析、比较分析、功能分析到共丰度网络分析。此外，OBMeta 还建立了一个包含 90 个公开研究项目、涵盖三种不同表型（肥胖症、T2D 和非酒精性脂肪肝）和五种以上不同干预策略（运动、饮食、益生菌、药物和手术）的策划数据库。通过 OBMeta，用户不仅可以分析自己的研究项目，还可以搜索和匹配公共数据集进行交叉验证。此外，OBMeta 还提供基于跨表型和跨干预的高级验证，最大限度地支持单项研究的初步发现。总之，OBMeta 是一个综合网络服务器，用于分析和验证肥胖相关代谢疾病的肠道微生物特征和生物标记物。可用性 OBMeta 可在以下网址免费获取：http://obmeta.met-bioinformatics.cn/。补充信息补充数据可在 Bioinformatics online 上获取。

{"title":"OBMeta: a comprehensive web server to analyze and validate gut microbial features and biomarkers for obesity-associated metabolic diseases","authors":"Cuifang Xu, Jiating Huang, Yongqiang Gao, Weixing Zhao, Yiqi Shen, Feihong Luo, Gang Yu, Feng Zhu, Yan Ni","doi":"10.1093/bioinformatics/btad715","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad715","url":null,"abstract":"Motivation Gut dysbiosis is closely associated with obesity and related metabolic diseases including type 2 diabetes (T2D) and non-alcoholic fatty liver disease (NAFLD). The gut microbial features and biomarkers have been increasingly investigated in many studies, which require further validation due to the limited sample size and various confounding factors that may affect microbial compositions in a single study. So far, it lacks a comprehensive bioinformatics pipeline providing automated statistical analysis and integrating multiple independent studies for cross-validation simultaneously. Results OBMeta aims to streamline the standard metagenomics data analysis from diversity analysis, comparative analysis, and functional analysis to co-abundance network analysis. In addition, a curated database has been established with a total of 90 public research projects, covering three different phenotypes (Obesity, T2D, and NAFLD) and more than five different intervention strategies (exercise, diet, probiotics, medication, and surgery). With OBMeta, users can not only analyze their research projects but also search and match public datasets for cross-validation. Moreover, OBMeta provides cross-phenotype and cross-intervention-based advanced validation that maximally supports preliminary findings from an individual study. To summarize, OBMeta is a comprehensive web server to analyze and validate gut microbial features and biomarkers for obesity-associated metabolic diseases. Availability OBMeta is freely available at: http://obmeta.met-bioinformatics.cn/. Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"13 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138574789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CytoCopasi: A Chemical Systems Biology Target and Drug Discovery Visual Data Analytics Platform CytoCopasi：化学系统生物学目标和药物发现可视化数据分析平台

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-12-09 DOI: 10.1093/bioinformatics/btad745

Hikmet Emre Kaya, Kevin J Naidoo

Motivation Target discovery and drug evaluation for diseases with complex mechanisms call for a streamlined chemical systems analysis platform. Currently available tools lack the emphasis on reaction kinetics, access to relevant databases, and algorithms to visualize perturbations on a chemical scale providing quantitative details as well streamlined visual data analytics functionality. Results CytoCopasi, a Maven-based application for Cytoscape that combines the chemical systems analysis features of COPASI with the visualization and database access tools of Cytoscape and its plugin applications has been developed. The diverse functionality of CytoCopasi through ab initio model construction, model construction via pathway and parameter databases KEGG and BRENDA is presented. The comparative systems biology visualization analysis toolset is illustrated through a drug competence study on the cancerous RAF/MEK/ERK pathway. Availability The COPASI files, simulation data, native libraries, and the manual are available on https://github.com/scientificomputing/CytoCopasi Supplementary information Supplementary data is available at Bioinformatics online.

动机针对机制复杂的疾病进行目标发现和药物评估，需要一个简化的化学系统分析平台。目前可用的工具缺乏对反应动力学的重视、对相关数据库的访问以及在化学尺度上可视化扰动的算法，无法提供定量细节和简化的可视化数据分析功能。结果 CytoCopasi 是一个基于 Maven 的 Cytoscape 应用程序，它将 COPASI 的化学系统分析功能与 Cytoscape 及其插件应用程序的可视化和数据库访问工具相结合。介绍了 CytoCopasi 的多种功能，包括自证模型构建、通过路径和参数数据库 KEGG 和 BRENDA 构建模型。通过对癌症 RAF/MEK/ERK 通路的药物能力研究，说明了比较系统生物学可视化分析工具集。可用性 COPASI 文件、模拟数据、原生库和手册可在 https://github.com/scientificomputing/CytoCopasi 上获取补充信息补充数据可在 Bioinformatics online 上获取。

{"title":"CytoCopasi: A Chemical Systems Biology Target and Drug Discovery Visual Data Analytics Platform","authors":"Hikmet Emre Kaya, Kevin J Naidoo","doi":"10.1093/bioinformatics/btad745","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad745","url":null,"abstract":"Motivation Target discovery and drug evaluation for diseases with complex mechanisms call for a streamlined chemical systems analysis platform. Currently available tools lack the emphasis on reaction kinetics, access to relevant databases, and algorithms to visualize perturbations on a chemical scale providing quantitative details as well streamlined visual data analytics functionality. Results CytoCopasi, a Maven-based application for Cytoscape that combines the chemical systems analysis features of COPASI with the visualization and database access tools of Cytoscape and its plugin applications has been developed. The diverse functionality of CytoCopasi through ab initio model construction, model construction via pathway and parameter databases KEGG and BRENDA is presented. The comparative systems biology visualization analysis toolset is illustrated through a drug competence study on the cancerous RAF/MEK/ERK pathway. Availability The COPASI files, simulation data, native libraries, and the manual are available on https://github.com/scientificomputing/CytoCopasi Supplementary information Supplementary data is available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"29 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138562731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MaxCLK: discovery of cancer driver genes via maximal clique and information entropy of modules MaxCLK：通过模块的最大聚类和信息熵发现癌症驱动基因

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-12-09 DOI: 10.1093/bioinformatics/btad737

Jian Liu, Fubin Ma, Yongdi Zhu, Naiqian Zhang, Lingming Kong, Jia Mi, Haiyan Cong, Rui Gao, Mingyi Wang, Yusen Zhang

Motivation Cancer is caused by the accumulation of somatic mutations in multiple pathways, in which driver mutations are typically of the properties of high coverage and high exclusivity in patients. Identifying cancer driver genes has a pivotal role in understanding the mechanisms of oncogenesis and treatment. Results Here, we introduced MaxCLK, an algorithm for identifying cancer driver genes, which was developed by an integrated analysis of somatic mutation data and protein–protein interaction (PPI) networks and further improved by an information entropy (IE) index. Tested on pancancer and single cancers, MaxCLK outperformed other existing methods with higher accuracy. About pancancer, we predicted 154 driver genes and 787 driver modules. The analysis of co-occurrence and exclusivity between modules and pathways reveals the correlation of their combinations. Overall, our study has deepened the understanding of driver mechanism in PPI topology and found novel driver genes. Availability The source codes for MaxCLK are freely available at https://github.com/ShandongUniversityMasterMa/MaxCLK-main. Supplementary information Supplementary data are available at Bioinformatics online.

动机癌症是由多种途径中的体细胞突变累积引起的，其中驱动基因突变在患者中通常具有高覆盖率和高排他性的特性。识别癌症驱动基因对于了解肿瘤发生和治疗机制具有举足轻重的作用。结果在这里，我们介绍了一种用于识别癌症驱动基因的算法--MaxCLK，它是通过对体细胞突变数据和蛋白-蛋白相互作用（PPI）网络的综合分析而开发的，并通过信息熵（IE）指数得到了进一步改进。通过对胰腺癌和单种癌症的测试，MaxCLK的准确性优于其他现有方法。关于胰腺癌，我们预测了 154 个驱动基因和 787 个驱动模块。模块和通路之间的共存性和排他性分析揭示了其组合的相关性。总之，我们的研究加深了对PPI拓扑中驱动机制的理解，并发现了新的驱动基因。可用性 MaxCLK 的源代码可在 https://github.com/ShandongUniversityMasterMa/MaxCLK-main 免费获取。补充信息补充数据可在 Bioinformatics online 上获取。

{"title":"MaxCLK: discovery of cancer driver genes via maximal clique and information entropy of modules","authors":"Jian Liu, Fubin Ma, Yongdi Zhu, Naiqian Zhang, Lingming Kong, Jia Mi, Haiyan Cong, Rui Gao, Mingyi Wang, Yusen Zhang","doi":"10.1093/bioinformatics/btad737","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad737","url":null,"abstract":"Motivation Cancer is caused by the accumulation of somatic mutations in multiple pathways, in which driver mutations are typically of the properties of high coverage and high exclusivity in patients. Identifying cancer driver genes has a pivotal role in understanding the mechanisms of oncogenesis and treatment. Results Here, we introduced MaxCLK, an algorithm for identifying cancer driver genes, which was developed by an integrated analysis of somatic mutation data and protein–protein interaction (PPI) networks and further improved by an information entropy (IE) index. Tested on pancancer and single cancers, MaxCLK outperformed other existing methods with higher accuracy. About pancancer, we predicted 154 driver genes and 787 driver modules. The analysis of co-occurrence and exclusivity between modules and pathways reveals the correlation of their combinations. Overall, our study has deepened the understanding of driver mechanism in PPI topology and found novel driver genes. Availability The source codes for MaxCLK are freely available at https://github.com/ShandongUniversityMasterMa/MaxCLK-main. Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"49 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138562604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EPIC-TRACE: predicting TCR binding to unseen epitopes using attention and contextualized embeddings EPIC-TRACE：利用注意力和上下文嵌入预测 TCR 与未知表位的结合

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-12-09 DOI: 10.1093/bioinformatics/btad743

Dani Korpela, Emmi Jokinen, Alexandru Dumitrescu, Jani Huuhtanen, Satu Mustjoki, Harri Lähdesmäki

Motivation T cells play an essential role in adaptive immune system to fight pathogens and cancer but may also give rise to autoimmune diseases. The recognition of a peptide-MHC (pMHC) complex by a T cell receptor (TCR) is required to elicit an immune response. Many machine learning models have been developed to predict the binding, but generalizing predictions to pMHCs outside the training data remains challenging. Results We have developed a new machine learning model that utilizes information about the TCR from both α and β chains, epitope sequence, and MHC. Our method uses ProtBERT embeddings for the amino acid sequences of both chains and the epitope, as well as convolution and multi-head attention architectures. We show the importance of each input feature as well as the benefit of including epitopes with only a few TCRs to the training data. We evaluate our model on existing databases and show that it compares favorably against other state-of-the-art models. Code availability https://github.com/DaniTheOrange/EPIC-TRACE Supplementary information Supplementary data are available at Bioinformatics online.

动机 T 细胞在适应性免疫系统中发挥着对抗病原体和癌症的重要作用，但也可能引发自身免疫性疾病。T细胞受体（TCR）识别多肽-MHC（pMHC）复合物是引起免疫反应的必要条件。目前已开发出许多机器学习模型来预测这种结合，但将预测结果推广到训练数据之外的 pMHC 仍然具有挑战性。结果我们开发了一种新的机器学习模型，它利用了来自 α 和 β 链、表位序列和 MHC 的 TCR 信息。我们的方法使用了针对两条链和表位的氨基酸序列的 ProtBERT 嵌入以及卷积和多头注意力架构。我们展示了每个输入特征的重要性，以及将只有少量 TCR 的表位纳入训练数据的好处。我们在现有数据库上对我们的模型进行了评估，结果表明该模型优于其他最先进的模型。代码可用性 https://github.com/DaniTheOrange/EPIC-TRACE 补充信息补充数据可在 Bioinformatics online 上获取。

{"title":"EPIC-TRACE: predicting TCR binding to unseen epitopes using attention and contextualized embeddings","authors":"Dani Korpela, Emmi Jokinen, Alexandru Dumitrescu, Jani Huuhtanen, Satu Mustjoki, Harri Lähdesmäki","doi":"10.1093/bioinformatics/btad743","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad743","url":null,"abstract":"Motivation T cells play an essential role in adaptive immune system to fight pathogens and cancer but may also give rise to autoimmune diseases. The recognition of a peptide-MHC (pMHC) complex by a T cell receptor (TCR) is required to elicit an immune response. Many machine learning models have been developed to predict the binding, but generalizing predictions to pMHCs outside the training data remains challenging. Results We have developed a new machine learning model that utilizes information about the TCR from both α and β chains, epitope sequence, and MHC. Our method uses ProtBERT embeddings for the amino acid sequences of both chains and the epitope, as well as convolution and multi-head attention architectures. We show the importance of each input feature as well as the benefit of including epitopes with only a few TCRs to the training data. We evaluate our model on existing databases and show that it compares favorably against other state-of-the-art models. Code availability https://github.com/DaniTheOrange/EPIC-TRACE Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138563087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Drug repositioning with adaptive graph convolutional networks 利用自适应图卷积网络重新定位药物

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-12-08 DOI: 10.1093/bioinformatics/btad748

Xinliang Sun, Xiao Jia, Zhangli Lu, Jing Tang, Min Li

Motivation Drug repositioning is an effective strategy to identify new indications for existing drugs, providing the quickest possible transition from bench to bedside. With the rapid development of deep learning, graph convolutional networks (GCNs) have been widely adopted for drug repositioning tasks. However, prior GCNs based methods exist limitations in deeply integrating node features and topological structures, which may hinder the capability of GCNs. Results In this study, we propose an adaptive graph convolutional networks approach, termed AdaDR, for drug repositioning by deeply integrating node features and topological structures. Distinct from conventional graph convolution networks, AdaDR models interactive information between them with adaptive graph convolution operation, which enhances the expression of model. Concretely, AdaDR simultaneously extracts embeddings from node features and topological structures and then uses the attention mechanism to learn adaptive importance weights of the embeddings. Experimental results show that AdaDR achieves better performance than multiple baselines for drug repositioning. Moreover, in the case study, exploratory analyses are offered for finding novel drug-disease associations. Availability and implementation The implementation of AdaDR and the preprocessed data is available at: https://github.com/xinliangSun/AdaDR. Supplementary information Supplementary data are available at Bioinformatics online.

动机药物重新定位是为现有药物确定新适应症的一种有效策略，能以最快的速度实现从实验室到临床的转变。随着深度学习的快速发展，图卷积网络（GCN）已被广泛应用于药物重新定位任务。然而，之前基于 GCNs 的方法在深度整合节点特征和拓扑结构方面存在局限性，这可能会阻碍 GCNs 能力的发挥。结果在本研究中，我们提出了一种自适应图卷积网络方法（称为 AdaDR），通过深度整合节点特征和拓扑结构来实现药物重新定位。有别于传统的图卷积网络，AdaDR 通过自适应图卷积运算对它们之间的交互信息进行建模，从而增强了模型的表达能力。具体来说，AdaDR 同时从节点特征和拓扑结构中提取嵌入，然后利用注意力机制学习嵌入的自适应重要性权重。实验结果表明，在药物重新定位方面，AdaDR 比多种基线方法取得了更好的性能。此外，在案例研究中，还提供了探索性分析，以发现新的药物-疾病关联。可用性和实现 AdaDR 的实现和预处理数据可在以下网址获取：https://github.com/xinliangSun/AdaDR。补充信息补充数据可在 Bioinformatics online 上获取。

{"title":"Drug repositioning with adaptive graph convolutional networks","authors":"Xinliang Sun, Xiao Jia, Zhangli Lu, Jing Tang, Min Li","doi":"10.1093/bioinformatics/btad748","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad748","url":null,"abstract":"Motivation Drug repositioning is an effective strategy to identify new indications for existing drugs, providing the quickest possible transition from bench to bedside. With the rapid development of deep learning, graph convolutional networks (GCNs) have been widely adopted for drug repositioning tasks. However, prior GCNs based methods exist limitations in deeply integrating node features and topological structures, which may hinder the capability of GCNs. Results In this study, we propose an adaptive graph convolutional networks approach, termed AdaDR, for drug repositioning by deeply integrating node features and topological structures. Distinct from conventional graph convolution networks, AdaDR models interactive information between them with adaptive graph convolution operation, which enhances the expression of model. Concretely, AdaDR simultaneously extracts embeddings from node features and topological structures and then uses the attention mechanism to learn adaptive importance weights of the embeddings. Experimental results show that AdaDR achieves better performance than multiple baselines for drug repositioning. Moreover, in the case study, exploratory analyses are offered for finding novel drug-disease associations. Availability and implementation The implementation of AdaDR and the preprocessed data is available at: https://github.com/xinliangSun/AdaDR. Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"12 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138562864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bibliometric analysis of neuroscience publications quantifies the impact of data sharing 神经科学出版物的文献计量分析量化了数据共享的影响

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-12-08 DOI: 10.1093/bioinformatics/btad746

Herve Emissah, Bengt Ljungquist, Giorgio A Ascoli

Summary Neural morphology, the branching geometry of brain cells, is an essential cellular substrate of nervous system function and pathology. Despite the accelerating production of digital reconstructions of neural morphology, the public accessibility of data remains a core issue in neuroscience. Deficiencies in the availability of existing data create redundancy of research efforts and limit synergy. We carried out a comprehensive bibliometric analysis of neural morphology publications to quantify the impact of data sharing in the neuroscience community. Our findings demonstrate that sharing digital reconstructions of neural morphology via NeuroMorpho.Org leads to a significant increase of citations to the original article, thus directly benefiting authors. The rate of data reusage remains constant for at least 16 years after sharing (the whole period analyzed), altogether nearly doubling the peer-reviewed discoveries in the field. Furthermore, the recent availability of larger and more numerous datasets fostered integrative applications, which accrue on average twice the citations of re-analyses of individual datasets. We also released an open-source citation tracking web-service allowing researchers to monitor reusage of their datasets in independent peer-reviewed reports. These results and tools can facilitate the recognition of shared data reuse for merit evaluations and funding decisions. Availability and Implementation The application is available at: http://cng-nmo-dev3.orc.gmu.edu:8181/. The source code at https://github.com/HerveEmissah/nmo-authors-app and https://github.com/HerveEmissah/nmo-bibliometric-analysis. Supplementary information Supplementary data are available at Bioinformatics online.

摘要神经形态，即脑细胞的分支几何形态，是神经系统功能和病理的重要细胞基质。尽管神经形态学数字重建技术的发展日新月异，但数据的公开获取仍然是神经科学领域的核心问题。现有数据可用性的不足造成了研究工作的重复，限制了协同作用的发挥。我们对神经形态学出版物进行了全面的文献计量分析，以量化数据共享对神经科学界的影响。我们的研究结果表明，通过 NeuroMorpho.Org 共享神经形态学的数字重构会显著增加原始文章的引用率，从而使作者直接受益。数据重用率在共享后至少 16 年内（整个分析期间）保持不变，使该领域经同行评审的发现增加了近一倍。此外，近期更大、更多数据集的出现促进了综合应用，其平均引用率是对单个数据集进行再分析的引用率的两倍。我们还发布了一个开源引文跟踪网络服务，允许研究人员监测其数据集在独立同行评议报告中的再利用情况。这些结果和工具可促进对共享数据再利用的认可，从而有助于评优和资金决策。可用性和实施应用程序可从以下网址获取：http://cng-nmo-dev3.orc.gmu.edu:8181/。源代码可从以下网址获取：https://github.com/HerveEmissah/nmo-authors-app 和 https://github.com/HerveEmissah/nmo-bibliometric-analysis。补充信息补充数据可在 Bioinformatics online 上获取。

{"title":"Bibliometric analysis of neuroscience publications quantifies the impact of data sharing","authors":"Herve Emissah, Bengt Ljungquist, Giorgio A Ascoli","doi":"10.1093/bioinformatics/btad746","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad746","url":null,"abstract":"Summary Neural morphology, the branching geometry of brain cells, is an essential cellular substrate of nervous system function and pathology. Despite the accelerating production of digital reconstructions of neural morphology, the public accessibility of data remains a core issue in neuroscience. Deficiencies in the availability of existing data create redundancy of research efforts and limit synergy. We carried out a comprehensive bibliometric analysis of neural morphology publications to quantify the impact of data sharing in the neuroscience community. Our findings demonstrate that sharing digital reconstructions of neural morphology via NeuroMorpho.Org leads to a significant increase of citations to the original article, thus directly benefiting authors. The rate of data reusage remains constant for at least 16 years after sharing (the whole period analyzed), altogether nearly doubling the peer-reviewed discoveries in the field. Furthermore, the recent availability of larger and more numerous datasets fostered integrative applications, which accrue on average twice the citations of re-analyses of individual datasets. We also released an open-source citation tracking web-service allowing researchers to monitor reusage of their datasets in independent peer-reviewed reports. These results and tools can facilitate the recognition of shared data reuse for merit evaluations and funding decisions. Availability and Implementation The application is available at: http://cng-nmo-dev3.orc.gmu.edu:8181/. The source code at https://github.com/HerveEmissah/nmo-authors-app and https://github.com/HerveEmissah/nmo-bibliometric-analysis. Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"7 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138562607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0