首页 > 最新文献

Bioinformatics (Oxford, England)最新文献

英文 中文
A dual diffusion model-based representation learning framework for antimicrobial peptides classification. 基于双扩散模型的AMPs分类表示学习框架。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag077
Wen Kong, Lingling Fu, Xingpeng Jiang, Weizhong Zhao

Motivation: The increasing prevalence of antibiotic-resistant bacteria has intensified the demand for novel antimicrobial agents. Antimicrobial peptides (AMPs) have emerged as promising alternatives, yet their identification or classification remains challenging due to the lack of multi-perspective information, insufficient feature representation learning, and monocular data modalities.

Results: In this paper, we propose a dual diffusion model-based representation learning framework for classifying AMPs, which effectively integrates both peptide sequence and structure information to address existing issues for the task. Specifically, our approach utilizes a multi-view feature construction module, which encodes peptide sequences and structures from distinctive perspectives, deriving initial feature representations with enriched biological semantics. To enhance representation learning, the proposed framework leverages both diffusion models for sequence and structure information respectively to effectively capture complex semantics from dual modalities. In addition, both single-modal and dual-modal contrastive learning are used to further advance the representation learning. Results of comprehensive experiments demonstrate that our model outperforms existing methods for the task of AMPs classification, providing a feasible solution to accelerating the discovery of novel antimicrobial agents.

Availability of implementation: The data and source codes are available in GitHub at https://github.com/kww567upup/DDM.

动机:抗生素耐药细菌的日益流行加剧了对新型抗菌药物的需求。抗菌肽(AMPs)已成为有希望的替代品,但由于缺乏多角度信息、特征表示学习不足和单目数据模式,它们的识别或分类仍然具有挑战性。结果:在本文中,我们提出了一个基于双扩散模型的表征学习框架,该框架有效地整合了肽序列和结构信息,解决了该任务中存在的问题。具体来说,我们的方法利用了一个多视图特征构建模块,该模块从不同的角度编码肽序列和结构,从而获得具有丰富生物语义的初始特征表示。为了增强表征学习,所提出的框架分别利用序列和结构信息的扩散模型来有效地从双重模态中捕获复杂语义。此外,采用单模态和双模态对比学习来进一步推进表征学习。综合实验结果表明,该模型在抗菌药物分类任务上优于现有方法,为加速发现新型抗菌药物提供了可行的解决方案。数据和代码的可用性:数据和源代码可在GitHub上获得https://github.com/kww567upup/DDM.Supplementary information:补充数据可在Bioinformatics在线获得。
{"title":"A dual diffusion model-based representation learning framework for antimicrobial peptides classification.","authors":"Wen Kong, Lingling Fu, Xingpeng Jiang, Weizhong Zhao","doi":"10.1093/bioinformatics/btag077","DOIUrl":"10.1093/bioinformatics/btag077","url":null,"abstract":"<p><strong>Motivation: </strong>The increasing prevalence of antibiotic-resistant bacteria has intensified the demand for novel antimicrobial agents. Antimicrobial peptides (AMPs) have emerged as promising alternatives, yet their identification or classification remains challenging due to the lack of multi-perspective information, insufficient feature representation learning, and monocular data modalities.</p><p><strong>Results: </strong>In this paper, we propose a dual diffusion model-based representation learning framework for classifying AMPs, which effectively integrates both peptide sequence and structure information to address existing issues for the task. Specifically, our approach utilizes a multi-view feature construction module, which encodes peptide sequences and structures from distinctive perspectives, deriving initial feature representations with enriched biological semantics. To enhance representation learning, the proposed framework leverages both diffusion models for sequence and structure information respectively to effectively capture complex semantics from dual modalities. In addition, both single-modal and dual-modal contrastive learning are used to further advance the representation learning. Results of comprehensive experiments demonstrate that our model outperforms existing methods for the task of AMPs classification, providing a feasible solution to accelerating the discovery of novel antimicrobial agents.</p><p><strong>Availability of implementation: </strong>The data and source codes are available in GitHub at https://github.com/kww567upup/DDM.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12960902/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding low-complexity DNA sequences with longdust. 用longdust寻找低复杂度的DNA序列。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag112
Heng Li, Brian Li

Motivation: Low-complexity (LC) DNA sequences are compositionally repetitive sequences that are often associated with spurious homologous matches and variant calling artifacts. While algorithms for identifying LC sequences exist, they either lack concise mathematical definition of complexity or are inefficient with long or variable context windows.

Results: Longdust is a new algorithm that efficiently identifies long LC sequences including centromeric satellite and tandem repeats with moderately long motifs. It defines string complexity by statistically modeling the k-mer count distribution with the parameters: the k-mer length, the context window size and a threshold on complexity. Longdust exhibits high performance on real data and high consistency with existing methods.

Availability and implementation: https://github.com/lh3/longdust.

动机:低复杂性(LC) DNA序列是组成重复的序列,通常与虚假的同源匹配和变体调用伪影有关。虽然存在识别LC序列的算法,但它们要么缺乏复杂性的简明数学定义,要么在长或可变上下文窗口时效率低下。结果:Longdust是一种新的算法,可以有效地识别长LC序列,包括着丝粒卫星序列和中等长度序列的串联重复序列。它通过使用以下参数对k-mer计数分布进行统计建模来定义字符串复杂度:k-mer长度、上下文窗口大小和复杂度阈值。Longdust对真实数据的处理性能好,与现有方法的一致性高。可用性和实现:https://github.com/lh3/longdust。
{"title":"Finding low-complexity DNA sequences with longdust.","authors":"Heng Li, Brian Li","doi":"10.1093/bioinformatics/btag112","DOIUrl":"10.1093/bioinformatics/btag112","url":null,"abstract":"<p><strong>Motivation: </strong>Low-complexity (LC) DNA sequences are compositionally repetitive sequences that are often associated with spurious homologous matches and variant calling artifacts. While algorithms for identifying LC sequences exist, they either lack concise mathematical definition of complexity or are inefficient with long or variable context windows.</p><p><strong>Results: </strong>Longdust is a new algorithm that efficiently identifies long LC sequences including centromeric satellite and tandem repeats with moderately long motifs. It defines string complexity by statistically modeling the k-mer count distribution with the parameters: the k-mer length, the context window size and a threshold on complexity. Longdust exhibits high performance on real data and high consistency with existing methods.</p><p><strong>Availability and implementation: </strong>https://github.com/lh3/longdust.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13003316/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147461337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
stDyer-image improves clustering analysis of spatially resolved transcriptomics and proteomics with morphological images. stDyer-image改进了形态学图像的空间分辨转录组学和蛋白质组学聚类分析。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag071
Ke Xu, Xin Maizie Zhou, Lu Zhang

Motivation: Spatially resolved transcriptomics (SRT) and spatially resolved proteomics (SRP) data enable the study of gene expression and protein abundances within their precise spatial and cellular contexts in tissues. Certain SRT and SRP technologies also capture corresponding morphology images, adding another layer of valuable information. However, few existing methods developed for SRT data effectively leverage these supplementary images to enhance clustering performance.

Results: Here, we introduce stDyer-image, an end-to-end deep learning framework designed for clustering for SRT and SRP datasets with images. Unlike existing methods that utilize images to complement gene expression data, stDyer-image directly links image features to cluster labels. This approach draws inspiration from pathologists, who can visually identify specific cell types or tumor regions from morphological images without relying on gene expression or protein abundances. Benchmarks against state-of-the-art tools demonstrate that stDyer-image achieves superior performance in clustering. Moreover, it is capable of handling large-scale datasets across diverse technologies, making it a versatile and powerful tool for spatial omics analysis.

Availability and implementation: The source code of stDyer-image and detailed tutorials are available at https://github.com/ericcombiolab/stDyer-image.

空间分辨转录组学(SRT)和空间分辨蛋白质组学(SRP)数据可以在组织中精确的空间和细胞背景下研究基因表达和蛋白质丰度。某些SRT和SRP技术还捕获相应的形态学图像,增加了另一层有价值的信息。然而,针对SRT数据开发的现有方法很少有效地利用这些补充图像来提高聚类性能。在这里,我们介绍了stDyer-image,这是一个端到端深度学习框架,专为具有图像的SRT和SRP数据集聚类而设计。与现有的利用图像来补充基因表达数据的方法不同,stdye -image直接将图像特征与聚类标签联系起来。这种方法从病理学家那里获得灵感,病理学家可以从形态学图像中直观地识别特定的细胞类型或肿瘤区域,而不依赖于基因表达或蛋白质丰度。针对最先进工具的基准测试表明,stdye -image在集群中实现了卓越的性能。此外,它能够处理跨不同技术的大规模数据集,使其成为空间组学分析的多功能和强大工具。
{"title":"stDyer-image improves clustering analysis of spatially resolved transcriptomics and proteomics with morphological images.","authors":"Ke Xu, Xin Maizie Zhou, Lu Zhang","doi":"10.1093/bioinformatics/btag071","DOIUrl":"10.1093/bioinformatics/btag071","url":null,"abstract":"<p><strong>Motivation: </strong>Spatially resolved transcriptomics (SRT) and spatially resolved proteomics (SRP) data enable the study of gene expression and protein abundances within their precise spatial and cellular contexts in tissues. Certain SRT and SRP technologies also capture corresponding morphology images, adding another layer of valuable information. However, few existing methods developed for SRT data effectively leverage these supplementary images to enhance clustering performance.</p><p><strong>Results: </strong>Here, we introduce stDyer-image, an end-to-end deep learning framework designed for clustering for SRT and SRP datasets with images. Unlike existing methods that utilize images to complement gene expression data, stDyer-image directly links image features to cluster labels. This approach draws inspiration from pathologists, who can visually identify specific cell types or tumor regions from morphological images without relying on gene expression or protein abundances. Benchmarks against state-of-the-art tools demonstrate that stDyer-image achieves superior performance in clustering. Moreover, it is capable of handling large-scale datasets across diverse technologies, making it a versatile and powerful tool for spatial omics analysis.</p><p><strong>Availability and implementation: </strong>The source code of stDyer-image and detailed tutorials are available at https://github.com/ericcombiolab/stDyer-image.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12960910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical methods to harmonize electronic health record data across healthcare systems: case study and lessons learned. 协调医疗保健系统中电子健康记录数据的统计方法:案例研究和经验教训。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag107
Xu Shi, Yuqi Zhai, Xianshi Yu, Xiaoou Li, Brian L Hazlehurst, Denis B Nyongesa, Daniel S Sapp, Brian D Williamson, David S Carrell, Luesa Healy, Kara L Cushing-Haugen, Jenna Wong, Shirley V Wang, James S Floyd, Kathleen Shattuck, Samuel McGown, Sarah Alam, José J Hernández-Muñoz, Jie Li, Yong Ma, Danijela Stojanovic, Sudha R Raman, Sharon E Davis, Tianxi Cai, Jennifer C Nelson, Patrick J Heagerty

Motivation: Although common data models for electronic health record (EHR) data can facilitate multi-site data organization and querying, the same medical event may still be coded differently between healthcare systems. In this paper, we present statistical methods to identify and mitigate coding discrepancies using summary-level data, and demonstrate these methods using data from two FDA Sentinel data partners: Kaiser Permanente Washington and Kaiser Permanente Northwest.

Results: We first characterize differences in coding patterns, then compute a code mapping matrix to harmonize data between systems. Our findings reveal significant heterogeneity in coded EHR data, even after adopting a common data model with the same coding system, highlighting the importance of data harmonization before downstream analyses. Our study also demonstrates the effectiveness of the data harmonization approaches, which provide a foundational data quality step to promote semantic interoperability, enhance data integration, and improve the integrity of study conclusions.

Availability and implementation: Computation prototypes, including R/Python codes and examples, are included in Section 7, available as supplementary data at Bioinformatics online and will be posted on GitHub upon publication.

动机:尽管电子健康记录(EHR)数据的通用数据模型可以促进多站点数据组织和查询,但相同的医疗事件在医疗保健系统之间仍然可能以不同的方式编码。在本文中,我们提出了使用摘要级数据识别和减轻编码差异的统计方法,并使用来自两个FDA哨兵数据合作伙伴:凯撒永久医疗机构华盛顿和凯撒永久医疗机构西北的数据来演示这些方法。结果:我们首先描述编码模式的差异,然后计算代码映射矩阵来协调系统之间的数据。我们的研究结果表明,即使采用相同编码系统的通用数据模型,编码的电子病历数据也存在显著的异质性,这突出了在下游分析之前数据协调的重要性。我们的研究还证明了数据协调方法的有效性,它为促进语义互操作性、增强数据集成和提高研究结论的完整性提供了一个基本的数据质量步骤。可用性:包括R/Python代码和示例在内的计算原型包含在补充材料的第7节中,并将在出版后发布在GitHub上。补充信息:补充数据可在生物信息学在线获取。
{"title":"Statistical methods to harmonize electronic health record data across healthcare systems: case study and lessons learned.","authors":"Xu Shi, Yuqi Zhai, Xianshi Yu, Xiaoou Li, Brian L Hazlehurst, Denis B Nyongesa, Daniel S Sapp, Brian D Williamson, David S Carrell, Luesa Healy, Kara L Cushing-Haugen, Jenna Wong, Shirley V Wang, James S Floyd, Kathleen Shattuck, Samuel McGown, Sarah Alam, José J Hernández-Muñoz, Jie Li, Yong Ma, Danijela Stojanovic, Sudha R Raman, Sharon E Davis, Tianxi Cai, Jennifer C Nelson, Patrick J Heagerty","doi":"10.1093/bioinformatics/btag107","DOIUrl":"10.1093/bioinformatics/btag107","url":null,"abstract":"<p><strong>Motivation: </strong>Although common data models for electronic health record (EHR) data can facilitate multi-site data organization and querying, the same medical event may still be coded differently between healthcare systems. In this paper, we present statistical methods to identify and mitigate coding discrepancies using summary-level data, and demonstrate these methods using data from two FDA Sentinel data partners: Kaiser Permanente Washington and Kaiser Permanente Northwest.</p><p><strong>Results: </strong>We first characterize differences in coding patterns, then compute a code mapping matrix to harmonize data between systems. Our findings reveal significant heterogeneity in coded EHR data, even after adopting a common data model with the same coding system, highlighting the importance of data harmonization before downstream analyses. Our study also demonstrates the effectiveness of the data harmonization approaches, which provide a foundational data quality step to promote semantic interoperability, enhance data integration, and improve the integrity of study conclusions.</p><p><strong>Availability and implementation: </strong>Computation prototypes, including R/Python codes and examples, are included in Section 7, available as supplementary data at Bioinformatics online and will be posted on GitHub upon publication.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13005927/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147328601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CIRCE: a scalable Python package to predict cis-regulatory DNA interactions from single-cell chromatin accessibility data. CIRCE:一个可扩展的Python包,用于从单细胞染色质可及性数据预测顺式调控DNA相互作用。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag092
Rémi Trimbour, Julio Saez-Rodriguez, Laura Cantini

Motivation: Chromatin 3D folding creates numerous DNA interactions, participating in gene expression regulation. Single-cell chromatin-accessibility assays now profile hundreds of thousands of cells, challenging existing methods for mapping cis-regulatory interactions.

Results: We present CIRCE, a fast and scalable Python package to predict cis-regulatory DNA interactions from single-cell chromatin accessibility data. CIRCE re-implements the Cicero workflow to analyse single-cell atlases, cutting runtime and memory use by several orders of magnitude. We also provide new options to compute metacells, grouping similar cells to reduce data sparsity. We benchmarked CIRCE against Cicero on two datasets of different sizes and demonstrated the improvement from CIRCE's metacells' strategy with promoter capture Hi-C data. We also evaluated how DNA interaction predictions are impacted by different pre-processing. We observed a negative impact of Cicero's count normalization, and the best performance was obtained with the single-cell count matrix directly. Finally, we demonstrated the scalability of CIRCE by processing a dataset of more than 700 000 cells and 1 million DNA regions in less than an hour. CIRCE should greatly facilitate the prediction of DNA region interactions for scverse and Python users, while providing new and up-to-date pre-processing insights.

Availability and implementation: CIRCE is released as an open-source software under the AGPL-3.0 licence. The package source code is available on GitHub at https://github.com/cantinilab/CIRCE, and its documentation is accessible at https://circe.readthedocs.io. The code to reproduce the presented results is available as a Snakemake pipeline at https://github.com/cantinilab/circe_reproducibility.s.

动机:染色质3D折叠产生大量的DNA相互作用,参与基因表达调控。单细胞染色质可及性分析现在分析了数十万个细胞,挑战了绘制顺式调控相互作用的现有方法。结果:我们提出了CIRCE,一个快速和可扩展的Python包,用于从单细胞染色质可及性数据预测顺式调控DNA相互作用。CIRCE重新实现了Cicero工作流程来分析单细胞地图集,将运行时和内存使用减少了几个数量级。我们还提供了计算元单元的新选项,将相似的单元分组以减少数据稀疏性。我们在两个不同大小的数据集上对CIRCE和Cicero进行了基准测试,并通过启动子捕获Hi-C数据证明了CIRCE的元细胞策略的改进。我们还评估了DNA相互作用预测如何受到不同预处理的影响。我们观察到Cicero计数归一化的负面影响,直接使用单细胞计数矩阵获得最佳性能。最后,我们通过在不到一个小时的时间内处理超过70万个细胞和100万个DNA区域的数据集,展示了CIRCE的可扩展性。CIRCE应该极大地促进DNA区域相互作用的预测,同时提供新的和最新的预处理见解。可用性:CIRCE在AGPL-3.0许可下作为开源软件发布。这个包的源代码可以在GitHub上找到https://github.com/cantinilab/CIRCE,它的文档可以在https://circe.readthedocs.io.The上找到。复制所呈现的结果的代码可以在https://github.com/cantinilab/circe_reproducibility.s.Supplementary上获得。信息:补充数据可以在Bioinformatics网上找到。
{"title":"CIRCE: a scalable Python package to predict cis-regulatory DNA interactions from single-cell chromatin accessibility data.","authors":"Rémi Trimbour, Julio Saez-Rodriguez, Laura Cantini","doi":"10.1093/bioinformatics/btag092","DOIUrl":"10.1093/bioinformatics/btag092","url":null,"abstract":"<p><strong>Motivation: </strong>Chromatin 3D folding creates numerous DNA interactions, participating in gene expression regulation. Single-cell chromatin-accessibility assays now profile hundreds of thousands of cells, challenging existing methods for mapping cis-regulatory interactions.</p><p><strong>Results: </strong>We present CIRCE, a fast and scalable Python package to predict cis-regulatory DNA interactions from single-cell chromatin accessibility data. CIRCE re-implements the Cicero workflow to analyse single-cell atlases, cutting runtime and memory use by several orders of magnitude. We also provide new options to compute metacells, grouping similar cells to reduce data sparsity. We benchmarked CIRCE against Cicero on two datasets of different sizes and demonstrated the improvement from CIRCE's metacells' strategy with promoter capture Hi-C data. We also evaluated how DNA interaction predictions are impacted by different pre-processing. We observed a negative impact of Cicero's count normalization, and the best performance was obtained with the single-cell count matrix directly. Finally, we demonstrated the scalability of CIRCE by processing a dataset of more than 700 000 cells and 1 million DNA regions in less than an hour. CIRCE should greatly facilitate the prediction of DNA region interactions for scverse and Python users, while providing new and up-to-date pre-processing insights.</p><p><strong>Availability and implementation: </strong>CIRCE is released as an open-source software under the AGPL-3.0 licence. The package source code is available on GitHub at https://github.com/cantinilab/CIRCE, and its documentation is accessible at https://circe.readthedocs.io. The code to reproduce the presented results is available as a Snakemake pipeline at https://github.com/cantinilab/circe_reproducibility.s.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12987762/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147286621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dogme: a nextflow pipeline for reprocessing nanopore RNA and DNA modifications. 道格姆:纳米孔RNA和DNA修饰再处理的下一个流程。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag066
Elnaz Abdollahzadeh, Ali Mortazavi

Motivation: Oxford Nanopore (ONT) sequencing allows for the direct detection of RNA and DNA modifications from unamplified nucleic acids, which is a significant advantage over other platforms. However, the rapid updates to ONT basecalling models and the evolving landscape of computational tools for modification detection bring about challenges for reproducible and standardized analyses. To address these challenges, we developed Dogme to automate basecalling, alignment, modification detection, and transcript quantification. Dogme automates the reprocessing of ONT POD5 files by integrating basecalling using Dorado, read mapping using minimap2 and subsequent analysis steps such as running modkit. The pipeline supports three major types of sequencing data-direct RNA (dRNA), complementary DNA (cDNA), and genomic DNA (gDNA). Dogme facilitates detection of diverse RNA modifications supported by Dorado such as N6-methyladenosine (m6A), 5-methylcytosine (m5C), inosine, pseudouridine, 2'-O-methylation (Nm) and DNA methylation, while concurrently quantifying full-length transcript isoforms LR-Kallisto for transcript quantification for dRNA and cDNA.

Results: We applied Dogme to three separate mouse C2C12 myoblast replicates using direct RNA sequencing on MinION flow cells. We detected 96 603 m6A, 43 476 m5C, 8829 inosine, 10 055 pseudouridine, and 30 320 Nm sites in three biological replicates. The pipeline produced reproducible modification profiles and transcript expression levels across replicates, demonstrating its utility for integrative long-read transcriptomic and epigenomic analyses.

Availability and implementation: Dogme is implemented in Nextflow and is freely available under the MIT license at https://github.com/mortazavilab/dogme, with documentation provided for installation and usage.

动机:牛津纳米孔(ONT)测序允许从未扩增的核酸中直接检测RNA和DNA修饰,这是其他平台的显著优势。然而,ONT基调用模型的快速更新和用于修饰检测的计算工具的不断发展,为可重复性和标准化分析带来了挑战。为了应对这些挑战,我们开发了Dogme来自动调用碱基、比对、修改检测和转录本定量。Dogme通过集成使用Dorado的基础调用、使用minimap2的读取映射以及随后的分析步骤(如运行modkit),实现了ONT POD5文件的自动再处理。该管道支持三种主要类型的测序数据-直接RNA (dRNA),互补DNA (cDNA)和基因组DNA (gDNA)。Dogme有助于检测Dorado支持的多种RNA修饰,如n6 -甲基腺苷(m6A)、5-甲基胞嘧啶(m5C)、肌苷、假尿嘧啶、2'- o -甲基化(Nm)和DNA甲基化,同时定量全长转录异构体LR-Kallisto,用于定量dRNA和cDNA的转录物。结果:我们将Dogme应用于3个独立的小鼠C2C12成肌细胞复制,对MinION流细胞进行直接RNA测序。我们在三个生物重复中检测到96,603个m6A位点、43,476个m5C位点、8,829个肌苷位点、10,055个假尿嘧啶位点和30,320 Nm位点。该管道产生了可重复的修饰谱和转录物表达水平,证明了其在综合长读转录组学和表观基因组学分析中的实用性。可用性:Dogme在Nextflow中实现,在MIT许可下可在https://github.com/mortazavilab/dogme免费获得,并提供安装和使用文档。
{"title":"Dogme: a nextflow pipeline for reprocessing nanopore RNA and DNA modifications.","authors":"Elnaz Abdollahzadeh, Ali Mortazavi","doi":"10.1093/bioinformatics/btag066","DOIUrl":"10.1093/bioinformatics/btag066","url":null,"abstract":"<p><strong>Motivation: </strong>Oxford Nanopore (ONT) sequencing allows for the direct detection of RNA and DNA modifications from unamplified nucleic acids, which is a significant advantage over other platforms. However, the rapid updates to ONT basecalling models and the evolving landscape of computational tools for modification detection bring about challenges for reproducible and standardized analyses. To address these challenges, we developed Dogme to automate basecalling, alignment, modification detection, and transcript quantification. Dogme automates the reprocessing of ONT POD5 files by integrating basecalling using Dorado, read mapping using minimap2 and subsequent analysis steps such as running modkit. The pipeline supports three major types of sequencing data-direct RNA (dRNA), complementary DNA (cDNA), and genomic DNA (gDNA). Dogme facilitates detection of diverse RNA modifications supported by Dorado such as N6-methyladenosine (m6A), 5-methylcytosine (m5C), inosine, pseudouridine, 2'-O-methylation (Nm) and DNA methylation, while concurrently quantifying full-length transcript isoforms LR-Kallisto for transcript quantification for dRNA and cDNA.</p><p><strong>Results: </strong>We applied Dogme to three separate mouse C2C12 myoblast replicates using direct RNA sequencing on MinION flow cells. We detected 96 603 m6A, 43 476 m5C, 8829 inosine, 10 055 pseudouridine, and 30 320 Nm sites in three biological replicates. The pipeline produced reproducible modification profiles and transcript expression levels across replicates, demonstrating its utility for integrative long-read transcriptomic and epigenomic analyses.</p><p><strong>Availability and implementation: </strong>Dogme is implemented in Nextflow and is freely available under the MIT license at https://github.com/mortazavilab/dogme, with documentation provided for installation and usage.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12961274/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147328512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of bacterial protein-compound interactions with only positive samples. 仅用阳性样品预测细菌蛋白质-化合物相互作用。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag067
Ki-Hwa Kim, Avinash Yaganapu, Sai Kosaraju, Aashish Bhatt, Yun Lyna Luo, Sai Phani Parsa, Juyeon Park, Hyun Lee, Jun Hyuck Lee, Tae-Jin Oh, Mingon Kang

Motivation: Prediction of Compound-Protein Interactions (CPI) in bacteria is crucial to advance various pharmaceutical and chemical engineering fields, including biocatalysis, drug discovery, and industrial processing. However, current CPI models cannot be applied for bacterial CPI prediction due to the lack of curated negative interaction samples.

Results: We propose a novel Positive-Unlabeled (PU) learning framework, named BIN-PU, to address this limitation. BIN-PU generates pseudo positive and negative labels from known positive interaction data, enabling effective training of deep learning models for CPI prediction. We also propose a weighted positive loss function that weights to truly positive samples. We have validated BIN-PU coupled with multiple CPI backbone models, comparing the performance with the existing PU models using bacterial cytochrome P450 (CYP) data. Extensive experiments demonstrate the superiority of BIN-PU over the benchmark models in predicting CPIs with only truly positive samples. Furthermore, we have validated BIN-PU on additional bacterial proteins obtained from literature review, human CYP datasets, and uncurated data for its reproducibility. We have also validated the CPI prediction for the uncurated CYP data with biological and biophysical experiments. BIN-PU represents a significant advancement in CPI prediction for bacterial proteins, opening new possibilities for improving predictive models in related biological interaction tasks.

Availability and implementation: The source code and data are available at https://github.com/datax-lab/CYP.

研究动机:细菌中化合物-蛋白质相互作用(CPI)的预测对于推进各种制药和化学工程领域的发展至关重要,包括生物催化、药物发现和工业加工。然而,目前的CPI模型不能应用于细菌CPI预测,因为缺乏策划负相互作用的样本。结果:我们提出了一个新的Positive-Unlabeled (PU)学习框架,命名为BIN-PU,以解决这一限制。BIN-PU从已知的正交互数据中生成伪正、伪负标签,有效训练深度学习模型用于CPI预测。我们还提出了一个加权的正损失函数,对真正样本进行加权。我们利用细菌细胞色素P450 (CYP)数据验证了BIN-PU与多个CPI骨干模型的耦合,并将其性能与现有的PU模型进行了比较。大量的实验证明了BIN-PU在预测只有真正正样本的cpi方面优于基准模型。此外,我们通过文献综述、人类CYP数据集和未经整理的数据验证了BIN-PU在其他细菌蛋白上的可重复性。我们还通过生物和生物物理实验验证了未经整理的CYP数据的CPI预测。BIN-PU代表了细菌蛋白CPI预测的重大进展,为改进相关生物相互作用任务的预测模型开辟了新的可能性。可用性和实施:源代码和数据可在https://github.com/datax-lab/CYP.Supplementary信息上获得;补充数据可在Bioinformatics在线上获得。
{"title":"Prediction of bacterial protein-compound interactions with only positive samples.","authors":"Ki-Hwa Kim, Avinash Yaganapu, Sai Kosaraju, Aashish Bhatt, Yun Lyna Luo, Sai Phani Parsa, Juyeon Park, Hyun Lee, Jun Hyuck Lee, Tae-Jin Oh, Mingon Kang","doi":"10.1093/bioinformatics/btag067","DOIUrl":"10.1093/bioinformatics/btag067","url":null,"abstract":"<p><strong>Motivation: </strong>Prediction of Compound-Protein Interactions (CPI) in bacteria is crucial to advance various pharmaceutical and chemical engineering fields, including biocatalysis, drug discovery, and industrial processing. However, current CPI models cannot be applied for bacterial CPI prediction due to the lack of curated negative interaction samples.</p><p><strong>Results: </strong>We propose a novel Positive-Unlabeled (PU) learning framework, named BIN-PU, to address this limitation. BIN-PU generates pseudo positive and negative labels from known positive interaction data, enabling effective training of deep learning models for CPI prediction. We also propose a weighted positive loss function that weights to truly positive samples. We have validated BIN-PU coupled with multiple CPI backbone models, comparing the performance with the existing PU models using bacterial cytochrome P450 (CYP) data. Extensive experiments demonstrate the superiority of BIN-PU over the benchmark models in predicting CPIs with only truly positive samples. Furthermore, we have validated BIN-PU on additional bacterial proteins obtained from literature review, human CYP datasets, and uncurated data for its reproducibility. We have also validated the CPI prediction for the uncurated CYP data with biological and biophysical experiments. BIN-PU represents a significant advancement in CPI prediction for bacterial proteins, opening new possibilities for improving predictive models in related biological interaction tasks.</p><p><strong>Availability and implementation: </strong>The source code and data are available at https://github.com/datax-lab/CYP.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12975285/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146222390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AutoFlow: an interactive Shiny app for supervised and unsupervised flow cytometry analysis. AutoFlow:用于监督和无监督流式细胞术分析的交互式Shiny应用程序。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag078
Freya E R Woods, Emilyanne Leonard, Timothy Ebbels, Jonathan Cairns, Rhiannon David

Motivation: Flow cytometry (FC) is a widely used technique for analysing cells or particles based on the fluorescence of specific markers. Thresholds for fluorescence are typically set manually, a laborious, subjective process that scales poorly as FC technology advances. Machine learning (ML) methods can address these issues but often require technical expertise many bench scientists do not possess. Thus, accessible, open-source, and cross-domain ML-based FC tools are needed.

Results: We present AutoFlow, an easy-to-use, adaptable R Shiny application for automated flow cytometry (FC) analysis. AutoFlow supports two workflows: supervised and unsupervised learning. The application automates key preprocessing steps including fluorescence compensation, debris exclusion, single-cell identification, viability marker gating, and downstream classification or clustering. Across three datasets, two publicly available (Mosmann and Nilsson Rare) and a novel bone marrow microphysiological system (BM-MPS) dataset, AutoFlow demonstrated robust performance. In the supervised workflow, multiclass classification on BM-MPS achieved 97.2% accuracy under a single-timepoint training and multi-timepoint testing scheme, with high sensitivity and specificity across major lineages. For rare populations, performance was strong: Mosmann Rare (0.03% prevalence) achieved 87.5% sensitivity, and 100% specificity, while Nilsson Rare (0.08% prevalence) achieved 87.9% sensitivity, and 99.9% specificity. The unsupervised workflow accurately grouped cells into biologically meaningful clusters, recovering known populations and identifying additional candidate populations with marker profiles consistent with true biology. AutoFlow offers a fast, reproducible, and scalable solution for FC analysis, enabling high-throughput studies and improving the discovery of rare or unexpected cell types.

Availability and implementation: The application is available at https://github.com/FERWoods/AutoFlow for download using R. An archived version is available at DOI: 10.5281/zenodo.18235796.

目的:流式细胞术(FC)是一种广泛使用的基于特定标记物的荧光分析细胞或颗粒的技术。荧光阈值通常是手动设置的,这是一个费力的主观过程,随着FC技术的进步,这个过程的可扩展性很差。机器学习(ML)方法可以解决这些问题,但通常需要许多实验室科学家不具备的技术费用。因此,需要可访问的、开源的、跨域的基于ml的FC工具。结果:我们提出了AutoFlow,一个易于使用,适应性强的R - Shiny应用程序,用于自动流式细胞术(FC)分析。AutoFlow支持两种工作流程:监督学习和非监督学习。该应用程序自动化关键预处理步骤,包括荧光补偿,碎片排除,单细胞鉴定,表面标记门控,MFI量化,和下游分类或聚类。在三个数据集中,两个可用的数据集(Mosmann和Nilsson Rare)和一个新的骨髓微生理系统(BM-MPS)数据集,AutoFlow显示出强大的性能。在监督工作流程中,BM-MPS的多类分类准确率达到97.2%,在主要谱系中具有较高的灵敏度和特异性。对于罕见人群,表现很好:Mosmann rare(患病率0.03%)的敏感性为87.5%,特异性为100%,Nilsson rare(患病率0.08%)的敏感性为87.9%,特异性为99.9%。无监督的工作流程准确地将细胞分组为具有生物学意义的簇,恢复已知的种群,并识别具有与真实生物学一致的标记谱的其他候选种群。AutoFlow为FC分析提供了快速、可重复和可扩展的解决方案,实现了高通量研究,并改进了罕见或意外细胞类型的发现。可用性:该应用程序可在https://github.com/FERWoods/AutoFlow上使用r下载,存档版本可在DOI:10.5281/zenodo.18235796处获得。补充信息:补充数据可在生物信息学在线获取。
{"title":"AutoFlow: an interactive Shiny app for supervised and unsupervised flow cytometry analysis.","authors":"Freya E R Woods, Emilyanne Leonard, Timothy Ebbels, Jonathan Cairns, Rhiannon David","doi":"10.1093/bioinformatics/btag078","DOIUrl":"10.1093/bioinformatics/btag078","url":null,"abstract":"<p><strong>Motivation: </strong>Flow cytometry (FC) is a widely used technique for analysing cells or particles based on the fluorescence of specific markers. Thresholds for fluorescence are typically set manually, a laborious, subjective process that scales poorly as FC technology advances. Machine learning (ML) methods can address these issues but often require technical expertise many bench scientists do not possess. Thus, accessible, open-source, and cross-domain ML-based FC tools are needed.</p><p><strong>Results: </strong>We present AutoFlow, an easy-to-use, adaptable R Shiny application for automated flow cytometry (FC) analysis. AutoFlow supports two workflows: supervised and unsupervised learning. The application automates key preprocessing steps including fluorescence compensation, debris exclusion, single-cell identification, viability marker gating, and downstream classification or clustering. Across three datasets, two publicly available (Mosmann and Nilsson Rare) and a novel bone marrow microphysiological system (BM-MPS) dataset, AutoFlow demonstrated robust performance. In the supervised workflow, multiclass classification on BM-MPS achieved 97.2% accuracy under a single-timepoint training and multi-timepoint testing scheme, with high sensitivity and specificity across major lineages. For rare populations, performance was strong: Mosmann Rare (0.03% prevalence) achieved 87.5% sensitivity, and 100% specificity, while Nilsson Rare (0.08% prevalence) achieved 87.9% sensitivity, and 99.9% specificity. The unsupervised workflow accurately grouped cells into biologically meaningful clusters, recovering known populations and identifying additional candidate populations with marker profiles consistent with true biology. AutoFlow offers a fast, reproducible, and scalable solution for FC analysis, enabling high-throughput studies and improving the discovery of rare or unexpected cell types.</p><p><strong>Availability and implementation: </strong>The application is available at https://github.com/FERWoods/AutoFlow for download using R. An archived version is available at DOI: 10.5281/zenodo.18235796.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12970595/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential cell signaling testing for cell-cell communication inference from single-cell data by dominoSignal. 用dominoSignal从单细胞数据推断细胞间通信的差异细胞信号测试。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag089
Jacob T Mitchell, Orian Stapleton, Kavita Krishnan, Sushma Nagaraj, Dmitrijs Lvovs, Christopher Cherry, Amanda Poissonnier, Wesley Horton, Andrew Adey, Varun Rao, Amanda Huff, Jacquelyn W Zimmerman, Luciane T Kagohara, Neeha Zaidi, Lisa M Coussens, Elizabeth M Jaffee, Jennifer H Elisseeff, Elana J Fertig

Motivation: Algorithms for ligand-receptor network inference have emerged as commonly used tools to estimate cell-cell communication from reference single-cell data. Many studies employ these algorithms to compare signaling between conditions and lack methods to statistically identify signals that are significantly different. We previously developed the cell communication inference algorithm Domino, which considers ligand and receptor gene expression in association with downstream transcription factor activity scoring. We developed the dominoSignal software to innovate upon Domino and extend its functionality to test statistically differential cellular signaling.

Results: This new functionality includes the compilation of active signals as linkages from multiple subjects in a single-cell data set and testing condition-dependent signaling linkage. The software is applicable for analysis of single-cell data sets with multiple subjects as biological replicates as well as with bootstrapped replicates from data sets with few or pooled subjects. We use simulation studies to benchmark the number of subjects in compared groups and cells within an annotated cell type sufficient to accurately identify differential linkages. We demonstrate the application of the Differential Cell Signaling Test (DCST) in the dominoSignal software to investigate consequences of cancer cell phenotypes and immunotherapy on cell-cell communication in tumor microenvironments. These applications in cancer studies demonstrate the ability of differential cell signaling analysis to infer changes to cell communication networks from therapeutic or experimental perturbations, which is broadly applicable across biological systems.

Availability: dominoSignal is available through Bioconductor at https://www.bioconductor.org/packages/release/bioc/html/dominoSignal.html.

动机:配体-受体网络推断算法已经成为从参考单细胞数据估计细胞-细胞通信的常用工具。许多研究使用这些算法来比较不同条件下的信号,缺乏统计方法来识别显著不同的信号。我们之前开发了细胞通讯推断算法Domino,该算法将配体和受体基因表达与下游转录因子活性评分相关联。我们开发了dominoSignal软件,对Domino进行创新,并扩展其功能,以测试统计差异的蜂窝信号。结果:这个新功能包括将来自多个受试者的活动信号作为一个单细胞数据集的连接进行编译,并测试条件依赖的信号连接。该软件适用于分析单细胞数据集与多个主题作为生物复制,以及与引导复制从数据集与少数或汇集的主题。我们使用模拟研究来基准比较组中的受试者数量和注释细胞类型内的细胞数量,足以准确识别差异联系。我们展示了差分细胞信号测试(DCST)在多米诺信号软件中的应用,以研究肿瘤微环境中癌细胞表型和免疫治疗对细胞-细胞通讯的影响。这些在癌症研究中的应用证明了差异细胞信号分析从治疗或实验扰动中推断细胞通信网络变化的能力,这在整个生物系统中广泛适用。可用性:dominoSignal可通过Bioconductor网站https://www.bioconductor.org/packages/release/bioc/html/dominoSignal.html.Supplementary获得:分析代码和补充信息可通过Zenodo网站https://zenodo.org/records/18329130和Bioinformatics在线获得。
{"title":"Differential cell signaling testing for cell-cell communication inference from single-cell data by dominoSignal.","authors":"Jacob T Mitchell, Orian Stapleton, Kavita Krishnan, Sushma Nagaraj, Dmitrijs Lvovs, Christopher Cherry, Amanda Poissonnier, Wesley Horton, Andrew Adey, Varun Rao, Amanda Huff, Jacquelyn W Zimmerman, Luciane T Kagohara, Neeha Zaidi, Lisa M Coussens, Elizabeth M Jaffee, Jennifer H Elisseeff, Elana J Fertig","doi":"10.1093/bioinformatics/btag089","DOIUrl":"10.1093/bioinformatics/btag089","url":null,"abstract":"<p><strong>Motivation: </strong>Algorithms for ligand-receptor network inference have emerged as commonly used tools to estimate cell-cell communication from reference single-cell data. Many studies employ these algorithms to compare signaling between conditions and lack methods to statistically identify signals that are significantly different. We previously developed the cell communication inference algorithm Domino, which considers ligand and receptor gene expression in association with downstream transcription factor activity scoring. We developed the dominoSignal software to innovate upon Domino and extend its functionality to test statistically differential cellular signaling.</p><p><strong>Results: </strong>This new functionality includes the compilation of active signals as linkages from multiple subjects in a single-cell data set and testing condition-dependent signaling linkage. The software is applicable for analysis of single-cell data sets with multiple subjects as biological replicates as well as with bootstrapped replicates from data sets with few or pooled subjects. We use simulation studies to benchmark the number of subjects in compared groups and cells within an annotated cell type sufficient to accurately identify differential linkages. We demonstrate the application of the Differential Cell Signaling Test (DCST) in the dominoSignal software to investigate consequences of cancer cell phenotypes and immunotherapy on cell-cell communication in tumor microenvironments. These applications in cancer studies demonstrate the ability of differential cell signaling analysis to infer changes to cell communication networks from therapeutic or experimental perturbations, which is broadly applicable across biological systems.</p><p><strong>Availability: </strong>dominoSignal is available through Bioconductor at https://www.bioconductor.org/packages/release/bioc/html/dominoSignal.html.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12998610/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scDock: streamlining drug discovery targeting cell-cell communication via scRNA-seq analysis and molecular docking. scDock:通过scRNA-seq分析和分子对接,简化靶向细胞-细胞通讯的药物发现。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag103
Chen-Hao Huang, Yen-Jen Oyang, Hsuan-Cheng Huang, Hsueh-Fen Juan

Summary: Identifying drugs that target intercellular communication networks represents a promising therapeutic strategy, yet linking single-cell RNA sequencing (scRNA-seq) analysis to structure-based drug screening remains technically challenging and requires substantial bioinformatics expertise. We present scDock, an integrated and user-friendly pipeline that seamlessly connects scRNA-seq data processing, cell-cell communication inference, and molecular docking-based drug discovery. Through a single configuration file, users can execute the complete workflow, from raw scRNA-seq data to ranked drug candidates, without programming skills. scDock automates the identification of disease-relevant ligand-receptor interactions from scRNA-seq data and performs structure-based virtual screening against these communication targets using Protein Data Bank (PDB) or AlphaFold-predicted protein structures. The pipeline generates comprehensive outputs at each stage, enabling users to explore intercellular signaling alterations and discover therapeutic compounds targeting specific cell-cell communications. scDock addresses a critical gap by providing an accessible end-to-end solution for communication-targeted drug discovery from single-cell data.

Availability and implementation: scDock is freely available at https://doi.org/10.6084/m9.figshare.31370368 and https://github.com/Andrewneteye4343/scDock. It is implemented in R, Python, shell scripts, and supports Linux systems, including Ubuntu and Debian.

摘要:识别靶向细胞间通讯网络的药物是一种很有前途的治疗策略,然而将单细胞RNA测序(scRNA-seq)分析与基于结构的药物筛选联系起来在技术上仍然具有挑战性,并且需要大量的生物信息学专业知识。我们提出了scDock,一个集成的、用户友好的管道,无缝连接scRNA-seq数据处理、细胞-细胞通信推断和基于分子对接的药物发现。通过单个配置文件,用户可以执行完整的工作流程,从原始scRNA-seq数据到候选药物排名,而无需编程技能。scDock从scRNA-seq数据中自动识别疾病相关的配体-受体相互作用,并使用蛋白质数据库(PDB)或alphafold预测的蛋白质结构对这些通信靶标进行基于结构的虚拟筛选。该管道在每个阶段产生全面的输出,使用户能够探索细胞间信号的改变,并发现针对特定细胞-细胞通信的治疗化合物。scDock通过提供可访问的端到端解决方案,从单细胞数据中发现以通信为目标的药物,从而解决了一个关键的空白。可用性和实现:scDock可以在https://doi.org/10.6084/m9.figshare.31370368和https://github.com/Andrewneteye4343/scDock上免费获得。它是用R、Python、shell脚本实现的,支持Linux系统,包括Ubuntu和Debian。补充信息:补充数据可在生物信息学在线获取。
{"title":"scDock: streamlining drug discovery targeting cell-cell communication via scRNA-seq analysis and molecular docking.","authors":"Chen-Hao Huang, Yen-Jen Oyang, Hsuan-Cheng Huang, Hsueh-Fen Juan","doi":"10.1093/bioinformatics/btag103","DOIUrl":"10.1093/bioinformatics/btag103","url":null,"abstract":"<p><strong>Summary: </strong>Identifying drugs that target intercellular communication networks represents a promising therapeutic strategy, yet linking single-cell RNA sequencing (scRNA-seq) analysis to structure-based drug screening remains technically challenging and requires substantial bioinformatics expertise. We present scDock, an integrated and user-friendly pipeline that seamlessly connects scRNA-seq data processing, cell-cell communication inference, and molecular docking-based drug discovery. Through a single configuration file, users can execute the complete workflow, from raw scRNA-seq data to ranked drug candidates, without programming skills. scDock automates the identification of disease-relevant ligand-receptor interactions from scRNA-seq data and performs structure-based virtual screening against these communication targets using Protein Data Bank (PDB) or AlphaFold-predicted protein structures. The pipeline generates comprehensive outputs at each stage, enabling users to explore intercellular signaling alterations and discover therapeutic compounds targeting specific cell-cell communications. scDock addresses a critical gap by providing an accessible end-to-end solution for communication-targeted drug discovery from single-cell data.</p><p><strong>Availability and implementation: </strong>scDock is freely available at https://doi.org/10.6084/m9.figshare.31370368 and https://github.com/Andrewneteye4343/scDock. It is implemented in R, Python, shell scripts, and supports Linux systems, including Ubuntu and Debian.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12996892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147328510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics (Oxford, England)
全部 Clean-Soil Air Water Chem. Ecol. ACTA PETROL SIN ACTA GEOL POL Hydrogeol. J. Conserv. Biol. GROUNDWATER Ecol. Indic. Atmos. Meas. Tech. Environmental Control in Biology Environmental Claims Journal Am. J. Phys. Anthropol. CRIT REV ENV SCI TEC Quat. Sci. Rev. ENVIRON HEALTH-GLOB Aquat. Geochem. Prog. Oceanogr. Int. J. Disaster Risk Reduct. EXPERT REV ANTICANC Am. J. Sci. Yan Ke Xue Bao (Hong Kong) J. Adv. Model. Earth Syst. Acta Oceanolog. Sin. Miner. Deposita Andean Geol. Int. J. Geomech. Eurasian Physical Technical Journal ITAL J REMOTE SENS Geosci. Model Dev. EQEC'96. 1996 European Quantum Electronic Conference Stud. Geophys. Geod. NEUES JAHRB GEOL P-A Eur. J. Control European Journal of Chemistry IZV-PHYS SOLID EART+ N. Z. J. Geol. Geophys. ERN: Other Microeconomics: General Equilibrium & Disequilibrium Models of Financial Markets (Topic) EUR PSYCHIAT Sediment. Geol. SCI CHINA EARTH SCI Geol. J. Turk. J. Earth Sci. Expert Opin. Orphan Drugs ECOSYSTEMS ERN: Other Macroeconomics: Aggregative Models (Topic) J. Syst. Paleontol. ITAL J GEOSCI ENVIRON ENG GEOSCI Earth Sci. Inf. Dyn. Atmos. Oceans WIRES WATER PERIOD MINERAL Mar. Micropaleontol. Phys. Chem. Miner. EYE Exp. Hematol. Oncol. Geography Compass ERN: Other IO: Empirical Studies of Firms & Markets (Topic) Am. Mineral. Engineering Structures and Technologies Russ. Geol. Geophys. Acta Geod. Geophys. 国际生物医学工程杂志 Carbon Balance Manage. Exp. Cell. Res. Int. J. Geog. Inf. Sci. Mar. Geophys. Res. Photogramm. Eng. Remote Sens. Global Biogeochem. Cycles AMEGHINIANA ANN SOC GEOL POL Norw. J. Geol. Oper. Res. Perspect. Environ. Eng. Sci. Memai Heiko Igaku Finance and Development Nat. Resour. Res. J. Hum. Evol. OCEAN SCI J J. Hydrol. Espacio Tiempo y Forma. Serie VII, Historia del Arte ECOLOGY Conserv. Genet. Resour. GEOLOGY Ann. Carnegie Mus. Atmos. Chem. Phys. BIOGEOSCIENCES Basin Res. Environ. Eng. Manage. J. Geostand. Geoanal. Res. Seismol. Res. Lett. Org. Geochem. Int. J. Biometeorol. Geol. Ore Deposits Asia-Pac. J. Atmos. Sci. J. Atmos. Chem. Geochem. Int. Ecol. Processes Communications Earth & Environment Geobiology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1