Briefings in bioinformatics最新文献_第9页

Precision DNA methylation typing via hierarchical clustering of Nanopore current signals and attention-based neural network. 通过对纳米孔电流信号和基于注意力的神经网络进行分层聚类，实现精确的 DNA 甲基化分型。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae596

Qi Dai, Hu Chen, Wen-Jing Yi, Jia-Ning Zhao, Wei Zhang, Ping-An He, Xiao-Qing Liu, Ying-Feng Zheng, Zhuo-Xing Shi

Decoding DNA methylation sites through nanopore sequencing has emerged as a cutting-edge technology in the field of DNA methylation research, as it enables direct sequencing of native DNA molecules without the need for prior enzymatic or chemical treatments. During nanopore sequencing, methylation modifications on DNA bases cause changes in electrical current intensity. Therefore, constructing deep neural network models to decode the electrical signals of nanopore sequencing has become a crucial step in methylation site identification. In this study, we utilized nanopore sequencing data containing diverse DNA methylation types and motif sequence diversity. We proposed a feature encoding method based on current signal clustering and leveraged the powerful attention mechanism in the Transformer framework to construct the PoreFormer model for identifying DNA methylation sites in nanopore sequencing. The model demonstrated excellent performance under conditions of multi-class methylation and motif sequence diversity, offering new insights into related research fields.

通过纳米孔测序解码 DNA 甲基化位点已成为 DNA 甲基化研究领域的一项前沿技术，因为它可以直接对原生 DNA 分子进行测序，而无需事先进行酶处理或化学处理。在纳米孔测序过程中，DNA 碱基的甲基化修饰会导致电流强度发生变化。因此，构建深度神经网络模型来解码纳米孔测序的电信号已成为甲基化位点鉴定的关键步骤。在本研究中，我们利用了包含不同 DNA 甲基化类型和主题序列多样性的纳米孔测序数据。我们提出了一种基于电流信号聚类的特征编码方法，并利用 Transformer 框架中强大的注意力机制构建了 PoreFormer 模型，用于识别纳米孔测序中的 DNA 甲基化位点。该模型在多类甲基化和主题序列多样性条件下表现出卓越的性能，为相关研究领域提供了新的见解。

{"title":"Precision DNA methylation typing via hierarchical clustering of Nanopore current signals and attention-based neural network.","authors":"Qi Dai, Hu Chen, Wen-Jing Yi, Jia-Ning Zhao, Wei Zhang, Ping-An He, Xiao-Qing Liu, Ying-Feng Zheng, Zhuo-Xing Shi","doi":"10.1093/bib/bbae596","DOIUrl":"10.1093/bib/bbae596","url":null,"abstract":"Decoding DNA methylation sites through nanopore sequencing has emerged as a cutting-edge technology in the field of DNA methylation research, as it enables direct sequencing of native DNA molecules without the need for prior enzymatic or chemical treatments. During nanopore sequencing, methylation modifications on DNA bases cause changes in electrical current intensity. Therefore, constructing deep neural network models to decode the electrical signals of nanopore sequencing has become a crucial step in methylation site identification. In this study, we utilized nanopore sequencing data containing diverse DNA methylation types and motif sequence diversity. We proposed a feature encoding method based on current signal clustering and leveraged the powerful attention mechanism in the Transformer framework to construct the PoreFormer model for identifying DNA methylation sites in nanopore sequencing. The model demonstrated excellent performance under conditions of multi-class methylation and motif sequence diversity, offering new insights into related research fields.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562827/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DrugDoctor: enhancing drug recommendation in cold-start scenario via visit-level representation learning and training. DrugDoctor：通过访问级表征学习和训练，增强冷启动场景中的药物推荐。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae464

Yabin Kuang, Minzhu Xie

Medication recommendation is a crucial application of artificial intelligence in healthcare. Current methodologies mostly depend on patient-level longitudinal representation, which utilizes the entirety of historical electronic health records for making predictions. However, they tend to overlook a few key elements: (1) The need to analyze the impact of past medications on previous conditions. (2) Similarity in patient visits is more common than similarity in the complete medical histories of patients. (3) It is difficult to accurately represent patient-level longitudinal data due to the varying numbers of visits. To our knowledge, current models face difficulties in dealing with initial patient visits (i.e. in cold-start scenarios) which are common in clinical practice. This paper introduces DrugDoctor, an innovative drug recommendation model crafted to emulate the decision-making mechanics of human doctors. Unlike previous methods, DrugDoctor explores the visit-level relationship between prescriptions and diseases while considering the impact of past prescriptions on the patient's condition to provide more accurate recommendations. We design a plug-and-play block to effectively capture drug substructure-aware disease information and effectiveness-aware medication information, employing cross-attention and multi-head self-attention mechanisms. Furthermore, DrugDoctor adopts a fundamentally new visit-level training strategy, aligning more closely with the practices of doctors. Extensive experiments conducted on the MIMIC-III and MIMIC-IV datasets demonstrate that DrugDoctor outperforms 10 other state-of-the-art methods in terms of Jaccard, F1-score, and PRAUC. Moreover, DrugDoctor exhibits strong robustness in handling patients with varying numbers of visits and effectively tackles "cold-start" issues in medication combination recommendations.

用药建议是人工智能在医疗保健领域的一项重要应用。目前的方法大多依赖于患者层面的纵向表示，即利用全部历史电子健康记录进行预测。然而，它们往往忽略了几个关键因素：(1) 需要分析以往药物对以往病情的影响。(2) 病人就诊的相似性比病人完整病史的相似性更常见。(3) 由于就诊次数不同，很难准确表示患者层面的纵向数据。据我们所知，目前的模型在处理临床实践中常见的患者初次就诊（即冷启动情景）时面临困难。本文介绍的 DrugDoctor 是一个创新的药物推荐模型，旨在模仿人类医生的决策机制。与以往的方法不同，DrugDoctor 探索处方与疾病之间的就诊关系，同时考虑过去的处方对患者病情的影响，从而提供更准确的建议。我们设计了一个即插即用的模块，利用交叉关注和多头自关注机制，有效捕捉药物子结构感知的疾病信息和药效感知的用药信息。此外，DrugDoctor 采用了一种全新的就诊级训练策略，更加贴近医生的实践。在 MIMIC-III 和 MIMIC-IV 数据集上进行的广泛实验表明，DrugDoctor 在 Jaccard、F1-score 和 PRAUC 方面优于其他 10 种最先进的方法。此外，DrugDoctor 在处理就诊次数不同的患者时表现出很强的鲁棒性，并能有效解决药物组合推荐中的 "冷启动 "问题。

{"title":"DrugDoctor: enhancing drug recommendation in cold-start scenario via visit-level representation learning and training.","authors":"Yabin Kuang, Minzhu Xie","doi":"10.1093/bib/bbae464","DOIUrl":"10.1093/bib/bbae464","url":null,"abstract":"Medication recommendation is a crucial application of artificial intelligence in healthcare. Current methodologies mostly depend on patient-level longitudinal representation, which utilizes the entirety of historical electronic health records for making predictions. However, they tend to overlook a few key elements: (1) The need to analyze the impact of past medications on previous conditions. (2) Similarity in patient visits is more common than similarity in the complete medical histories of patients. (3) It is difficult to accurately represent patient-level longitudinal data due to the varying numbers of visits. To our knowledge, current models face difficulties in dealing with initial patient visits (i.e. in cold-start scenarios) which are common in clinical practice. This paper introduces DrugDoctor, an innovative drug recommendation model crafted to emulate the decision-making mechanics of human doctors. Unlike previous methods, DrugDoctor explores the visit-level relationship between prescriptions and diseases while considering the impact of past prescriptions on the patient's condition to provide more accurate recommendations. We design a plug-and-play block to effectively capture drug substructure-aware disease information and effectiveness-aware medication information, employing cross-attention and multi-head self-attention mechanisms. Furthermore, DrugDoctor adopts a fundamentally new visit-level training strategy, aligning more closely with the practices of doctors. Extensive experiments conducted on the MIMIC-III and MIMIC-IV datasets demonstrate that DrugDoctor outperforms 10 other state-of-the-art methods in terms of Jaccard, F1-score, and PRAUC. Moreover, DrugDoctor exhibits strong robustness in handling patients with varying numbers of visits and effectively tackles \"cold-start\" issues in medication combination recommendations.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11418268/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AntigenBoost: enhanced mRNA-based antigen expression through rational amino acid substitution. AntigenBoost：通过合理的氨基酸替换增强基于 mRNA 的抗原表达。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae468

Yumiao Gao, Siran Zhu, Huichun Li, Xueting Hao, Wen Chen, Deng Pan, Zhikang Qian

Messenger RNA (mRNA) vaccines represent a groundbreaking advancement in immunology and public health, particularly highlighted by their role in combating the COVID-19 pandemic. Optimizing mRNA-based antigen expression is a crucial focus in this emerging industry. We have developed a bioinformatics tool named AntigenBoost to address the challenge posed by destabilizing dipeptides that hinder ribosomal translation. AntigenBoost identifies these dipeptides within specific antigens and provides a range of potential amino acid substitution strategies using a two-dimensional scoring system. Through a combination of bioinformatics analysis and experimental validation, we significantly enhanced the in vitro expression of mRNA-derived Respiratory Syncytial Virus fusion glycoprotein and Influenza A Hemagglutinin antigen. Notably, a single amino acid substitution improved the immune response in mice, underscoring the effectiveness of AntigenBoost in mRNA vaccine design.

信使核糖核酸（mRNA）疫苗是免疫学和公共卫生领域的突破性进展，在抗击 COVID-19 大流行中发挥的作用尤其突出。优化基于 mRNA 的抗原表达是这一新兴产业的关键重点。我们开发了一种名为 AntigenBoost 的生物信息学工具，以应对阻碍核糖体翻译的不稳定二肽带来的挑战。AntigenBoost 可识别特定抗原中的这些二肽，并利用二维评分系统提供一系列潜在的氨基酸替代策略。通过结合生物信息学分析和实验验证，我们显著提高了源自 mRNA 的呼吸道合胞病毒融合糖蛋白和甲型流感血凝素抗原的体外表达。值得注意的是，单个氨基酸的替换就能改善小鼠的免疫反应，这突出了 AntigenBoost 在 mRNA 疫苗设计中的有效性。

引用次数: 0

Building multiscale models with PhysiBoSS, an agent-based modeling tool. 利用基于代理的建模工具 PhysiBoSS 建立多尺度模型。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae509

Marco Ruscone, Andrea Checcoli, Randy Heiland, Emmanuel Barillot, Paul Macklin, Laurence Calzone, Vincent Noël

Multiscale models provide a unique tool for analyzing complex processes that study events occurring at different scales across space and time. In the context of biological systems, such models can simulate mechanisms happening at the intracellular level such as signaling, and at the extracellular level where cells communicate and coordinate with other cells. These models aim to understand the impact of genetic or environmental deregulation observed in complex diseases, describe the interplay between a pathological tissue and the immune system, and suggest strategies to revert the diseased phenotypes. The construction of these multiscale models remains a very complex task, including the choice of the components to consider, the level of details of the processes to simulate, or the fitting of the parameters to the data. One additional difficulty is the expert knowledge needed to program these models in languages such as C++ or Python, which may discourage the participation of non-experts. Simplifying this process through structured description formalisms-coupled with a graphical interface-is crucial in making modeling more accessible to the broader scientific community, as well as streamlining the process for advanced users. This article introduces three examples of multiscale models which rely on the framework PhysiBoSS, an add-on of PhysiCell that includes intracellular descriptions as continuous time Boolean models to the agent-based approach. The article demonstrates how to construct these models more easily, relying on PhysiCell Studio, the PhysiCell Graphical User Interface. A step-by-step tutorial is provided as Supplementary Material and all models are provided at https://physiboss.github.io/tutorial/.

多尺度模型为分析复杂过程提供了一种独特的工具，可用于研究在不同空间和时间尺度上发生的事件。在生物系统中，这类模型可以模拟发生在细胞内水平（如信号传递）和细胞外水平（细胞与其他细胞进行交流和协调）的机制。这些模型旨在了解复杂疾病中观察到的遗传或环境失调的影响，描述病理组织与免疫系统之间的相互作用，并提出恢复疾病表型的策略。构建这些多尺度模型仍然是一项非常复杂的任务，包括选择要考虑的组成部分、模拟过程的详细程度或参数与数据的拟合。另外一个困难是，用 C++ 或 Python 等语言对这些模型进行编程需要专业知识，这可能会阻碍非专业人员的参与。通过结构化的描述形式--再加上图形界面--来简化这一过程，对于让更广泛的科学界更容易接受建模以及简化高级用户的建模过程至关重要。PhysiBoSS 是 PhysiCell 的附加组件，将细胞内描述作为连续时间布尔模型纳入基于代理的方法。文章演示了如何利用 PhysiCell 图形用户界面 PhysiCell Studio 更轻松地构建这些模型。分步教程作为补充材料提供，所有模型可在 https://physiboss.github.io/tutorial/ 网站上查阅。

{"title":"Building multiscale models with PhysiBoSS, an agent-based modeling tool.","authors":"Marco Ruscone, Andrea Checcoli, Randy Heiland, Emmanuel Barillot, Paul Macklin, Laurence Calzone, Vincent Noël","doi":"10.1093/bib/bbae509","DOIUrl":"10.1093/bib/bbae509","url":null,"abstract":"Multiscale models provide a unique tool for analyzing complex processes that study events occurring at different scales across space and time. In the context of biological systems, such models can simulate mechanisms happening at the intracellular level such as signaling, and at the extracellular level where cells communicate and coordinate with other cells. These models aim to understand the impact of genetic or environmental deregulation observed in complex diseases, describe the interplay between a pathological tissue and the immune system, and suggest strategies to revert the diseased phenotypes. The construction of these multiscale models remains a very complex task, including the choice of the components to consider, the level of details of the processes to simulate, or the fitting of the parameters to the data. One additional difficulty is the expert knowledge needed to program these models in languages such as C++ or Python, which may discourage the participation of non-experts. Simplifying this process through structured description formalisms-coupled with a graphical interface-is crucial in making modeling more accessible to the broader scientific community, as well as streamlining the process for advanced users. This article introduces three examples of multiscale models which rely on the framework PhysiBoSS, an add-on of PhysiCell that includes intracellular descriptions as continuous time Boolean models to the agent-based approach. The article demonstrates how to construct these models more easily, relying on PhysiCell Studio, the PhysiCell Graphical User Interface. A step-by-step tutorial is provided as Supplementary Material and all models are provided at https://physiboss.github.io/tutorial/.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11489466/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cancerous time estimation for interpreting the evolution of lung adenocarcinoma. 用于解释肺腺癌演变的癌症时间估计。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae520

Yourui Han, Bolin Chen, Jun Bian, Ruiming Kang, Xuequn Shang

The evolution of lung adenocarcinoma is accompanied by a multitude of gene mutations and dysfunctions, rendering its phenotypic state and evolutionary direction highly complex. To interpret the evolution of lung adenocarcinoma, various methods have been developed to elucidate the molecular pathogenesis and functional evolution processes. However, most of these methods are constrained by the absence of cancerous temporal information, and the challenges of heterogeneous characteristics. To handle these problems, in this study, a patient quasi-potential landscape method was proposed to estimate the cancerous time of phenotypic states' emergence during the evolutionary process. Subsequently, a total of 39 different oncogenetic paths were identified based on cancerous time and mutations, reflecting the molecular pathogenesis of the evolutionary process of lung adenocarcinoma. To interpret the evolution patterns of lung adenocarcinoma, three oncogenetic graphs were obtained as the common evolutionary patterns by merging the oncogenetic paths. Moreover, patients were evenly re-divided into early, middle, and late evolutionary stages according to cancerous time, and a feasible framework was developed to construct the functional evolution network of lung adenocarcinoma. A total of six significant functional evolution processes were identified from the functional evolution network based on the pathway enrichment analysis, which plays critical roles in understanding the development of lung adenocarcinoma.

肺腺癌的演化过程伴随着多种基因突变和功能障碍，使其表型状态和演化方向变得非常复杂。为了解释肺腺癌的进化，人们开发了各种方法来阐明其分子发病机制和功能进化过程。然而，这些方法大多受制于癌症时间信息的缺失和异质性特征的挑战。为了解决这些问题，本研究提出了一种病人准潜在景观方法，以估计进化过程中表型状态出现的癌变时间。随后，根据癌变时间和突变情况，共确定了39种不同的癌基因路径，反映了肺腺癌进化过程中的分子发病机制。为了解读肺腺癌的进化模式，通过合并肿瘤基因路径，得到了三个肿瘤基因图谱，作为共同的进化模式。此外，根据癌变时间将患者重新均匀地划分为早期、中期和晚期三个进化阶段，并建立了一个可行的框架来构建肺腺癌的功能进化网络。根据通路富集分析，从功能进化网络中发现了六个重要的功能进化过程，它们在理解肺腺癌的发展过程中发挥着关键作用。

{"title":"Cancerous time estimation for interpreting the evolution of lung adenocarcinoma.","authors":"Yourui Han, Bolin Chen, Jun Bian, Ruiming Kang, Xuequn Shang","doi":"10.1093/bib/bbae520","DOIUrl":"https://doi.org/10.1093/bib/bbae520","url":null,"abstract":"The evolution of lung adenocarcinoma is accompanied by a multitude of gene mutations and dysfunctions, rendering its phenotypic state and evolutionary direction highly complex. To interpret the evolution of lung adenocarcinoma, various methods have been developed to elucidate the molecular pathogenesis and functional evolution processes. However, most of these methods are constrained by the absence of cancerous temporal information, and the challenges of heterogeneous characteristics. To handle these problems, in this study, a patient quasi-potential landscape method was proposed to estimate the cancerous time of phenotypic states' emergence during the evolutionary process. Subsequently, a total of 39 different oncogenetic paths were identified based on cancerous time and mutations, reflecting the molecular pathogenesis of the evolutionary process of lung adenocarcinoma. To interpret the evolution patterns of lung adenocarcinoma, three oncogenetic graphs were obtained as the common evolutionary patterns by merging the oncogenetic paths. Moreover, patients were evenly re-divided into early, middle, and late evolutionary stages according to cancerous time, and a feasible framework was developed to construct the functional evolution network of lung adenocarcinoma. A total of six significant functional evolution processes were identified from the functional evolution network based on the pathway enrichment analysis, which plays critical roles in understanding the development of lung adenocarcinoma.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483137/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AIGen: an artificial intelligence software for complex genetic data analysis. AIGen：用于复杂遗传数据分析的人工智能软件。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae566

Tingting Hou, Xiaoxi Shen, Shan Zhang, Muxuan Liang, Li Chen, Qing Lu

The recent development of artificial intelligence (AI) technology, especially the advance of deep neural network (DNN) technology, has revolutionized many fields. While DNN plays a central role in modern AI technology, it has rarely been used in genetic data analysis due to analytical and computational challenges brought by high-dimensional genetic data and an increasing number of samples. To facilitate the use of AI in genetic data analysis, we developed a C++ package, AIGen, based on two newly developed neural networks (i.e. kernel neural networks and functional neural networks) that are capable of modeling complex genotype-phenotype relationships (e.g. interactions) while providing robust performance against high-dimensional genetic data. Moreover, computationally efficient algorithms (e.g. a minimum norm quadratic unbiased estimation approach and batch training) are implemented in the package to accelerate the computation, making them computationally efficient for analyzing large-scale datasets with thousands or even millions of samples. By applying AIGen to the UK Biobank dataset, we demonstrate that it can efficiently analyze large-scale genetic data, attain improved accuracy, and maintain robust performance. Availability: AIGen is developed in C++ and its source code, along with reference libraries, is publicly accessible on GitHub at https://github.com/TingtHou/AIGen.

近年来，人工智能（AI）技术的发展，尤其是深度神经网络（DNN）技术的进步，给许多领域带来了革命性的变化。虽然 DNN 在现代人工智能技术中发挥着核心作用，但由于高维遗传数据和日益增多的样本带来的分析和计算挑战，它很少被用于遗传数据分析。为了促进人工智能在遗传数据分析中的应用，我们开发了一个 C++ 软件包 AIGen，它基于两种新开发的神经网络（即核神经网络和功能神经网络），能够模拟复杂的基因型-表型关系（如相互作用），同时在处理高维遗传数据时具有强大的性能。此外，该软件包还采用了计算效率高的算法（如最小规范二次无偏估计方法和批量训练）来加速计算，使其在分析具有数千甚至数百万样本的大规模数据集时具有很高的计算效率。通过将 AIGen 应用于英国生物库数据集，我们证明了它可以高效地分析大规模遗传数据、提高准确性并保持稳健的性能。可用性AIGen 采用 C++ 开发，其源代码和参考库可在 GitHub 上公开访问，网址为 https://github.com/TingtHou/AIGen。

{"title":"AIGen: an artificial intelligence software for complex genetic data analysis.","authors":"Tingting Hou, Xiaoxi Shen, Shan Zhang, Muxuan Liang, Li Chen, Qing Lu","doi":"10.1093/bib/bbae566","DOIUrl":"10.1093/bib/bbae566","url":null,"abstract":"The recent development of artificial intelligence (AI) technology, especially the advance of deep neural network (DNN) technology, has revolutionized many fields. While DNN plays a central role in modern AI technology, it has rarely been used in genetic data analysis due to analytical and computational challenges brought by high-dimensional genetic data and an increasing number of samples. To facilitate the use of AI in genetic data analysis, we developed a C++ package, AIGen, based on two newly developed neural networks (i.e. kernel neural networks and functional neural networks) that are capable of modeling complex genotype-phenotype relationships (e.g. interactions) while providing robust performance against high-dimensional genetic data. Moreover, computationally efficient algorithms (e.g. a minimum norm quadratic unbiased estimation approach and batch training) are implemented in the package to accelerate the computation, making them computationally efficient for analyzing large-scale datasets with thousands or even millions of samples. By applying AIGen to the UK Biobank dataset, we demonstrate that it can efficiently analyze large-scale genetic data, attain improved accuracy, and maintain robust performance. Availability: AIGen is developed in C++ and its source code, along with reference libraries, is publicly accessible on GitHub at https://github.com/TingtHou/AIGen.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11568876/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142643854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STAVER: a standardized benchmark dataset-based algorithm for effective variation reduction in large-scale DIA-MS data. STAVER：基于标准化基准数据集的算法，可有效减少大规模 DIA-MS 数据中的变异。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae553

Peng Ran, Yunzhi Wang, Kai Li, Shiman He, Subei Tan, Jiacheng Lv, Jiajun Zhu, Shaoshuai Tang, Jinwen Feng, Zhaoyu Qin, Yan Li, Lin Huang, Yanan Yin, Lingli Zhu, Wenjun Yang, Chen Ding

Mass spectrometry (MS)-based proteomics has become instrumental in comprehensively investigating complex biological systems. Data-independent acquisition (DIA)-MS, utilizing hybrid spectral library search strategies, allows for the simultaneous quantification of thousands of proteins, showing promise in enhancing protein identification and quantification precision. However, low-quality profiles can considerably undermine quantitative precision, resulting in inaccurate protein quantification. To tackle this challenge, we introduced STAVER, a novel algorithm that leverages standardized benchmark datasets to reduce non-biological variation in large-scale DIA-MS analyses. By eliminating unwanted noise in MS signals, STAVER significantly improved protein quantification precision, especially in hybrid spectral library searches. Moreover, we validated STAVER's robustness and applicability across multiple large-scale DIA datasets, demonstrating significantly enhanced precision and reproducibility of protein quantification. STAVER offers an innovative and effective approach for enhancing the quality of large-scale DIA proteomic data, facilitating cross-platform and cross-laboratory comparative analyses. This advancement significantly enhances the consistency and reliability of findings in clinical research. The complete package is available at https://github.com/Ran485/STAVER.

基于质谱（MS）的蛋白质组学已成为全面研究复杂生物系统的重要工具。数据独立采集（DIA）-质谱利用混合谱库搜索策略，可同时对数千种蛋白质进行定量分析，在提高蛋白质鉴定和定量精度方面大有可为。然而，低质量的图谱会大大降低定量精度，导致蛋白质定量不准确。为了应对这一挑战，我们引入了 STAVER 算法，这是一种利用标准化基准数据集来减少大规模 DIA-MS 分析中的非生物变异的新型算法。通过消除质谱信号中不必要的噪声，STAVER 显著提高了蛋白质定量精度，尤其是在混合谱库搜索中。此外，我们还在多个大规模 DIA 数据集上验证了 STAVER 的稳健性和适用性，证明其显著提高了蛋白质定量的精度和可重复性。STAVER 为提高大规模 DIA 蛋白质组学数据的质量提供了一种创新而有效的方法，促进了跨平台和跨实验室的比较分析。这一进步大大提高了临床研究结果的一致性和可靠性。完整的软件包可从 https://github.com/Ran485/STAVER 获取。

{"title":"STAVER: a standardized benchmark dataset-based algorithm for effective variation reduction in large-scale DIA-MS data.","authors":"Peng Ran, Yunzhi Wang, Kai Li, Shiman He, Subei Tan, Jiacheng Lv, Jiajun Zhu, Shaoshuai Tang, Jinwen Feng, Zhaoyu Qin, Yan Li, Lin Huang, Yanan Yin, Lingli Zhu, Wenjun Yang, Chen Ding","doi":"10.1093/bib/bbae553","DOIUrl":"10.1093/bib/bbae553","url":null,"abstract":"Mass spectrometry (MS)-based proteomics has become instrumental in comprehensively investigating complex biological systems. Data-independent acquisition (DIA)-MS, utilizing hybrid spectral library search strategies, allows for the simultaneous quantification of thousands of proteins, showing promise in enhancing protein identification and quantification precision. However, low-quality profiles can considerably undermine quantitative precision, resulting in inaccurate protein quantification. To tackle this challenge, we introduced STAVER, a novel algorithm that leverages standardized benchmark datasets to reduce non-biological variation in large-scale DIA-MS analyses. By eliminating unwanted noise in MS signals, STAVER significantly improved protein quantification precision, especially in hybrid spectral library searches. Moreover, we validated STAVER's robustness and applicability across multiple large-scale DIA datasets, demonstrating significantly enhanced precision and reproducibility of protein quantification. STAVER offers an innovative and effective approach for enhancing the quality of large-scale DIA proteomic data, facilitating cross-platform and cross-laboratory comparative analyses. This advancement significantly enhances the consistency and reliability of findings in clinical research. The complete package is available at https://github.com/Ran485/STAVER.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11540132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MultiFeatVotPIP: a voting-based ensemble learning framework for predicting proinflammatory peptides. MultiFeatVotPIP：预测促炎肽的基于投票的集合学习框架。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae505

Chaorui Yan, Aoyun Geng, Zhuoyu Pan, Zilong Zhang, Feifei Cui

Inflammatory responses may lead to tissue or organ damage, and proinflammatory peptides (PIPs) are signaling peptides that can induce such responses. Many diseases have been redefined as inflammatory diseases. To identify PIPs more efficiently, we expanded the dataset and designed an ensemble learning model with manually encoded features. Specifically, we adopted a more comprehensive feature encoding method and considered the actual impact of certain features to filter them. Identification and prediction of PIPs were performed using an ensemble learning model based on five different classifiers. The results show that the model's sensitivity, specificity, accuracy, and Matthews correlation coefficient are all higher than those of the state-of-the-art models. We named this model MultiFeatVotPIP, and both the model and the data can be accessed publicly at https://github.com/ChaoruiYan019/MultiFeatVotPIP. Additionally, we have developed a user-friendly web interface for users, which can be accessed at http://www.bioai-lab.com/MultiFeatVotPIP.

炎症反应可能导致组织或器官损伤，而促炎肽（PIPs）是能够诱发这种反应的信号肽。许多疾病已被重新定义为炎症性疾病。为了更有效地识别 PIPs，我们扩大了数据集，并设计了一个具有人工编码特征的集合学习模型。具体来说，我们采用了一种更全面的特征编码方法，并考虑了某些特征的实际影响，对其进行了过滤。我们使用基于五个不同分类器的集合学习模型对 PIP 进行了识别和预测。结果表明，该模型的灵敏度、特异性、准确度和马修斯相关系数均高于最先进的模型。我们将该模型命名为 MultiFeatVotPIP，模型和数据均可在 https://github.com/ChaoruiYan019/MultiFeatVotPIP 上公开访问。此外，我们还为用户开发了一个友好的网络界面，访问网址为 http://www.bioai-lab.com/MultiFeatVotPIP。

{"title":"MultiFeatVotPIP: a voting-based ensemble learning framework for predicting proinflammatory peptides.","authors":"Chaorui Yan, Aoyun Geng, Zhuoyu Pan, Zilong Zhang, Feifei Cui","doi":"10.1093/bib/bbae505","DOIUrl":"https://doi.org/10.1093/bib/bbae505","url":null,"abstract":"Inflammatory responses may lead to tissue or organ damage, and proinflammatory peptides (PIPs) are signaling peptides that can induce such responses. Many diseases have been redefined as inflammatory diseases. To identify PIPs more efficiently, we expanded the dataset and designed an ensemble learning model with manually encoded features. Specifically, we adopted a more comprehensive feature encoding method and considered the actual impact of certain features to filter them. Identification and prediction of PIPs were performed using an ensemble learning model based on five different classifiers. The results show that the model's sensitivity, specificity, accuracy, and Matthews correlation coefficient are all higher than those of the state-of-the-art models. We named this model MultiFeatVotPIP, and both the model and the data can be accessed publicly at https://github.com/ChaoruiYan019/MultiFeatVotPIP. Additionally, we have developed a user-friendly web interface for users, which can be accessed at http://www.bioai-lab.com/MultiFeatVotPIP.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SELF-Former: multi-scale gene filtration transformer for single-cell spatial reconstruction. SELF-Former：用于单细胞空间重建的多尺度基因过滤转换器。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae523

Tianyi Chen, Xindian Wei, Lianxin Xie, Yunfei Zhang, Cheng Liu, Wenjun Shen, Si Wu, Hau-San Wong

The spatial reconstruction of single-cell RNA sequencing (scRNA-seq) data into spatial transcriptomics (ST) is a rapidly evolving field that addresses the significant challenge of aligning gene expression profiles to their spatial origins within tissues. This task is complicated by the inherent batch effects and the need for precise gene expression characterization to accurately reflect spatial information. To address these challenges, we developed SELF-Former, a transformer-based framework that utilizes multi-scale structures to learn gene representations, while designing spatial correlation constraints for the reconstruction of corresponding ST data. SELF-Former excels in recovering the spatial information of ST data and effectively mitigates batch effects between scRNA-seq and ST data. A novel aspect of SELF-Former is the introduction of a gene filtration module, which significantly enhances the spatial reconstruction task by selecting genes that are crucial for accurate spatial positioning and reconstruction. The superior performance and effectiveness of SELF-Former's modules have been validated across four benchmark datasets, establishing it as a robust and effective method for spatial reconstruction tasks. SELF-Former demonstrates its capability to extract meaningful gene expression information from scRNA-seq data and accurately map it to the spatial context of real ST data. Our method represents a significant advancement in the field, offering a reliable approach for spatial reconstruction.

将单细胞 RNA 测序（scRNA-seq）数据空间重构为空间转录组学（ST）是一个快速发展的领域，它解决了将基因表达谱与其在组织内的空间起源对齐的重大挑战。由于固有的批次效应以及需要精确的基因表达表征以准确反映空间信息，这项任务变得非常复杂。为了应对这些挑战，我们开发了 SELF-Former，这是一种基于变换器的框架，它利用多尺度结构来学习基因表征，同时为重建相应的 ST 数据设计空间相关性约束。SELF-Former 擅长恢复 ST 数据的空间信息，并能有效缓解 scRNA-seq 和 ST 数据之间的批次效应。SELF-Former 的一个新颖之处是引入了基因过滤模块，通过选择对准确空间定位和重建至关重要的基因，大大增强了空间重建任务。SELF-Former 模块的卓越性能和有效性已在四个基准数据集上得到验证，使其成为空间重建任务中一种稳健有效的方法。SELF-Former 证明了自己有能力从 scRNA-seq 数据中提取有意义的基因表达信息，并将其准确映射到真实 ST 数据的空间环境中。我们的方法代表了该领域的重大进步，为空间重建提供了一种可靠的方法。

{"title":"SELF-Former: multi-scale gene filtration transformer for single-cell spatial reconstruction.","authors":"Tianyi Chen, Xindian Wei, Lianxin Xie, Yunfei Zhang, Cheng Liu, Wenjun Shen, Si Wu, Hau-San Wong","doi":"10.1093/bib/bbae523","DOIUrl":"https://doi.org/10.1093/bib/bbae523","url":null,"abstract":"The spatial reconstruction of single-cell RNA sequencing (scRNA-seq) data into spatial transcriptomics (ST) is a rapidly evolving field that addresses the significant challenge of aligning gene expression profiles to their spatial origins within tissues. This task is complicated by the inherent batch effects and the need for precise gene expression characterization to accurately reflect spatial information. To address these challenges, we developed SELF-Former, a transformer-based framework that utilizes multi-scale structures to learn gene representations, while designing spatial correlation constraints for the reconstruction of corresponding ST data. SELF-Former excels in recovering the spatial information of ST data and effectively mitigates batch effects between scRNA-seq and ST data. A novel aspect of SELF-Former is the introduction of a gene filtration module, which significantly enhances the spatial reconstruction task by selecting genes that are crucial for accurate spatial positioning and reconstruction. The superior performance and effectiveness of SELF-Former's modules have been validated across four benchmark datasets, establishing it as a robust and effective method for spatial reconstruction tasks. SELF-Former demonstrates its capability to extract meaningful gene expression information from scRNA-seq data and accurately map it to the spatial context of real ST data. Our method represents a significant advancement in the field, offering a reliable approach for spatial reconstruction.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483138/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Constructing the dynamic transcriptional regulatory networks to identify phenotype-specific transcription regulators. 构建动态转录调控网络，识别表型特异性转录调控因子。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae542

Yang Guo, Zhiqiang Xiao

The transcriptional regulatory network (TRN) is a graph framework that helps understand the complex transcriptional regulation mechanisms in the transcription process. Identifying the phenotype-specific transcription regulators is vital to reveal the functional roles of transcription elements in associating the specific phenotypes. Although many methods have been developed towards detecting the phenotype-specific transcription elements based on the static TRN in the past decade, most of them are not satisfactory for elucidating the phenotype-related functional roles of transcription regulators in multiple levels, as the dynamic characteristics of transcription regulators are usually ignored in static models. In this study, we introduce a novel framework called DTGN to identify the phenotype-specific transcription factors (TFs) and pathways by constructing dynamic TRNs. We first design a graph autoencoder model to integrate the phenotype-oriented time-series gene expression data and static TRN to learn the temporal representations of genes. Then, based on the learned temporal representations of genes, we develop a statistical method to construct a series of dynamic TRNs associated with the development of specific phenotypes. Finally, we identify the phenotype-specific TFs and pathways from the constructed dynamic TRNs. Results from multiple phenotypic datasets show that the proposed DTGN framework outperforms most existing methods in identifying phenotype-specific TFs and pathways. Our framework offers a new approach to exploring the functional roles of transcription regulators that associate with specific phenotypes in a dynamic model.

转录调控网络（TRN）是一个图形框架，有助于理解转录过程中复杂的转录调控机制。识别表型特异性转录调控因子对于揭示转录元件在特定表型中的功能作用至关重要。过去十年中，虽然有很多基于静态 TRN 检测表型特异性转录元件的方法，但由于静态模型通常忽略了转录调节因子的动态特征，因此大多数方法在多层次阐明转录调节因子与表型相关的功能作用方面并不理想。在本研究中，我们引入了一种名为 DTGN 的新框架，通过构建动态 TRN 来识别表型特异的转录因子（TFs）和通路。我们首先设计了一个图自动编码器模型，将面向表型的时间序列基因表达数据和静态 TRN 整合在一起，以学习基因的时间表征。然后，基于学习到的基因时间表征，我们开发了一种统计方法来构建一系列与特定表型发展相关的动态 TRN。最后，我们从构建的动态 TRN 中识别出表型特异的 TF 和通路。多个表型数据集的研究结果表明，在识别表型特异性 TF 和通路方面，所提出的 DTGN 框架优于大多数现有方法。我们的框架为探索动态模型中与特定表型相关的转录调节因子的功能作用提供了一种新方法。

{"title":"Constructing the dynamic transcriptional regulatory networks to identify phenotype-specific transcription regulators.","authors":"Yang Guo, Zhiqiang Xiao","doi":"10.1093/bib/bbae542","DOIUrl":"https://doi.org/10.1093/bib/bbae542","url":null,"abstract":"The transcriptional regulatory network (TRN) is a graph framework that helps understand the complex transcriptional regulation mechanisms in the transcription process. Identifying the phenotype-specific transcription regulators is vital to reveal the functional roles of transcription elements in associating the specific phenotypes. Although many methods have been developed towards detecting the phenotype-specific transcription elements based on the static TRN in the past decade, most of them are not satisfactory for elucidating the phenotype-related functional roles of transcription regulators in multiple levels, as the dynamic characteristics of transcription regulators are usually ignored in static models. In this study, we introduce a novel framework called DTGN to identify the phenotype-specific transcription factors (TFs) and pathways by constructing dynamic TRNs. We first design a graph autoencoder model to integrate the phenotype-oriented time-series gene expression data and static TRN to learn the temporal representations of genes. Then, based on the learned temporal representations of genes, we develop a statistical method to construct a series of dynamic TRNs associated with the development of specific phenotypes. Finally, we identify the phenotype-specific TFs and pathways from the constructed dynamic TRNs. Results from multiple phenotypic datasets show that the proposed DTGN framework outperforms most existing methods in identifying phenotype-specific TFs and pathways. Our framework offers a new approach to exploring the functional roles of transcription regulators that associate with specific phenotypes in a dynamic model.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11503644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142495337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0