首页 > 最新文献

Quantitative Biology最新文献

英文 中文
Bioinformatics and biomedical informatics with ChatGPT: Year one review. 使用 ChatGPT 的生物信息学和生物医学信息学:一年回顾。
IF 0.6 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-01 Epub Date: 2024-06-27 DOI: 10.1002/qub2.67
Jinge Wang, Zien Cheng, Qiuming Yao, Li Liu, Dong Xu, Gangqing Hu

The year 2023 marked a significant surge in the exploration of applying large language model chatbots, notably Chat Generative Pre-trained Transformer (ChatGPT), across various disciplines. We surveyed the application of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.

2023 年,大型语言模型聊天机器人(尤其是聊天生成预训练转换器(ChatGPT))在各学科中的应用探索出现了显著的增长。我们调查了 ChatGPT 在生物信息学和生物医学信息学中的全年应用情况,涵盖了omics、遗传学、生物医学文本挖掘、药物发现、生物医学图像理解、生物信息学编程和生物信息学教育。我们的调查描述了该聊天机器人目前在生物信息学方面的优势和局限性,并对未来发展的潜在途径提出了见解。
{"title":"Bioinformatics and biomedical informatics with ChatGPT: Year one review.","authors":"Jinge Wang, Zien Cheng, Qiuming Yao, Li Liu, Dong Xu, Gangqing Hu","doi":"10.1002/qub2.67","DOIUrl":"10.1002/qub2.67","url":null,"abstract":"<p><p>The year 2023 marked a significant surge in the exploration of applying large language model chatbots, notably Chat Generative Pre-trained Transformer (ChatGPT), across various disciplines. We surveyed the application of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.</p>","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446534/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive evaluation of large language models in mining gene relations and pathway knowledge. 对挖掘基因关系和路径知识的大型语言模型进行综合评估。
IF 0.6 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-01 Epub Date: 2024-06-21 DOI: 10.1002/qub2.57
Muhammad Azam, Yibo Chen, Micheal Olaolu Arowolo, Haowang Liu, Mihail Popescu, Dong Xu

Understanding complex biological pathways, including gene-gene interactions and gene regulatory networks, is critical for exploring disease mechanisms and drug development. Manual literature curation of biological pathways cannot keep up with the exponential growth of new discoveries in the literature. Large-scale language models (LLMs) trained on extensive text corpora contain rich biological information, and they can be mined as a biological knowledge graph. This study assesses 21 LLMs, including both application programming interface (API)-based models and open-source models in their capacities of retrieving biological knowledge. The evaluation focuses on predicting gene regulatory relations (activation, inhibition, and phosphorylation) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway components. Results indicated a significant disparity in model performance. API-based models GPT-4 and Claude-Pro showed superior performance, with an F1 score of 0.4448 and 0.4386 for the gene regulatory relation prediction, and a Jaccard similarity index of 0.2778 and 0.2657 for the KEGG pathway prediction, respectively. Open-source models lagged behind their API-based counterparts, whereas Falcon-180b and llama2-7b had the highest F1 scores of 0.2787 and 0.1923 in gene regulatory relations, respectively. The KEGG pathway recognition had a Jaccard similarity index of 0.2237 for Falcon-180b and 0.2207 for llama2-7b. Our study suggests that LLMs are informative in gene network analysis and pathway mapping, but their effectiveness varies, necessitating careful model selection. This work also provides a case study and insight into using LLMs das knowledge graphs. Our code is publicly available at the website of GitHub (Muh-aza).

了解复杂的生物通路,包括基因与基因之间的相互作用和基因调控网络,对于探索疾病机理和药物开发至关重要。生物通路的人工文献整理跟不上文献中新发现的指数级增长。在大量文本语料库中训练的大规模语言模型(LLM)包含丰富的生物信息,可以作为生物知识图谱进行挖掘。本研究评估了 21 种 LLM,包括基于应用编程接口(API)的模型和开源模型,以评估它们检索生物知识的能力。评估的重点是预测基因调控关系(激活、抑制和磷酸化)以及《京都基因组百科全书》(KEGG)通路成分。结果表明,模型性能存在明显差异。基于 API 的模型 GPT-4 和 Claude-Pro 表现优异,基因调控关系预测的 F1 分数分别为 0.4448 和 0.4386,KEGG 通路预测的 Jaccard 相似度指数分别为 0.2778 和 0.2657。开源模型落后于基于 API 的模型,而 Falcon-180b 和 llama2-7b 在基因调控关系方面的 F1 分数最高,分别为 0.2787 和 0.1923。在 KEGG 通路识别中,Falcon-180b 和 llama2-7b 的 Jaccard 相似度指数分别为 0.2237 和 0.2207。我们的研究表明,LLMs 在基因网络分析和通路图绘制中具有参考价值,但其有效性各不相同,因此需要谨慎选择模型。这项工作还为使用 LLMs das 知识图谱提供了案例研究和见解。我们的代码可在 GitHub 网站(Muh-aza)上公开获取。
{"title":"A comprehensive evaluation of large language models in mining gene relations and pathway knowledge.","authors":"Muhammad Azam, Yibo Chen, Micheal Olaolu Arowolo, Haowang Liu, Mihail Popescu, Dong Xu","doi":"10.1002/qub2.57","DOIUrl":"10.1002/qub2.57","url":null,"abstract":"<p><p>Understanding complex biological pathways, including gene-gene interactions and gene regulatory networks, is critical for exploring disease mechanisms and drug development. Manual literature curation of biological pathways cannot keep up with the exponential growth of new discoveries in the literature. Large-scale language models (LLMs) trained on extensive text corpora contain rich biological information, and they can be mined as a biological knowledge graph. This study assesses 21 LLMs, including both application programming interface (API)-based models and open-source models in their capacities of retrieving biological knowledge. The evaluation focuses on predicting gene regulatory relations (activation, inhibition, and phosphorylation) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway components. Results indicated a significant disparity in model performance. API-based models GPT-4 and Claude-Pro showed superior performance, with an F1 score of 0.4448 and 0.4386 for the gene regulatory relation prediction, and a Jaccard similarity index of 0.2778 and 0.2657 for the KEGG pathway prediction, respectively. Open-source models lagged behind their API-based counterparts, whereas Falcon-180b and llama2-7b had the highest F1 scores of 0.2787 and 0.1923 in gene regulatory relations, respectively. The KEGG pathway recognition had a Jaccard similarity index of 0.2237 for Falcon-180b and 0.2207 for llama2-7b. Our study suggests that LLMs are informative in gene network analysis and pathway mapping, but their effectiveness varies, necessitating careful model selection. This work also provides a case study and insight into using LLMs das knowledge graphs. Our code is publicly available at the website of GitHub (Muh-aza).</p>","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446478/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foundation models for bioinformatics 生物信息学基础模型
IF 0.6 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-24 DOI: 10.1002/qub2.69
Ziyu Chen, Lin Wei, Ge Gao
Transformer‐based foundation models such as ChatGPTs have revolutionized our daily life and affected many fields including bioinformatics. In this perspective, we first discuss about the direct application of textual foundation models on bioinformatics tasks, focusing on how to make the most out of canonical large language models and mitigate their inherent flaws. Meanwhile, we go through the transformer‐based, bioinformatics‐tailored foundation models for both sequence and non‐sequence data. In particular, we envision the further development directions as well as challenges for bioinformatics foundation models.
基于变换器的基础模型(如 ChatGPT)已经彻底改变了我们的日常生活,并影响了包括生物信息学在内的许多领域。在本视角中,我们首先讨论了文本基础模型在生物信息学任务中的直接应用,重点是如何充分利用经典大语言模型并减少其固有缺陷。同时,我们还讨论了基于转换器的、为序列和非序列数据定制的生物信息学基础模型。我们特别展望了生物信息学基础模型的进一步发展方向和挑战。
{"title":"Foundation models for bioinformatics","authors":"Ziyu Chen, Lin Wei, Ge Gao","doi":"10.1002/qub2.69","DOIUrl":"https://doi.org/10.1002/qub2.69","url":null,"abstract":"Transformer‐based foundation models such as ChatGPTs have revolutionized our daily life and affected many fields including bioinformatics. In this perspective, we first discuss about the direct application of textual foundation models on bioinformatics tasks, focusing on how to make the most out of canonical large language models and mitigate their inherent flaws. Meanwhile, we go through the transformer‐based, bioinformatics‐tailored foundation models for both sequence and non‐sequence data. In particular, we envision the further development directions as well as challenges for bioinformatics foundation models.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141806077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A penalized integrative deep neural network for variable selection among multiple omics datasets 用于在多个 omics 数据集中进行变量选择的惩罚性整合深度神经网络
IF 3.1 4区 生物学 Q1 Mathematics Pub Date : 2024-06-07 DOI: 10.1002/qub2.51
Yang Li, Xiaonan Ren, Haochen Yu, Tao Sun, Shuangge Ma
Deep learning has been increasingly popular in omics data analysis. Recent works incorporating variable selection into deep learning have greatly enhanced the model’s interpretability. However, because deep learning desires a large sample size, the existing methods may result in uncertain findings when the dataset has a small sample size, commonly seen in omics data analysis. With the explosion and availability of omics data from multiple populations/studies, the existing methods naively pool them into one dataset to enhance the sample size while ignoring that variable structures can differ across datasets, which might lead to inaccurate variable selection results. We propose a penalized integrative deep neural network (PIN) to simultaneously select important variables from multiple datasets. PIN directly aggregates multiple datasets as input and considers both homogeneity and heterogeneity situations among multiple datasets in an integrative analysis framework. Results from extensive simulation studies and applications of PIN to gene expression datasets from elders with different cognitive statuses or ovarian cancer patients at different stages demonstrate that PIN outperforms existing methods with considerably improved performance among multiple datasets. The source code is freely available on Github (rucliyang/PINFunc). We speculate that the proposed PIN method will promote the identification of disease‐related important variables based on multiple studies/datasets from diverse origins.
深度学习在全息数据分析中越来越受欢迎。最近将变量选择纳入深度学习的研究大大提高了模型的可解释性。然而,由于深度学习需要大量样本,当数据集样本量较小时,现有方法可能会导致不确定的结论,这在全局组学数据分析中很常见。随着来自多个人群/研究的 omics 数据的爆炸式增长和可用性的提高,现有方法天真地将这些数据汇集到一个数据集,以提高样本量,却忽略了不同数据集的变量结构可能不同,这可能导致变量选择结果不准确。我们提出了一种惩罚性整合深度神经网络(PIN),可同时从多个数据集中选择重要变量。PIN 直接聚合多个数据集作为输入,并在一个整合分析框架中考虑多个数据集之间的同质性和异质性情况。大量的模拟研究和 PIN 在不同认知状态的老年人或不同阶段的卵巢癌患者基因表达数据集上的应用结果表明,PIN 优于现有方法,在多个数据集之间的性能有了显著提高。源代码可在 Github 上免费获取(rucliyang/PINFunc)。我们推测,所提出的 PIN 方法将促进基于不同来源的多个研究/数据集识别与疾病相关的重要变量。
{"title":"A penalized integrative deep neural network for variable selection among multiple omics datasets","authors":"Yang Li, Xiaonan Ren, Haochen Yu, Tao Sun, Shuangge Ma","doi":"10.1002/qub2.51","DOIUrl":"https://doi.org/10.1002/qub2.51","url":null,"abstract":"Deep learning has been increasingly popular in omics data analysis. Recent works incorporating variable selection into deep learning have greatly enhanced the model’s interpretability. However, because deep learning desires a large sample size, the existing methods may result in uncertain findings when the dataset has a small sample size, commonly seen in omics data analysis. With the explosion and availability of omics data from multiple populations/studies, the existing methods naively pool them into one dataset to enhance the sample size while ignoring that variable structures can differ across datasets, which might lead to inaccurate variable selection results. We propose a penalized integrative deep neural network (PIN) to simultaneously select important variables from multiple datasets. PIN directly aggregates multiple datasets as input and considers both homogeneity and heterogeneity situations among multiple datasets in an integrative analysis framework. Results from extensive simulation studies and applications of PIN to gene expression datasets from elders with different cognitive statuses or ovarian cancer patients at different stages demonstrate that PIN outperforms existing methods with considerably improved performance among multiple datasets. The source code is freely available on Github (rucliyang/PINFunc). We speculate that the proposed PIN method will promote the identification of disease‐related important variables based on multiple studies/datasets from diverse origins.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141372161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comprehensive cross cancer analyses reveal mutational signature cancer specificity 癌症交叉综合分析揭示突变特征癌症特异性
IF 3.1 4区 生物学 Q1 Mathematics Pub Date : 2024-06-05 DOI: 10.1002/qub2.49
Rui Xin, Limin Jiang, Hui Yu, Fengyao Yan, Jijun Tang, Yan Guo
Mutational signatures refer to distinct patterns of DNA mutations that occur in a specific context or under certain conditions. It is a powerful tool to describe cancer etiology. We conducted a study to show cancer heterogeneity and cancer specificity from the aspect of mutational signatures through collinearity analysis and machine learning techniques. Through thorough training and independent validation, our results show that while the majority of the mutational signatures are distinct, similarities between certain mutational signature pairs can be observed through both mutation patterns and mutational signature abundance. The observation can potentially assist to determine the etiology of yet elusive mutational signatures. Further analysis using machine learning approaches demonstrated moderate mutational signature cancer specificity. Skin cancer among all cancer types demonstrated the strongest mutational signature specificity.
突变特征是指在特定环境或特定条件下发生的DNA突变的独特模式。它是描述癌症病因学的有力工具。我们开展了一项研究,通过共线性分析和机器学习技术,从突变特征方面展示癌症的异质性和特异性。通过全面的训练和独立验证,我们的结果表明,虽然大多数突变特征是不同的,但通过突变模式和突变特征丰度可以观察到某些突变特征对之间的相似性。这一观察结果可能有助于确定难以捉摸的突变特征的病因。利用机器学习方法进行的进一步分析表明,突变特征癌症特异性适中。在所有癌症类型中,皮肤癌的突变特征特异性最强。
{"title":"Comprehensive cross cancer analyses reveal mutational signature cancer specificity","authors":"Rui Xin, Limin Jiang, Hui Yu, Fengyao Yan, Jijun Tang, Yan Guo","doi":"10.1002/qub2.49","DOIUrl":"https://doi.org/10.1002/qub2.49","url":null,"abstract":"Mutational signatures refer to distinct patterns of DNA mutations that occur in a specific context or under certain conditions. It is a powerful tool to describe cancer etiology. We conducted a study to show cancer heterogeneity and cancer specificity from the aspect of mutational signatures through collinearity analysis and machine learning techniques. Through thorough training and independent validation, our results show that while the majority of the mutational signatures are distinct, similarities between certain mutational signature pairs can be observed through both mutation patterns and mutational signature abundance. The observation can potentially assist to determine the etiology of yet elusive mutational signatures. Further analysis using machine learning approaches demonstrated moderate mutational signature cancer specificity. Skin cancer among all cancer types demonstrated the strongest mutational signature specificity.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141383956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high‐order epistasis model SimHOEPI:利用高阶外显率模型生成单核苷酸多态性数据的再采样模拟器
IF 3.1 4区 生物学 Q1 Mathematics Pub Date : 2024-04-16 DOI: 10.1002/qub2.42
Yahan Li, Xinrui Cai, J. Shang, Yuanyuan Zhang, Jinxing Liu
Epistasis is a ubiquitous phenomenon in genetics, and is considered to be one of main factors in current efforts to unveil missing heritability of complex diseases. Simulation data is crucial for evaluating epistasis detection tools in genome‐wide association studies (GWAS). Existing simulators normally suffer from two limitations: absence of support for high‐order epistasis models containing multiple single nucleotide polymorphisms (SNPs), and inability to generate simulation SNP data independently. In this study, we proposed a simulator SimHOEPI, which is capable of calculating penetrance tables of high‐order epistasis models depending on either prevalence or heritability, and uses a resampling strategy to generate simulation data independently. Highlights of SimHOEPI are the preservation of realistic minor allele frequencies in sampling data, the accurate calculation and embedding of high‐order epistasis models, and acceptable simulation time. A series of experiments were carried out to verify these properties from different aspects. Experimental results show that SimHOEPI can generate simulation SNP data independently with high‐order epistasis models, implying that it might be an alternative simulator for GWAS.
外显率是遗传学中无处不在的现象,被认为是目前揭示复杂疾病缺失遗传性的主要因素之一。模拟数据对于评估全基因组关联研究(GWAS)中的外显子检测工具至关重要。现有的模拟器通常存在两个局限:不支持包含多个单核苷酸多态性(SNP)的高阶表观模型,以及无法独立生成模拟 SNP 数据。在这项研究中,我们提出了一种模拟器 SimHOEPI,它能够根据患病率或遗传率计算高阶外显率模型的渗透率表,并使用重采样策略独立生成模拟数据。SimHOEPI 的亮点是在采样数据中保留了真实的小等位基因频率,精确计算和嵌入高阶外显率模型,以及可接受的模拟时间。为了从不同方面验证这些特性,我们进行了一系列实验。实验结果表明,SimHOEPI 可以独立生成具有高阶外显率模型的模拟 SNP 数据,这意味着它可以成为 GWAS 的替代模拟器。
{"title":"SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high‐order epistasis model","authors":"Yahan Li, Xinrui Cai, J. Shang, Yuanyuan Zhang, Jinxing Liu","doi":"10.1002/qub2.42","DOIUrl":"https://doi.org/10.1002/qub2.42","url":null,"abstract":"Epistasis is a ubiquitous phenomenon in genetics, and is considered to be one of main factors in current efforts to unveil missing heritability of complex diseases. Simulation data is crucial for evaluating epistasis detection tools in genome‐wide association studies (GWAS). Existing simulators normally suffer from two limitations: absence of support for high‐order epistasis models containing multiple single nucleotide polymorphisms (SNPs), and inability to generate simulation SNP data independently. In this study, we proposed a simulator SimHOEPI, which is capable of calculating penetrance tables of high‐order epistasis models depending on either prevalence or heritability, and uses a resampling strategy to generate simulation data independently. Highlights of SimHOEPI are the preservation of realistic minor allele frequencies in sampling data, the accurate calculation and embedding of high‐order epistasis models, and acceptable simulation time. A series of experiments were carried out to verify these properties from different aspects. Experimental results show that SimHOEPI can generate simulation SNP data independently with high‐order epistasis models, implying that it might be an alternative simulator for GWAS.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140695859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Functional predictability of universal gene circuits in diverse microbial hosts 不同微生物宿主中通用基因回路的功能可预测性
IF 3.1 4区 生物学 Q1 Mathematics Pub Date : 2024-04-14 DOI: 10.1002/qub2.41
Chenrui Qin, Tong Xu, Xuejin Zhao, Yeqing Zong, Haoqian M. Zhang, Chunbo Lou, Ouyang Qi, Long Qian
Although the principles of synthetic biology were initially established in model bacteria, microbial producers, extremophiles and gut microbes have now emerged as valuable prokaryotic chassis for biological engineering. Extending the host range in which designed circuits can function reliably and predictably presents a major challenge for the concept of synthetic biology to materialize. In this work, we systematically characterized the cross‐species universality of two transcriptional regulatory modules—the T7 RNA polymerase activator module and the repressors module—in three non‐model microbes. We found striking linear relationships in circuit activities among different organisms for both modules. Parametrized model fitting revealed host non‐specific parameters defining the universality of both modules. Lastly, a genetic NOT gate and a band‐pass filter circuit were constructed from these modules and tested in non‐model organisms. Combined models employing host non‐specific parameters were successful in quantitatively predicting circuit behaviors, underscoring the potential of universal biological parts and predictive modeling in synthetic bioengineering.
虽然合成生物学的原理最初是在模式细菌中确立的,但现在微生物生产者、嗜极生物和肠道微生物已成为生物工程的重要原核生物底盘。要实现合成生物学的概念,就必须扩大宿主范围,使设计的电路能在其中可靠、可预测地发挥作用。在这项工作中,我们系统地描述了两种转录调控模块--T7 RNA 聚合酶激活模块和抑制模块--在三种非模式微生物中的跨物种通用性。我们发现这两个模块在不同生物体内的电路活动具有显著的线性关系。参数化模型拟合揭示了确定这两个模块普遍性的宿主非特异性参数。最后,我们利用这些模块构建了一个遗传 NOT 门和一个带通滤波器电路,并在非模式生物中进行了测试。采用宿主非特异性参数的组合模型成功地定量预测了电路行为,凸显了通用生物部件和预测模型在合成生物工程中的潜力。
{"title":"Functional predictability of universal gene circuits in diverse microbial hosts","authors":"Chenrui Qin, Tong Xu, Xuejin Zhao, Yeqing Zong, Haoqian M. Zhang, Chunbo Lou, Ouyang Qi, Long Qian","doi":"10.1002/qub2.41","DOIUrl":"https://doi.org/10.1002/qub2.41","url":null,"abstract":"Although the principles of synthetic biology were initially established in model bacteria, microbial producers, extremophiles and gut microbes have now emerged as valuable prokaryotic chassis for biological engineering. Extending the host range in which designed circuits can function reliably and predictably presents a major challenge for the concept of synthetic biology to materialize. In this work, we systematically characterized the cross‐species universality of two transcriptional regulatory modules—the T7 RNA polymerase activator module and the repressors module—in three non‐model microbes. We found striking linear relationships in circuit activities among different organisms for both modules. Parametrized model fitting revealed host non‐specific parameters defining the universality of both modules. Lastly, a genetic NOT gate and a band‐pass filter circuit were constructed from these modules and tested in non‐model organisms. Combined models employing host non‐specific parameters were successful in quantitatively predicting circuit behaviors, underscoring the potential of universal biological parts and predictive modeling in synthetic bioengineering.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140704855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A clinical trial termination prediction model based on denoising autoencoder and deep survival regression 基于去噪自动编码器和深度生存回归的临床试验终止预测模型
IF 3.1 4区 生物学 Q1 Mathematics Pub Date : 2024-04-12 DOI: 10.1002/qub2.43
Huamei Qi, Wenhui Yang, Wenqin Zou, Yuxuan Hu
Effective clinical trials are necessary for understanding medical advances but early termination of trials can result in unnecessary waste of resources. Survival models can be used to predict survival probabilities in such trials. However, survival data from clinical trials are sparse, and DeepSurv cannot accurately capture their effective features, making the models weak in generalization and decreasing their prediction accuracy. In this paper, we propose a survival prediction model for clinical trial completion based on the combination of denoising autoencoder (DAE) and DeepSurv models. The DAE is used to obtain a robust representation of features by breaking the loop of raw features after autoencoder training, and then the robust features are provided to DeepSurv as input for training. The clinical trial dataset for training the model was obtained from the ClinicalTrials.gov dataset. A study of clinical trial completion in pregnant women was conducted in response to the fact that many current clinical trials exclude pregnant women. The experimental results showed that the denoising autoencoder and deep survival regression (DAE‐DSR) model was able to extract meaningful and robust features for survival analysis; the C‐index of the training and test datasets were 0.74 and 0.75 respectively. Compared with the Cox proportional hazards model and DeepSurv model, the survival analysis curves obtained by using DAE‐DSR model had more prominent features, and the model was more robust and performed better in actual prediction.
有效的临床试验是了解医学进步的必要条件,但过早终止试验会造成不必要的资源浪费。生存模型可用于预测此类试验的生存概率。然而,临床试验中的生存数据稀少,DeepSurv 无法准确捕捉其有效特征,这使得模型的泛化能力较弱,降低了预测的准确性。本文提出了一种基于去噪自编码器(DAE)和 DeepSurv 模型组合的临床试验完成生存预测模型。在自动编码器训练后,利用 DAE 打破原始特征的循环,获得稳健的特征表示,然后将稳健特征作为输入提供给 DeepSurv 进行训练。用于训练模型的临床试验数据集来自 ClinicalTrials.gov 数据集。针对目前许多临床试验将孕妇排除在外的情况,对孕妇的临床试验完成情况进行了研究。实验结果表明,去噪自编码器和深度生存回归(DAE-DSR)模型能够为生存分析提取有意义且稳健的特征;训练数据集和测试数据集的 C 指数分别为 0.74 和 0.75。与Cox比例危害模型和DeepSurv模型相比,使用DAE-DSR模型得到的生存分析曲线特征更突出,模型更稳健,实际预测效果更好。
{"title":"A clinical trial termination prediction model based on denoising autoencoder and deep survival regression","authors":"Huamei Qi, Wenhui Yang, Wenqin Zou, Yuxuan Hu","doi":"10.1002/qub2.43","DOIUrl":"https://doi.org/10.1002/qub2.43","url":null,"abstract":"Effective clinical trials are necessary for understanding medical advances but early termination of trials can result in unnecessary waste of resources. Survival models can be used to predict survival probabilities in such trials. However, survival data from clinical trials are sparse, and DeepSurv cannot accurately capture their effective features, making the models weak in generalization and decreasing their prediction accuracy. In this paper, we propose a survival prediction model for clinical trial completion based on the combination of denoising autoencoder (DAE) and DeepSurv models. The DAE is used to obtain a robust representation of features by breaking the loop of raw features after autoencoder training, and then the robust features are provided to DeepSurv as input for training. The clinical trial dataset for training the model was obtained from the ClinicalTrials.gov dataset. A study of clinical trial completion in pregnant women was conducted in response to the fact that many current clinical trials exclude pregnant women. The experimental results showed that the denoising autoencoder and deep survival regression (DAE‐DSR) model was able to extract meaningful and robust features for survival analysis; the C‐index of the training and test datasets were 0.74 and 0.75 respectively. Compared with the Cox proportional hazards model and DeepSurv model, the survival analysis curves obtained by using DAE‐DSR model had more prominent features, and the model was more robust and performed better in actual prediction.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140711890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data 基于多组学数据发现泛癌症驱动基因的特征提取框架
IF 3.1 4区 生物学 Q1 Mathematics Pub Date : 2024-04-05 DOI: 10.1002/qub2.40
Xiaomeng Xue, Feng Li, J. Shang, Lingyun Dai, Daohui Ge, Qianqian Ren
The identification of tumor driver genes facilitates accurate cancer diagnosis and treatment, playing a key role in precision oncology, along with gene signaling, regulation, and their interaction with protein complexes. To tackle the challenge of distinguishing driver genes from a large number of genomic data, we construct a feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data (mutations, gene expression, copy number variants, and DNA methylation) combined with protein–protein interaction (PPI) networks. Using a network propagation algorithm, we mine functional information among nodes in the PPI network, focusing on genes with weak node information to represent specific cancer information. From these functional features, we extract distribution features of pan‐cancer data, pan‐cancer TOPSIS features of functional features using the ideal solution method, and SetExpan features of pan‐cancer data from the gene functional features, a method to rank pan‐cancer data based on the average inverse rank. These features represent the common message of pan‐cancer. Finally, we use the lightGBM classification algorithm for gene prediction. Experimental results show that our method outperforms existing methods in terms of the area under the check precision‐recall curve (AUPRC) and demonstrates better performance across different PPI networks. This indicates our framework’s effectiveness in predicting potential cancer genes, offering valuable insights for the diagnosis and treatment of tumors.
肿瘤驱动基因的鉴定有助于癌症的准确诊断和治疗,在精准肿瘤学中发挥着关键作用,同时还涉及基因信号转导、调控及其与蛋白质复合物的相互作用。为了应对从大量基因组数据中区分驱动基因的挑战,我们构建了一个特征提取框架,用于发现基于多组学数据(突变、基因表达、拷贝数变异和DNA甲基化)和蛋白质-蛋白质相互作用(PPI)网络的泛癌症驱动基因。我们利用网络传播算法挖掘 PPI 网络中节点间的功能信息,重点关注节点信息较弱的基因,以代表特定的癌症信息。从这些功能特征中,我们提取了泛癌症数据的分布特征,利用理想解法提取了功能特征的泛癌症 TOPSIS 特征,并从基因功能特征中提取了泛癌症数据的 SetExpan 特征,这是一种基于平均逆等级对泛癌症数据进行排序的方法。这些特征代表了泛癌症的共同信息。最后,我们使用 lightGBM 分类算法进行基因预测。实验结果表明,我们的方法在检查精度-召回曲线下面积(AUPRC)方面优于现有方法,并在不同的 PPI 网络中表现出更好的性能。这表明我们的框架能有效预测潜在的癌症基因,为肿瘤的诊断和治疗提供有价值的见解。
{"title":"A feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data","authors":"Xiaomeng Xue, Feng Li, J. Shang, Lingyun Dai, Daohui Ge, Qianqian Ren","doi":"10.1002/qub2.40","DOIUrl":"https://doi.org/10.1002/qub2.40","url":null,"abstract":"The identification of tumor driver genes facilitates accurate cancer diagnosis and treatment, playing a key role in precision oncology, along with gene signaling, regulation, and their interaction with protein complexes. To tackle the challenge of distinguishing driver genes from a large number of genomic data, we construct a feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data (mutations, gene expression, copy number variants, and DNA methylation) combined with protein–protein interaction (PPI) networks. Using a network propagation algorithm, we mine functional information among nodes in the PPI network, focusing on genes with weak node information to represent specific cancer information. From these functional features, we extract distribution features of pan‐cancer data, pan‐cancer TOPSIS features of functional features using the ideal solution method, and SetExpan features of pan‐cancer data from the gene functional features, a method to rank pan‐cancer data based on the average inverse rank. These features represent the common message of pan‐cancer. Finally, we use the lightGBM classification algorithm for gene prediction. Experimental results show that our method outperforms existing methods in terms of the area under the check precision‐recall curve (AUPRC) and demonstrates better performance across different PPI networks. This indicates our framework’s effectiveness in predicting potential cancer genes, offering valuable insights for the diagnosis and treatment of tumors.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140736823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GCARDTI: Drug–target interaction prediction based on a hybrid mechanism in drug SELFIES GCARDTI:基于药物 SELFIES 混合机制的药物-靶点相互作用预测
IF 3.1 4区 生物学 Q1 Mathematics Pub Date : 2024-04-01 DOI: 10.1002/qub2.39
Yinfei Feng, Yuanyuan Zhang, Zengqian Deng, Mimi Xiong
The prediction of the interaction between a drug and a target is the most critical issue in the fields of drug development and repurposing. However, there are still two challenges in current deep learning research: (i) the structural information of drug molecules is not fully explored in most drug target studies, and the previous drug SMILES does not correspond well to effective drug molecules and (ii) exploration of the potential relationship between drugs and targets is in need of improvement. In this work, we use a new and better representation of the effective molecular graph structure, SELFIES. We propose a hybrid mechanism framework based on convolutional neural network and graph attention network to capture multi‐view feature information of drug and target molecular structures, and we aim to enhance the ability to capture interaction sites between a drug and a target. In this study, our experiments using two different datasets show that the GCARDTI model outperforms a variety of different model algorithms on different metrics. We also demonstrate the accuracy of our model through two case studies.
预测药物与靶点之间的相互作用是药物开发和再利用领域最关键的问题。然而,目前的深度学习研究仍存在两个难题:(1)大多数药物靶点研究中,药物分子的结构信息没有被充分挖掘,以往的药物SMILES与有效药物分子的对应关系并不理想;(2)对药物与靶点之间潜在关系的挖掘有待改进。在这项工作中,我们使用了一种新的、更好的有效分子图结构表征--SELFIES。我们提出了一种基于卷积神经网络和图注意网络的混合机制框架,以捕捉药物和靶标分子结构的多视角特征信息,并致力于提高捕捉药物和靶标之间相互作用位点的能力。在本研究中,我们使用两个不同数据集进行的实验表明,GCARDTI 模型在不同指标上优于各种不同的模型算法。我们还通过两个案例研究证明了我们模型的准确性。
{"title":"GCARDTI: Drug–target interaction prediction based on a hybrid mechanism in drug SELFIES","authors":"Yinfei Feng, Yuanyuan Zhang, Zengqian Deng, Mimi Xiong","doi":"10.1002/qub2.39","DOIUrl":"https://doi.org/10.1002/qub2.39","url":null,"abstract":"The prediction of the interaction between a drug and a target is the most critical issue in the fields of drug development and repurposing. However, there are still two challenges in current deep learning research: (i) the structural information of drug molecules is not fully explored in most drug target studies, and the previous drug SMILES does not correspond well to effective drug molecules and (ii) exploration of the potential relationship between drugs and targets is in need of improvement. In this work, we use a new and better representation of the effective molecular graph structure, SELFIES. We propose a hybrid mechanism framework based on convolutional neural network and graph attention network to capture multi‐view feature information of drug and target molecular structures, and we aim to enhance the ability to capture interaction sites between a drug and a target. In this study, our experiments using two different datasets show that the GCARDTI model outperforms a variety of different model algorithms on different metrics. We also demonstrate the accuracy of our model through two case studies.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140766626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Quantitative Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1