首页 > 最新文献

Journal of Computational Biology最新文献

英文 中文
Using Traditional and Deep Machine Learning to Predict Emergency Room Triage Levels. 使用传统和深度机器学习预测急诊室分类水平。
IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-06-01 Epub Date: 2025-05-22 DOI: 10.1089/cmb.2024.0632
Mehmet Yıldırım, Savaş Sezik, Ayşe Başar

Accurate triage in emergency rooms is crucial for efficient patient care and resource allocation. We developed methods to predict triage levels using several traditional machine learning methods (logistic regression, random forest, XGBoost) and neural network deep learning-based approaches. These models were tested on a dataset from emergency department visits of patients at a local Turkish hospital; this dataset consists of both structured and unstructured data. Compared with previous work, our challenge was to build a predictive model that uses documents written in the Turkish language and that handles specific aspects of the Turkish medical system. Text embedding techniques such as Bag of Words, Word2Vec, and BERT-based embedding were used to process the unstructured patient complaints. We used a comprehensive set of features including patient history data and disease diagnosis within our predictive models, which included advanced neural network architectures such as convolutional neural networks, attention mechanisms, and long-short-term memory networks. Our results revealed that BERT embeddings significantly enhanced the performance of neural network models, while Word2Vec embeddings showed slight better results in traditional machine learning models. The most effective model was XGBoost combined with Word2Vec embeddings, achieving 86.7% AUC, 81.5% accuracy, and 68.7% weighted F1 score. We conclude that text embedding methods and machine learning methods are effective tools to predict emergency room triage levels. The integration of patient history into the models, alongside the strategic use of text embeddings, significantly improves predictive accuracy.

在急诊室进行准确的分诊对有效的病人护理和资源分配至关重要。我们开发了使用几种传统机器学习方法(逻辑回归、随机森林、XGBoost)和基于神经网络深度学习的方法来预测分类水平的方法。这些模型在土耳其当地一家医院急诊科就诊患者的数据集上进行了测试;该数据集由结构化和非结构化数据组成。与之前的工作相比,我们面临的挑战是建立一个预测模型,该模型使用土耳其语编写的文档,并处理土耳其医疗系统的特定方面。文本嵌入技术如Bag of Words、Word2Vec和基于bert的嵌入技术被用于处理非结构化的患者投诉。我们在预测模型中使用了包括患者病史数据和疾病诊断在内的一系列综合特征,其中包括卷积神经网络、注意力机制和长短期记忆网络等先进的神经网络架构。我们的研究结果表明,BERT嵌入显著提高了神经网络模型的性能,而Word2Vec嵌入在传统机器学习模型中表现稍好。最有效的模型是XGBoost结合Word2Vec嵌入,AUC达到86.7%,准确率达到81.5%,F1加权得分达到68.7%。我们得出结论,文本嵌入方法和机器学习方法是预测急诊室分诊水平的有效工具。将患者病史整合到模型中,以及策略性地使用文本嵌入,显著提高了预测的准确性。
{"title":"Using Traditional and Deep Machine Learning to Predict Emergency Room Triage Levels.","authors":"Mehmet Yıldırım, Savaş Sezik, Ayşe Başar","doi":"10.1089/cmb.2024.0632","DOIUrl":"10.1089/cmb.2024.0632","url":null,"abstract":"<p><p>Accurate triage in emergency rooms is crucial for efficient patient care and resource allocation. We developed methods to predict triage levels using several traditional machine learning methods (logistic regression, random forest, XGBoost) and neural network deep learning-based approaches. These models were tested on a dataset from emergency department visits of patients at a local Turkish hospital; this dataset consists of both structured and unstructured data. Compared with previous work, our challenge was to build a predictive model that uses documents written in the Turkish language and that handles specific aspects of the Turkish medical system. Text embedding techniques such as Bag of Words, Word2Vec, and BERT-based embedding were used to process the unstructured patient complaints. We used a comprehensive set of features including patient history data and disease diagnosis within our predictive models, which included advanced neural network architectures such as convolutional neural networks, attention mechanisms, and long-short-term memory networks. Our results revealed that BERT embeddings significantly enhanced the performance of neural network models, while Word2Vec embeddings showed slight better results in traditional machine learning models. The most effective model was XGBoost combined with Word2Vec embeddings, achieving 86.7% AUC, 81.5% accuracy, and 68.7% weighted F1 score. We conclude that text embedding methods and machine learning methods are effective tools to predict emergency room triage levels. The integration of patient history into the models, alongside the strategic use of text embeddings, significantly improves predictive accuracy.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"584-600"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144119823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Relation Between Linear Autoencoders and Non-Negative Matrix Factorization for Mutational Signature Extraction. 线性自编码器与非负矩阵分解在突变签名提取中的关系。
IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-05-01 Epub Date: 2025-03-21 DOI: 10.1089/cmb.2024.0784
Ida Egendal, Rasmus Froberg Brøndum, Marta Pelizzola, Asger Hobolth, Martin Bøgsted

Since its introduction, non-negative matrix factorization (NMF) has been a popular tool for extracting interpretable, low-dimensional representations of high-dimensional data. However, several recent studies have proposed replacing NMF with autoencoders. The increasing popularity of autoencoders warrants an investigation on whether this replacement is in general valid and reasonable. Moreover, the exact relationship between non-negative autoencoders and NMF has not been thoroughly explored. Thus, a main aim of this study is to investigate in detail the relationship between autoencoders and NMF. We define a non-negative linear autoencoder, AE-NMF, which is mathematically equivalent with convex NMF, a constrained version of NMF. The performance of NMF and the non-negative linear autoencoder is compared within the context of mutational signature extraction from simulated and real-world cancer genomics data. We find that the reconstructions based on NMF are more accurate compared with AE-NMF, while the signatures extracted using both methods exhibit comparable consistency and performance when externally validated. These findings suggest that AE-NMF, the linear non-negative autoencoders investigated in this article, do not provide an improvement of NMF in the field of mutational signature extraction. Our study serves as a foundation for understanding the theoretical implication of replacing NMF with non-negative autoencoders.

自引入以来,非负矩阵分解(NMF)一直是一种用于从高维数据中提取可解释的低维表示的流行工具。然而,最近的一些研究已经提出用自动编码器取代NMF。自动编码器的日益普及保证了对这种替代是否普遍有效和合理的调查。此外,非负自编码器与NMF之间的确切关系尚未得到充分探讨。因此,本研究的主要目的是详细探讨自编码器与NMF之间的关系。我们定义了一个非负线性自编码器AE-NMF,它在数学上与凸NMF等效,凸NMF是NMF的约束版本。在模拟和真实癌症基因组数据的突变特征提取背景下,比较了NMF和非负线性自编码器的性能。我们发现基于NMF的重建比AE-NMF更准确,而使用两种方法提取的特征在外部验证时表现出相当的一致性和性能。这些发现表明,本文研究的线性非负自编码器AE-NMF在突变特征提取领域并没有提供NMF的改进。我们的研究为理解用非负自编码器取代NMF的理论含义奠定了基础。
{"title":"On the Relation Between Linear Autoencoders and Non-Negative Matrix Factorization for Mutational Signature Extraction.","authors":"Ida Egendal, Rasmus Froberg Brøndum, Marta Pelizzola, Asger Hobolth, Martin Bøgsted","doi":"10.1089/cmb.2024.0784","DOIUrl":"10.1089/cmb.2024.0784","url":null,"abstract":"<p><p>Since its introduction, non-negative matrix factorization (NMF) has been a popular tool for extracting interpretable, low-dimensional representations of high-dimensional data. However, several recent studies have proposed replacing NMF with autoencoders. The increasing popularity of autoencoders warrants an investigation on whether this replacement is in general valid and reasonable. Moreover, the exact relationship between non-negative autoencoders and NMF has not been thoroughly explored. Thus, a main aim of this study is to investigate in detail the relationship between autoencoders and NMF. We define a non-negative linear autoencoder, AE-NMF, which is mathematically equivalent with convex NMF, a constrained version of NMF. The performance of NMF and the non-negative linear autoencoder is compared within the context of mutational signature extraction from simulated and real-world cancer genomics data. We find that the reconstructions based on NMF are more accurate compared with AE-NMF, while the signatures extracted using both methods exhibit comparable consistency and performance when externally validated. These findings suggest that AE-NMF, the linear non-negative autoencoders investigated in this article, do not provide an improvement of NMF in the field of mutational signature extraction. Our study serves as a foundation for understanding the theoretical implication of replacing NMF with non-negative autoencoders.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"461-472"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143669971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressed Representation of Extreme Learning Machine with Self-Diffusion Graph Denoising Applied for Dissecting Molecular Heterogeneity. 基于自扩散图去噪的极限学习机压缩表示在分子异质性解剖中的应用。
IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-05-01 Epub Date: 2025-03-19 DOI: 10.1089/cmb.2024.0729
Xin Duan, Xinnan Ding, Yuelin Lu

Molecular heterogeneity exists in many biological systems, such as major malignancies or diverse cell populations. Clustering of gene expression profiles has been widely used to dissect molecular heterogeneity. One drawback common to most clustering methods is that they often suffer from high dimensionality and noise, as well as feature redundancy. To address these challenges, we propose Extreme learning machine self-diffusion (ELMSD), an auto-encoder extreme learning machine feature representation method that incorporates a self-diffusion graph denoising framework to effectively dissect molecular heterogeneity. Our method, ELMSD, first learns a compressed representation of gene expression profiles from the hidden layer of the autoencoder extreme learning machine, followed by an iterative graph diffusion process to enhance the sample-to-sample similarity. The enhanced graph can largely facilitate the downstream clustering analysis, making it more efficient to analyze molecular properties. To demonstrate the utility of ELMSD, we applied it on one simulation dataset, five single-cell datasets, and 20 cancer datasets. Experiment results show that the ELMSD approach outperforms several state-of-the-art clustering methods and cancer subtypes, cell types identified by ELMSD reveal strong clinical relevance and biological interpretation. The ELMSD code is available at: https://github.com/DXCODEE/ELMSD.

分子异质性存在于许多生物系统中,如主要的恶性肿瘤或不同的细胞群。基因表达谱的聚类已被广泛用于解剖分子异质性。大多数聚类方法的一个共同缺点是它们经常受到高维和噪声以及特征冗余的影响。为了解决这些挑战,我们提出了极限学习机自扩散(ELMSD),这是一种自编码器极限学习机特征表示方法,它结合了自扩散图去噪框架来有效地剖析分子异质性。我们的方法,ELMSD,首先从自编码器极限学习机的隐藏层学习基因表达谱的压缩表示,然后通过迭代图扩散过程来增强样本间的相似性。增强后的图在很大程度上方便了下游聚类分析,使分子性质分析更加高效。为了演示ELMSD的实用性,我们将其应用于一个模拟数据集、五个单细胞数据集和20个癌症数据集。实验结果表明,ELMSD方法优于几种最先进的聚类方法和癌症亚型,ELMSD鉴定的细胞类型具有很强的临床相关性和生物学解释。ELMSD代码可从https://github.com/DXCODEE/ELMSD获得。
{"title":"Compressed Representation of Extreme Learning Machine with Self-Diffusion Graph Denoising Applied for Dissecting Molecular Heterogeneity.","authors":"Xin Duan, Xinnan Ding, Yuelin Lu","doi":"10.1089/cmb.2024.0729","DOIUrl":"10.1089/cmb.2024.0729","url":null,"abstract":"<p><p>Molecular heterogeneity exists in many biological systems, such as major malignancies or diverse cell populations. Clustering of gene expression profiles has been widely used to dissect molecular heterogeneity. One drawback common to most clustering methods is that they often suffer from high dimensionality and noise, as well as feature redundancy. To address these challenges, we propose Extreme learning machine self-diffusion (ELMSD), an auto-encoder extreme learning machine feature representation method that incorporates a self-diffusion graph denoising framework to effectively dissect molecular heterogeneity. Our method, ELMSD, first learns a compressed representation of gene expression profiles from the hidden layer of the autoencoder extreme learning machine, followed by an iterative graph diffusion process to enhance the sample-to-sample similarity. The enhanced graph can largely facilitate the downstream clustering analysis, making it more efficient to analyze molecular properties. To demonstrate the utility of ELMSD, we applied it on one simulation dataset, five single-cell datasets, and 20 cancer datasets. Experiment results show that the ELMSD approach outperforms several state-of-the-art clustering methods and cancer subtypes, cell types identified by ELMSD reveal strong clinical relevance and biological interpretation. The ELMSD code is available at: https://github.com/DXCODEE/ELMSD.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"486-497"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143657382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Dynamics of HIV-Tuberculosis Coinfection Model with Temporal Recovery from Tuberculosis: An Analysis. 考虑结核病时间恢复的hiv -结核共感染模型动力学分析。
IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-05-01 Epub Date: 2025-04-25 DOI: 10.1089/cmb.2024.0763
Pankaj Singh Rana, Nitin Sharma, Sunil Singh Negi, Haci Mehmet Baskonus

The current study is an attempt to frame a deterministic compartmental model for HIV-TB coinfection, considering temporary recovery from Tuberculosis (TB) after treatment (the possibility of reinfection with TB even after recovery). The proposed HIV-TB coinfection model is a composite of an susceptible-infected (SI) type HIV/AIDS model and a susceptible-exposed-infected-recovered type TB model. In the beginning, the HIV-TB model is constructed, followed by the qualitative investigation of the model. The equilibrium points of the model are obtained and have been examined in detail. Further, the basic reproduction number for the HIV-TB coinfection model has been computed, and the proposed model has been simulated numerically to investigate the effect of treatment on HIV-TB coinfection. Analysis of the model claims the existence of interior equilibrium when both HIV and TB reproduction numbers are more than unity. The results exhibit that TB treatment will be the most efficient in discarding the HIV-TB coinfection disease whenever the basic reproduction of HIV-TB is less than one. In addition, our results suggest that the reinfection of TB after recovery impacts HIV-TB transmission. It has been found that reinfection makes disease eradication more challenging. As, in the presence of reinfection, the total infected cases are always higher than the infected cases in the absence of reinfection.

目前的研究试图构建一个HIV-TB合并感染的确定性区室模型,考虑到治疗后结核病的暂时康复(即使在康复后再次感染结核病的可能性)。提出的HIV-TB共感染模型是一个易感感染(SI)型HIV/AIDS模型和易感暴露-感染-康复型TB模型的复合模型。首先,构建HIV-TB模型,然后对模型进行定性研究。得到了模型的平衡点,并对其进行了详细的检验。进一步,计算了HIV-TB共感染模型的基本繁殖数,并对所提出的模型进行了数值模拟,以研究治疗对HIV-TB共感染的影响。对模型的分析表明,当HIV和TB的繁殖数均大于1时,存在内部均衡。结果表明,当HIV-TB的基本繁殖数小于1时,结核病治疗将是丢弃HIV-TB合并感染疾病的最有效方法。此外,我们的研究结果表明,恢复后结核病的再感染影响HIV-TB的传播。已经发现,再感染使疾病根除更具挑战性。因为,有再感染时,总感染病例数总是高于无再感染时的感染病例数。
{"title":"On the Dynamics of HIV-Tuberculosis Coinfection Model with Temporal Recovery from Tuberculosis: An Analysis.","authors":"Pankaj Singh Rana, Nitin Sharma, Sunil Singh Negi, Haci Mehmet Baskonus","doi":"10.1089/cmb.2024.0763","DOIUrl":"https://doi.org/10.1089/cmb.2024.0763","url":null,"abstract":"<p><p>The current study is an attempt to frame a deterministic compartmental model for HIV-TB coinfection, considering temporary recovery from Tuberculosis (TB) after treatment (the possibility of reinfection with TB even after recovery). The proposed HIV<b>-</b>TB coinfection model is a composite of an susceptible-infected (SI) type HIV/AIDS model and a susceptible-exposed-infected-recovered type TB model. In the beginning, the HIV<b>-</b>TB model is constructed, followed by the qualitative investigation of the model. The equilibrium points of the model are obtained and have been examined in detail. Further, the basic reproduction number for the HIV<b>-</b>TB coinfection model has been computed, and the proposed model has been simulated numerically to investigate the effect of treatment on HIV<b>-</b>TB coinfection. Analysis of the model claims the existence of interior equilibrium when both HIV and TB reproduction numbers are more than unity. The results exhibit that TB treatment will be the most efficient in discarding the HIV<b>-</b>TB coinfection disease whenever the basic reproduction of HIV<b>-</b>TB is less than one. In addition, our results suggest that the reinfection of TB after recovery impacts HIV<b>-</b>TB transmission. It has been found that reinfection makes disease eradication more challenging. As, in the presence of reinfection, the total infected cases are always higher than the infected cases in the absence of reinfection.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"32 5","pages":"537-555"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143981581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disambiguating a Soft Metagenomic Clustering. 软宏基因组聚类的消歧。
IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-05-01 Epub Date: 2025-03-07 DOI: 10.1089/cmb.2024.0825
Rahul Nihalani, Jaroslaw Zola, Srinivas Aluru

Clustering is a popular technique used for analyzing amplicon sequencing data in metagenomics. Specifically, it is used to assign sequences (reads) to clusters, each cluster representing a species or a higher level taxonomic unit. Reads from multiple species often sharing subsequences, combined with lack of a perfect similarity measure, make it difficult to correctly assign reads to clusters. Thus, metagenomic clustering methods must either resort to ambiguity, or make the best available choice at each read assignment stage, which could lead to incorrect clusters and potentially cascading errors. In this article, we argue for first generating an ambiguous clustering and then resolving the ambiguities collectively by analyzing the ambiguous clusters. We propose a rigorous formulation of this problem and show that it is NP-Hard. We then propose an efficient heuristic to solve it in practice. We validate our approach on several synthetically generated datasets and two datasets consisting of 16S rDNA sequences from the microbiome of rat guts.

聚类是元基因组学中用于分析扩增子测序数据的常用技术。具体来说,它用于将序列(读取)分配到簇中,每个簇代表一个物种或更高级别的分类单位。来自多个物种的Reads通常共享子序列,再加上缺乏完美的相似性度量,使得难以正确地将Reads分配给簇。因此,宏基因组聚类方法要么采用歧义性,要么在每个读取分配阶段做出最佳可用选择,这可能导致不正确的聚类和潜在的级联错误。在本文中,我们主张首先生成一个模糊聚类,然后通过分析模糊聚类来集体解决模糊问题。我们提出了这个问题的一个严格的公式,并证明了它是np困难的。然后,我们提出了一种有效的启发式方法来解决实际问题。我们在几个合成的数据集和两个由大鼠肠道微生物组的16S rDNA序列组成的数据集上验证了我们的方法。
{"title":"Disambiguating a Soft Metagenomic Clustering.","authors":"Rahul Nihalani, Jaroslaw Zola, Srinivas Aluru","doi":"10.1089/cmb.2024.0825","DOIUrl":"10.1089/cmb.2024.0825","url":null,"abstract":"<p><p>Clustering is a popular technique used for analyzing amplicon sequencing data in metagenomics. Specifically, it is used to assign sequences (<i>reads</i>) to clusters, each cluster representing a species or a higher level taxonomic unit. Reads from multiple species often sharing subsequences, combined with lack of a perfect similarity measure, make it difficult to correctly assign reads to clusters. Thus, metagenomic clustering methods must either resort to ambiguity, or make the best available choice at each read assignment stage, which could lead to incorrect clusters and potentially cascading errors. In this article, we argue for first generating an ambiguous clustering and then resolving the ambiguities collectively by analyzing the ambiguous clusters. We propose a rigorous formulation of this problem and show that it is <i>NP</i>-Hard. We then propose an efficient heuristic to solve it in practice. We validate our approach on several synthetically generated datasets and two datasets consisting of 16S rDNA sequences from the microbiome of rat guts.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"473-485"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143573107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Faster and More Accurate Estimation of Protein Hinges Based on Information Criteria. 基于信息准则的蛋白质铰链快速准确估计。
IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-05-01 Epub Date: 2025-04-28 DOI: 10.1089/cmb.2024.0731
Bunsho Koyano, Tetsuo Shibuya

Protein hinges are flexible parts connecting several rigid substructures of proteins that are crucial to determine protein function. Various methods have been developed for efficiently and accurately estimating protein hinge positions by comparing two different conformations of the same protein for a growing number of protein structures. However, few studies have focused on accurately estimating the number of hinges, and it is required to accurately estimate both the number and positions of hinges. We propose faster and more accurate algorithms for estimating the number and positions of hinges by utilizing information criteria that run in O(n2)-time, where n is the protein length. Our algorithms utilize Bayesian Information Criterion (BIC) or Akaike's Information Criterion based on a newly proposed k-hinge structure generation model that models the hinge motions between two protein conformations. Our exact algorithm based on BIC outperformed the most accurate previous method in terms of both hinge number and position accuracy on our simulation dataset. Our exact algorithm was approximately as fast as the previous fastest method, DynDom, on our simulation dataset. We evaluated the hinge number and position accuracy of our exact algorithm and previous methods on one hinge-annotated dataset. The hinge number and position accuracy of our exact algorithm were comparable to the most accurate previous method on the hinge-annotated dataset. We further propose even faster O(n)-time heuristic algorithms, where n is the protein length. Our heuristic algorithm achieved almost the same hinge number and position accuracy as our exact algorithm, and was over 18 times faster than our exact algorithm and DynDom.

蛋白质铰链是连接蛋白质的几个刚性亚结构的柔性部件,对决定蛋白质的功能至关重要。对于越来越多的蛋白质结构,通过比较同一蛋白质的两种不同构象,已经开发了各种方法来有效和准确地估计蛋白质的铰链位置。然而,很少有研究关注铰链数量的准确估计,并且需要准确估计铰链的数量和位置。我们提出了更快和更准确的算法来估计铰链的数量和位置,利用信息标准在O(n2)时间内运行,其中n是蛋白质长度。我们的算法利用基于新提出的k-铰结构生成模型的贝叶斯信息准则(BIC)或Akaike信息准则,该模型模拟了两种蛋白质构象之间的铰运动。在仿真数据集上,基于BIC的精确算法在铰链数和位置精度方面都优于之前最精确的方法。在我们的模拟数据集上,我们的精确算法与之前最快的方法DynDom差不多快。我们在一个铰链注释数据集上评估了我们的精确算法和以前的方法的铰链数和位置精度。我们的精确算法在铰链注释数据集上的铰链数和位置精度与之前最精确的方法相当。我们进一步提出更快的O(n)时间启发式算法,其中n为蛋白质长度。启发式算法获得了与精确算法几乎相同的铰链数和位置精度,比精确算法和DynDom快18倍以上。
{"title":"Faster and More Accurate Estimation of Protein Hinges Based on Information Criteria.","authors":"Bunsho Koyano, Tetsuo Shibuya","doi":"10.1089/cmb.2024.0731","DOIUrl":"https://doi.org/10.1089/cmb.2024.0731","url":null,"abstract":"<p><p>Protein hinges are flexible parts connecting several rigid substructures of proteins that are crucial to determine protein function. Various methods have been developed for efficiently and accurately estimating protein hinge positions by comparing two different conformations of the same protein for a growing number of protein structures. However, few studies have focused on accurately estimating the number of hinges, and it is required to accurately estimate both the number and positions of hinges. We propose faster and more accurate algorithms for estimating the number and positions of hinges by utilizing information criteria that run in <i>O</i>(<i>n</i><sup>2</sup>)-time, where <i>n</i> is the protein length. Our algorithms utilize Bayesian Information Criterion (BIC) or Akaike's Information Criterion based on a newly proposed <i>k</i>-hinge structure generation model that models the hinge motions between two protein conformations. Our exact algorithm based on BIC outperformed the most accurate previous method in terms of both hinge number and position accuracy on our simulation dataset. Our exact algorithm was approximately as fast as the previous fastest method, DynDom, on our simulation dataset. We evaluated the hinge number and position accuracy of our exact algorithm and previous methods on one hinge-annotated dataset. The hinge number and position accuracy of our exact algorithm were comparable to the most accurate previous method on the hinge-annotated dataset. We further propose even faster <i>O</i>(<i>n</i>)-time heuristic algorithms, where <i>n</i> is the protein length. Our heuristic algorithm achieved almost the same hinge number and position accuracy as our exact algorithm, and was over 18 times faster than our exact algorithm and DynDom.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"32 5","pages":"498-519"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144004036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A New Structure Feature Introduced to Predict Protein-Protein Interaction Sites. 引入一种新的结构特征来预测蛋白质-蛋白质相互作用位点。
IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-05-01 Epub Date: 2025-02-26 DOI: 10.1089/cmb.2024.0804
Lingwei Lai, Jing Geng, Haochen Duan, Siyuan Chen, Lvwen Huang, Jiantao Yu

Interaction between proteins often depends on the sequence features and structure features of proteins. Both of these features are helpful for machine learning methods to predict (protein-protein interaction) PPI sites. In this study, we introduced a new structure feature: concave-convex feature on the protein surface, which was computed by the structural data of proteins in Protein Data Bank database. And then, a prediction model combining protein sequence features and structure features was constructed, named SSPPI_Ensemble (Sequence and Structure geometric feature-based PPI site prediction). Three sequence features, i.e., PSSMs (Position-Specific Scoring Matrices), HMM (Hidden Markov Models) and raw protein sequence, were used. The Dictionary of Secondary Structure in Proteins and the concave-convex feature were used as the structure feature. Compared with the other prediction methods, our method has achieved better performance or showed the obvious advantages on the same test datasets, confirming the proposed concave-convex feature is useful in predicting PPI sites.

蛋白质之间的相互作用往往取决于蛋白质的序列特征和结构特征。这两个特征都有助于机器学习方法预测(蛋白质-蛋白质相互作用)PPI位点。在这项研究中,我们引入了一种新的结构特征:蛋白质表面的凹凸特征,该特征是由蛋白质数据库中的蛋白质结构数据计算得出的。然后,构建了蛋白质序列特征与结构特征相结合的预测模型SSPPI_Ensemble (sequence And structure geometric feature based PPI site prediction)。使用了三个序列特征,即PSSMs (Position-Specific Scoring Matrices)、HMM (Hidden Markov Models)和原蛋白序列。利用蛋白质二级结构词典和凹凸特征作为结构特征。与其他预测方法相比,我们的方法在相同的测试数据集上取得了更好的性能或显示出明显的优势,证实了我们提出的凹凸特征在PPI位点预测方面的有用性。
{"title":"A New Structure Feature Introduced to Predict Protein-Protein Interaction Sites.","authors":"Lingwei Lai, Jing Geng, Haochen Duan, Siyuan Chen, Lvwen Huang, Jiantao Yu","doi":"10.1089/cmb.2024.0804","DOIUrl":"10.1089/cmb.2024.0804","url":null,"abstract":"<p><p>Interaction between proteins often depends on the sequence features and structure features of proteins. Both of these features are helpful for machine learning methods to predict (protein-protein interaction) PPI sites. In this study, we introduced a new structure feature: concave-convex feature on the protein surface, which was computed by the structural data of proteins in Protein Data Bank database. And then, a prediction model combining protein sequence features and structure features was constructed, named SSPPI_Ensemble (Sequence and Structure geometric feature-based PPI site prediction). Three sequence features, i.e., PSSMs (Position-Specific Scoring Matrices), HMM (Hidden Markov Models) and raw protein sequence, were used. The Dictionary of Secondary Structure in Proteins and the concave-convex feature were used as the structure feature. Compared with the other prediction methods, our method has achieved better performance or showed the obvious advantages on the same test datasets, confirming the proposed concave-convex feature is useful in predicting PPI sites.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"520-536"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143501476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subject-Specific Dosage Estimation for Primary Hypothyroidism Using Sparse Data. 使用稀疏数据估计原发性甲状腺功能减退症的受试者特异性剂量。
IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-04-01 Epub Date: 2025-02-17 DOI: 10.1089/cmb.2024.0752
Devleena Ghosh, Chittaranjan Mandal

Subject-specific dosage estimation for primary hypothyroidism using subject-specific parameters of the thyrotropic regulation system is presented in this work. The data needed for such personalized modeling are usually sparse. This is addressed by utilizing available data along with domain knowledge for estimation of model parameters but with some uncertainty. Optimization-based dosage estimation approaches may not be applicable in the presence of such uncertainty. In this work, the optimal drug dosage range based on estimated parameter ranges for primary hypothyroid condition is estimated using the mathematical model through satisfiability modulo theory (SMT)-based analysis. The salient features of this work are as follows: (1) estimation of subject-specific model parameters with uncertainty using subject-specific pre-treatment and post-treatment observations, (2) modeling periodic drug administration as part of the ordinary differential equation model of thyrotropic regulation pathway through Fourier series approximation, (3) application of SMT-based analysis for determining optimal dosage range using this model and estimated parameter ranges, and (4) an initial dosage estimation method using the regression model. Results have been obtained to support the working of the developed computational procedures.

使用促甲状腺调节系统的个体特异性参数对原发性甲状腺功能减退症进行个体特异性剂量估计。这种个性化建模所需的数据通常是稀疏的。这是通过利用可用数据和领域知识来估计模型参数来解决的,但存在一些不确定性。在存在这种不确定性的情况下,基于优化的剂量估计方法可能不适用。本文通过基于可满足模理论(SMT)的分析,利用数学模型估计出原发性甲状腺功能减退的最优用药剂量范围。本工作的突出特点如下:(1)利用受试者特异性治疗前和治疗后的观察,估计具有不确定性的受试者特异性模型参数;(2)通过傅立叶级数近似,将周期给药作为促甲状腺调节途径的常微分方程模型的一部分建模;(3)应用基于smt的分析,利用该模型和估计的参数范围确定最佳剂量范围。(4)基于回归模型的初始剂量估算方法。所得到的结果支持所开发的计算程序的工作。
{"title":"Subject-Specific Dosage Estimation for Primary Hypothyroidism Using Sparse Data.","authors":"Devleena Ghosh, Chittaranjan Mandal","doi":"10.1089/cmb.2024.0752","DOIUrl":"10.1089/cmb.2024.0752","url":null,"abstract":"<p><p>Subject-specific dosage estimation for primary hypothyroidism using subject-specific parameters of the thyrotropic regulation system is presented in this work. The data needed for such personalized modeling are usually sparse. This is addressed by utilizing available data along with domain knowledge for estimation of model parameters but with some uncertainty. Optimization-based dosage estimation approaches may not be applicable in the presence of such uncertainty. In this work, the optimal drug dosage range based on estimated parameter ranges for primary hypothyroid condition is estimated using the mathematical model through satisfiability modulo theory (SMT)-based analysis. The salient features of this work are as follows: (1) estimation of subject-specific model parameters with uncertainty using subject-specific pre-treatment and post-treatment observations, (2) modeling periodic drug administration as part of the ordinary differential equation model of thyrotropic regulation pathway through Fourier series approximation, (3) application of SMT-based analysis for determining optimal dosage range using this model and estimated parameter ranges, and (4) an initial dosage estimation method using the regression model. Results have been obtained to support the working of the developed computational procedures.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"417-443"},"PeriodicalIF":1.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143433256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
De Novo Antibody Design with SE(3) Diffusion. 基于SE(3)扩散的从头抗体设计。
IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-04-01 Epub Date: 2024-12-27 DOI: 10.1089/cmb.2024.0768
Daniel Cutting, Frédéric A Dreyer, David Errington, Constantin Schneider, Charlotte M Deane

We introduce IgDiff, an antibody variable domain diffusion model based on a general protein backbone diffusion framework, which was extended to handle multiple chains. Assessing the designability and novelty of the structures generated with our model, we find that IgDiff produces highly designable antibodies that can contain novel binding regions. The backbone dihedral angles of sampled structures show good agreement with a reference antibody distribution. We verify these designed antibodies experimentally and find that all express with high yield. Finally, we compare our model with a state-of-the-art generative backbone diffusion model on a range of antibody design tasks, such as the design of the complementarity determining regions or the pairing of a light chain to an existing heavy chain, and show improved properties and designability.

我们介绍了一种基于一般蛋白质骨架扩散框架的抗体可变结构域扩散模型IgDiff,并将其扩展到多链。通过评估我们的模型生成的结构的可设计性和新颖性,我们发现IgDiff产生了高度可设计的抗体,可以包含新的结合区域。样品结构的骨架二面角与参考抗体分布吻合良好。我们通过实验验证了这些设计的抗体,发现它们都具有高表达率。最后,我们将我们的模型与最先进的生成骨架扩散模型在一系列抗体设计任务上进行了比较,例如互补性决定区域的设计或轻链与现有重链的配对,并显示出改进的性能和可设计性。
{"title":"<i>De Novo</i> Antibody Design with SE(3) Diffusion.","authors":"Daniel Cutting, Frédéric A Dreyer, David Errington, Constantin Schneider, Charlotte M Deane","doi":"10.1089/cmb.2024.0768","DOIUrl":"10.1089/cmb.2024.0768","url":null,"abstract":"<p><p>We introduce <i>IgDiff</i>, an antibody variable domain diffusion model based on a general protein backbone diffusion framework, which was extended to handle multiple chains. Assessing the designability and novelty of the structures generated with our model, we find that <i>IgDiff</i> produces highly designable antibodies that can contain novel binding regions. The backbone dihedral angles of sampled structures show good agreement with a reference antibody distribution. We verify these designed antibodies experimentally and find that all express with high yield. Finally, we compare our model with a state-of-the-art generative backbone diffusion model on a range of antibody design tasks, such as the design of the complementarity determining regions or the pairing of a light chain to an existing heavy chain, and show improved properties and designability.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"351-361"},"PeriodicalIF":1.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142894855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disease Spread Model in Structurally Complex Spaces: An Open Markov Chain Approach. 结构复杂空间中的疾病传播模型:一个开马尔可夫链方法。
IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-04-01 Epub Date: 2025-02-11 DOI: 10.1089/cmb.2024.0630
Brenda Ivette García-Maya, Yehtli Morales-Huerta, Raúl Salgado-García

Understanding the dynamical behavior of infectious disease propagation within enclosed spaces is crucial for effectively establishing control measures. In this article, we present a modeling approach to analyze the dynamics of individuals in enclosed spaces, where such spaces are comprised of different chambers. Our focus is on capturing the movement of individuals and their infection status using an open Markov chain framework. Unlike ordinary Markov chains, an open Markov chain accounts for individuals entering and leaving the system. We categorize individuals within the system into three different groups: susceptible, carrier, and infected. A discrete-time process is employed to model the behavior of individuals throughout the system. To quantify the risk of infection, we derive a probability function that takes into account the total number of individuals inside the system and the distribution among the different groups. Furthermore, we calculate mathematical expressions for the average number of susceptible, carrier, and infected individuals at each time step. Additionally, we determine mathematical expressions for the mean number and stationary mean populations of these groups. To validate our modeling approach, we compare the theoretical and numerical models proposed in this work.

了解传染病在封闭空间内传播的动态行为对于有效地制定控制措施至关重要。在本文中,我们提出了一种建模方法来分析封闭空间中个体的动力学,这些空间由不同的腔室组成。我们的重点是使用开放的马尔可夫链框架捕捉个人的运动及其感染状况。与普通的马尔可夫链不同,开放的马尔可夫链记录了个人进入和离开系统的情况。我们将系统内的个体分为三种不同的群体:易感者、携带者和感染者。一个离散时间过程被用来模拟整个系统中个体的行为。为了量化感染的风险,我们推导了一个概率函数,该函数考虑了系统内个体的总数和不同群体之间的分布。此外,我们计算了每个时间步的易感、携带和感染个体的平均数量的数学表达式。此外,我们确定了这些群体的平均数量和平稳平均总体的数学表达式。为了验证我们的建模方法,我们比较了本工作中提出的理论模型和数值模型。
{"title":"Disease Spread Model in Structurally Complex Spaces: An Open Markov Chain Approach.","authors":"Brenda Ivette García-Maya, Yehtli Morales-Huerta, Raúl Salgado-García","doi":"10.1089/cmb.2024.0630","DOIUrl":"10.1089/cmb.2024.0630","url":null,"abstract":"<p><p>Understanding the dynamical behavior of infectious disease propagation within enclosed spaces is crucial for effectively establishing control measures. In this article, we present a modeling approach to analyze the dynamics of individuals in enclosed spaces, where such spaces are comprised of different chambers. Our focus is on capturing the movement of individuals and their infection status using an open Markov chain framework. Unlike ordinary Markov chains, an open Markov chain accounts for individuals entering and leaving the system. We categorize individuals within the system into three different groups: susceptible, carrier, and infected. A discrete-time process is employed to model the behavior of individuals throughout the system. To quantify the risk of infection, we derive a probability function that takes into account the total number of individuals inside the system and the distribution among the different groups. Furthermore, we calculate mathematical expressions for the average number of susceptible, carrier, and infected individuals at each time step. Additionally, we determine mathematical expressions for the mean number and stationary mean populations of these groups. To validate our modeling approach, we compare the theoretical and numerical models proposed in this work.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"394-416"},"PeriodicalIF":1.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143391035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1