Pub Date : 2025-06-01Epub Date: 2025-05-22DOI: 10.1089/cmb.2024.0632
Mehmet Yıldırım, Savaş Sezik, Ayşe Başar
Accurate triage in emergency rooms is crucial for efficient patient care and resource allocation. We developed methods to predict triage levels using several traditional machine learning methods (logistic regression, random forest, XGBoost) and neural network deep learning-based approaches. These models were tested on a dataset from emergency department visits of patients at a local Turkish hospital; this dataset consists of both structured and unstructured data. Compared with previous work, our challenge was to build a predictive model that uses documents written in the Turkish language and that handles specific aspects of the Turkish medical system. Text embedding techniques such as Bag of Words, Word2Vec, and BERT-based embedding were used to process the unstructured patient complaints. We used a comprehensive set of features including patient history data and disease diagnosis within our predictive models, which included advanced neural network architectures such as convolutional neural networks, attention mechanisms, and long-short-term memory networks. Our results revealed that BERT embeddings significantly enhanced the performance of neural network models, while Word2Vec embeddings showed slight better results in traditional machine learning models. The most effective model was XGBoost combined with Word2Vec embeddings, achieving 86.7% AUC, 81.5% accuracy, and 68.7% weighted F1 score. We conclude that text embedding methods and machine learning methods are effective tools to predict emergency room triage levels. The integration of patient history into the models, alongside the strategic use of text embeddings, significantly improves predictive accuracy.
在急诊室进行准确的分诊对有效的病人护理和资源分配至关重要。我们开发了使用几种传统机器学习方法(逻辑回归、随机森林、XGBoost)和基于神经网络深度学习的方法来预测分类水平的方法。这些模型在土耳其当地一家医院急诊科就诊患者的数据集上进行了测试;该数据集由结构化和非结构化数据组成。与之前的工作相比,我们面临的挑战是建立一个预测模型,该模型使用土耳其语编写的文档,并处理土耳其医疗系统的特定方面。文本嵌入技术如Bag of Words、Word2Vec和基于bert的嵌入技术被用于处理非结构化的患者投诉。我们在预测模型中使用了包括患者病史数据和疾病诊断在内的一系列综合特征,其中包括卷积神经网络、注意力机制和长短期记忆网络等先进的神经网络架构。我们的研究结果表明,BERT嵌入显著提高了神经网络模型的性能,而Word2Vec嵌入在传统机器学习模型中表现稍好。最有效的模型是XGBoost结合Word2Vec嵌入,AUC达到86.7%,准确率达到81.5%,F1加权得分达到68.7%。我们得出结论,文本嵌入方法和机器学习方法是预测急诊室分诊水平的有效工具。将患者病史整合到模型中,以及策略性地使用文本嵌入,显著提高了预测的准确性。
{"title":"Using Traditional and Deep Machine Learning to Predict Emergency Room Triage Levels.","authors":"Mehmet Yıldırım, Savaş Sezik, Ayşe Başar","doi":"10.1089/cmb.2024.0632","DOIUrl":"10.1089/cmb.2024.0632","url":null,"abstract":"<p><p>Accurate triage in emergency rooms is crucial for efficient patient care and resource allocation. We developed methods to predict triage levels using several traditional machine learning methods (logistic regression, random forest, XGBoost) and neural network deep learning-based approaches. These models were tested on a dataset from emergency department visits of patients at a local Turkish hospital; this dataset consists of both structured and unstructured data. Compared with previous work, our challenge was to build a predictive model that uses documents written in the Turkish language and that handles specific aspects of the Turkish medical system. Text embedding techniques such as Bag of Words, Word2Vec, and BERT-based embedding were used to process the unstructured patient complaints. We used a comprehensive set of features including patient history data and disease diagnosis within our predictive models, which included advanced neural network architectures such as convolutional neural networks, attention mechanisms, and long-short-term memory networks. Our results revealed that BERT embeddings significantly enhanced the performance of neural network models, while Word2Vec embeddings showed slight better results in traditional machine learning models. The most effective model was XGBoost combined with Word2Vec embeddings, achieving 86.7% AUC, 81.5% accuracy, and 68.7% weighted F1 score. We conclude that text embedding methods and machine learning methods are effective tools to predict emergency room triage levels. The integration of patient history into the models, alongside the strategic use of text embeddings, significantly improves predictive accuracy.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"584-600"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144119823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-01Epub Date: 2025-03-21DOI: 10.1089/cmb.2024.0784
Ida Egendal, Rasmus Froberg Brøndum, Marta Pelizzola, Asger Hobolth, Martin Bøgsted
Since its introduction, non-negative matrix factorization (NMF) has been a popular tool for extracting interpretable, low-dimensional representations of high-dimensional data. However, several recent studies have proposed replacing NMF with autoencoders. The increasing popularity of autoencoders warrants an investigation on whether this replacement is in general valid and reasonable. Moreover, the exact relationship between non-negative autoencoders and NMF has not been thoroughly explored. Thus, a main aim of this study is to investigate in detail the relationship between autoencoders and NMF. We define a non-negative linear autoencoder, AE-NMF, which is mathematically equivalent with convex NMF, a constrained version of NMF. The performance of NMF and the non-negative linear autoencoder is compared within the context of mutational signature extraction from simulated and real-world cancer genomics data. We find that the reconstructions based on NMF are more accurate compared with AE-NMF, while the signatures extracted using both methods exhibit comparable consistency and performance when externally validated. These findings suggest that AE-NMF, the linear non-negative autoencoders investigated in this article, do not provide an improvement of NMF in the field of mutational signature extraction. Our study serves as a foundation for understanding the theoretical implication of replacing NMF with non-negative autoencoders.
{"title":"On the Relation Between Linear Autoencoders and Non-Negative Matrix Factorization for Mutational Signature Extraction.","authors":"Ida Egendal, Rasmus Froberg Brøndum, Marta Pelizzola, Asger Hobolth, Martin Bøgsted","doi":"10.1089/cmb.2024.0784","DOIUrl":"10.1089/cmb.2024.0784","url":null,"abstract":"<p><p>Since its introduction, non-negative matrix factorization (NMF) has been a popular tool for extracting interpretable, low-dimensional representations of high-dimensional data. However, several recent studies have proposed replacing NMF with autoencoders. The increasing popularity of autoencoders warrants an investigation on whether this replacement is in general valid and reasonable. Moreover, the exact relationship between non-negative autoencoders and NMF has not been thoroughly explored. Thus, a main aim of this study is to investigate in detail the relationship between autoencoders and NMF. We define a non-negative linear autoencoder, AE-NMF, which is mathematically equivalent with convex NMF, a constrained version of NMF. The performance of NMF and the non-negative linear autoencoder is compared within the context of mutational signature extraction from simulated and real-world cancer genomics data. We find that the reconstructions based on NMF are more accurate compared with AE-NMF, while the signatures extracted using both methods exhibit comparable consistency and performance when externally validated. These findings suggest that AE-NMF, the linear non-negative autoencoders investigated in this article, do not provide an improvement of NMF in the field of mutational signature extraction. Our study serves as a foundation for understanding the theoretical implication of replacing NMF with non-negative autoencoders.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"461-472"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143669971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-01Epub Date: 2025-03-19DOI: 10.1089/cmb.2024.0729
Xin Duan, Xinnan Ding, Yuelin Lu
Molecular heterogeneity exists in many biological systems, such as major malignancies or diverse cell populations. Clustering of gene expression profiles has been widely used to dissect molecular heterogeneity. One drawback common to most clustering methods is that they often suffer from high dimensionality and noise, as well as feature redundancy. To address these challenges, we propose Extreme learning machine self-diffusion (ELMSD), an auto-encoder extreme learning machine feature representation method that incorporates a self-diffusion graph denoising framework to effectively dissect molecular heterogeneity. Our method, ELMSD, first learns a compressed representation of gene expression profiles from the hidden layer of the autoencoder extreme learning machine, followed by an iterative graph diffusion process to enhance the sample-to-sample similarity. The enhanced graph can largely facilitate the downstream clustering analysis, making it more efficient to analyze molecular properties. To demonstrate the utility of ELMSD, we applied it on one simulation dataset, five single-cell datasets, and 20 cancer datasets. Experiment results show that the ELMSD approach outperforms several state-of-the-art clustering methods and cancer subtypes, cell types identified by ELMSD reveal strong clinical relevance and biological interpretation. The ELMSD code is available at: https://github.com/DXCODEE/ELMSD.
{"title":"Compressed Representation of Extreme Learning Machine with Self-Diffusion Graph Denoising Applied for Dissecting Molecular Heterogeneity.","authors":"Xin Duan, Xinnan Ding, Yuelin Lu","doi":"10.1089/cmb.2024.0729","DOIUrl":"10.1089/cmb.2024.0729","url":null,"abstract":"<p><p>Molecular heterogeneity exists in many biological systems, such as major malignancies or diverse cell populations. Clustering of gene expression profiles has been widely used to dissect molecular heterogeneity. One drawback common to most clustering methods is that they often suffer from high dimensionality and noise, as well as feature redundancy. To address these challenges, we propose Extreme learning machine self-diffusion (ELMSD), an auto-encoder extreme learning machine feature representation method that incorporates a self-diffusion graph denoising framework to effectively dissect molecular heterogeneity. Our method, ELMSD, first learns a compressed representation of gene expression profiles from the hidden layer of the autoencoder extreme learning machine, followed by an iterative graph diffusion process to enhance the sample-to-sample similarity. The enhanced graph can largely facilitate the downstream clustering analysis, making it more efficient to analyze molecular properties. To demonstrate the utility of ELMSD, we applied it on one simulation dataset, five single-cell datasets, and 20 cancer datasets. Experiment results show that the ELMSD approach outperforms several state-of-the-art clustering methods and cancer subtypes, cell types identified by ELMSD reveal strong clinical relevance and biological interpretation. The ELMSD code is available at: https://github.com/DXCODEE/ELMSD.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"486-497"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143657382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The current study is an attempt to frame a deterministic compartmental model for HIV-TB coinfection, considering temporary recovery from Tuberculosis (TB) after treatment (the possibility of reinfection with TB even after recovery). The proposed HIV-TB coinfection model is a composite of an susceptible-infected (SI) type HIV/AIDS model and a susceptible-exposed-infected-recovered type TB model. In the beginning, the HIV-TB model is constructed, followed by the qualitative investigation of the model. The equilibrium points of the model are obtained and have been examined in detail. Further, the basic reproduction number for the HIV-TB coinfection model has been computed, and the proposed model has been simulated numerically to investigate the effect of treatment on HIV-TB coinfection. Analysis of the model claims the existence of interior equilibrium when both HIV and TB reproduction numbers are more than unity. The results exhibit that TB treatment will be the most efficient in discarding the HIV-TB coinfection disease whenever the basic reproduction of HIV-TB is less than one. In addition, our results suggest that the reinfection of TB after recovery impacts HIV-TB transmission. It has been found that reinfection makes disease eradication more challenging. As, in the presence of reinfection, the total infected cases are always higher than the infected cases in the absence of reinfection.
{"title":"On the Dynamics of HIV-Tuberculosis Coinfection Model with Temporal Recovery from Tuberculosis: An Analysis.","authors":"Pankaj Singh Rana, Nitin Sharma, Sunil Singh Negi, Haci Mehmet Baskonus","doi":"10.1089/cmb.2024.0763","DOIUrl":"https://doi.org/10.1089/cmb.2024.0763","url":null,"abstract":"<p><p>The current study is an attempt to frame a deterministic compartmental model for HIV-TB coinfection, considering temporary recovery from Tuberculosis (TB) after treatment (the possibility of reinfection with TB even after recovery). The proposed HIV<b>-</b>TB coinfection model is a composite of an susceptible-infected (SI) type HIV/AIDS model and a susceptible-exposed-infected-recovered type TB model. In the beginning, the HIV<b>-</b>TB model is constructed, followed by the qualitative investigation of the model. The equilibrium points of the model are obtained and have been examined in detail. Further, the basic reproduction number for the HIV<b>-</b>TB coinfection model has been computed, and the proposed model has been simulated numerically to investigate the effect of treatment on HIV<b>-</b>TB coinfection. Analysis of the model claims the existence of interior equilibrium when both HIV and TB reproduction numbers are more than unity. The results exhibit that TB treatment will be the most efficient in discarding the HIV<b>-</b>TB coinfection disease whenever the basic reproduction of HIV<b>-</b>TB is less than one. In addition, our results suggest that the reinfection of TB after recovery impacts HIV<b>-</b>TB transmission. It has been found that reinfection makes disease eradication more challenging. As, in the presence of reinfection, the total infected cases are always higher than the infected cases in the absence of reinfection.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"32 5","pages":"537-555"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143981581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-01Epub Date: 2025-03-07DOI: 10.1089/cmb.2024.0825
Rahul Nihalani, Jaroslaw Zola, Srinivas Aluru
Clustering is a popular technique used for analyzing amplicon sequencing data in metagenomics. Specifically, it is used to assign sequences (reads) to clusters, each cluster representing a species or a higher level taxonomic unit. Reads from multiple species often sharing subsequences, combined with lack of a perfect similarity measure, make it difficult to correctly assign reads to clusters. Thus, metagenomic clustering methods must either resort to ambiguity, or make the best available choice at each read assignment stage, which could lead to incorrect clusters and potentially cascading errors. In this article, we argue for first generating an ambiguous clustering and then resolving the ambiguities collectively by analyzing the ambiguous clusters. We propose a rigorous formulation of this problem and show that it is NP-Hard. We then propose an efficient heuristic to solve it in practice. We validate our approach on several synthetically generated datasets and two datasets consisting of 16S rDNA sequences from the microbiome of rat guts.
{"title":"Disambiguating a Soft Metagenomic Clustering.","authors":"Rahul Nihalani, Jaroslaw Zola, Srinivas Aluru","doi":"10.1089/cmb.2024.0825","DOIUrl":"10.1089/cmb.2024.0825","url":null,"abstract":"<p><p>Clustering is a popular technique used for analyzing amplicon sequencing data in metagenomics. Specifically, it is used to assign sequences (<i>reads</i>) to clusters, each cluster representing a species or a higher level taxonomic unit. Reads from multiple species often sharing subsequences, combined with lack of a perfect similarity measure, make it difficult to correctly assign reads to clusters. Thus, metagenomic clustering methods must either resort to ambiguity, or make the best available choice at each read assignment stage, which could lead to incorrect clusters and potentially cascading errors. In this article, we argue for first generating an ambiguous clustering and then resolving the ambiguities collectively by analyzing the ambiguous clusters. We propose a rigorous formulation of this problem and show that it is <i>NP</i>-Hard. We then propose an efficient heuristic to solve it in practice. We validate our approach on several synthetically generated datasets and two datasets consisting of 16S rDNA sequences from the microbiome of rat guts.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"473-485"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143573107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-01Epub Date: 2025-04-28DOI: 10.1089/cmb.2024.0731
Bunsho Koyano, Tetsuo Shibuya
Protein hinges are flexible parts connecting several rigid substructures of proteins that are crucial to determine protein function. Various methods have been developed for efficiently and accurately estimating protein hinge positions by comparing two different conformations of the same protein for a growing number of protein structures. However, few studies have focused on accurately estimating the number of hinges, and it is required to accurately estimate both the number and positions of hinges. We propose faster and more accurate algorithms for estimating the number and positions of hinges by utilizing information criteria that run in O(n2)-time, where n is the protein length. Our algorithms utilize Bayesian Information Criterion (BIC) or Akaike's Information Criterion based on a newly proposed k-hinge structure generation model that models the hinge motions between two protein conformations. Our exact algorithm based on BIC outperformed the most accurate previous method in terms of both hinge number and position accuracy on our simulation dataset. Our exact algorithm was approximately as fast as the previous fastest method, DynDom, on our simulation dataset. We evaluated the hinge number and position accuracy of our exact algorithm and previous methods on one hinge-annotated dataset. The hinge number and position accuracy of our exact algorithm were comparable to the most accurate previous method on the hinge-annotated dataset. We further propose even faster O(n)-time heuristic algorithms, where n is the protein length. Our heuristic algorithm achieved almost the same hinge number and position accuracy as our exact algorithm, and was over 18 times faster than our exact algorithm and DynDom.
{"title":"Faster and More Accurate Estimation of Protein Hinges Based on Information Criteria.","authors":"Bunsho Koyano, Tetsuo Shibuya","doi":"10.1089/cmb.2024.0731","DOIUrl":"https://doi.org/10.1089/cmb.2024.0731","url":null,"abstract":"<p><p>Protein hinges are flexible parts connecting several rigid substructures of proteins that are crucial to determine protein function. Various methods have been developed for efficiently and accurately estimating protein hinge positions by comparing two different conformations of the same protein for a growing number of protein structures. However, few studies have focused on accurately estimating the number of hinges, and it is required to accurately estimate both the number and positions of hinges. We propose faster and more accurate algorithms for estimating the number and positions of hinges by utilizing information criteria that run in <i>O</i>(<i>n</i><sup>2</sup>)-time, where <i>n</i> is the protein length. Our algorithms utilize Bayesian Information Criterion (BIC) or Akaike's Information Criterion based on a newly proposed <i>k</i>-hinge structure generation model that models the hinge motions between two protein conformations. Our exact algorithm based on BIC outperformed the most accurate previous method in terms of both hinge number and position accuracy on our simulation dataset. Our exact algorithm was approximately as fast as the previous fastest method, DynDom, on our simulation dataset. We evaluated the hinge number and position accuracy of our exact algorithm and previous methods on one hinge-annotated dataset. The hinge number and position accuracy of our exact algorithm were comparable to the most accurate previous method on the hinge-annotated dataset. We further propose even faster <i>O</i>(<i>n</i>)-time heuristic algorithms, where <i>n</i> is the protein length. Our heuristic algorithm achieved almost the same hinge number and position accuracy as our exact algorithm, and was over 18 times faster than our exact algorithm and DynDom.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"32 5","pages":"498-519"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144004036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interaction between proteins often depends on the sequence features and structure features of proteins. Both of these features are helpful for machine learning methods to predict (protein-protein interaction) PPI sites. In this study, we introduced a new structure feature: concave-convex feature on the protein surface, which was computed by the structural data of proteins in Protein Data Bank database. And then, a prediction model combining protein sequence features and structure features was constructed, named SSPPI_Ensemble (Sequence and Structure geometric feature-based PPI site prediction). Three sequence features, i.e., PSSMs (Position-Specific Scoring Matrices), HMM (Hidden Markov Models) and raw protein sequence, were used. The Dictionary of Secondary Structure in Proteins and the concave-convex feature were used as the structure feature. Compared with the other prediction methods, our method has achieved better performance or showed the obvious advantages on the same test datasets, confirming the proposed concave-convex feature is useful in predicting PPI sites.
蛋白质之间的相互作用往往取决于蛋白质的序列特征和结构特征。这两个特征都有助于机器学习方法预测(蛋白质-蛋白质相互作用)PPI位点。在这项研究中,我们引入了一种新的结构特征:蛋白质表面的凹凸特征,该特征是由蛋白质数据库中的蛋白质结构数据计算得出的。然后,构建了蛋白质序列特征与结构特征相结合的预测模型SSPPI_Ensemble (sequence And structure geometric feature based PPI site prediction)。使用了三个序列特征,即PSSMs (Position-Specific Scoring Matrices)、HMM (Hidden Markov Models)和原蛋白序列。利用蛋白质二级结构词典和凹凸特征作为结构特征。与其他预测方法相比,我们的方法在相同的测试数据集上取得了更好的性能或显示出明显的优势,证实了我们提出的凹凸特征在PPI位点预测方面的有用性。
{"title":"A New Structure Feature Introduced to Predict Protein-Protein Interaction Sites.","authors":"Lingwei Lai, Jing Geng, Haochen Duan, Siyuan Chen, Lvwen Huang, Jiantao Yu","doi":"10.1089/cmb.2024.0804","DOIUrl":"10.1089/cmb.2024.0804","url":null,"abstract":"<p><p>Interaction between proteins often depends on the sequence features and structure features of proteins. Both of these features are helpful for machine learning methods to predict (protein-protein interaction) PPI sites. In this study, we introduced a new structure feature: concave-convex feature on the protein surface, which was computed by the structural data of proteins in Protein Data Bank database. And then, a prediction model combining protein sequence features and structure features was constructed, named SSPPI_Ensemble (Sequence and Structure geometric feature-based PPI site prediction). Three sequence features, i.e., PSSMs (Position-Specific Scoring Matrices), HMM (Hidden Markov Models) and raw protein sequence, were used. The Dictionary of Secondary Structure in Proteins and the concave-convex feature were used as the structure feature. Compared with the other prediction methods, our method has achieved better performance or showed the obvious advantages on the same test datasets, confirming the proposed concave-convex feature is useful in predicting PPI sites.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"520-536"},"PeriodicalIF":1.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143501476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-02-17DOI: 10.1089/cmb.2024.0752
Devleena Ghosh, Chittaranjan Mandal
Subject-specific dosage estimation for primary hypothyroidism using subject-specific parameters of the thyrotropic regulation system is presented in this work. The data needed for such personalized modeling are usually sparse. This is addressed by utilizing available data along with domain knowledge for estimation of model parameters but with some uncertainty. Optimization-based dosage estimation approaches may not be applicable in the presence of such uncertainty. In this work, the optimal drug dosage range based on estimated parameter ranges for primary hypothyroid condition is estimated using the mathematical model through satisfiability modulo theory (SMT)-based analysis. The salient features of this work are as follows: (1) estimation of subject-specific model parameters with uncertainty using subject-specific pre-treatment and post-treatment observations, (2) modeling periodic drug administration as part of the ordinary differential equation model of thyrotropic regulation pathway through Fourier series approximation, (3) application of SMT-based analysis for determining optimal dosage range using this model and estimated parameter ranges, and (4) an initial dosage estimation method using the regression model. Results have been obtained to support the working of the developed computational procedures.
{"title":"Subject-Specific Dosage Estimation for Primary Hypothyroidism Using Sparse Data.","authors":"Devleena Ghosh, Chittaranjan Mandal","doi":"10.1089/cmb.2024.0752","DOIUrl":"10.1089/cmb.2024.0752","url":null,"abstract":"<p><p>Subject-specific dosage estimation for primary hypothyroidism using subject-specific parameters of the thyrotropic regulation system is presented in this work. The data needed for such personalized modeling are usually sparse. This is addressed by utilizing available data along with domain knowledge for estimation of model parameters but with some uncertainty. Optimization-based dosage estimation approaches may not be applicable in the presence of such uncertainty. In this work, the optimal drug dosage range based on estimated parameter ranges for primary hypothyroid condition is estimated using the mathematical model through satisfiability modulo theory (SMT)-based analysis. The salient features of this work are as follows: (1) estimation of subject-specific model parameters with uncertainty using subject-specific pre-treatment and post-treatment observations, (2) modeling periodic drug administration as part of the ordinary differential equation model of thyrotropic regulation pathway through Fourier series approximation, (3) application of SMT-based analysis for determining optimal dosage range using this model and estimated parameter ranges, and (4) an initial dosage estimation method using the regression model. Results have been obtained to support the working of the developed computational procedures.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"417-443"},"PeriodicalIF":1.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143433256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2024-12-27DOI: 10.1089/cmb.2024.0768
Daniel Cutting, Frédéric A Dreyer, David Errington, Constantin Schneider, Charlotte M Deane
We introduce IgDiff, an antibody variable domain diffusion model based on a general protein backbone diffusion framework, which was extended to handle multiple chains. Assessing the designability and novelty of the structures generated with our model, we find that IgDiff produces highly designable antibodies that can contain novel binding regions. The backbone dihedral angles of sampled structures show good agreement with a reference antibody distribution. We verify these designed antibodies experimentally and find that all express with high yield. Finally, we compare our model with a state-of-the-art generative backbone diffusion model on a range of antibody design tasks, such as the design of the complementarity determining regions or the pairing of a light chain to an existing heavy chain, and show improved properties and designability.
{"title":"<i>De Novo</i> Antibody Design with SE(3) Diffusion.","authors":"Daniel Cutting, Frédéric A Dreyer, David Errington, Constantin Schneider, Charlotte M Deane","doi":"10.1089/cmb.2024.0768","DOIUrl":"10.1089/cmb.2024.0768","url":null,"abstract":"<p><p>We introduce <i>IgDiff</i>, an antibody variable domain diffusion model based on a general protein backbone diffusion framework, which was extended to handle multiple chains. Assessing the designability and novelty of the structures generated with our model, we find that <i>IgDiff</i> produces highly designable antibodies that can contain novel binding regions. The backbone dihedral angles of sampled structures show good agreement with a reference antibody distribution. We verify these designed antibodies experimentally and find that all express with high yield. Finally, we compare our model with a state-of-the-art generative backbone diffusion model on a range of antibody design tasks, such as the design of the complementarity determining regions or the pairing of a light chain to an existing heavy chain, and show improved properties and designability.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"351-361"},"PeriodicalIF":1.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142894855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-02-11DOI: 10.1089/cmb.2024.0630
Brenda Ivette García-Maya, Yehtli Morales-Huerta, Raúl Salgado-García
Understanding the dynamical behavior of infectious disease propagation within enclosed spaces is crucial for effectively establishing control measures. In this article, we present a modeling approach to analyze the dynamics of individuals in enclosed spaces, where such spaces are comprised of different chambers. Our focus is on capturing the movement of individuals and their infection status using an open Markov chain framework. Unlike ordinary Markov chains, an open Markov chain accounts for individuals entering and leaving the system. We categorize individuals within the system into three different groups: susceptible, carrier, and infected. A discrete-time process is employed to model the behavior of individuals throughout the system. To quantify the risk of infection, we derive a probability function that takes into account the total number of individuals inside the system and the distribution among the different groups. Furthermore, we calculate mathematical expressions for the average number of susceptible, carrier, and infected individuals at each time step. Additionally, we determine mathematical expressions for the mean number and stationary mean populations of these groups. To validate our modeling approach, we compare the theoretical and numerical models proposed in this work.
{"title":"Disease Spread Model in Structurally Complex Spaces: An Open Markov Chain Approach.","authors":"Brenda Ivette García-Maya, Yehtli Morales-Huerta, Raúl Salgado-García","doi":"10.1089/cmb.2024.0630","DOIUrl":"10.1089/cmb.2024.0630","url":null,"abstract":"<p><p>Understanding the dynamical behavior of infectious disease propagation within enclosed spaces is crucial for effectively establishing control measures. In this article, we present a modeling approach to analyze the dynamics of individuals in enclosed spaces, where such spaces are comprised of different chambers. Our focus is on capturing the movement of individuals and their infection status using an open Markov chain framework. Unlike ordinary Markov chains, an open Markov chain accounts for individuals entering and leaving the system. We categorize individuals within the system into three different groups: susceptible, carrier, and infected. A discrete-time process is employed to model the behavior of individuals throughout the system. To quantify the risk of infection, we derive a probability function that takes into account the total number of individuals inside the system and the distribution among the different groups. Furthermore, we calculate mathematical expressions for the average number of susceptible, carrier, and infected individuals at each time step. Additionally, we determine mathematical expressions for the mean number and stationary mean populations of these groups. To validate our modeling approach, we compare the theoretical and numerical models proposed in this work.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"394-416"},"PeriodicalIF":1.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143391035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}