Pub Date : 2025-11-22eCollection Date: 2025-01-01DOI: 10.1016/j.csbj.2025.11.051
Jiao Wang, Hui Zong, Yingbo Zhang, Xingyun Liu, Ke Shen, Xiaoyu Li, Rongrong Wu, Min Jiang, Daniel Rivero Cebrián, Juan Ramón Rabuñal Dopico, Bairong Shen
Circadian rhythms regulate numerous physiological and biochemical processes in humans, and their disruption is linked to elevated cancer risk and progression. Although substantial research has elucidated interactions between circadian mechanisms and cancer pathways, these findings remain fragmented and poorly integrated, impeding a holistic understanding. To address this gap, we developed the Circadian-Related Risk Factor Knowledgebase for Cancer (CirRFKB), a manually curated repository documenting validated associations between the circadian clock and cancer. CirRFKB curates data from 471 articles, encompassing 46 cancer types and 4052 records, categorizing risk factors into 1449 single factors and 340 combinations. Single factors were categorized into 681 genetic factors, 106 environmental factors, 244 physiological factors, and 418 behavioral factors. These factors were further classified as 254 protective factors, 323 risk factors, 291 no-influencing factors, and 921 unclear factors. The user-friendly interface enables researchers to explore, visualize, and retrieve data through comprehensive browsing and query tools. CirRFKB provides a foundational resource that structures circadian-cancer interactions, offering systematic evidence to advance clinical applications in deep phenotyping for precision oncology and the optimization of chronotherapy. CirRFKB is publicly accessible at: http://bioinf.org.cn:9876/.
{"title":"CirRFKB: A knowledgebase of circadian-related risk factors for cancer pathogenesis and personalized medicine.","authors":"Jiao Wang, Hui Zong, Yingbo Zhang, Xingyun Liu, Ke Shen, Xiaoyu Li, Rongrong Wu, Min Jiang, Daniel Rivero Cebrián, Juan Ramón Rabuñal Dopico, Bairong Shen","doi":"10.1016/j.csbj.2025.11.051","DOIUrl":"10.1016/j.csbj.2025.11.051","url":null,"abstract":"<p><p>Circadian rhythms regulate numerous physiological and biochemical processes in humans, and their disruption is linked to elevated cancer risk and progression. Although substantial research has elucidated interactions between circadian mechanisms and cancer pathways, these findings remain fragmented and poorly integrated, impeding a holistic understanding. To address this gap, we developed the Circadian-Related Risk Factor Knowledgebase for Cancer (CirRFKB), a manually curated repository documenting validated associations between the circadian clock and cancer. CirRFKB curates data from 471 articles, encompassing 46 cancer types and 4052 records, categorizing risk factors into 1449 single factors and 340 combinations. Single factors were categorized into 681 genetic factors, 106 environmental factors, 244 physiological factors, and 418 behavioral factors. These factors were further classified as 254 protective factors, 323 risk factors, 291 no-influencing factors, and 921 unclear factors. The user-friendly interface enables researchers to explore, visualize, and retrieve data through comprehensive browsing and query tools. CirRFKB provides a foundational resource that structures circadian-cancer interactions, offering systematic evidence to advance clinical applications in deep phenotyping for precision oncology and the optimization of chronotherapy. CirRFKB is publicly accessible at: http://bioinf.org.cn:9876/.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"5326-5334"},"PeriodicalIF":4.1,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12686628/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145721480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21eCollection Date: 2025-01-01DOI: 10.1016/j.csbj.2025.11.036
Beatrice Ruth, Bashar Ibrahim, Peter Dittrich
Microbial communities typically consist of numerous species that coexist through intricate mutual dependencies. Understanding the structure of these communities and the interactions among their species is essential for explaining their functions and predicting their behavior. In this study, we follow the idea that a community organizes itself into a hierarchy of potentially persistent sub-communities. Previously, this hierarchy was described using Chemical Organization Theory (COT). However, that approach did not account for negative interactions. Here, we enhance the theory by incorporating negative interactions through an inhibitory resource called a toxin. For simplicity, we assume that a taxon sensitive to a toxin cannot coexist with a taxon that produces that toxin. Our results demonstrate that introducing a toxin reduces the number of organizations, with the extent of this reduction depending on various modeling parameters. Further, we show that the usage of essential resources leads to a computationally NP-hard transformation problem into direct taxa interactions. Additionally, we demonstrate that the number of measurements required to infer all persistent subspaces increases. We determine which groups of species are mutually excluded due to toxin interactions. Besides toxic interactions, it is also possible to infer cross-feeding aspects of the microbial community, for which a potential algorithm is outlined and illustrated by an example.
{"title":"A formal approach to the hierarchical structures of microbial communities with negative interactions.","authors":"Beatrice Ruth, Bashar Ibrahim, Peter Dittrich","doi":"10.1016/j.csbj.2025.11.036","DOIUrl":"10.1016/j.csbj.2025.11.036","url":null,"abstract":"<p><p>Microbial communities typically consist of numerous species that coexist through intricate mutual dependencies. Understanding the structure of these communities and the interactions among their species is essential for explaining their functions and predicting their behavior. In this study, we follow the idea that a community organizes itself into a hierarchy of potentially persistent sub-communities. Previously, this hierarchy was described using Chemical Organization Theory (COT). However, that approach did not account for negative interactions. Here, we enhance the theory by incorporating negative interactions through an inhibitory resource called a toxin. For simplicity, we assume that a taxon sensitive to a toxin cannot coexist with a taxon that produces that toxin. Our results demonstrate that introducing a toxin reduces the number of organizations, with the extent of this reduction depending on various modeling parameters. Further, we show that the usage of essential resources leads to a computationally NP-hard transformation problem into direct taxa interactions. Additionally, we demonstrate that the number of measurements required to infer all persistent subspaces increases. We determine which groups of species are mutually excluded due to toxin interactions. Besides toxic interactions, it is also possible to infer cross-feeding aspects of the microbial community, for which a potential algorithm is outlined and illustrated by an example.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"5561-5574"},"PeriodicalIF":4.1,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12731272/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145833148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21eCollection Date: 2025-01-01DOI: 10.1016/j.csbj.2025.11.037
Harshita Sahni, Sarah Michelle Crotzer, Juston Moore, Steven S Branda, Trilce Estrada, S Gnanakaran
Understanding protein-protein interactions (PPIs) between viruses and host organisms is crucial for uncovering infection mechanisms and identifying potential therapeutic targets. The ability to generalize PPI predictive models across understudied viruses presents a significant challenge. In this work, we use arenavirus-human PPIs to illustrate the difficulties associated with model generalization, which are compounded by a lack of both positive and negative data. We employ a Transfer Learning approach to investigate arenavirus-human PPIs by utilizing models trained on better-studied virus-human and human-human PPIs. Additionally, we curate and assess four types of negative sampling datasets to evaluate their impact on model performance. Despite the overall high accuracies (93-99 %) and AUPRC scores (0.8-0.9) appearing promising, further analysis indicates that these performance metrics can be misleading due to data leakage, data bias, and overfitting, especially concerning under-represented viral proteins. We reveal these gaps and assess the impact of data imbalance using standard k-fold cross-validation and Independent Blind Testing with a Balanced Dataset, resulting in a drop in accuracy below 50 %. We propose a viral protein-specific evaluation framework that categorizes viral proteins into majority and minority classes based on their representation in the dataset, enabling comparison of model performance across these groups using balanced accuracies. This framework offers a more robust evaluation of model generalizability, addressing biases inherent in standard evaluation techniques and paving the way for more reliable PPI prediction models for understudied viruses.
{"title":"Challenges in predicting protein-protein interactions of understudied viruses: Arenavirus-human interactions.","authors":"Harshita Sahni, Sarah Michelle Crotzer, Juston Moore, Steven S Branda, Trilce Estrada, S Gnanakaran","doi":"10.1016/j.csbj.2025.11.037","DOIUrl":"10.1016/j.csbj.2025.11.037","url":null,"abstract":"<p><p>Understanding protein-protein interactions (PPIs) between viruses and host organisms is crucial for uncovering infection mechanisms and identifying potential therapeutic targets. The ability to generalize PPI predictive models across understudied viruses presents a significant challenge. In this work, we use arenavirus-human PPIs to illustrate the difficulties associated with model generalization, which are compounded by a lack of both positive and negative data. We employ a Transfer Learning approach to investigate arenavirus-human PPIs by utilizing models trained on better-studied virus-human and human-human PPIs. Additionally, we curate and assess four types of negative sampling datasets to evaluate their impact on model performance. Despite the overall high accuracies (93-99 %) and AUPRC scores (0.8-0.9) appearing promising, further analysis indicates that these performance metrics can be misleading due to data leakage, data bias, and overfitting, especially concerning under-represented viral proteins. We reveal these gaps and assess the impact of data imbalance using standard k-fold cross-validation and Independent Blind Testing with a Balanced Dataset, resulting in a drop in accuracy below 50 %. We propose a viral protein-specific evaluation framework that categorizes viral proteins into majority and minority classes based on their representation in the dataset, enabling comparison of model performance across these groups using balanced accuracies. This framework offers a more robust evaluation of model generalizability, addressing biases inherent in standard evaluation techniques and paving the way for more reliable PPI prediction models for understudied viruses.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"5401-5412"},"PeriodicalIF":4.1,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12703866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145767348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There is growing interest in developing predictive models of enzyme catalytic properties that leverage activity data spanning diverse enzyme families. A fundamental challenge lies in the inherent biases of public biochemical databases. These databases predominantly catalog valid enzyme activities, rarely include negative instances, and report quantitative catalytic parameters for only a relatively small subset of enzymes. Such limitations pose a major obstacle to supervised learning of enzyme catalytic properties. One existing approach for model training involves generating synthetic negative enzyme-activity pairs by recombining existing enzymes and their activity information, particularly substrates or chemical reactions, that were not originally associated within datasets. However, it remains unclear whether the generated negative examples are truly inactive or merely unobserved active instances. To build a model that captures functional properties across diverse enzyme families while avoiding reliance on negative examples, this paper introduces a self-supervised domain adaptation methodology for pre-trained protein language models, solely based on positive enzyme-reaction pairs. The enzyme representations obtained from the adapted protein language model achieved superior or at least competitive performance compared to those from an existing method that relies on synthetic negatives, in both the turnover number prediction task for natural reactions of wild-type enzymes and the activity prediction task for family-wide enzyme-substrate specificity screening datasets. Overall, our approach represents a methodological advancement that eliminates the need for synthetic negatives and provides a scalable framework for leveraging the growing enzyme activity data in biochemical databases.
{"title":"Self-supervised domain adaptation of protein language model based solely on positive enzyme-reaction pairs.","authors":"Tomoya Okuno, Naoaki Ono, Md Altaf-Ul-Amin, Shigehiko Kanaya","doi":"10.1016/j.csbj.2025.11.045","DOIUrl":"10.1016/j.csbj.2025.11.045","url":null,"abstract":"<p><p>There is growing interest in developing predictive models of enzyme catalytic properties that leverage activity data spanning diverse enzyme families. A fundamental challenge lies in the inherent biases of public biochemical databases. These databases predominantly catalog valid enzyme activities, rarely include negative instances, and report quantitative catalytic parameters for only a relatively small subset of enzymes. Such limitations pose a major obstacle to supervised learning of enzyme catalytic properties. One existing approach for model training involves generating synthetic negative enzyme-activity pairs by recombining existing enzymes and their activity information, particularly substrates or chemical reactions, that were not originally associated within datasets. However, it remains unclear whether the generated negative examples are truly inactive or merely unobserved active instances. To build a model that captures functional properties across diverse enzyme families while avoiding reliance on negative examples, this paper introduces a self-supervised domain adaptation methodology for pre-trained protein language models, solely based on positive enzyme-reaction pairs. The enzyme representations obtained from the adapted protein language model achieved superior or at least competitive performance compared to those from an existing method that relies on synthetic negatives, in both the turnover number prediction task for natural reactions of wild-type enzymes and the activity prediction task for family-wide enzyme-substrate specificity screening datasets. Overall, our approach represents a methodological advancement that eliminates the need for synthetic negatives and provides a scalable framework for leveraging the growing enzyme activity data in biochemical databases.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"5441-5449"},"PeriodicalIF":4.1,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12712682/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145803408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21eCollection Date: 2025-01-01DOI: 10.1016/j.csbj.2025.11.046
Yu-Cheng Chen, Ming-Ren Yang, Yu-Wei Wu
Identifying antimicrobial resistance (AMR)-related biomarkers from large-scale genomic datasets is often akin to finding a needle in a haystack. With pan-genomic data containing more than 100,000 gene sequences, isolating features that truly drive resistance remains a major challenge in computational biology. Here we present PanARGMiner, a machine learning-based feature selection framework designed to robustly extract highly relevant and informative biomarkers from high-dimensional biological data. PanARGMiner uses an ensemble-based feature selection strategy to select highly informative and compact feature subsets. It then utilizes repeated iterations to ensure the stability and reliability of the proposed framework, enabling PanARGMiner to generate significantly reduced features with comparable prediction performance compared to those obtained with other feature selection algorithms. Applying PanARGMiner to bacterial pan-genomic antimicrobial resistance datasets successfully extracted as few as one to ten candidate AMR biomarkers from datasets with more than 100,000 genes for five common pathogens. Although many of the extracted candidate AMR biomarkers are well-known resistance genes, proteins not known to be associated with AMR mechanisms, including functionally uncharacterized hypothetical proteins, were also extracted. This indicates the potential of PanARGMiner in revealing both established and novel mechanisms of antibiotic resistance, thus providing actionable insights for biomarker discovery, functional genomics, and precision medicine based on complex data. Its ability to uncover both known and uncharacterized resistance-related features offers new opportunities for research and clinical applications in combating AMR.
{"title":"PanARGMiner (Pan-Genomic Antimicrobial Resistance Gene Miner): An advanced feature selection framework for extracting key resistance genes from pan-genomic datasets.","authors":"Yu-Cheng Chen, Ming-Ren Yang, Yu-Wei Wu","doi":"10.1016/j.csbj.2025.11.046","DOIUrl":"10.1016/j.csbj.2025.11.046","url":null,"abstract":"<p><p>Identifying antimicrobial resistance (AMR)-related biomarkers from large-scale genomic datasets is often akin to finding a needle in a haystack. With pan-genomic data containing more than 100,000 gene sequences, isolating features that truly drive resistance remains a major challenge in computational biology. Here we present PanARGMiner, a machine learning-based feature selection framework designed to robustly extract highly relevant and informative biomarkers from high-dimensional biological data. PanARGMiner uses an ensemble-based feature selection strategy to select highly informative and compact feature subsets. It then utilizes repeated iterations to ensure the stability and reliability of the proposed framework, enabling PanARGMiner to generate significantly reduced features with comparable prediction performance compared to those obtained with other feature selection algorithms. Applying PanARGMiner to bacterial pan-genomic antimicrobial resistance datasets successfully extracted as few as one to ten candidate AMR biomarkers from datasets with more than 100,000 genes for five common pathogens. Although many of the extracted candidate AMR biomarkers are well-known resistance genes, proteins not known to be associated with AMR mechanisms, including functionally uncharacterized hypothetical proteins, were also extracted. This indicates the potential of PanARGMiner in revealing both established and novel mechanisms of antibiotic resistance, thus providing actionable insights for biomarker discovery, functional genomics, and precision medicine based on complex data. Its ability to uncover both known and uncharacterized resistance-related features offers new opportunities for research and clinical applications in combating AMR.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"5363-5374"},"PeriodicalIF":4.1,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12699266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145755479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Dietary interventions can modulate the gut bacteria community (microbiota) and offer a complementary strategy for improving metabolic control in type 2 diabetes (T2D). This pilot study evaluated clinical clinical outcomes and gut microbiota changes following a structured low-calorie diet (LCD) intervention in obese T2D individuals under standard care.
Methods: Twenty obese T2D patients were randomized into an intervention group (n = 15) (6-week 1000-1200 kcal/day of glycemic and metabolic control LCD), or a matched control group (n = 5). Clinical parameters and fecal microbiota profiles were assessed at baseline, week 6, and week 12.
Results: The intervention group showed clinical trends toward improved glycemic and metabolic parameters, including reductions in fasting plasma glucose (FPG), hemoglobin A1c (HbA1c), and lipid levels (i.e., cholesterol) (P > 0.05), accompanied by significant loss of body weight, body mass index (BMI), and body fat (P < 0.05). Four intervention participants (26.7 %) achieved normoglycemia without glucose-lowering medication. Gut microbiota analyses revealed significant alterations in alpha and beta diversity over time in the intervention group (AMOVA: P(control baseline, intervention 12-week) = 0.025 and P(intervention baseline, intervention 12-week) = 0.002), with increased abundance of beneficial genera i.e. Streptococcus, Bifidobacterium and Lactobacillus, and enrichment of Actinobacteria, Candidatus Saccharibacteria (TM7), and Firmicutes at week 12. Linear discriminant analysis effect size (LEfSe) analysis identified distinct microbial biomarkers differentiating groups. Microbial functional predictions revealed significantly decreased inferred activity in pathways related to adipocytokine signaling, D-glutamine and D-glutamate metabolism, and type I diabetes mellitus (P < 0.05); however, these predictions were computational inferences and not experimentally validated.
Conclusion: A structured LCD combined with standard care led to metabolic improvement and remodeling of gut microbiota trend in obese Thai individuals with T2D. The findings support the dietary interventions to beneficially modulate the gut microbiome and metabolic health, while highlighting the need for larger studies and functional validation.
{"title":"Low-calorie diet intervention ameliorates gut microbiota dysbiosis and metabolic changes in obese patients with type 2 diabetes under standard care.","authors":"Mongkontida Umphonsathien, Pornsawan Prutanopajai, Thanya Cheibchalard, Naraporn Somboonna","doi":"10.1016/j.csbj.2025.11.043","DOIUrl":"10.1016/j.csbj.2025.11.043","url":null,"abstract":"<p><strong>Background: </strong>Dietary interventions can modulate the gut bacteria community (microbiota) and offer a complementary strategy for improving metabolic control in type 2 diabetes (T2D). This pilot study evaluated clinical clinical outcomes and gut microbiota changes following a structured low-calorie diet (LCD) intervention in obese T2D individuals under standard care.</p><p><strong>Methods: </strong>Twenty obese T2D patients were randomized into an intervention group (n = 15) (6-week 1000-1200 kcal/day of glycemic and metabolic control LCD), or a matched control group (n = 5). Clinical parameters and fecal microbiota profiles were assessed at baseline, week 6, and week 12.</p><p><strong>Results: </strong>The intervention group showed clinical trends toward improved glycemic and metabolic parameters, including reductions in fasting plasma glucose (FPG), hemoglobin A1c (HbA1c), and lipid levels (i.e., cholesterol) (<i>P</i> > 0.05), accompanied by significant loss of body weight, body mass index (BMI), and body fat (<i>P</i> < 0.05). Four intervention participants (26.7 %) achieved normoglycemia without glucose-lowering medication. Gut microbiota analyses revealed significant alterations in alpha and beta diversity over time in the intervention group (AMOVA: <i>P</i>(control baseline, intervention 12-week) = 0.025 and <i>P</i>(intervention baseline, intervention 12-week) = 0.002), with increased abundance of beneficial genera i.e. <i>Streptococcus</i>, <i>Bifidobacterium</i> and <i>Lactobacillus</i>, and enrichment of <i>Actinobacteria</i>, <i>Candidatus Saccharibacteria</i> (TM7), and <i>Firmicutes</i> at week 12. Linear discriminant analysis effect size (LEfSe) analysis identified distinct microbial biomarkers differentiating groups. Microbial functional predictions revealed significantly decreased inferred activity in pathways related to adipocytokine signaling, D-glutamine and D-glutamate metabolism, and type I diabetes mellitus (<i>P</i> < 0.05); however, these predictions were computational inferences and not experimentally validated.</p><p><strong>Conclusion: </strong>A structured LCD combined with standard care led to metabolic improvement and remodeling of gut microbiota trend in obese Thai individuals with T2D. The findings support the dietary interventions to beneficially modulate the gut microbiome and metabolic health, while highlighting the need for larger studies and functional validation.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"5307-5317"},"PeriodicalIF":4.1,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12686632/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145721486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19eCollection Date: 2025-01-01DOI: 10.1016/j.csbj.2025.11.038
Touseef Ur Rehman, Muhammad Rameez Ur Rahman, Weihua Tang, Sebastiano Vascon, Pei Jiang, Yu Liu, Senyi Gong, Xun Wan, Ali Mohsin, Meijin Guo
Extracellular vesicles (EVs) are naturally secreted nanoscale mediators of intercellular communication, showing potential for therapeutic and functional food applications. Although many EVs are being isolated with claims of therapeutic benefits, the evaluation criteria require extensive resources and time, often resulting in futile outcomes. This work addresses this gap by developing a visual and quantitative system using monk fruit cell-derived EVs (MFEVs) as a model to efficiently select the most suitable therapeutic EVs by analyzing their characterization parameters. This approach saves valuable resources and time. To generate variations, MFEVs were isolated using eight different techniques: ultracentrifugation, ultrafiltration, polyethylene glycol (PEG) precipitation (8 %, 10 %, 15 %, and 20 %), anion-exchange chromatography, and a novel combined ultrafiltration-precipitation method. Following isolation, their physicochemical properties, biochemical composition, and bioactivity were characterized, and their dose-dependent anticancer effects were evaluated across multiple cancer cell lines. Next, using data from the correlative statistics of anticancer activity with characterization parameters, "ExoOrb" is developed. It is an analytical multicriteria decision-making system that objectively ranks the therapeutic potential of EVs by employing factor normalization, weighted scoring, and multidimensional visualizations. The system has been validated using both the original dataset and synthetic datasets. The original dataset identified PEG 10 %-MFEVs as more effective therapeutically, and the synthetic dataset confirmed ExoOrb's ability for metrisizing EVs across multiple EVs types. To our knowledge, ExoOrb is the first potentially universal framework for evaluating the therapeutic potential of EVs based on characterization parameters, providing a reliable tool for scientific and therapeutic research through standardized, data-driven optimization.
{"title":"ExoOrb: A novel visual and analytical system for therapeutic extracellular vesicles metrics.","authors":"Touseef Ur Rehman, Muhammad Rameez Ur Rahman, Weihua Tang, Sebastiano Vascon, Pei Jiang, Yu Liu, Senyi Gong, Xun Wan, Ali Mohsin, Meijin Guo","doi":"10.1016/j.csbj.2025.11.038","DOIUrl":"10.1016/j.csbj.2025.11.038","url":null,"abstract":"<p><p>Extracellular vesicles (EVs) are naturally secreted nanoscale mediators of intercellular communication, showing potential for therapeutic and functional food applications. Although many EVs are being isolated with claims of therapeutic benefits, the evaluation criteria require extensive resources and time, often resulting in futile outcomes. This work addresses this gap by developing a visual and quantitative system using monk fruit cell-derived EVs (MFEVs) as a model to efficiently select the most suitable therapeutic EVs by analyzing their characterization parameters. This approach saves valuable resources and time. To generate variations, MFEVs were isolated using eight different techniques: ultracentrifugation, ultrafiltration, polyethylene glycol (PEG) precipitation (8 %, 10 %, 15 %, and 20 %), anion-exchange chromatography, and a novel combined ultrafiltration-precipitation method. Following isolation, their physicochemical properties, biochemical composition, and bioactivity were characterized, and their dose-dependent anticancer effects were evaluated across multiple cancer cell lines. Next, using data from the correlative statistics of anticancer activity with characterization parameters, \"ExoOrb\" is developed. It is an analytical multicriteria decision-making system that objectively ranks the therapeutic potential of EVs by employing factor normalization, weighted scoring, and multidimensional visualizations. The system has been validated using both the original dataset and synthetic datasets. The original dataset identified PEG 10 %-MFEVs as more effective therapeutically, and the synthetic dataset confirmed ExoOrb's ability for metrisizing EVs across multiple EVs types. To our knowledge, ExoOrb is the first potentially universal framework for evaluating the therapeutic potential of EVs based on characterization parameters, providing a reliable tool for scientific and therapeutic research through standardized, data-driven optimization.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"5289-5306"},"PeriodicalIF":4.1,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12681852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145707128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19eCollection Date: 2025-01-01DOI: 10.1016/j.csbj.2025.11.041
Yijun Zhang, Zimeng Chen, Zhuxuan Wan, Qianhui Jiang, Xiaoling Lu, Bin Yan, Jing Qin, Yong Liu, Junwen Wang
Cyclic peptides are becoming attractive molecules for drug discovery because of their properties with inherent stability and structural diversity. However, the high potential of cyclic peptide drugs is challenged by the limited membrane permeability cross cell membrane. To predict cyclic peptide membrane permeability (CPMP), an increased number of computational models or tools are designed and used. But these existing algorithms or models do not appropriately capture feature diversity of cyclic peptides. In this study, we introduce a novel multi-source feature fusion model called MSF-CPMP, which aims to increase the accuracy of predicted CPMP. The MSF-CPMP model incorporates three features extracted from SMILES sequences, graph-based molecular structures, and physicochemical properties of cyclic peptides. By benchmarking with other non-deep-learning and deep learning-based methods, MSF-CPMP achieved the highest levels of the evaluation metrics such as accuracy of 0.9062 and AUROC of 0.9546 and further validated MSF-CPMP robustness in learning capabilities and efficacy of its multi-source fusion. Our result demonstrates that MSF-CPMP outperforms other methods in predicting CPMP, that provides also exemplifies the power of advanced deep learning methods in tackling complex biological challenges, offering contributions to computational biology and clinical treatment. Code is available at https://github.com/wanglabhku/MSF-CPMP.
{"title":"MSF-CPMP: A novel multi-source feature fusion model for prediction of cyclic peptide membrane permeability.","authors":"Yijun Zhang, Zimeng Chen, Zhuxuan Wan, Qianhui Jiang, Xiaoling Lu, Bin Yan, Jing Qin, Yong Liu, Junwen Wang","doi":"10.1016/j.csbj.2025.11.041","DOIUrl":"10.1016/j.csbj.2025.11.041","url":null,"abstract":"<p><p>Cyclic peptides are becoming attractive molecules for drug discovery because of their properties with inherent stability and structural diversity. However, the high potential of cyclic peptide drugs is challenged by the limited membrane permeability cross cell membrane. To predict cyclic peptide membrane permeability (CPMP), an increased number of computational models or tools are designed and used. But these existing algorithms or models do not appropriately capture feature diversity of cyclic peptides. In this study, we introduce a novel multi-source feature fusion model called MSF-CPMP, which aims to increase the accuracy of predicted CPMP. The MSF-CPMP model incorporates three features extracted from SMILES sequences, graph-based molecular structures, and physicochemical properties of cyclic peptides. By benchmarking with other non-deep-learning and deep learning-based methods, MSF-CPMP achieved the highest levels of the evaluation metrics such as accuracy of 0.9062 and AUROC of 0.9546 and further validated MSF-CPMP robustness in learning capabilities and efficacy of its multi-source fusion. Our result demonstrates that MSF-CPMP outperforms other methods in predicting CPMP, that provides also exemplifies the power of advanced deep learning methods in tackling complex biological challenges, offering contributions to computational biology and clinical treatment. Code is available at https://github.com/wanglabhku/MSF-CPMP.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"5413-5424"},"PeriodicalIF":4.1,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12703865/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145767393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We developed a robust LC-MS/MS method for the simultaneous quantification of 16 uremic toxins (UTs) and 14 bile acids (BAs) in plasma and fecal samples within a single method. The method demonstrated high sensitivity, broad metabolite coverage, and excellent accuracy, precision, and throughput. Using this platform, targeted metabolites were quantified in peritoneal dialysis (PD) patients (n = 31) and healthy controls (HC; n = 60). Of the 30 targeted metabolites included in the validation method, 20 were detected in fecal samples and 12 in plasma in this study. Fecal samples exhibited greater BA diversity, whereas UTs were evenly distributed across both matrices. Fecal profiles showed minimal differences between PD and HC, suggesting limited gut-level alteration. In contrast, plasma analysis revealed nine metabolites significantly elevated in PD, including indoxyl sulfate, phenyl sulfate, hippuric acid, and imidazole propionate (ImP), lithocholic acid, cinnamoylglycine, m-hydroxyhippuric acid, phenylacetylglutamine, and phenylacetylglycine. Notably, plasma ImP-an underexplored metabolite-was elevated independently of diabetes or cardiovascular disease, implicating impaired renal clearance as its primary driver. These results highlight the systemic impact of gut-derived metabolites in kidney failure and position targeted UT-BA profiling as a powerful complementary tool for clinical metabolomics in chronic kidney disease and PD.
{"title":"LC-MS/MS identifies elevated imidazole propionate and gut-derived metabolite alterations in peritoneal dialysis patients.","authors":"Weerawan Manokasemsan, Narumol Jariyasopit, Kwanjeera Wanichthanarak, Patcha Poungsombat, Alongkorn Kurilung, Suphitcha Limjiasahapong, Kajol Thapa, Yongyut Sirivatanauksorn, Sukit Raksasuk, Thatsaphan Srithongkul, Chagriya Kitiyakara, Sakda Khoomrung","doi":"10.1016/j.csbj.2025.11.039","DOIUrl":"10.1016/j.csbj.2025.11.039","url":null,"abstract":"<p><p>We developed a robust LC-MS/MS method for the simultaneous quantification of 16 uremic toxins (UTs) and 14 bile acids (BAs) in plasma and fecal samples within a single method. The method demonstrated high sensitivity, broad metabolite coverage, and excellent accuracy, precision, and throughput. Using this platform, targeted metabolites were quantified in peritoneal dialysis (PD) patients (n = 31) and healthy controls (HC; n = 60). Of the 30 targeted metabolites included in the validation method, 20 were detected in fecal samples and 12 in plasma in this study. Fecal samples exhibited greater BA diversity, whereas UTs were evenly distributed across both matrices. Fecal profiles showed minimal differences between PD and HC, suggesting limited gut-level alteration. In contrast, plasma analysis revealed nine metabolites significantly elevated in PD, including indoxyl sulfate, phenyl sulfate, hippuric acid, and imidazole propionate (ImP), lithocholic acid, cinnamoylglycine, m-hydroxyhippuric acid, phenylacetylglutamine, and phenylacetylglycine. Notably, plasma ImP-an underexplored metabolite-was elevated independently of diabetes or cardiovascular disease, implicating impaired renal clearance as its primary driver. These results highlight the systemic impact of gut-derived metabolites in kidney failure and position targeted UT-BA profiling as a powerful complementary tool for clinical metabolomics in chronic kidney disease and PD.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"5271-5280"},"PeriodicalIF":4.1,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12681520/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145707355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-16eCollection Date: 2025-01-01DOI: 10.1016/j.csbj.2025.11.035
Fionn Daire Keogh, Jonas Marx, Alicia Hiemisch, Rainer Koenig
Omics data analysis often yields extensive lists of genes or enriched gene sets, making it difficult to interpret the underlying cellular mechanisms. Existing gene set categorization methods typically rely on the Gene Ontology hierarchy, neglecting semantic similarity encoded in textual descriptions. We developed Slimformer, an embedding-based Natural Language Processing model that learns contextual relationships between gene sets based on their names, descriptions, and associated genes. A supervised classifier then assigns these embeddings to process categories, trained on a manually curated gold standard. Applied to 2856 annotated gene sets, Slimformer achieved 82.4 % balanced accuracy and an F1-score of 0.867. Applied to gene expression data from human cells infected with Respiratory Syncytial Virus, Slimformer revealed strong downregulation of major cell cycle processes which is highly relevant for the viral pathomechanism, which was overlooked by other tools we tested. By integrating linguistic and functional information, Slimformer enhances the interpretability of omics data and provides a flexible framework for systematic gene set categorization.
{"title":"Slimformer: An NLP-based web server for semantic categorization of gene sets.","authors":"Fionn Daire Keogh, Jonas Marx, Alicia Hiemisch, Rainer Koenig","doi":"10.1016/j.csbj.2025.11.035","DOIUrl":"10.1016/j.csbj.2025.11.035","url":null,"abstract":"<p><p>Omics data analysis often yields extensive lists of genes or enriched gene sets, making it difficult to interpret the underlying cellular mechanisms. Existing gene set categorization methods typically rely on the Gene Ontology hierarchy, neglecting semantic similarity encoded in textual descriptions. We developed Slimformer, an embedding-based Natural Language Processing model that learns contextual relationships between gene sets based on their names, descriptions, and associated genes. A supervised classifier then assigns these embeddings to process categories, trained on a manually curated gold standard. Applied to 2856 annotated gene sets, Slimformer achieved 82.4 % balanced accuracy and an F1-score of 0.867. Applied to gene expression data from human cells infected with Respiratory Syncytial Virus, Slimformer revealed strong downregulation of major cell cycle processes which is highly relevant for the viral pathomechanism, which was overlooked by other tools we tested. By integrating linguistic and functional information, Slimformer enhances the interpretability of omics data and provides a flexible framework for systematic gene set categorization.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"5252-5262"},"PeriodicalIF":4.1,"publicationDate":"2025-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12671349/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145667639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}