Pub Date : 2025-12-08DOI: 10.1186/s12859-025-06334-7
Szabolcs Makai, Diána Makai, Erika Chonata-Jiménez, Ildikó Karsai, Péter Mikó, Adél Sepsi, András Cseh
Background: Crossovers are essential for genome stability and genetic diversity, yet in plants they occur infrequently, typically restricted to only one to three per chromosome pair. Genotyping approaches such as SNP arrays or genotyping-by-sequencing (GBS) enable high-resolution detection of crossover frequency, a critical step for elucidating the mechanisms that regulate meiotic recombination and for exploiting it in plant breeding. Despite their widespread use and the availability of highly reproducible marker sets, user-friendly tools for reliable recombination analysis remain scarce.
Results: Here we present X-cross/over, a web-based platform that applies a graph-theoretical algorithm to estimate crossover frequencies from SNP datasets in HapMap format. The platform was evaluated using publicly available barley backcross inbred populations and newly developed wheat doubled haploid lines. Across both datasets, X-cross/over detected crossover events with high accuracy and sensitivity, yielding results consistent with published genotyping and cytological analyses. Importantly, the tool produces outcomes comparable to expert analyses while remaining accessible to users without bioinformatics expertise.
Conclusions: X-cross/over provides a consistent and transparent framework for detecting crossover sites and quantifying their frequency. Implemented in a platform-independent environment, the application is freely available at https://insilicolabdesk.atk.kinin.hu , making it a versatile resource for exploring the genetic and epigenetic regulation of meiotic recombination across plant species.
{"title":"X-cross/over: a web tool for graph-based estimation of meiotic crossover events in plants.","authors":"Szabolcs Makai, Diána Makai, Erika Chonata-Jiménez, Ildikó Karsai, Péter Mikó, Adél Sepsi, András Cseh","doi":"10.1186/s12859-025-06334-7","DOIUrl":"10.1186/s12859-025-06334-7","url":null,"abstract":"<p><strong>Background: </strong>Crossovers are essential for genome stability and genetic diversity, yet in plants they occur infrequently, typically restricted to only one to three per chromosome pair. Genotyping approaches such as SNP arrays or genotyping-by-sequencing (GBS) enable high-resolution detection of crossover frequency, a critical step for elucidating the mechanisms that regulate meiotic recombination and for exploiting it in plant breeding. Despite their widespread use and the availability of highly reproducible marker sets, user-friendly tools for reliable recombination analysis remain scarce.</p><p><strong>Results: </strong>Here we present X-cross/over, a web-based platform that applies a graph-theoretical algorithm to estimate crossover frequencies from SNP datasets in HapMap format. The platform was evaluated using publicly available barley backcross inbred populations and newly developed wheat doubled haploid lines. Across both datasets, X-cross/over detected crossover events with high accuracy and sensitivity, yielding results consistent with published genotyping and cytological analyses. Importantly, the tool produces outcomes comparable to expert analyses while remaining accessible to users without bioinformatics expertise.</p><p><strong>Conclusions: </strong>X-cross/over provides a consistent and transparent framework for detecting crossover sites and quantifying their frequency. Implemented in a platform-independent environment, the application is freely available at https://insilicolabdesk.atk.kinin.hu , making it a versatile resource for exploring the genetic and epigenetic regulation of meiotic recombination across plant species.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"21"},"PeriodicalIF":3.3,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12849401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145707288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-08DOI: 10.1186/s12859-025-06341-8
Martin Engst, Martin Brokeš, Tereza Čalounová, Raman Samusevich, Roman Bushuiev, Anton Bushuiev, Ratthachat Chatpatanasiri, Adéla Tajovská, Safa Mert Akmeşe, Milana Perković, Matouš Soldát, Josef Sivic, Tomáš Pluskal
Background: Terpene synthases (TPSs) are enzymes that catalyze some of the most complex reactions in nature-the cyclizations of terpenes, which form the carbon backbones to the largest group of natural products, the terpenoids. On average, more than half of the carbon atoms in a terpene scaffold undergo a change in connectivity or configuration during these enzymatic cascades. Understanding TPS reaction mechanisms remains challenging, often requiring intricate computational modeling and isotopic labelling studies. Moreover, the relationship between TPS sequence and catalytic function is difficult to decipher, while data-driven approaches remain limited due to a lack of comprehensive, high-quality data sources. MAIN: We introduce the Mechanisms And Reactions of Terpene Synthases DataBase (MARTS-DB)-a manually curated, structured, and searchable database that integrates TPS enzymes, the terpenes they produce, and their detailed reaction mechanisms. MARTS-DB includes over 2850 reactions catalyzed by 1432 annotated enzymes from across all domains of life, with reaction mechanisms mapped as stepwise cascades for more than 500 terpenes. Accessible at https://www.marts-db.org , the database provides advanced search functionality and supports full dataset downloads in machine-readable formats. It also encourages community contributions to promote continuous growth.
Conclusion: User-friendly and comprehensive, MARTS-DB enables the systematic exploration of TPS catalysis, opening new avenues for computational analysis and machine learning, as recently demonstrated in the prediction of novel TPSs.
{"title":"MARTS-DB: a database of mechanisms and reactions of terpene synthases.","authors":"Martin Engst, Martin Brokeš, Tereza Čalounová, Raman Samusevich, Roman Bushuiev, Anton Bushuiev, Ratthachat Chatpatanasiri, Adéla Tajovská, Safa Mert Akmeşe, Milana Perković, Matouš Soldát, Josef Sivic, Tomáš Pluskal","doi":"10.1186/s12859-025-06341-8","DOIUrl":"10.1186/s12859-025-06341-8","url":null,"abstract":"<p><strong>Background: </strong>Terpene synthases (TPSs) are enzymes that catalyze some of the most complex reactions in nature-the cyclizations of terpenes, which form the carbon backbones to the largest group of natural products, the terpenoids. On average, more than half of the carbon atoms in a terpene scaffold undergo a change in connectivity or configuration during these enzymatic cascades. Understanding TPS reaction mechanisms remains challenging, often requiring intricate computational modeling and isotopic labelling studies. Moreover, the relationship between TPS sequence and catalytic function is difficult to decipher, while data-driven approaches remain limited due to a lack of comprehensive, high-quality data sources. MAIN: We introduce the Mechanisms And Reactions of Terpene Synthases DataBase (MARTS-DB)-a manually curated, structured, and searchable database that integrates TPS enzymes, the terpenes they produce, and their detailed reaction mechanisms. MARTS-DB includes over 2850 reactions catalyzed by 1432 annotated enzymes from across all domains of life, with reaction mechanisms mapped as stepwise cascades for more than 500 terpenes. Accessible at https://www.marts-db.org , the database provides advanced search functionality and supports full dataset downloads in machine-readable formats. It also encourages community contributions to promote continuous growth.</p><p><strong>Conclusion: </strong>User-friendly and comprehensive, MARTS-DB enables the systematic exploration of TPS catalysis, opening new avenues for computational analysis and machine learning, as recently demonstrated in the prediction of novel TPSs.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"10"},"PeriodicalIF":3.3,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12797696/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145707303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-07DOI: 10.1186/s12859-025-06342-7
Chun Hing She, Sophelia Hoi-Shan Chan, Wanling Yang
Background: Accurate structural variant detection from short-read sequencing data remains challenged by false positives, particularly for heterozygous deletions where reduced allelic support and coverage-based detection methods are ambiguous. Existing SV genotyping and filtering approaches suffer from significant recall reductions, dependencies on additional pre-computed resources, or restriction to depth-based signals that overlook read level evidence.
Results: Here we present SVhet, a novel computational framework that leverages the heterozygosity patterns detected from different read evidences to identify false heterozygous deletions. Comprehensive benchmarking using 31 Human Genome Structural Variation Consortium Phase 3 samples demonstrated SVhet's ability to further reduce false positives while maintaining baseline recall. Hybrid approach of duphold and SVhet achieved up to 60% reduction in false positive counts while preserving recall. We also showed SVhet to be computationally efficient that can complete a whole genome structural variant callset under 5 min using 4 CPU cores. SVhet is available under a permissive MIT license via https://github.com/snakesch/SVhet .
Conclusion: SVhet provides an accurate and efficient solution for evaluating heterozygous deletions derived from short read sequencing data. SVhet can be used as a standalone tool or in conjunction with other filtering tools such as duphold. Importantly, it does not require additional variant sets, and can operate with minimal compute. Altogether, SVhet adds to the current effort to achieve accurate structural variant detection using short reads.
{"title":"SVhet: towards accurate detection of germline heterozygous deletions using short reads.","authors":"Chun Hing She, Sophelia Hoi-Shan Chan, Wanling Yang","doi":"10.1186/s12859-025-06342-7","DOIUrl":"10.1186/s12859-025-06342-7","url":null,"abstract":"<p><strong>Background: </strong>Accurate structural variant detection from short-read sequencing data remains challenged by false positives, particularly for heterozygous deletions where reduced allelic support and coverage-based detection methods are ambiguous. Existing SV genotyping and filtering approaches suffer from significant recall reductions, dependencies on additional pre-computed resources, or restriction to depth-based signals that overlook read level evidence.</p><p><strong>Results: </strong>Here we present SVhet, a novel computational framework that leverages the heterozygosity patterns detected from different read evidences to identify false heterozygous deletions. Comprehensive benchmarking using 31 Human Genome Structural Variation Consortium Phase 3 samples demonstrated SVhet's ability to further reduce false positives while maintaining baseline recall. Hybrid approach of duphold and SVhet achieved up to 60% reduction in false positive counts while preserving recall. We also showed SVhet to be computationally efficient that can complete a whole genome structural variant callset under 5 min using 4 CPU cores. SVhet is available under a permissive MIT license via https://github.com/snakesch/SVhet .</p><p><strong>Conclusion: </strong>SVhet provides an accurate and efficient solution for evaluating heterozygous deletions derived from short read sequencing data. SVhet can be used as a standalone tool or in conjunction with other filtering tools such as duphold. Importantly, it does not require additional variant sets, and can operate with minimal compute. Altogether, SVhet adds to the current effort to achieve accurate structural variant detection using short reads.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"9"},"PeriodicalIF":3.3,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12798059/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145699631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-07DOI: 10.1186/s12859-025-06335-6
Annekathrin Silvia Nedwed, Arsenij Ustjanzew, Najla Abassi, Leon Dammer, Alicia Schulze, Sara Salome Helbich, Michael Delacher, Konstantin Strauch, Federico Marini
{"title":"GeDi: simplifying gene set distances for enhanced omics interpretation in R/Bioconductor.","authors":"Annekathrin Silvia Nedwed, Arsenij Ustjanzew, Najla Abassi, Leon Dammer, Alicia Schulze, Sara Salome Helbich, Michael Delacher, Konstantin Strauch, Federico Marini","doi":"10.1186/s12859-025-06335-6","DOIUrl":"10.1186/s12859-025-06335-6","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"14"},"PeriodicalIF":3.3,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12809992/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145699681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Transcription factors and their target genes form regulatory modules known as regulons, which exhibit significant specificity across various cell types. The integration of single-cell transcriptome data, transcription factor motif data, and ChIP-seq data presents a challenging task in identifying cell-type-specific regulons and examining their activities.
Results: In response, this study presents a Deep Structured Semantic Model for inferring and prioritizing cell-type-specific Regulons (DSSMReg). This approach utilizes single-cell transcriptome and transcription factor motif data to map transcription factors and target genes into a low-dimensional semantic space, resulting in the generation of feature vectors. The model then computes the cosine similarity between transcription factors and target genes to evaluate their regulatory strength and subsequently infers cell-type-specific regulons based on this assessment. Moreover, DSSMReg employs the AUCell algorithm to rank the importance of regulons for each cell type.
Conclusions: We compared DSSMReg against five representative gene regulatory inference algorithms using scRNA-seq data from five cell lines, with DSSMReg achieving the highest evaluation metrics for both AUROC and AUPRC. Furthermore, we applied DSSMReg to infer cell-type-specific regulons from scRNA-seq data of triple-negative breast cancer and human bone marrow hematopoietic stem cells. Our results indicated that regulons with high AUCell scores possess significant biological relevance. The source code of DSSMReg is freely available at https://github.com/YaxinF/DSSMReg .
{"title":"A DSSM network for inferring and prioritizing cell-type-specific regulons using single-cell RNA-seq data.","authors":"Yaxin Fan, Yichao Mei, Shengbao Bao, Jianyong Wang, Junxiang Gao","doi":"10.1186/s12859-025-06329-4","DOIUrl":"10.1186/s12859-025-06329-4","url":null,"abstract":"<p><strong>Background: </strong>Transcription factors and their target genes form regulatory modules known as regulons, which exhibit significant specificity across various cell types. The integration of single-cell transcriptome data, transcription factor motif data, and ChIP-seq data presents a challenging task in identifying cell-type-specific regulons and examining their activities.</p><p><strong>Results: </strong>In response, this study presents a Deep Structured Semantic Model for inferring and prioritizing cell-type-specific Regulons (DSSMReg). This approach utilizes single-cell transcriptome and transcription factor motif data to map transcription factors and target genes into a low-dimensional semantic space, resulting in the generation of feature vectors. The model then computes the cosine similarity between transcription factors and target genes to evaluate their regulatory strength and subsequently infers cell-type-specific regulons based on this assessment. Moreover, DSSMReg employs the AUCell algorithm to rank the importance of regulons for each cell type.</p><p><strong>Conclusions: </strong>We compared DSSMReg against five representative gene regulatory inference algorithms using scRNA-seq data from five cell lines, with DSSMReg achieving the highest evaluation metrics for both AUROC and AUPRC. Furthermore, we applied DSSMReg to infer cell-type-specific regulons from scRNA-seq data of triple-negative breast cancer and human bone marrow hematopoietic stem cells. Our results indicated that regulons with high AUCell scores possess significant biological relevance. The source code of DSSMReg is freely available at https://github.com/YaxinF/DSSMReg .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"8"},"PeriodicalIF":3.3,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12798040/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145699660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05DOI: 10.1186/s12859-025-06340-9
Binbin Wu, William W Ja
Circadian rhythms regulate a wide range of biological processes, and their precise characterization is essential for understanding behavioral and physiological fluctuations. However, existing tools to analyze circadian data often require coding expertise or rely on specific data acquisition software, limiting their general applicability. Here, we present easyClock, an intuitive and interactive application designed to streamline circadian rhythm analysis and visualization. The easyClock application enables simultaneous processing of multiple files, allowing users to batch-analyze and visualize diverse sets of time series data. To enhance data analysis efficiency and provide comparable results, this application integrates comprehensive methods for handling data with various waveforms and noises. Additionally, easyClock can assess inter-individual variability and group differences using linear mixed-effects modeling. All statistical results and graphs are easily viewed and exported for any selected range of data. As a demonstration, we present a re-analysis of a time-series transcriptomic dataset, highlighting the value of easyClock as an accessible, open-source tool. This easy-to-use application requires no programming expertise and can be directly installed on Windows and macOS machines in a single step.
{"title":"easyClock: a user-friendly desktop application for circadian rhythm analysis and visualization.","authors":"Binbin Wu, William W Ja","doi":"10.1186/s12859-025-06340-9","DOIUrl":"10.1186/s12859-025-06340-9","url":null,"abstract":"<p><p>Circadian rhythms regulate a wide range of biological processes, and their precise characterization is essential for understanding behavioral and physiological fluctuations. However, existing tools to analyze circadian data often require coding expertise or rely on specific data acquisition software, limiting their general applicability. Here, we present easyClock, an intuitive and interactive application designed to streamline circadian rhythm analysis and visualization. The easyClock application enables simultaneous processing of multiple files, allowing users to batch-analyze and visualize diverse sets of time series data. To enhance data analysis efficiency and provide comparable results, this application integrates comprehensive methods for handling data with various waveforms and noises. Additionally, easyClock can assess inter-individual variability and group differences using linear mixed-effects modeling. All statistical results and graphs are easily viewed and exported for any selected range of data. As a demonstration, we present a re-analysis of a time-series transcriptomic dataset, highlighting the value of easyClock as an accessible, open-source tool. This easy-to-use application requires no programming expertise and can be directly installed on Windows and macOS machines in a single step.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"7"},"PeriodicalIF":3.3,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12797594/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145686715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lung cancer is one of the most prevalent malignant tumors with high morbidity and mortality rates worldwide. Extensive multi-omics analyses have revealed significant intratumoral heterogeneity even within the same histopathological subtype. However, a database that systematically integrates multi-omics data for lung cancer research has long been lacking. Here, we developed LOSTdb, a molecular subtype annotation system for lung cancer that integrates multi-omics data and metadata. LOSTdb comprises 295 multi-omics datasets, including bulk RNA-seq, genomic, proteomic, methylation, and scRNA-seq data, with over 10,000 manually curated metadata entries. This resource encompasses high-quality clinical specimens, mouse models, and cell lines, totaling 34,393 samples and more than 1.2 million single cells. Each omics sample was annotated with both literature-based classical subtypes and NMF-derived meta-program (MP) subtypes. The platform supports cross-searching of omics and metadata at the gene and dataset levels, offers multiple visualization and analysis methods, and includes five tool modules, enabling functions such as integrated analysis, significance analysis between metadata as well as between genes and metadata, and target prediction for lung cancer molecular subtypes, serving as an essential tool for lung cancer precision medicine. LOSTdb is a user-friendly interactive database freely accessible at http://lostdbcancer.com:8080 .
{"title":"LOSTdb: a manually curated multi-omics database for lung cancer research.","authors":"Hao Luo, Yunhao Yang, Zhipeng Gong, Lunxu Liu, Yaohui Chen","doi":"10.1186/s12859-025-06319-6","DOIUrl":"10.1186/s12859-025-06319-6","url":null,"abstract":"<p><p>Lung cancer is one of the most prevalent malignant tumors with high morbidity and mortality rates worldwide. Extensive multi-omics analyses have revealed significant intratumoral heterogeneity even within the same histopathological subtype. However, a database that systematically integrates multi-omics data for lung cancer research has long been lacking. Here, we developed LOSTdb, a molecular subtype annotation system for lung cancer that integrates multi-omics data and metadata. LOSTdb comprises 295 multi-omics datasets, including bulk RNA-seq, genomic, proteomic, methylation, and scRNA-seq data, with over 10,000 manually curated metadata entries. This resource encompasses high-quality clinical specimens, mouse models, and cell lines, totaling 34,393 samples and more than 1.2 million single cells. Each omics sample was annotated with both literature-based classical subtypes and NMF-derived meta-program (MP) subtypes. The platform supports cross-searching of omics and metadata at the gene and dataset levels, offers multiple visualization and analysis methods, and includes five tool modules, enabling functions such as integrated analysis, significance analysis between metadata as well as between genes and metadata, and target prediction for lung cancer molecular subtypes, serving as an essential tool for lung cancer precision medicine. LOSTdb is a user-friendly interactive database freely accessible at http://lostdbcancer.com:8080 .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"290"},"PeriodicalIF":3.3,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12676782/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145666846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02DOI: 10.1186/s12859-025-06323-w
Philipp Georg Heilmann, Emanuel Grosch, Matthias Frisch, Matthias Herrmann, Steffen Beuch, Vivek Kurra, Martin Mascher, Raz Avni, Klaus Oldach, Ina Röhrs, Anja Hanemann, Raja Ram Mehta, Carsten Reinbrecht, Albrecht Serfling, Andreas Stahl, Marco Stucke, Amine Abbadi, Tobias Kox, Carola Zenke-Philippi
{"title":"Haplotype-based autoencoders can reduce the dataset dimension and estimate haplotype block effects in different crop species.","authors":"Philipp Georg Heilmann, Emanuel Grosch, Matthias Frisch, Matthias Herrmann, Steffen Beuch, Vivek Kurra, Martin Mascher, Raz Avni, Klaus Oldach, Ina Röhrs, Anja Hanemann, Raja Ram Mehta, Carsten Reinbrecht, Albrecht Serfling, Andreas Stahl, Marco Stucke, Amine Abbadi, Tobias Kox, Carola Zenke-Philippi","doi":"10.1186/s12859-025-06323-w","DOIUrl":"10.1186/s12859-025-06323-w","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"289"},"PeriodicalIF":3.3,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12670737/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145660248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.1186/s12859-025-06338-3
Jiadong Chu, Yu Wang, Na Sun, Qiang Han, Ziqing Sun, Mengtong Sun, Yuheng Yuan, Qida He, Yueping Shen
Background: Multi-omics integration may provide additional information about the development of tumors and improve the performance of predictive models. The key challenge lies in integrating several omics sources, especially to capture their biological relationships. Previous studies proposed a structural equation model framework to combine two data platforms for predicting survival; however, several limitations remain.
Results: In this study, we introduce an extended Bayesian survival model combined with a structural equation model for adaptation to broader applications. The No U-turn Sampling (NUTS) algorithm was utilized to efficiently sample the posterior distribution of model parameters. Through a series of simulation studies, our model showed excellent goodness-of-fit and predictive performance. To validate the efficiency of our model, we utilized a gastric cancer dataset with three omics types (mRNA, microRNA, and methylation) obtained from The Cancer Genome Atlas. After bioinformatic processing, we included six mRNA, microRNA, and methylation loci datasets into the framework and discovered that our model exhibited greater predictive performance compared to non-integrated and Integrative Bayesian Analysis of Genomics (iBAG) models.
Conclusions: In conclusion, our extended Bayesian structural equation model for multi-omics survival analysis provides a robust framework that significantly enhances predictive accuracy by effectively capturing complex biological relationships across diverse omics data sources, demonstrating clear advantages over both non-integrated approaches and existing integrative methods like iBAG.
{"title":"A parametric survival model with bayesian structural equation based on multi-omics integration.","authors":"Jiadong Chu, Yu Wang, Na Sun, Qiang Han, Ziqing Sun, Mengtong Sun, Yuheng Yuan, Qida He, Yueping Shen","doi":"10.1186/s12859-025-06338-3","DOIUrl":"10.1186/s12859-025-06338-3","url":null,"abstract":"<p><strong>Background: </strong>Multi-omics integration may provide additional information about the development of tumors and improve the performance of predictive models. The key challenge lies in integrating several omics sources, especially to capture their biological relationships. Previous studies proposed a structural equation model framework to combine two data platforms for predicting survival; however, several limitations remain.</p><p><strong>Results: </strong>In this study, we introduce an extended Bayesian survival model combined with a structural equation model for adaptation to broader applications. The No U-turn Sampling (NUTS) algorithm was utilized to efficiently sample the posterior distribution of model parameters. Through a series of simulation studies, our model showed excellent goodness-of-fit and predictive performance. To validate the efficiency of our model, we utilized a gastric cancer dataset with three omics types (mRNA, microRNA, and methylation) obtained from The Cancer Genome Atlas. After bioinformatic processing, we included six mRNA, microRNA, and methylation loci datasets into the framework and discovered that our model exhibited greater predictive performance compared to non-integrated and Integrative Bayesian Analysis of Genomics (iBAG) models.</p><p><strong>Conclusions: </strong>In conclusion, our extended Bayesian structural equation model for multi-omics survival analysis provides a robust framework that significantly enhances predictive accuracy by effectively capturing complex biological relationships across diverse omics data sources, demonstrating clear advantages over both non-integrated approaches and existing integrative methods like iBAG.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"3"},"PeriodicalIF":3.3,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12771807/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145628935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1186/s12859-025-06307-w
Ramu Gautam, Yang Jiao, Yasong Pang, Mo Weng, Mei Yang
{"title":"ProTrack3D: a comprehensive tool for segmentation and tracking of proteins with split and fusion.","authors":"Ramu Gautam, Yang Jiao, Yasong Pang, Mo Weng, Mei Yang","doi":"10.1186/s12859-025-06307-w","DOIUrl":"10.1186/s12859-025-06307-w","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"4"},"PeriodicalIF":3.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777108/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145629068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}