Pub Date : 2025-01-23eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1489704
Anas Al-Okaily, Abdelghani Tbakhi
Data compression is a challenging and increasingly important problem. As the amount of data generated daily continues to increase, efficient transmission and storage have never been more critical. In this study, a novel encoding algorithm is proposed, motivated by the compression of DNA data and associated characteristics. The proposed algorithm follows a divide-and-conquer approach by scanning the whole genome, classifying subsequences based on similarities in their content, and binning similar subsequences together. The data is then compressed into each bin independently. This approach is different than the currently known approaches: entropy, dictionary, predictive, or transform-based methods. Proof-of-concept performance was evaluated using a benchmark dataset with seventeen genomes ranging in size from kilobytes to gigabytes. The results showed a considerable improvement in the compression of each genome, preserving several megabytes compared to state-of-the-art tools. Moreover, the algorithm can be applied to the compression of other data types include mainly text, numbers, images, audio, and video which are being generated daily and unprecedentedly in massive volumes.
{"title":"A novel lossless encoding algorithm for data compression-genomics data as an exemplar.","authors":"Anas Al-Okaily, Abdelghani Tbakhi","doi":"10.3389/fbinf.2024.1489704","DOIUrl":"10.3389/fbinf.2024.1489704","url":null,"abstract":"<p><p>Data compression is a challenging and increasingly important problem. As the amount of data generated daily continues to increase, efficient transmission and storage have never been more critical. In this study, a novel encoding algorithm is proposed, motivated by the compression of DNA data and associated characteristics. The proposed algorithm follows a divide-and-conquer approach by scanning the whole genome, classifying subsequences based on similarities in their content, and binning similar subsequences together. The data is then compressed into each bin independently. This approach is different than the currently known approaches: entropy, dictionary, predictive, or transform-based methods. Proof-of-concept performance was evaluated using a benchmark dataset with seventeen genomes ranging in size from kilobytes to gigabytes. The results showed a considerable improvement in the compression of each genome, preserving several megabytes compared to state-of-the-art tools. Moreover, the algorithm can be applied to the compression of other data types include mainly text, numbers, images, audio, and video which are being generated daily and unprecedentedly in massive volumes.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1489704"},"PeriodicalIF":2.8,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11799261/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143366862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: The development of nanobodies targeting Programmed Cell Death Protein-1 (PD-1) offers a promising approach in cancer immunotherapy. This study aims to design and characterize a PD-1-specific nanobody using an integrated computational and experimental approach.
Methods: An in silico design strategy was employed, involving Complementarity-Determining Region (CDR) grafting to construct the nanobody sequence. The three-dimensional structure of the nanobody was predicted using AlphaFold2, and molecular docking simulations via ClusPro were conducted to evaluate binding interactions with PD-1. Physicochemical properties, including stability and solubility, were analyzed using web-based tools, while molecular dynamics (MD) simulations assessed stability under physiological conditions. The nanobody was produced and purified using Ni-NTA chromatography, and experimental validation was performed through Western blotting, ELISA, and dot blot analysis.
Results: Computational findings demonstrated favorable binding interactions, stability, and physicochemical properties of the nanobody. Experimental results confirmed the nanobody's specific binding affinity to PD-1, with ELISA and dot blot analyses providing evidence of robust interaction.
Discussion: This study highlights the potential of combining computational and experimental approaches for engineering nanobodies. The engineered PD-1 nanobody exhibits promising characteristics, making it a strong candidate for further testing in cancer immunotherapy applications.
{"title":"Innovative CDR grafting and computational methods for PD-1 specific nanobody design.","authors":"Jagadeeswara Reddy Devasani, Girijasankar Guntuku, Nalini Panatula, Murali Krishna Kumar Muthyala, Mary Sulakshana Palla, Teruna J Siahaan","doi":"10.3389/fbinf.2024.1488331","DOIUrl":"10.3389/fbinf.2024.1488331","url":null,"abstract":"<p><strong>Introduction: </strong>The development of nanobodies targeting Programmed Cell Death Protein-1 (PD-1) offers a promising approach in cancer immunotherapy. This study aims to design and characterize a PD-1-specific nanobody using an integrated computational and experimental approach.</p><p><strong>Methods: </strong>An <i>in silico</i> design strategy was employed, involving Complementarity-Determining Region (CDR) grafting to construct the nanobody sequence. The three-dimensional structure of the nanobody was predicted using AlphaFold2, and molecular docking simulations via ClusPro were conducted to evaluate binding interactions with PD-1. Physicochemical properties, including stability and solubility, were analyzed using web-based tools, while molecular dynamics (MD) simulations assessed stability under physiological conditions. The nanobody was produced and purified using Ni-NTA chromatography, and experimental validation was performed through Western blotting, ELISA, and dot blot analysis.</p><p><strong>Results: </strong>Computational findings demonstrated favorable binding interactions, stability, and physicochemical properties of the nanobody. Experimental results confirmed the nanobody's specific binding affinity to PD-1, with ELISA and dot blot analyses providing evidence of robust interaction.</p><p><strong>Discussion: </strong>This study highlights the potential of combining computational and experimental approaches for engineering nanobodies. The engineered PD-1 nanobody exhibits promising characteristics, making it a strong candidate for further testing in cancer immunotherapy applications.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1488331"},"PeriodicalIF":2.8,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11782559/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-15eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1494717
Wang Wenlun, Yu Chaohang, Huang Yan, Li Wenbin, Zhou Nanqing, Hu Qianmin, Wu Shengcai, Yuan Qing, Yu Shirui, Zhang Feng, Zhu Lingyun
The precise role of lncRNAs in skeletal muscle development and atrophy remain elusive. We conducted a bioinformatic analysis of 26 GEO datasets from mouse studies, encompassing embryonic development, postnatal growth, regeneration, cell proliferation, and differentiation, using R and relevant packages (limma et al.). LncRNA-miRNA relationships were predicted using miRcode and lncBaseV2, with miRNA-mRNA pairs identified via miRcode, miRDB, and Targetscan7. Based on the ceRNA theory, we constructed and visualized the lncRNA-miRNA-mRNA regulatory network using ggalluvial among other R packages. GO, Reactome, KEGG, and GSEA explored interactions in muscle development and regeneration. We identified five candidate lncRNAs (Xist, Gas5, Pvt1, Airn, and Meg3) as potential mediators in these processes and microgravity-induced muscle wasting. Additionally, we created a detailed lncRNA-miRNA-mRNA regulatory network, including interactions such as lncRNA Xist/miR-126/IRS1, lncRNA Xist/miR-486-5p/GAB2, lncRNA Pvt1/miR-148/RAB34, and lncRNA Gas5/miR-455-5p/SOCS3. Significant signaling pathway changes (PI3K/Akt, MAPK, NF-κB, cell cycle, AMPK, Hippo, and cAMP) were observed during muscle development, regeneration, and atrophy. Despite bioinformatics challenges, our research underscores the significant roles of lncRNAs in muscle protein synthesis, degradation, cell proliferation, differentiation, function, and metabolism under both normal and microgravity conditions. This study offers new insights into the molecular mechanisms governing skeletal muscle development and regeneration.
{"title":"Developing a ceRNA-based lncRNA-miRNA-mRNA regulatory network to uncover roles in skeletal muscle development.","authors":"Wang Wenlun, Yu Chaohang, Huang Yan, Li Wenbin, Zhou Nanqing, Hu Qianmin, Wu Shengcai, Yuan Qing, Yu Shirui, Zhang Feng, Zhu Lingyun","doi":"10.3389/fbinf.2024.1494717","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1494717","url":null,"abstract":"<p><p>The precise role of lncRNAs in skeletal muscle development and atrophy remain elusive. We conducted a bioinformatic analysis of 26 GEO datasets from mouse studies, encompassing embryonic development, postnatal growth, regeneration, cell proliferation, and differentiation, using R and relevant packages (limma et al.). LncRNA-miRNA relationships were predicted using miRcode and lncBaseV2, with miRNA-mRNA pairs identified via miRcode, miRDB, and Targetscan7. Based on the ceRNA theory, we constructed and visualized the lncRNA-miRNA-mRNA regulatory network using ggalluvial among other R packages. GO, Reactome, KEGG, and GSEA explored interactions in muscle development and regeneration. We identified five candidate lncRNAs (Xist, Gas5, Pvt1, Airn, and Meg3) as potential mediators in these processes and microgravity-induced muscle wasting. Additionally, we created a detailed lncRNA-miRNA-mRNA regulatory network, including interactions such as lncRNA Xist/miR-126/IRS1, lncRNA Xist/miR-486-5p/GAB2, lncRNA Pvt1/miR-148/RAB34, and lncRNA Gas5/miR-455-5p/SOCS3. Significant signaling pathway changes (PI3K/Akt, MAPK, NF-κB, cell cycle, AMPK, Hippo, and cAMP) were observed during muscle development, regeneration, and atrophy. Despite bioinformatics challenges, our research underscores the significant roles of lncRNAs in muscle protein synthesis, degradation, cell proliferation, differentiation, function, and metabolism under both normal and microgravity conditions. This study offers new insights into the molecular mechanisms governing skeletal muscle development and regeneration.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1494717"},"PeriodicalIF":2.8,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774864/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143070043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-14eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1487292
H W Cayatineto, S T Hakim
Introduction: Flaviviridae comprise a group of enveloped, positive-stranded RNA viruses that are mainly transmitted through either mosquitoes or tick bites and/or contaminated blood, blood products, or other body secretions. These viruses cause diseases ranging from mild to severe and are considered important human pathogens. MicroRNAs (miRNAs) are non-coding molecules involved in growth, development, cell proliferation, protein synthesis, apoptosis, and pathogenesis. These small molecules are even being used as gene suppressors in antiviral therapeutics, inhibiting viral replication. In the current study, we used bioinformatic tools to predict a possible miRNA sequence that could be complementary to the nucleocapsid (NP) and/or capsid (CP) gene of the Flaviviridae family and provide an inhibitory solution.
Methods: Bioinformatics is a field of science that includes tremendous computational analysis, logarithms, and sequence alignments. To predict the right alignments between miRNA and viral mRNA genomes, we used computational databases such as miRBase, NCBI, and Basic Alignment Search Tool-nucleotides (BLAST-n).
Results: Of the 2,600 mature miRNAs, hsa-miR-548d-3p revealed complementary sequences with the flavivirus capsid gene and bovine viral diarrhea virus (BVDV) capsid gene and was selected as a possible candidate to inhibit flaviviruses.
Conclusion: Although more detailed in vitro and in vivo studies are required to test the possible inhibitory effects of hsa-miR-548d-3p against flaviviruses, this computational study may be the first step to study further, developing a novel therapeutic for lethal viruses within the Flaviviridae family using suggested candidate miRNAs.
{"title":"hsa-miR-548d-3p: a potential microRNA to target nucleocapsid and/or capsid genes in multiple members of the Flaviviridae family.","authors":"H W Cayatineto, S T Hakim","doi":"10.3389/fbinf.2024.1487292","DOIUrl":"10.3389/fbinf.2024.1487292","url":null,"abstract":"<p><strong>Introduction: </strong>Flaviviridae comprise a group of enveloped, positive-stranded RNA viruses that are mainly transmitted through either mosquitoes or tick bites and/or contaminated blood, blood products, or other body secretions. These viruses cause diseases ranging from mild to severe and are considered important human pathogens. MicroRNAs (miRNAs) are non-coding molecules involved in growth, development, cell proliferation, protein synthesis, apoptosis, and pathogenesis. These small molecules are even being used as gene suppressors in antiviral therapeutics, inhibiting viral replication. In the current study, we used bioinformatic tools to predict a possible miRNA sequence that could be complementary to the nucleocapsid (NP) and/or capsid (CP) gene of the Flaviviridae family and provide an inhibitory solution.</p><p><strong>Methods: </strong>Bioinformatics is a field of science that includes tremendous computational analysis, logarithms, and sequence alignments. To predict the right alignments between miRNA and viral mRNA genomes, we used computational databases such as miRBase, NCBI, and Basic Alignment Search Tool-nucleotides (BLAST-n).</p><p><strong>Results: </strong>Of the 2,600 mature miRNAs, hsa-miR-548d-3p revealed complementary sequences with the flavivirus capsid gene and bovine viral diarrhea virus (BVDV) capsid gene and was selected as a possible candidate to inhibit flaviviruses.</p><p><strong>Conclusion: </strong>Although more detailed <i>in vitro</i> and <i>in vivo</i> studies are required to test the possible inhibitory effects of hsa-miR-548d-3p against flaviviruses, this computational study may be the first step to study further, developing a novel therapeutic for lethal viruses within the Flaviviridae family using suggested candidate miRNAs.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1487292"},"PeriodicalIF":2.8,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11772435/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143060951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-09eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1332782
Aram Safrastyan, Damian Wollny
Cell-cell communication mediated by ligand-receptor interactions (LRI) is critical to coordinating diverse biological processes in homeostasis and disease. Lately, our understanding of these processes has greatly expanded through the inference of cellular communication, utilizing RNA extracted from bulk tissue or individual cells. Considering the challenge of obtaining tissue biopsies for these approaches, we considered the potential of studying cell-free RNA obtained from blood. To test the feasibility of this approach, we used the BulkSignalR algorithm across 295 cell-free RNA samples and compared the LRI profiles across multiple cancer types and healthy donors. Interestingly, we detected specific and reproducible LRIs particularly in the blood of liver cancer patients compared to healthy donors. We found an increase in the magnitude of hepatocyte interactions, notably hepatocyte autocrine interactions in liver cancer patients. Additionally, a robust panel of 30 liver cancer-specific LRIs presents a bridge linking liver cancer pathogenesis to discernible blood markers. In summary, our approach shows the plausibility of detecting liver LRIs in blood and builds upon the biological understanding of cell-free transcriptomes.
{"title":"Detection of reproducible liver cancer specific ligand-receptor signaling in blood.","authors":"Aram Safrastyan, Damian Wollny","doi":"10.3389/fbinf.2024.1332782","DOIUrl":"10.3389/fbinf.2024.1332782","url":null,"abstract":"<p><p>Cell-cell communication mediated by ligand-receptor interactions (LRI) is critical to coordinating diverse biological processes in homeostasis and disease. Lately, our understanding of these processes has greatly expanded through the inference of cellular communication, utilizing RNA extracted from bulk tissue or individual cells. Considering the challenge of obtaining tissue biopsies for these approaches, we considered the potential of studying cell-free RNA obtained from blood. To test the feasibility of this approach, we used the BulkSignalR algorithm across 295 cell-free RNA samples and compared the LRI profiles across multiple cancer types and healthy donors. Interestingly, we detected specific and reproducible LRIs particularly in the blood of liver cancer patients compared to healthy donors. We found an increase in the magnitude of hepatocyte interactions, notably hepatocyte autocrine interactions in liver cancer patients. Additionally, a robust panel of 30 liver cancer-specific LRIs presents a bridge linking liver cancer pathogenesis to discernible blood markers. In summary, our approach shows the plausibility of detecting liver LRIs in blood and builds upon the biological understanding of cell-free transcriptomes.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1332782"},"PeriodicalIF":2.8,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-07eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1546680
Dapeng Wang, Giuseppe Agapito
{"title":"Editorial: Multi-omics approaches in the study of human disease mechanisms.","authors":"Dapeng Wang, Giuseppe Agapito","doi":"10.3389/fbinf.2024.1546680","DOIUrl":"10.3389/fbinf.2024.1546680","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1546680"},"PeriodicalIF":2.8,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747011/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143017425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1493712
Vineeta Pandey, Aarshi Srivastava, Ramwant Gupta, Haitham E M Zaki, Muhammad Shafiq Shahid, Rajarshi K Gaur
Phytoplasma, a potentially hazardous pathogen associated with witches' broom, is an economically harmful disease-producing bacteria that damages chilli cultivation. Phytoplasma-infected plants display various symptoms that indicate significant disruptions in normal plant physiology and behaviour. Diseases caused by phytoplasma are widespread and have a major economic impact on crop quality and yield. This work focuses on identifying and examining chilli microRNAs (miRNAs) as potential targets against the 16S rRNA and secA gene of "Candidatus Phytoplasma trifolii" ("Ca. P. trifolii") through plant miRNA prediction algorithms. Mature chilli miRNAs (CA-miRNAs) were collected and used to hybridise the 16S rRNA and secA genes. A total of four common CA-miRNAs were picked according to genetic consensus. Three algorithms applied in the present study suggested that the physiologically relevant, top-ranked miR169b_2 has a possibly specific site at nucleotide position 1,006 for targeting the 'Ca. P. trifolii' 16S rRNA gene. The circos algorithm was then utilised to create the miRNA-mRNA regulatory network. The free energy between the miRNA:mRNA duplex was also computed, and the best value of -17.46 kcal/mol was obtained for CA-miR166c_2. Currently, there are no suitable commercial 'Ca. P. trifolii'-resistant chilli crops. As a result, the expected biological data provide useful evidence for developing 'Ca. P. trifolii'-resistant chilli plants.
植物原体是一种与“女巫扫帚”有关的潜在危险病原体,是一种对经济有害的致病细菌,会损害辣椒的种植。植物原体感染的植物表现出各种症状,表明正常的植物生理和行为受到严重破坏。植物原体引起的疾病广泛传播,对作物品质和产量产生重大经济影响。本研究的重点是通过植物miRNA预测算法,鉴定和检测辣椒microRNAs (miRNAs)作为“Candidatus Phytoplasma trifolii”(“Ca. P. trifolii”)16S rRNA和secA基因的潜在靶点。收集辣椒成熟miRNAs (CA-miRNAs),用于16S rRNA和secA基因的杂交。根据遗传共识,共选择了四种常见的ca - mirna。本研究中应用的三种算法表明,在生理上相关的、排名靠前的miR169b_2在核苷酸位置1006上可能有一个靶向“Ca. P. trifolii”16S rRNA基因的特定位点。然后利用circos算法创建miRNA-mRNA调控网络。计算了miRNA:mRNA双链间的自由能,CA-miR166c_2的最佳自由能为-17.46 kcal/mol。目前,还没有合适的抗“三斑螟”的商业辣椒作物。结果表明,本研究得到的生物学数据为培育抗三斑卡虫辣椒品种提供了有益的依据。
{"title":"<i>In silico</i> identification of chilli genome encoded MicroRNAs targeting the 16S rRNA and <i>secA</i> genes of \"<i>Candidatus</i> phytoplasma trifolii<i>\"</i>.","authors":"Vineeta Pandey, Aarshi Srivastava, Ramwant Gupta, Haitham E M Zaki, Muhammad Shafiq Shahid, Rajarshi K Gaur","doi":"10.3389/fbinf.2024.1493712","DOIUrl":"10.3389/fbinf.2024.1493712","url":null,"abstract":"<p><p>Phytoplasma, a potentially hazardous pathogen associated with witches' broom, is an economically harmful disease-producing bacteria that damages chilli cultivation. Phytoplasma-infected plants display various symptoms that indicate significant disruptions in normal plant physiology and behaviour. Diseases caused by phytoplasma are widespread and have a major economic impact on crop quality and yield. This work focuses on identifying and examining chilli microRNAs (miRNAs) as potential targets against the 16S rRNA and <i>secA</i> gene of \"<i>Candidatus</i> Phytoplasma trifolii\" (\"<i>Ca</i>. P. trifolii\") through plant miRNA prediction algorithms. Mature chilli miRNAs (CA-miRNAs) were collected and used to hybridise the 16S rRNA and <i>secA</i> genes. A total of four common CA-miRNAs were picked according to genetic consensus. Three algorithms applied in the present study suggested that the physiologically relevant, top-ranked miR169b_2 has a possibly specific site at nucleotide position 1,006 for targeting the '<i>Ca</i>. P. trifolii' 16S rRNA gene. The circos algorithm was then utilised to create the miRNA-mRNA regulatory network. The free energy between the miRNA:mRNA duplex was also computed, and the best value of -17.46 kcal/mol was obtained for CA-miR166c_2. Currently, there are no suitable commercial '<i>Ca</i>. P. trifolii'-resistant chilli crops. As a result, the expected biological data provide useful evidence for developing '<i>Ca</i>. P. trifolii'-resistant chilli plants.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1493712"},"PeriodicalIF":2.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11743513/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143017424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1483255
Benjamin Dubois, Mathieu Delitte, Salomé Lengrand, Claude Bragard, Anne Legrève, Frédéric Debode
Background: The study of sample taxonomic composition has evolved from direct observations and labor-intensive morphological studies to different DNA sequencing methodologies. Most of these studies leverage the metabarcoding approach, which involves the amplification of a small taxonomically-informative portion of the genome and its subsequent high-throughput sequencing. Recent advances in sequencing technology brought by Oxford Nanopore Technologies have revolutionized the field, enabling portability, affordable cost and long-read sequencing, therefore leading to a significant increase in taxonomic resolution. However, Nanopore sequencing data exhibit a particular profile, with a higher error rate compared with Illumina sequencing, and existing bioinformatics pipelines for the analysis of such data are scarce and often insufficient, requiring specialized tools to accurately process long-read sequences.
Results: We present PRONAME (PROcessing NAnopore MEtabarcoding data), an open-source, user-friendly pipeline optimized for processing raw Nanopore sequencing data. PRONAME includes precompiled databases for complete 16S sequences (Silva138 and Greengenes2) and a newly developed and curated database dedicated to bacterial 16S-ITS-23S operon sequences. The user can also provide a custom database if desired, therefore enabling the analysis of metabarcoding data for any domain of life. The pipeline significantly improves sequence accuracy, implementing innovative error-correction strategies and taking advantage of the new sequencing chemistry to produce high-quality duplex reads. Evaluations using a mock community have shown that PRONAME delivers consensus sequences demonstrating at least 99.5% accuracy with standard settings (and up to 99.7%), making it a robust tool for genomic analysis of complex multi-species communities.
Conclusion: PRONAME meets the challenges of long-read Nanopore data processing, offering greater accuracy and versatility than existing pipelines. By integrating Nanopore-specific quality filtering, clustering and error correction, PRONAME produces high-precision consensus sequences. This brings the accuracy of Nanopore sequencing close to that of Illumina sequencing, while taking advantage of the benefits of long-read technologies.
背景:样品分类组成的研究已经从直接观察和劳动密集型形态学研究发展到不同的DNA测序方法。这些研究大多利用元条形码方法,其中涉及基因组的一小部分分类信息的扩增和随后的高通量测序。牛津纳米孔技术带来的测序技术的最新进展已经彻底改变了该领域,使便携性,可负担的成本和长读测序,因此导致分类分辨率的显着增加。然而,与Illumina测序相比,纳米孔测序数据具有更高的错误率,并且现有的用于分析此类数据的生物信息学管道稀缺且往往不足,需要专门的工具来准确处理长读序列。结果:我们提出了PRONAME(处理纳米孔元条形码数据),一个开源的,用户友好的管道,优化处理原始纳米孔测序数据。PRONAME包括预编译的完整16S序列数据库(Silva138和Greengenes2)和一个新开发的专门用于细菌16S- its - 23s操纵子序列的数据库。如果需要,用户还可以提供自定义数据库,因此可以分析任何生命领域的元条形码数据。该管道显着提高了序列准确性,实施了创新的纠错策略,并利用新的测序化学来产生高质量的双工读取。使用模拟群落的评估表明,PRONAME提供的一致性序列在标准设置下的准确率至少为99.5%(最高可达99.7%),使其成为复杂多物种群落基因组分析的强大工具。结论:PRONAME满足长读纳米孔数据处理的挑战,提供比现有管道更高的准确性和通用性。通过整合纳米孔特定的质量过滤,聚类和纠错,PRONAME产生高精度的共识序列。这使得纳米孔测序的准确性接近Illumina测序,同时利用了长读技术的优势。
{"title":"PRONAME: a user-friendly pipeline to process long-read nanopore metabarcoding data by generating high-quality consensus sequences.","authors":"Benjamin Dubois, Mathieu Delitte, Salomé Lengrand, Claude Bragard, Anne Legrève, Frédéric Debode","doi":"10.3389/fbinf.2024.1483255","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1483255","url":null,"abstract":"<p><strong>Background: </strong>The study of sample taxonomic composition has evolved from direct observations and labor-intensive morphological studies to different DNA sequencing methodologies. Most of these studies leverage the metabarcoding approach, which involves the amplification of a small taxonomically-informative portion of the genome and its subsequent high-throughput sequencing. Recent advances in sequencing technology brought by Oxford Nanopore Technologies have revolutionized the field, enabling portability, affordable cost and long-read sequencing, therefore leading to a significant increase in taxonomic resolution. However, Nanopore sequencing data exhibit a particular profile, with a higher error rate compared with Illumina sequencing, and existing bioinformatics pipelines for the analysis of such data are scarce and often insufficient, requiring specialized tools to accurately process long-read sequences.</p><p><strong>Results: </strong>We present PRONAME (PROcessing NAnopore MEtabarcoding data), an open-source, user-friendly pipeline optimized for processing raw Nanopore sequencing data. PRONAME includes precompiled databases for complete 16S sequences (Silva138 and Greengenes2) and a newly developed and curated database dedicated to bacterial 16S-ITS-23S operon sequences. The user can also provide a custom database if desired, therefore enabling the analysis of metabarcoding data for any domain of life. The pipeline significantly improves sequence accuracy, implementing innovative error-correction strategies and taking advantage of the new sequencing chemistry to produce high-quality duplex reads. Evaluations using a mock community have shown that PRONAME delivers consensus sequences demonstrating at least 99.5% accuracy with standard settings (and up to 99.7%), making it a robust tool for genomic analysis of complex multi-species communities.</p><p><strong>Conclusion: </strong>PRONAME meets the challenges of long-read Nanopore data processing, offering greater accuracy and versatility than existing pipelines. By integrating Nanopore-specific quality filtering, clustering and error correction, PRONAME produces high-precision consensus sequences. This brings the accuracy of Nanopore sequencing close to that of Illumina sequencing, while taking advantage of the benefits of long-read technologies.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1483255"},"PeriodicalIF":2.8,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-16eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1495417
Jack M Craig, S Blair Hedges, Sudhir Kumar
Primates, consisting of apes, monkeys, tarsiers, and lemurs, are among the most charismatic and well-studied animals on Earth, yet there is no taxonomically complete molecular timetree for the group. Combining the latest large-scale genomic primate phylogeny of 205 recognized species with the 400-species literature consensus tree available from TimeTree.org yields a phylogeny of just 405 primates, with 50 species still missing despite having molecular sequence data in the NCBI GenBank. In this study, we assemble a timetree of 455 primates, incorporating every species for which molecular data are available. We use a synthetic approach consisting of a literature review for published timetrees, de novo dating of untimed trees, and assembly of timetrees from novel alignments. The resulting near-complete molecular timetree of primates allows testing of two long-standing alternate hypotheses for the origins of primate biodiversity: whether species richness arises at a constant rate, in which case older clades have more species, or whether some clades exhibit faster rates of speciation than others, in which case, these fast clades would be more species-rich. Consistent with other large-scale macroevolutionary analyses, we found that the speciation rate is similar across the primate tree of life, albeit with some variation in smaller clades.
{"title":"Completing a molecular timetree of primates.","authors":"Jack M Craig, S Blair Hedges, Sudhir Kumar","doi":"10.3389/fbinf.2024.1495417","DOIUrl":"10.3389/fbinf.2024.1495417","url":null,"abstract":"<p><p>Primates, consisting of apes, monkeys, tarsiers, and lemurs, are among the most charismatic and well-studied animals on Earth, yet there is no taxonomically complete molecular timetree for the group. Combining the latest large-scale genomic primate phylogeny of 205 recognized species with the 400-species literature consensus tree available from TimeTree.org yields a phylogeny of just 405 primates, with 50 species still missing despite having molecular sequence data in the NCBI GenBank. In this study, we assemble a timetree of 455 primates, incorporating every species for which molecular data are available. We use a synthetic approach consisting of a literature review for published timetrees, <i>de novo</i> dating of untimed trees, and assembly of timetrees from novel alignments. The resulting near-complete molecular timetree of primates allows testing of two long-standing alternate hypotheses for the origins of primate biodiversity: whether species richness arises at a constant rate, in which case older clades have more species, or whether some clades exhibit faster rates of speciation than others, in which case, these fast clades would be more species-rich. Consistent with other large-scale macroevolutionary analyses, we found that the speciation rate is similar across the primate tree of life, albeit with some variation in smaller clades.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1495417"},"PeriodicalIF":2.8,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11683086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142908053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-16eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1510352
Benson R Kidenya, Gerald Mboowa
{"title":"Unlocking the future of complex human diseases prediction: multi-omics risk score breakthrough.","authors":"Benson R Kidenya, Gerald Mboowa","doi":"10.3389/fbinf.2024.1510352","DOIUrl":"10.3389/fbinf.2024.1510352","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1510352"},"PeriodicalIF":2.8,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11682975/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142908057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}