Transcriptomics is the study of RNA transcripts, the portion of the genome that is transcribed, in a specific cell, tissue, or organism. Transcriptomics provides insight into gene expression patterns, regulation, and the underlying mechanisms of cellular processes. Community transcriptomics takes this a step further by studying the RNA transcripts from environmental assemblies of organisms, with the intention of better understanding the interactions between members of the community. Community transcriptomics requires successful extraction of RNA from a diverse set of organisms and subsequent analysis via mapping those reads to a reference genome or de novo assembly of the reads. Both, extraction protocols and the analysis steps can pose hurdles for community transcriptomics. This review covers advances in transcriptomic techniques and assesses the viability of applying them to community transcriptomics.
{"title":"Environmental community transcriptomics: strategies and struggles.","authors":"Jeanet Mante, Kyra E Groover, Randi M Pullen","doi":"10.1093/bfgp/elae033","DOIUrl":"10.1093/bfgp/elae033","url":null,"abstract":"<p><p>Transcriptomics is the study of RNA transcripts, the portion of the genome that is transcribed, in a specific cell, tissue, or organism. Transcriptomics provides insight into gene expression patterns, regulation, and the underlying mechanisms of cellular processes. Community transcriptomics takes this a step further by studying the RNA transcripts from environmental assemblies of organisms, with the intention of better understanding the interactions between members of the community. Community transcriptomics requires successful extraction of RNA from a diverse set of organisms and subsequent analysis via mapping those reads to a reference genome or de novo assembly of the reads. Both, extraction protocols and the analysis steps can pose hurdles for community transcriptomics. This review covers advances in transcriptomic techniques and assesses the viability of applying them to community transcriptomics.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tyler Kolisnik, Faeze Keshavarz-Rahaghi, Rachel V Purcell, Adam N H Smith, Olin K Silander
Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn "RandomForestClassifier" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.
随机森林模型被广泛应用于基因组数据分析,并能深入揭示复杂的生物机制,尤其是当特征以交互、非线性或非相加的方式影响目标时。目前,一些计算速度最快的随机森林方法是用 Python 实现的。然而,许多生物学家使用 R 进行基因组数据分析,因为 R 提供了一个统一的平台来执行额外的统计分析和可视化。pyRforest 继承了 Python 的高效内存管理和并行化功能,并针对大型基因组数据集(如 RNA-seq 数据集)上的分类任务进行了优化。这种方法可用于估算和直观显示单个特征的 P 值,使研究人员能够识别出有可靠统计证据表明存在效应的特征子集。此外,pyRforest 还包括 SHapley Additive exPlanations 值的计算和可视化方法。pyRforest 结合了 Python 和 R 的优势,从而改进了用于基因组数据分析的随机森林模型的实现和可解释性。pyRforest 的下载地址为:https://www.github.com/tkolisnik/pyRforest,相关的 vignette 下载地址为:https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf。
{"title":"pyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.","authors":"Tyler Kolisnik, Faeze Keshavarz-Rahaghi, Rachel V Purcell, Adam N H Smith, Olin K Silander","doi":"10.1093/bfgp/elae038","DOIUrl":"10.1093/bfgp/elae038","url":null,"abstract":"<p><p>Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn \"RandomForestClassifier\" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735746/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142382536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When the traditional random forest (RF) algorithm is used to select feature elements in biostatistical data, a large amount of noise data and parameters can affect the importance of the selected feature elements, making the control of feature selection difficult. Therefore, it is a challenge for the traditional RF algorithm to preserve the accuracy of algorithm results in the presence of noise data. Generally, directly removing noise data can result in significant bias in the results. In this study, we develop a new algorithm, standardized threshold, and loops based random forest (STLBRF), and apply it to the field of gene expression data for feature gene selection. This algorithm, based on the traditional RF algorithm, combines backward elimination and K-fold cross-validation to construct a cyclic system and set a standardized threshold: error increment. The algorithm overcomes the shortcomings of existing gene selection methods. We compare ridge regression, lasso regression, elastic net regression, the traditional RF algorithm, and our improved RF algorithm using three real gene expression datasets and conducting a quantitative analysis. To ensure the reliability of the results, we validate the effectiveness of the genes selected by these methods using the Random Forest classifier. The results indicate that, compared to other methods, the STLBRF algorithm achieves not only higher effectiveness in feature gene selection but also better control over the number of selected genes. Our method offers reliable technical support for feature expression analysis and research on biomarker selection.
{"title":"STLBRF: an improved random forest algorithm based on standardized-threshold for feature screening of gene expression data.","authors":"Huini Feng, Ying Ju, Xiaofeng Yin, Wenshi Qiu, Xu Zhang","doi":"10.1093/bfgp/elae048","DOIUrl":"10.1093/bfgp/elae048","url":null,"abstract":"<p><p>When the traditional random forest (RF) algorithm is used to select feature elements in biostatistical data, a large amount of noise data and parameters can affect the importance of the selected feature elements, making the control of feature selection difficult. Therefore, it is a challenge for the traditional RF algorithm to preserve the accuracy of algorithm results in the presence of noise data. Generally, directly removing noise data can result in significant bias in the results. In this study, we develop a new algorithm, standardized threshold, and loops based random forest (STLBRF), and apply it to the field of gene expression data for feature gene selection. This algorithm, based on the traditional RF algorithm, combines backward elimination and K-fold cross-validation to construct a cyclic system and set a standardized threshold: error increment. The algorithm overcomes the shortcomings of existing gene selection methods. We compare ridge regression, lasso regression, elastic net regression, the traditional RF algorithm, and our improved RF algorithm using three real gene expression datasets and conducting a quantitative analysis. To ensure the reliability of the results, we validate the effectiveness of the genes selected by these methods using the Random Forest classifier. The results indicate that, compared to other methods, the STLBRF algorithm achieves not only higher effectiveness in feature gene selection but also better control over the number of selected genes. Our method offers reliable technical support for feature expression analysis and research on biomarker selection.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735748/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142907874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hwisoo Choi, Hyeonkyu Kim, Hoebin Chung, Dong-Sung Lee, Junil Kim
Recent advancements in single-cell technologies, including single-cell RNA sequencing (scRNA-seq) and Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), have greatly improved our insight into the epigenomic landscapes across various biological contexts and diseases. This paper reviews key computational tools and machine learning approaches that integrate scRNA-seq and scATAC-seq data to facilitate the alignment of transcriptomic data with chromatin accessibility profiles. Applying these integrated single-cell technologies in neurodegenerative diseases, such as Alzheimer's disease and Parkinson's disease, reveals how changes in chromatin accessibility and gene expression can illuminate pathogenic mechanisms and identify potential therapeutic targets. Despite facing challenges like data sparsity and computational demands, ongoing enhancements in scATAC-seq and scRNA-seq technologies, along with better analytical methods, continue to expand their applications. These advancements promise to revolutionize our approach to medical research and clinical diagnostics, offering a comprehensive view of cellular function and disease pathology.
{"title":"Application of computational algorithms for single-cell RNA-seq and ATAC-seq in neurodegenerative diseases.","authors":"Hwisoo Choi, Hyeonkyu Kim, Hoebin Chung, Dong-Sung Lee, Junil Kim","doi":"10.1093/bfgp/elae044","DOIUrl":"10.1093/bfgp/elae044","url":null,"abstract":"<p><p>Recent advancements in single-cell technologies, including single-cell RNA sequencing (scRNA-seq) and Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), have greatly improved our insight into the epigenomic landscapes across various biological contexts and diseases. This paper reviews key computational tools and machine learning approaches that integrate scRNA-seq and scATAC-seq data to facilitate the alignment of transcriptomic data with chromatin accessibility profiles. Applying these integrated single-cell technologies in neurodegenerative diseases, such as Alzheimer's disease and Parkinson's disease, reveals how changes in chromatin accessibility and gene expression can illuminate pathogenic mechanisms and identify potential therapeutic targets. Despite facing challenges like data sparsity and computational demands, ongoing enhancements in scATAC-seq and scRNA-seq technologies, along with better analytical methods, continue to expand their applications. These advancements promise to revolutionize our approach to medical research and clinical diagnostics, offering a comprehensive view of cellular function and disease pathology.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735751/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142584958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangfei Huang, Zilu Yu, Juan Tian, Tao Chen, Aiping Wei, Chao Mei, Shibiao Chen, Yong Li
Organ fibrosis, a common consequence of chronic tissue injury, presents a significant health challenge. Recent research has revealed the regulatory role of N6-methyladenosine (m6A) RNA modification in fibrosis of various organs, including the lung, liver, kidney, and heart. In this comprehensive review, we summarize the latest findings on the mechanisms and functions of m6A modification in organ fibrosis. By highlighting the potential of m6A modification as a therapeutic target, our goal is to encourage further research in this emerging field and support advancements in the clinical treatment of organ fibrosis.
{"title":"m6A RNA modification pathway: orchestrating fibrotic mechanisms across multiple organs.","authors":"Xiangfei Huang, Zilu Yu, Juan Tian, Tao Chen, Aiping Wei, Chao Mei, Shibiao Chen, Yong Li","doi":"10.1093/bfgp/elae051","DOIUrl":"10.1093/bfgp/elae051","url":null,"abstract":"<p><p>Organ fibrosis, a common consequence of chronic tissue injury, presents a significant health challenge. Recent research has revealed the regulatory role of N6-methyladenosine (m6A) RNA modification in fibrosis of various organs, including the lung, liver, kidney, and heart. In this comprehensive review, we summarize the latest findings on the mechanisms and functions of m6A modification in organ fibrosis. By highlighting the potential of m6A modification as a therapeutic target, our goal is to encourage further research in this emerging field and support advancements in the clinical treatment of organ fibrosis.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735750/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-throughput gene expression data have been extensively generated and utilized in biological mechanism investigations, biomarker detection, disease diagnosis and prognosis. These applications encompass not only bulk transcriptome, but also single cell RNA-seq data. However, extracting reliable biological information from transcriptome data remains challenging due to the constrains of Compositional Data Analysis. Current data preprocessing methods, including dataset normalization and batch effect correction, are insufficient to address these issues and improve data quality for downstream analysis. Alternatively, qualification methods focusing on the relative order of gene expression (ROGER) are more informative than the quantification methods that rely on gene expression abundance. The Pairwise Analysis of Gene expression method is an enhancement of ROGER, designed for data integration in either sample space or feature space. In this review, we summarize the methods applied to transcriptome data analysis and discuss their potentials in predicting clinical outcomes.
高通量基因表达数据已广泛产生并用于生物机制研究、生物标记物检测、疾病诊断和预后。这些应用不仅包括大量转录组数据,还包括单细胞 RNA-seq 数据。然而,由于合成数据分析的限制,从转录组数据中提取可靠的生物信息仍然具有挑战性。目前的数据预处理方法,包括数据集归一化和批量效应校正,都不足以解决这些问题并提高下游分析的数据质量。另外,与依赖基因表达丰度的定量方法相比,侧重于基因表达相对顺序(ROGER)的定性方法信息量更大。基因表达成对分析方法是 ROGER 的增强版,旨在对样本空间或特征空间进行数据整合。在这篇综述中,我们总结了应用于转录组数据分析的方法,并讨论了这些方法在预测临床结果方面的潜力。
{"title":"Less is more: relative rank is more informative than absolute abundance for compositional NGS data.","authors":"Xubin Zheng, Nana Jin, Qiong Wu, Ning Zhang, Haonan Wu, Yuanhao Wang, Rui Luo, Tao Liu, Wanfu Ding, Qingshan Geng, Lixin Cheng","doi":"10.1093/bfgp/elae045","DOIUrl":"10.1093/bfgp/elae045","url":null,"abstract":"<p><p>High-throughput gene expression data have been extensively generated and utilized in biological mechanism investigations, biomarker detection, disease diagnosis and prognosis. These applications encompass not only bulk transcriptome, but also single cell RNA-seq data. However, extracting reliable biological information from transcriptome data remains challenging due to the constrains of Compositional Data Analysis. Current data preprocessing methods, including dataset normalization and batch effect correction, are insufficient to address these issues and improve data quality for downstream analysis. Alternatively, qualification methods focusing on the relative order of gene expression (ROGER) are more informative than the quantification methods that rely on gene expression abundance. The Pairwise Analysis of Gene expression method is an enhancement of ROGER, designed for data integration in either sample space or feature space. In this review, we summarize the methods applied to transcriptome data analysis and discuss their potentials in predicting clinical outcomes.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735744/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adam Wojtulewski, Aleksandra Sikora, Sean Dineen, Mustafa Raoof, Aleksandra Karolak
Objective: The primary objective of this study is to investigate various applications of artificial intelligence (AI) and statistical methodologies for analyzing and managing peritoneal metastases (PM) caused by gastrointestinal cancers.
Methods: Relevant keywords and search criteria were comprehensively researched on PubMed and Google Scholar to identify articles and reviews related to the topic. The AI approaches considered were conventional machine learning (ML) and deep learning (DL) models, and the relevant statistical approaches included biostatistics and logistic models.
Results: The systematic literature review yielded nearly 30 articles meeting the predefined criteria. Analyses of these studies showed that AI methodologies consistently outperformed traditional statistical approaches. In the AI approaches, DL consistently produced the most precise results, while classical ML demonstrated varied performance but maintained high predictive accuracy. The sample size was the recurring factor that increased the accuracy of the predictions for models of the same type.
Conclusions: AI and statistical approaches can detect PM developing among patients with gastrointestinal cancers. Therefore, if clinicians integrated these approaches into diagnostics and prognostics, they could better analyze and manage PM, enhancing clinical decision-making and patients' outcomes. Collaboration across multiple institutions would also help in standardizing methods for data collection and allowing consistent results.
{"title":"Using artificial intelligence and statistics for managing peritoneal metastases from gastrointestinal cancers.","authors":"Adam Wojtulewski, Aleksandra Sikora, Sean Dineen, Mustafa Raoof, Aleksandra Karolak","doi":"10.1093/bfgp/elae049","DOIUrl":"10.1093/bfgp/elae049","url":null,"abstract":"<p><strong>Objective: </strong>The primary objective of this study is to investigate various applications of artificial intelligence (AI) and statistical methodologies for analyzing and managing peritoneal metastases (PM) caused by gastrointestinal cancers.</p><p><strong>Methods: </strong>Relevant keywords and search criteria were comprehensively researched on PubMed and Google Scholar to identify articles and reviews related to the topic. The AI approaches considered were conventional machine learning (ML) and deep learning (DL) models, and the relevant statistical approaches included biostatistics and logistic models.</p><p><strong>Results: </strong>The systematic literature review yielded nearly 30 articles meeting the predefined criteria. Analyses of these studies showed that AI methodologies consistently outperformed traditional statistical approaches. In the AI approaches, DL consistently produced the most precise results, while classical ML demonstrated varied performance but maintained high predictive accuracy. The sample size was the recurring factor that increased the accuracy of the predictions for models of the same type.</p><p><strong>Conclusions: </strong>AI and statistical approaches can detect PM developing among patients with gastrointestinal cancers. Therefore, if clinicians integrated these approaches into diagnostics and prognostics, they could better analyze and manage PM, enhancing clinical decision-making and patients' outcomes. Collaboration across multiple institutions would also help in standardizing methods for data collection and allowing consistent results.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142907876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomas Klingström, Emelie Zonabend König, Avhashoni Agnes Zwane
Phenotyping of animals is a routine task in agriculture which can provide large datasets for the functional annotation of genomes. Using the livestock farming sector to study complex traits enables genetics researchers to fully benefit from the digital transformation of society as economies of scale substantially reduces the cost of phenotyping animals on farms. In the agricultural sector genomics has transitioned towards a model of 'Genomics without the genes' as a large proportion of the genetic variation in animals can be modelled using the infinitesimal model for genomic breeding valuations. Combined with third generation sequencing creating pan-genomes for livestock the digital infrastructure for trait collection and precision farming provides a unique opportunity for high-throughput phenotyping and the study of complex traits in a controlled environment. The emphasis on cost efficient data collection mean that mobile phones and computers have become ubiquitous for cost-efficient large-scale data collection but that the majority of the recorded traits can still be recorded manually with limited training or tools. This is especially valuable in low- and middle income countries and in settings where indigenous breeds are kept at farms preserving more traditional farming methods. Digitalization is therefore an important enabler for high-throughput phenotyping for smaller livestock herds with limited technology investments as well as large-scale commercial operations. It is demanding and challenging for individual researchers to keep up with the opportunities created by the rapid advances in digitalization for livestock farming and how it can be used by researchers with or without a specialization in livestock. This review provides an overview of the current status of key enabling technologies for precision livestock farming applicable for the functional annotation of genomes.
{"title":"Beyond the hype: using AI, big data, wearable devices, and the internet of things for high-throughput livestock phenotyping.","authors":"Tomas Klingström, Emelie Zonabend König, Avhashoni Agnes Zwane","doi":"10.1093/bfgp/elae032","DOIUrl":"10.1093/bfgp/elae032","url":null,"abstract":"<p><p>Phenotyping of animals is a routine task in agriculture which can provide large datasets for the functional annotation of genomes. Using the livestock farming sector to study complex traits enables genetics researchers to fully benefit from the digital transformation of society as economies of scale substantially reduces the cost of phenotyping animals on farms. In the agricultural sector genomics has transitioned towards a model of 'Genomics without the genes' as a large proportion of the genetic variation in animals can be modelled using the infinitesimal model for genomic breeding valuations. Combined with third generation sequencing creating pan-genomes for livestock the digital infrastructure for trait collection and precision farming provides a unique opportunity for high-throughput phenotyping and the study of complex traits in a controlled environment. The emphasis on cost efficient data collection mean that mobile phones and computers have become ubiquitous for cost-efficient large-scale data collection but that the majority of the recorded traits can still be recorded manually with limited training or tools. This is especially valuable in low- and middle income countries and in settings where indigenous breeds are kept at farms preserving more traditional farming methods. Digitalization is therefore an important enabler for high-throughput phenotyping for smaller livestock herds with limited technology investments as well as large-scale commercial operations. It is demanding and challenging for individual researchers to keep up with the opportunities created by the rapid advances in digitalization for livestock farming and how it can be used by researchers with or without a specialization in livestock. This review provides an overview of the current status of key enabling technologies for precision livestock farming applicable for the functional annotation of genomes.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142001413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Single-cell technology opened up a new avenue to delineate cellular status at a single-cell resolution and has become an essential tool for studying human diseases. Multiplexing allows cost-effective experiments by combining multiple samples and effectively mitigates batch effects. It starts by giving each sample a unique tag and then pooling them together for library preparation and sequencing. After sequencing, sample demultiplexing is performed based on tag detection, where cells belonging to one sample are expected to have a higher amount of the corresponding tag than cells from other samples. However, in reality, demultiplexing is not straightforward due to the noise and contamination from various sources. Successful demultiplexing depends on the efficient removal of such contamination. Here, we perform a systematic benchmark combining different normalization methods and demultiplexing approaches using real-world data and simulated datasets. We show that accounting for sequencing depth variability increases the separability between tagged and untagged cells, and the clustering-based approach outperforms existing tools. The clustering-based workflow is available as an R package from https://github.com/hwlim/hashDemux.
单细胞技术为以单细胞分辨率描述细胞状态开辟了一条新途径,已成为研究人类疾病的重要工具。多路复用技术通过将多个样本组合在一起,实现了经济高效的实验,并有效地减轻了批次效应。首先,给每个样本一个独特的标签,然后将它们集中在一起进行文库制备和测序。测序结束后,根据标签检测结果对样本进行解复用,预计属于一个样本的细胞会比其他样本的细胞含有更多的相应标签。然而,在现实中,由于各种来源的噪音和污染,解复用并不简单。成功的解复用取决于能否有效去除这些污染。在这里,我们利用真实世界数据和模拟数据集,结合不同的归一化方法和去多路复用方法,进行了一次系统的基准测试。我们的研究表明,考虑测序深度的可变性能提高标记细胞与非标记细胞之间的可分离性,基于聚类的方法优于现有工具。基于聚类的工作流程可作为 R 软件包从 https://github.com/hwlim/hashDemux 获取。
{"title":"Systematic benchmark of single-cell hashtag demultiplexing approaches reveals robust performance of a clustering-based method.","authors":"Mohammed Sayed, Yue Julia Wang, Hee-Woong Lim","doi":"10.1093/bfgp/elae039","DOIUrl":"10.1093/bfgp/elae039","url":null,"abstract":"<p><p>Single-cell technology opened up a new avenue to delineate cellular status at a single-cell resolution and has become an essential tool for studying human diseases. Multiplexing allows cost-effective experiments by combining multiple samples and effectively mitigates batch effects. It starts by giving each sample a unique tag and then pooling them together for library preparation and sequencing. After sequencing, sample demultiplexing is performed based on tag detection, where cells belonging to one sample are expected to have a higher amount of the corresponding tag than cells from other samples. However, in reality, demultiplexing is not straightforward due to the noise and contamination from various sources. Successful demultiplexing depends on the efficient removal of such contamination. Here, we perform a systematic benchmark combining different normalization methods and demultiplexing approaches using real-world data and simulated datasets. We show that accounting for sequencing depth variability increases the separability between tagged and untagged cells, and the clustering-based approach outperforms existing tools. The clustering-based workflow is available as an R package from https://github.com/hwlim/hashDemux.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142481473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brad Balderson, Mitchell Fane, Tracey J Harvey, Michael Piper, Aaron Smith, Mikael Bodén
Metastatic melanoma originates from melanocytes of the skin. Melanoma metastasis results in poor treatment prognosis for patients and is associated with epigenetic and transcriptional changes that reflect the developmental program of melanocyte differentiation from neural crest stem cells. Several studies have explored melanoma transcriptional heterogeneity using microarray, bulk and single-cell RNA-sequencing technologies to derive data-driven models of the transcriptional-state change which occurs during melanoma progression. No study has systematically examined how different models of melanoma progression derived from different data types, technologies and biological conditions compare. Here, we perform a cross-sectional study to identify averaging effects of bulk-based studies that mask and distort apparent melanoma transcriptional heterogeneity; we describe new transcriptionally distinct melanoma cell states, identify differential co-expression of genes between studies and examine the effects of predicted drug susceptibilities of different cell states between studies. Importantly, we observe considerable variability in drug-target gene expression between studies, indicating potential transcriptional plasticity of melanoma to down-regulate these drug targets and thereby circumvent treatment. Overall, observed differences in gene co-expression and predicted drug susceptibility between studies suggest bulk-based transcriptional measurements do not reliably gauge heterogeneity and that melanoma transcriptional plasticity is greater than described when studies are considered in isolation.
{"title":"Systematic analysis of the transcriptional landscape of melanoma reveals drug-target expression plasticity.","authors":"Brad Balderson, Mitchell Fane, Tracey J Harvey, Michael Piper, Aaron Smith, Mikael Bodén","doi":"10.1093/bfgp/elad055","DOIUrl":"10.1093/bfgp/elad055","url":null,"abstract":"<p><p>Metastatic melanoma originates from melanocytes of the skin. Melanoma metastasis results in poor treatment prognosis for patients and is associated with epigenetic and transcriptional changes that reflect the developmental program of melanocyte differentiation from neural crest stem cells. Several studies have explored melanoma transcriptional heterogeneity using microarray, bulk and single-cell RNA-sequencing technologies to derive data-driven models of the transcriptional-state change which occurs during melanoma progression. No study has systematically examined how different models of melanoma progression derived from different data types, technologies and biological conditions compare. Here, we perform a cross-sectional study to identify averaging effects of bulk-based studies that mask and distort apparent melanoma transcriptional heterogeneity; we describe new transcriptionally distinct melanoma cell states, identify differential co-expression of genes between studies and examine the effects of predicted drug susceptibilities of different cell states between studies. Importantly, we observe considerable variability in drug-target gene expression between studies, indicating potential transcriptional plasticity of melanoma to down-regulate these drug targets and thereby circumvent treatment. Overall, observed differences in gene co-expression and predicted drug susceptibility between studies suggest bulk-based transcriptional measurements do not reliably gauge heterogeneity and that melanoma transcriptional plasticity is greater than described when studies are considered in isolation.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139106948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}