高维 NGS 数据分析中特征选择和特征提取的进展综述。

IF 3.9 4区 生物学 Q1 GENETICS & HEREDITY Functional & Integrative Genomics Pub Date : 2024-08-19 DOI:10.1007/s10142-024-01415-x
Kasmika Borah, Himanish Shekhar Das, Soumita Seth, Koushik Mallick, Zubair Rahaman, Saurav Mallik
{"title":"高维 NGS 数据分析中特征选择和特征提取的进展综述。","authors":"Kasmika Borah,&nbsp;Himanish Shekhar Das,&nbsp;Soumita Seth,&nbsp;Koushik Mallick,&nbsp;Zubair Rahaman,&nbsp;Saurav Mallik","doi":"10.1007/s10142-024-01415-x","DOIUrl":null,"url":null,"abstract":"<div><p>Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.</p></div>","PeriodicalId":574,"journal":{"name":"Functional & Integrative Genomics","volume":null,"pages":null},"PeriodicalIF":3.9000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis\",\"authors\":\"Kasmika Borah,&nbsp;Himanish Shekhar Das,&nbsp;Soumita Seth,&nbsp;Koushik Mallick,&nbsp;Zubair Rahaman,&nbsp;Saurav Mallik\",\"doi\":\"10.1007/s10142-024-01415-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.</p></div>\",\"PeriodicalId\":574,\"journal\":{\"name\":\"Functional & Integrative Genomics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Functional & Integrative Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10142-024-01415-x\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Functional & Integrative Genomics","FirstCategoryId":"99","ListUrlMain":"https://link.springer.com/article/10.1007/s10142-024-01415-x","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

生物医学技术的最新进展和高维下一代测序(NGS)数据集的激增导致了数据量和数据密度的显著增长。与生物样本数量相比,NGS 高维数据的特点是具有大量基因组学、转录组学、蛋白质组学和元基因组学特征,这给降低特征维度带来了巨大挑战。NGS 数据的高维度给数据分析带来了巨大挑战,包括增加计算负担、潜在的过拟合以及解释结果的困难。特征选择和特征提取是应对这些挑战的两种关键技术,它们可以降低数据维度,从而提高模型性能、可解释性和计算效率。特征选择和特征提取可分为统计方法和机器学习方法。本研究对各种基于统计、机器学习和深度学习的特征选择和提取技术进行了全面的比较综述,这些技术是专门为人类的 NGS 和微阵列数据解读量身定制的。为了收集这些技术的信息,我们进行了全面的文献检索,重点是基于阵列和 NGS 的数据分析。针对本文调查的基于微阵列、批量 RNA-Seq 和单细胞、单细胞 RNA-Seq(scRNA-Seq)技术的数据集,探索了各种技术,包括深度学习架构、机器学习算法和统计方法。本研究概述了这些技术,强调了它们在高维 NGS 数据中的应用、优势和局限性。这篇综述为读者应用特征选择和特征提取技术提高预测模型的性能、揭示潜在的生物学模式以及深入了解大量复杂的 NGS 和微阵列数据提供了更好的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis

Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.50
自引率
3.40%
发文量
92
审稿时长
2 months
期刊介绍: Functional & Integrative Genomics is devoted to large-scale studies of genomes and their functions, including systems analyses of biological processes. The journal will provide the research community an integrated platform where researchers can share, review and discuss their findings on important biological questions that will ultimately enable us to answer the fundamental question: How do genomes work?
期刊最新文献
The Integrator complex: an emerging complex structure involved in the regulation of gene expression by targeting RNA polymerase II Genotyping by sequencing; a strategy for identification and mapping of induced mutation in newly developed wheat mutant lines Transcriptome analysis of the allotetraploids of the Dilatata group of Paspalum (Poaceae): effects of diploidization on the expression of defensin and Snakin/GASA genes Identification of lncRNAs regulating seed traits in Brassica juncea and development of a comprehensive seed omics database Identification, charectrization and genetic transformation of lignin and pectin polysaccharides through CRISPR/Cas9 in Nicotiana tobacum
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1