基于自动机器学习的重组热点预测。

IF 4.7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Journal of Molecular Biology Pub Date : 2024-06-12 DOI:10.1016/j.jmb.2024.168653
Dong-Xin Ye, Jun-Wen Yu, Rui Li, Yu-Duo Hao, Tian-Yu Wang, Hui Yang, Hui Ding
{"title":"基于自动机器学习的重组热点预测。","authors":"Dong-Xin Ye, Jun-Wen Yu, Rui Li, Yu-Duo Hao, Tian-Yu Wang, Hui Yang, Hui Ding","doi":"10.1016/j.jmb.2024.168653","DOIUrl":null,"url":null,"abstract":"<p><p>Meiotic recombination plays a pivotal role in genetic evolution. Genetic variation induced by recombination is a crucial factor in generating biodiversity and a driving force for evolution. At present, the development of recombination hotspot prediction methods has encountered challenges related to insufficient feature extraction and limited generalization capabilities. This paper focused on the research of recombination hotspot prediction methods. We explored deep learning-based recombination hotspot prediction and scrutinized the shortcomings of prevalent models in addressing the challenge of recombination hotspot prediction. To addressing these deficiencies, an automated machine learning approach was utilized to construct recombination hotspot prediction model. The model combined sequence information with physicochemical properties by employing TF-IDF-Kmer and DNA composition components to acquire more effective feature data. Experimental results validate the effectiveness of the feature extraction method and automated machine learning technology used in this study. The final model was validated on three distinct datasets and yielded accuracy rates of 97.14%, 79.71%, and 98.73%, surpassing the current leading models by 2%, 2.56%, and 4%, respectively. In addition, we incorporated tools such as SHAP and AutoGluon to analyze the interpretability of black-box models, delved into the impact of individual features on the results, and investigated the reasons behind misclassification of samples. Finally, an application of recombination hotspot prediction was established to facilitate easy access to necessary information and tools for researchers. The research outcomes of this paper underscore the enormous potential of automated machine learning methods in gene sequence prediction.</p>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Prediction of Recombination Hotspot Based on Automated Machine Learning.\",\"authors\":\"Dong-Xin Ye, Jun-Wen Yu, Rui Li, Yu-Duo Hao, Tian-Yu Wang, Hui Yang, Hui Ding\",\"doi\":\"10.1016/j.jmb.2024.168653\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Meiotic recombination plays a pivotal role in genetic evolution. Genetic variation induced by recombination is a crucial factor in generating biodiversity and a driving force for evolution. At present, the development of recombination hotspot prediction methods has encountered challenges related to insufficient feature extraction and limited generalization capabilities. This paper focused on the research of recombination hotspot prediction methods. We explored deep learning-based recombination hotspot prediction and scrutinized the shortcomings of prevalent models in addressing the challenge of recombination hotspot prediction. To addressing these deficiencies, an automated machine learning approach was utilized to construct recombination hotspot prediction model. The model combined sequence information with physicochemical properties by employing TF-IDF-Kmer and DNA composition components to acquire more effective feature data. Experimental results validate the effectiveness of the feature extraction method and automated machine learning technology used in this study. The final model was validated on three distinct datasets and yielded accuracy rates of 97.14%, 79.71%, and 98.73%, surpassing the current leading models by 2%, 2.56%, and 4%, respectively. In addition, we incorporated tools such as SHAP and AutoGluon to analyze the interpretability of black-box models, delved into the impact of individual features on the results, and investigated the reasons behind misclassification of samples. Finally, an application of recombination hotspot prediction was established to facilitate easy access to necessary information and tools for researchers. The research outcomes of this paper underscore the enormous potential of automated machine learning methods in gene sequence prediction.</p>\",\"PeriodicalId\":369,\"journal\":{\"name\":\"Journal of Molecular Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Molecular Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jmb.2024.168653\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.jmb.2024.168653","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

减数分裂重组在基因进化中起着举足轻重的作用。重组诱导的遗传变异是产生生物多样性的关键因素,也是进化的驱动力。目前,重组热点预测方法的发展遇到了特征提取不足、泛化能力有限等挑战。本文聚焦于重组热点预测方法的研究。我们探索了基于深度学习的重组热点预测方法,并仔细研究了现有模型在应对重组热点预测挑战时存在的不足。针对这些不足,我们利用自动机器学习方法构建了重组热点预测模型。该模型通过使用 TF-IDF-Kmer 和 DNA 组成成分,将序列信息与理化特性相结合,以获取更有效的特征数据。实验结果验证了本研究中使用的特征提取方法和自动机器学习技术的有效性。最终模型在三个不同的数据集上进行了验证,准确率分别为 97.14%、79.71% 和 98.73%,分别比目前的领先模型高出 2%、2.56% 和 4%。此外,我们还利用 SHAP 和 AutoGluon 等工具分析了黑盒模型的可解释性,深入研究了单个特征对结果的影响,并调查了样本分类错误背后的原因。最后,建立了重组热点预测网站,方便研究人员获取必要的信息和工具。本文的研究成果凸显了自动机器学习方法在基因序列预测中的巨大潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The Prediction of Recombination Hotspot Based on Automated Machine Learning.

Meiotic recombination plays a pivotal role in genetic evolution. Genetic variation induced by recombination is a crucial factor in generating biodiversity and a driving force for evolution. At present, the development of recombination hotspot prediction methods has encountered challenges related to insufficient feature extraction and limited generalization capabilities. This paper focused on the research of recombination hotspot prediction methods. We explored deep learning-based recombination hotspot prediction and scrutinized the shortcomings of prevalent models in addressing the challenge of recombination hotspot prediction. To addressing these deficiencies, an automated machine learning approach was utilized to construct recombination hotspot prediction model. The model combined sequence information with physicochemical properties by employing TF-IDF-Kmer and DNA composition components to acquire more effective feature data. Experimental results validate the effectiveness of the feature extraction method and automated machine learning technology used in this study. The final model was validated on three distinct datasets and yielded accuracy rates of 97.14%, 79.71%, and 98.73%, surpassing the current leading models by 2%, 2.56%, and 4%, respectively. In addition, we incorporated tools such as SHAP and AutoGluon to analyze the interpretability of black-box models, delved into the impact of individual features on the results, and investigated the reasons behind misclassification of samples. Finally, an application of recombination hotspot prediction was established to facilitate easy access to necessary information and tools for researchers. The research outcomes of this paper underscore the enormous potential of automated machine learning methods in gene sequence prediction.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Molecular Biology
Journal of Molecular Biology 生物-生化与分子生物学
CiteScore
11.30
自引率
1.80%
发文量
412
审稿时长
28 days
期刊介绍: Journal of Molecular Biology (JMB) provides high quality, comprehensive and broad coverage in all areas of molecular biology. The journal publishes original scientific research papers that provide mechanistic and functional insights and report a significant advance to the field. The journal encourages the submission of multidisciplinary studies that use complementary experimental and computational approaches to address challenging biological questions. Research areas include but are not limited to: Biomolecular interactions, signaling networks, systems biology; Cell cycle, cell growth, cell differentiation; Cell death, autophagy; Cell signaling and regulation; Chemical biology; Computational biology, in combination with experimental studies; DNA replication, repair, and recombination; Development, regenerative biology, mechanistic and functional studies of stem cells; Epigenetics, chromatin structure and function; Gene expression; Membrane processes, cell surface proteins and cell-cell interactions; Methodological advances, both experimental and theoretical, including databases; Microbiology, virology, and interactions with the host or environment; Microbiota mechanistic and functional studies; Nuclear organization; Post-translational modifications, proteomics; Processing and function of biologically important macromolecules and complexes; Molecular basis of disease; RNA processing, structure and functions of non-coding RNAs, transcription; Sorting, spatiotemporal organization, trafficking; Structural biology; Synthetic biology; Translation, protein folding, chaperones, protein degradation and quality control.
期刊最新文献
Determinants in the HTLV-1 capsid major homology region that are critical for virus particle assembly. Pim1 is critical in T-cell-independent B-cell response and MAPK activation in B cells. Translation complex profile sequencing allows discrimination of leaky scanning and reinitiation in upstream open reading frame-controlled translation. Chromatin Transcription Elongation - A Structural Perspective. A nanobody toolbox for recognizing distinct epitopes on Cas9.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1