Yixian Quah, Soontag Jung, Jireh Yi-Le Chan, Onju Ham, Ji-Seong Jeong, Sangyun Kim, Woojin Kim, Seung-Chun Park, Seung-Jin Lee, Wook-Joon Yu
{"title":"Predictive biomarkers for embryotoxicity: a machine learning approach to mitigating multicollinearity in RNA-Seq","authors":"Yixian Quah, Soontag Jung, Jireh Yi-Le Chan, Onju Ham, Ji-Seong Jeong, Sangyun Kim, Woojin Kim, Seung-Chun Park, Seung-Jin Lee, Wook-Joon Yu","doi":"10.1007/s00204-024-03852-w","DOIUrl":null,"url":null,"abstract":"<div><p>Multicollinearity, characterized by significant co-expression patterns among genes, often occurs in high-throughput expression data, potentially impacting the predictive model’s reliability. This study examined multicollinearity among closely related genes, particularly in RNA-Seq data obtained from embryoid bodies (EB) exposed to 5-fluorouracil perturbation to identify genes associated with embryotoxicity. Six genes—<i>Dppa5a</i>, <i>Gdf3</i>, <i>Zfp42</i>, <i>Meis1</i>, <i>Hoxa2</i>, and <i>Hoxb1</i>—emerged as candidates based on domain knowledge and were validated using qPCR in EBs perturbed by 39 test substances. We conducted correlation studies and utilized the variance inflation factor (VIF) to examine the existence of multicollinearity among the genes. Recursive feature elimination with cross-validation (RFECV) ranked <i>Zfp42</i> and <i>Hoxb1</i> as the top two among the seven features considered, identifying them as potential early embryotoxicity assessment biomarkers. As a result, a <i>t</i> test assessing the statistical significance of this two-feature prediction model yielded a <i>p</i> value of 0.0044, confirming the successful reduction of redundancies and multicollinearity through RFECV. Our study presents a systematic methodology for using machine learning techniques in transcriptomics data analysis, enhancing the discovery of potential reporter gene candidates for embryotoxicity screening research, and improving the predictive model's predictive accuracy and feasibility while reducing financial and time constraints.</p></div>","PeriodicalId":8329,"journal":{"name":"Archives of Toxicology","volume":"98 12","pages":"4093 - 4105"},"PeriodicalIF":4.8000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives of Toxicology","FirstCategoryId":"3","ListUrlMain":"https://link.springer.com/article/10.1007/s00204-024-03852-w","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TOXICOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Multicollinearity, characterized by significant co-expression patterns among genes, often occurs in high-throughput expression data, potentially impacting the predictive model’s reliability. This study examined multicollinearity among closely related genes, particularly in RNA-Seq data obtained from embryoid bodies (EB) exposed to 5-fluorouracil perturbation to identify genes associated with embryotoxicity. Six genes—Dppa5a, Gdf3, Zfp42, Meis1, Hoxa2, and Hoxb1—emerged as candidates based on domain knowledge and were validated using qPCR in EBs perturbed by 39 test substances. We conducted correlation studies and utilized the variance inflation factor (VIF) to examine the existence of multicollinearity among the genes. Recursive feature elimination with cross-validation (RFECV) ranked Zfp42 and Hoxb1 as the top two among the seven features considered, identifying them as potential early embryotoxicity assessment biomarkers. As a result, a t test assessing the statistical significance of this two-feature prediction model yielded a p value of 0.0044, confirming the successful reduction of redundancies and multicollinearity through RFECV. Our study presents a systematic methodology for using machine learning techniques in transcriptomics data analysis, enhancing the discovery of potential reporter gene candidates for embryotoxicity screening research, and improving the predictive model's predictive accuracy and feasibility while reducing financial and time constraints.
期刊介绍:
Archives of Toxicology provides up-to-date information on the latest advances in toxicology. The journal places particular emphasis on studies relating to defined effects of chemicals and mechanisms of toxicity, including toxic activities at the molecular level, in humans and experimental animals. Coverage includes new insights into analysis and toxicokinetics and into forensic toxicology. Review articles of general interest to toxicologists are an additional important feature of the journal.