Monica Isgut, Andrew Hornback, Yunan Luo, Asma Khimani, Neha Jain, May D. Wang
{"title":"Are gene-by-environment interactions leveraged in multi-modality neural networks for breast cancer prediction?","authors":"Monica Isgut, Andrew Hornback, Yunan Luo, Asma Khimani, Neha Jain, May D. Wang","doi":"arxiv-2407.20978","DOIUrl":null,"url":null,"abstract":"Polygenic risk scores (PRSs) can significantly enhance breast cancer risk\nprediction when combined with clinical risk factor data. While many studies\nhave explored the value-add of PRSs, little is known about the potential impact\nof gene-by-gene or gene-by-environment interactions towards enhancing the risk\ndiscrimination capabilities of multi-modal models combining PRSs with clinical\ndata. In this study, we integrated data on 318 individual genotype variants\nalong with clinical data in a neural network to explore whether gene-by-gene\n(i.e., between individual variants) and/or gene-by-environment (between\nclinical risk factors and variants) interactions could be leveraged jointly\nduring training to improve breast cancer risk prediction performance. We\nbenchmarked our approach against a baseline model combining traditional\nunivariate PRSs with clinical data in a logistic regression model and ran an\ninterpretability analysis to identify feature interactions. While our model did not demonstrate improved performance over the baseline,\nwe discovered 248 (<1%) statistically significant gene-by-gene and\ngene-by-environment interactions out of the ~53.6k possible feature pairs, the\nmost contributory of which included rs6001930 (MKL1) and rs889312 (MAP3K1),\nwith age and menopause being the most heavily interacting non-genetic risk\nfactors. We also modeled the significant interactions as a network of highly\nconnected features, suggesting that potential higher-order interactions are\ncaptured by the model. Although gene-by-environment (or gene-by-gene)\ninteractions did not enhance breast cancer risk prediction performance in\nneural networks, our study provides evidence that these interactions can be\nleveraged by these models to inform their predictions. This study represents\nthe first application of neural networks to screen for interactions impacting\nbreast cancer risk using real-world data.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"76 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.20978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Polygenic risk scores (PRSs) can significantly enhance breast cancer risk
prediction when combined with clinical risk factor data. While many studies
have explored the value-add of PRSs, little is known about the potential impact
of gene-by-gene or gene-by-environment interactions towards enhancing the risk
discrimination capabilities of multi-modal models combining PRSs with clinical
data. In this study, we integrated data on 318 individual genotype variants
along with clinical data in a neural network to explore whether gene-by-gene
(i.e., between individual variants) and/or gene-by-environment (between
clinical risk factors and variants) interactions could be leveraged jointly
during training to improve breast cancer risk prediction performance. We
benchmarked our approach against a baseline model combining traditional
univariate PRSs with clinical data in a logistic regression model and ran an
interpretability analysis to identify feature interactions. While our model did not demonstrate improved performance over the baseline,
we discovered 248 (<1%) statistically significant gene-by-gene and
gene-by-environment interactions out of the ~53.6k possible feature pairs, the
most contributory of which included rs6001930 (MKL1) and rs889312 (MAP3K1),
with age and menopause being the most heavily interacting non-genetic risk
factors. We also modeled the significant interactions as a network of highly
connected features, suggesting that potential higher-order interactions are
captured by the model. Although gene-by-environment (or gene-by-gene)
interactions did not enhance breast cancer risk prediction performance in
neural networks, our study provides evidence that these interactions can be
leveraged by these models to inform their predictions. This study represents
the first application of neural networks to screen for interactions impacting
breast cancer risk using real-world data.