Frederick St. Peter, Srinivas Mukund Vadrev, O. Soufan
{"title":"R400: A novel gene signature for dose prediction in radiation exposure studies in humans","authors":"Frederick St. Peter, Srinivas Mukund Vadrev, O. Soufan","doi":"10.3389/fsysb.2022.1022486","DOIUrl":null,"url":null,"abstract":"Radiation’s harmful effects on biological organisms have long been studied through mainly evaluating pathological changes in cells, tissues, or organs. Recently, there have been more accessible gene expression datasets relating to radiation exposure studies. This provides an opportunity to analyze responses at the molecular level toward revealing phenotypic differences. Biomarkers in toxicogenomics have been suggested as indicators of radiation exposure and seem to react differently to various dosages of radiation. This study proposes a predictive gene signature specific to radiation exposure and can be used in automatically diagnosing the exposure dose. In searching for a reliable gene set that will correctly identify the exposure dose, consideration needs to be given to the size of the set. For this reason, we experimented with the number of genes used for training and testing. Gene set sizes of 28, 100, 200, 300, 400, 500, 600, 700, 800, 900 and 1,000 were tested to find the size that provided the best accuracy across three datasets. Models were then trained and tested using multiple datasets in various ways, including an external validation. The dissimilarities between these datasets provide an analogy to real-world conditions where data from multiple sources are likely to have variances in format, settings, time parameters, participants, processes, and machine tolerances, so a robust training dataset from many heterogeneous samples should provide better predictability. All three datasets showed positive results with the correct classification of the radiation exposure dose. The average accuracy of all three models was 88% for gene sets of both 400 and 1,000 genes. R400 provided the best results when testing the three datasets used in this study. A literature validation of top selected genes shows high relevance of perturbations to adverse effects reported during cancer radiotherapy.","PeriodicalId":73109,"journal":{"name":"Frontiers in systems biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fsysb.2022.1022486","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Radiation’s harmful effects on biological organisms have long been studied through mainly evaluating pathological changes in cells, tissues, or organs. Recently, there have been more accessible gene expression datasets relating to radiation exposure studies. This provides an opportunity to analyze responses at the molecular level toward revealing phenotypic differences. Biomarkers in toxicogenomics have been suggested as indicators of radiation exposure and seem to react differently to various dosages of radiation. This study proposes a predictive gene signature specific to radiation exposure and can be used in automatically diagnosing the exposure dose. In searching for a reliable gene set that will correctly identify the exposure dose, consideration needs to be given to the size of the set. For this reason, we experimented with the number of genes used for training and testing. Gene set sizes of 28, 100, 200, 300, 400, 500, 600, 700, 800, 900 and 1,000 were tested to find the size that provided the best accuracy across three datasets. Models were then trained and tested using multiple datasets in various ways, including an external validation. The dissimilarities between these datasets provide an analogy to real-world conditions where data from multiple sources are likely to have variances in format, settings, time parameters, participants, processes, and machine tolerances, so a robust training dataset from many heterogeneous samples should provide better predictability. All three datasets showed positive results with the correct classification of the radiation exposure dose. The average accuracy of all three models was 88% for gene sets of both 400 and 1,000 genes. R400 provided the best results when testing the three datasets used in this study. A literature validation of top selected genes shows high relevance of perturbations to adverse effects reported during cancer radiotherapy.