{"title":"A demonstration of the enviromics approach to integrating environmental ‘big data’ problems","authors":"Andrew F. Bowerman","doi":"10.1111/nph.20079","DOIUrl":null,"url":null,"abstract":"<div>With the expansion of technologies available to biological science has come an enormous rise in the amount and diverse nature of data. How we interrogate and combine ‘big data’ in different biological contexts has become the new challenge for crop biologists, be it at the genetic, phenotypic or environmental level (Pal <i>et al</i>., <span>2020</span>). An enormous amount of environmental data is now being collected globally using instruments such as remote satellites and automated weather stations, ranging from rainfall and temperature to light intensity and soil characteristics. The challenge plant breeders now face is how to use these data effectively when evaluating the typical differential responses of genotypes to environments (Resende <i>et al</i>., <span>2021</span>). Resende <i>et al</i>. (<span>2024b</span>; doi: 10.1111/nph.19951) begin to address the challenge of how to use ‘big data’ in an article recently published in <i>New Phytologist</i>. The study focuses on the application of machine learning to develop enviromic markers, providing a more precise and efficient method for predicting maize hybrid performance across diverse environments. The research aims to enhance maize crop yield and genetic selection gains, particularly in Brazil's four southernmost states. <blockquote><p>‘The ability to accurately predict crop performance in untested environments holds substantial promise for addressing the challenges of global food security…’</p>\n<div></div>\n</blockquote>\n</div>\n<p>Resende <i>et al</i>.'s research is particularly significant given the extensive geographical range and environmental variability within which maize is cultivated. Indeed, the range of environments and environmental measures is ambitious: 183 field trials conducted across four Brazilian states, involving 79 phenotyped maize hybrids and their 85 nonphenotyped parents. Data collection was carried out from 2017 to 2021, encompassing various environmental covariates sourced from weather, soil, sensors, and satellites, adding up to over 1300 envirotypic covariates. By focusing on precise environmental characterization, the study aimed to optimize the breeding of high-yielding, stable maize hybrids.</p>\n<p>The concepts of envirotypes and enviromics are relatively new, certainly as applied to plant breeding efforts (Costa-Neto & Fritsche-Neto, <span>2021</span>; Resende <i>et al</i>., <span>2021</span>). Enviromics is a field that integrates environmental data with genomics to better understand and predict the interactions between an organism's genetic makeup and its environment. This interdisciplinary approach leverages the variety of environmental data mentioned above to study how these factors collectively influence phenotypic traits. Resende <i>et al</i>.'s most recent study (<span>2024b</span>) extends their proposed methodology (Resende <i>et al</i>., <span>2024a</span>) to use a Geographic Information Systems (GIS) platform for the purpose of high-density envirotyping. The GIS environment for this study was meticulously designed to include a geoprocessing polygon encompassing all experimental points, with a 50 km buffer zone to ensure comprehensive coverage. This setup resulted in a prediction grid comprising of 14 966 geographical bins covering the Brazilian states of São Paulo, Paraná, Santa Catarina and Rio Grande de Sul. These states represent considerable environmental variation, given São Paulo's diverse tropical to temperate climate, Paraná's subtropical climate, Santa Catarina's mix of coastal and highland climates with strong industrial and tourism sectors, and Rio Grande do Sul's temperate climate.</p>\n<p>A key contribution of this research is the development of Engineered Enviromic Markers (EEM), which provide a novel approach to understanding and predicting hybrid performance. By aggregating predictors into Random Forests and using hierarchical clustering, the researchers created a robust model capable of handling the complexities of G × E interactions. The study also introduced the Reaction to Engineered Enviromic Markers (REEM) model, an ensemble modelling technique that combines predictions from multiple models to enhance overall predictive accuracy. The genetic correlations derived from this approach allowed the researchers to define Breeding Zones in the environments studied, and to then predict the yield of a given genotype onto the map. Yield stability could also be predicted by the strength of the relationship between the yield of a given genotype and EEMs, with low relationships indicating greater yield stability.</p>\n<p>The Random Forest methodologies employed by this study to define the EEMs are interesting as they are largely data agnostic (i.e. the nature of the different data sets included are largely irrelevant) and the modelling system allows for any new type of data to be included in future. Data types could include radar, thermal or LiDAR sensors (Newman & Furbank, <span>2021</span>; Resende <i>et al</i>., <span>2024a</span>), or, indeed new technologies not yet employed. Already this study has used data from MODIS, WorldClim, SoilGrid and NASA Power to build its collection of environmental covariates.</p>\n<p>The integration of enviromics into precision breeding represents a significant advancement in agricultural science. By providing detailed environmental characterizations and leveraging machine learning techniques, Resende <i>et al</i>. (<span>2024b</span>) offer a pathway to more efficient and effective crop breeding strategies. Current approaches to understanding genotype–environment interactions, especially in diverse agronomic conditions, require crop trials in a broad range of environments. Trials can be prohibitively expensive and often aren't feasible, but the ability to collate data from the range of trials used here demonstrates the value of small trials or farm-based records for crop yield modelling. The ability to accurately predict crop performance in untested environments holds substantial promise for addressing the challenges of global food security, particularly in the face of climate change (Fig. 1).</p>\n<figure><picture>\n<source media=\"(min-width: 1650px)\" srcset=\"/cms/asset/fdcce9c9-69bf-40ce-80ef-b4493d377c3c/nph20079-fig-0001-m.jpg\"/><img alt=\"Details are in the caption following the image\" data-lg-src=\"/cms/asset/fdcce9c9-69bf-40ce-80ef-b4493d377c3c/nph20079-fig-0001-m.jpg\" loading=\"lazy\" src=\"/cms/asset/26779424-3b1d-44ef-9eee-7877852b3cb2/nph20079-fig-0001-m.png\" title=\"Details are in the caption following the image\"/></picture><figcaption>\n<div><strong>Fig. 1<span style=\"font-weight:normal\"></span></strong><div>Open in figure viewer<i aria-hidden=\"true\"></i><span>PowerPoint</span></div>\n</div>\n<div>Remote sensing technologies: revolutionizing environmental data collection across spatial and temporal scales. The integration of these diverse, high-volume datasets into crop breeding and prediction models presents both opportunities and challenges for modern agriculture.</div>\n</figcaption>\n</figure>\n<p>What is particularly interesting from the findings is the ability to predict which parental lines should be used to generate novel hybrids to maximize yield for a given area; this can include genotypes which have not been trialled in the area in question. The results of these predictions have yet to be tested or published.</p>\n<p>Detailed environmental data enable breeders to make more informed decisions, ultimately leading to better-performing crops tailored to specific environmental conditions. In future, it will be most compelling to see how Resende <i>et al</i>.'s (<span>2024b</span>) modelling system can be developed to include real-time data, or data through the cropping cycle to enable model updates and better predict end season yields. Expanding the methodology to more diverse crop species with different engineered enviromic markers/environmental preferences will be important to evaluate its wider value, especially given the differences in genetic variability in crop species (Swarup <i>et al</i>., <span>2021</span>; Khoury <i>et al</i>., <span>2022</span>).</p>\n<p>The future deployment of enviromics in crop breeding methodologies, amply demonstrated by Resende <i>et al</i>. (<span>2024b</span>), will allow far better estimation of phenotypic change due to environmental variance and will strengthen predictions of crop productivity.</p>","PeriodicalId":214,"journal":{"name":"New Phytologist","volume":null,"pages":null},"PeriodicalIF":8.3000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Phytologist","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/nph.20079","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
With the expansion of technologies available to biological science has come an enormous rise in the amount and diverse nature of data. How we interrogate and combine ‘big data’ in different biological contexts has become the new challenge for crop biologists, be it at the genetic, phenotypic or environmental level (Pal et al., 2020). An enormous amount of environmental data is now being collected globally using instruments such as remote satellites and automated weather stations, ranging from rainfall and temperature to light intensity and soil characteristics. The challenge plant breeders now face is how to use these data effectively when evaluating the typical differential responses of genotypes to environments (Resende et al., 2021). Resende et al. (2024b; doi: 10.1111/nph.19951) begin to address the challenge of how to use ‘big data’ in an article recently published in New Phytologist. The study focuses on the application of machine learning to develop enviromic markers, providing a more precise and efficient method for predicting maize hybrid performance across diverse environments. The research aims to enhance maize crop yield and genetic selection gains, particularly in Brazil's four southernmost states.
‘The ability to accurately predict crop performance in untested environments holds substantial promise for addressing the challenges of global food security…’
Resende et al.'s research is particularly significant given the extensive geographical range and environmental variability within which maize is cultivated. Indeed, the range of environments and environmental measures is ambitious: 183 field trials conducted across four Brazilian states, involving 79 phenotyped maize hybrids and their 85 nonphenotyped parents. Data collection was carried out from 2017 to 2021, encompassing various environmental covariates sourced from weather, soil, sensors, and satellites, adding up to over 1300 envirotypic covariates. By focusing on precise environmental characterization, the study aimed to optimize the breeding of high-yielding, stable maize hybrids.
The concepts of envirotypes and enviromics are relatively new, certainly as applied to plant breeding efforts (Costa-Neto & Fritsche-Neto, 2021; Resende et al., 2021). Enviromics is a field that integrates environmental data with genomics to better understand and predict the interactions between an organism's genetic makeup and its environment. This interdisciplinary approach leverages the variety of environmental data mentioned above to study how these factors collectively influence phenotypic traits. Resende et al.'s most recent study (2024b) extends their proposed methodology (Resende et al., 2024a) to use a Geographic Information Systems (GIS) platform for the purpose of high-density envirotyping. The GIS environment for this study was meticulously designed to include a geoprocessing polygon encompassing all experimental points, with a 50 km buffer zone to ensure comprehensive coverage. This setup resulted in a prediction grid comprising of 14 966 geographical bins covering the Brazilian states of São Paulo, Paraná, Santa Catarina and Rio Grande de Sul. These states represent considerable environmental variation, given São Paulo's diverse tropical to temperate climate, Paraná's subtropical climate, Santa Catarina's mix of coastal and highland climates with strong industrial and tourism sectors, and Rio Grande do Sul's temperate climate.
A key contribution of this research is the development of Engineered Enviromic Markers (EEM), which provide a novel approach to understanding and predicting hybrid performance. By aggregating predictors into Random Forests and using hierarchical clustering, the researchers created a robust model capable of handling the complexities of G × E interactions. The study also introduced the Reaction to Engineered Enviromic Markers (REEM) model, an ensemble modelling technique that combines predictions from multiple models to enhance overall predictive accuracy. The genetic correlations derived from this approach allowed the researchers to define Breeding Zones in the environments studied, and to then predict the yield of a given genotype onto the map. Yield stability could also be predicted by the strength of the relationship between the yield of a given genotype and EEMs, with low relationships indicating greater yield stability.
The Random Forest methodologies employed by this study to define the EEMs are interesting as they are largely data agnostic (i.e. the nature of the different data sets included are largely irrelevant) and the modelling system allows for any new type of data to be included in future. Data types could include radar, thermal or LiDAR sensors (Newman & Furbank, 2021; Resende et al., 2024a), or, indeed new technologies not yet employed. Already this study has used data from MODIS, WorldClim, SoilGrid and NASA Power to build its collection of environmental covariates.
The integration of enviromics into precision breeding represents a significant advancement in agricultural science. By providing detailed environmental characterizations and leveraging machine learning techniques, Resende et al. (2024b) offer a pathway to more efficient and effective crop breeding strategies. Current approaches to understanding genotype–environment interactions, especially in diverse agronomic conditions, require crop trials in a broad range of environments. Trials can be prohibitively expensive and often aren't feasible, but the ability to collate data from the range of trials used here demonstrates the value of small trials or farm-based records for crop yield modelling. The ability to accurately predict crop performance in untested environments holds substantial promise for addressing the challenges of global food security, particularly in the face of climate change (Fig. 1).
What is particularly interesting from the findings is the ability to predict which parental lines should be used to generate novel hybrids to maximize yield for a given area; this can include genotypes which have not been trialled in the area in question. The results of these predictions have yet to be tested or published.
Detailed environmental data enable breeders to make more informed decisions, ultimately leading to better-performing crops tailored to specific environmental conditions. In future, it will be most compelling to see how Resende et al.'s (2024b) modelling system can be developed to include real-time data, or data through the cropping cycle to enable model updates and better predict end season yields. Expanding the methodology to more diverse crop species with different engineered enviromic markers/environmental preferences will be important to evaluate its wider value, especially given the differences in genetic variability in crop species (Swarup et al., 2021; Khoury et al., 2022).
The future deployment of enviromics in crop breeding methodologies, amply demonstrated by Resende et al. (2024b), will allow far better estimation of phenotypic change due to environmental variance and will strengthen predictions of crop productivity.
期刊介绍:
New Phytologist is an international electronic journal published 24 times a year. It is owned by the New Phytologist Foundation, a non-profit-making charitable organization dedicated to promoting plant science. The journal publishes excellent, novel, rigorous, and timely research and scholarship in plant science and its applications. The articles cover topics in five sections: Physiology & Development, Environment, Interaction, Evolution, and Transformative Plant Biotechnology. These sections encompass intracellular processes, global environmental change, and encourage cross-disciplinary approaches. The journal recognizes the use of techniques from molecular and cell biology, functional genomics, modeling, and system-based approaches in plant science. Abstracting and Indexing Information for New Phytologist includes Academic Search, AgBiotech News & Information, Agroforestry Abstracts, Biochemistry & Biophysics Citation Index, Botanical Pesticides, CAB Abstracts®, Environment Index, Global Health, and Plant Breeding Abstracts, and others.