A demonstration of the enviromics approach to integrating environmental ‘big data’ problems

IF 8.3 1区生物学 Q1 PLANT SCIENCES New Phytologist Pub Date : 2024-08-30 DOI:10.1111/nph.20079

Andrew F. Bowerman

{"title":"A demonstration of the enviromics approach to integrating environmental ‘big data’ problems","authors":"Andrew F. Bowerman","doi":"10.1111/nph.20079","DOIUrl":null,"url":null,"abstract":"<div>With the expansion of technologies available to biological science has come an enormous rise in the amount and diverse nature of data. How we interrogate and combine ‘big data’ in different biological contexts has become the new challenge for crop biologists, be it at the genetic, phenotypic or environmental level (Pal et al., 2020). An enormous amount of environmental data is now being collected globally using instruments such as remote satellites and automated weather stations, ranging from rainfall and temperature to light intensity and soil characteristics. The challenge plant breeders now face is how to use these data effectively when evaluating the typical differential responses of genotypes to environments (Resende et al., 2021). Resende et al. (2024b; doi: 10.1111/nph.19951) begin to address the challenge of how to use ‘big data’ in an article recently published in New Phytologist. The study focuses on the application of machine learning to develop enviromic markers, providing a more precise and efficient method for predicting maize hybrid performance across diverse environments. The research aims to enhance maize crop yield and genetic selection gains, particularly in Brazil's four southernmost states. <blockquote>‘The ability to accurately predict crop performance in untested environments holds substantial promise for addressing the challenges of global food security…’\n<div></div>\n</blockquote>\n</div>\nResende et al.'s research is particularly significant given the extensive geographical range and environmental variability within which maize is cultivated. Indeed, the range of environments and environmental measures is ambitious: 183 field trials conducted across four Brazilian states, involving 79 phenotyped maize hybrids and their 85 nonphenotyped parents. Data collection was carried out from 2017 to 2021, encompassing various environmental covariates sourced from weather, soil, sensors, and satellites, adding up to over 1300 envirotypic covariates. By focusing on precise environmental characterization, the study aimed to optimize the breeding of high-yielding, stable maize hybrids.\nThe concepts of envirotypes and enviromics are relatively new, certainly as applied to plant breeding efforts (Costa-Neto & Fritsche-Neto, 2021; Resende et al., 2021). Enviromics is a field that integrates environmental data with genomics to better understand and predict the interactions between an organism's genetic makeup and its environment. This interdisciplinary approach leverages the variety of environmental data mentioned above to study how these factors collectively influence phenotypic traits. Resende et al.'s most recent study (2024b) extends their proposed methodology (Resende et al., 2024a) to use a Geographic Information Systems (GIS) platform for the purpose of high-density envirotyping. The GIS environment for this study was meticulously designed to include a geoprocessing polygon encompassing all experimental points, with a 50 km buffer zone to ensure comprehensive coverage. This setup resulted in a prediction grid comprising of 14 966 geographical bins covering the Brazilian states of São Paulo, Paraná, Santa Catarina and Rio Grande de Sul. These states represent considerable environmental variation, given São Paulo's diverse tropical to temperate climate, Paraná's subtropical climate, Santa Catarina's mix of coastal and highland climates with strong industrial and tourism sectors, and Rio Grande do Sul's temperate climate.\nA key contribution of this research is the development of Engineered Enviromic Markers (EEM), which provide a novel approach to understanding and predicting hybrid performance. By aggregating predictors into Random Forests and using hierarchical clustering, the researchers created a robust model capable of handling the complexities of G × E interactions. The study also introduced the Reaction to Engineered Enviromic Markers (REEM) model, an ensemble modelling technique that combines predictions from multiple models to enhance overall predictive accuracy. The genetic correlations derived from this approach allowed the researchers to define Breeding Zones in the environments studied, and to then predict the yield of a given genotype onto the map. Yield stability could also be predicted by the strength of the relationship between the yield of a given genotype and EEMs, with low relationships indicating greater yield stability.\nThe Random Forest methodologies employed by this study to define the EEMs are interesting as they are largely data agnostic (i.e. the nature of the different data sets included are largely irrelevant) and the modelling system allows for any new type of data to be included in future. Data types could include radar, thermal or LiDAR sensors (Newman & Furbank, 2021; Resende et al., 2024a), or, indeed new technologies not yet employed. Already this study has used data from MODIS, WorldClim, SoilGrid and NASA Power to build its collection of environmental covariates.\nThe integration of enviromics into precision breeding represents a significant advancement in agricultural science. By providing detailed environmental characterizations and leveraging machine learning techniques, Resende et al. (2024b) offer a pathway to more efficient and effective crop breeding strategies. Current approaches to understanding genotype–environment interactions, especially in diverse agronomic conditions, require crop trials in a broad range of environments. Trials can be prohibitively expensive and often aren't feasible, but the ability to collate data from the range of trials used here demonstrates the value of small trials or farm-based records for crop yield modelling. The ability to accurately predict crop performance in untested environments holds substantial promise for addressing the challenges of global food security, particularly in the face of climate change (Fig. 1).\n<figure><picture>\n<source media=\"(min-width: 1650px)\" srcset=\"/cms/asset/fdcce9c9-69bf-40ce-80ef-b4493d377c3c/nph20079-fig-0001-m.jpg\"/><img alt=\"Details are in the caption following the image\" data-lg-src=\"/cms/asset/fdcce9c9-69bf-40ce-80ef-b4493d377c3c/nph20079-fig-0001-m.jpg\" loading=\"lazy\" src=\"/cms/asset/26779424-3b1d-44ef-9eee-7877852b3cb2/nph20079-fig-0001-m.png\" title=\"Details are in the caption following the image\"/></picture><figcaption>\n<div>Fig. 1<div>Open in figure viewerPowerPoint</div>\n</div>\n<div>Remote sensing technologies: revolutionizing environmental data collection across spatial and temporal scales. The integration of these diverse, high-volume datasets into crop breeding and prediction models presents both opportunities and challenges for modern agriculture.</div>\n</figcaption>\n</figure>\nWhat is particularly interesting from the findings is the ability to predict which parental lines should be used to generate novel hybrids to maximize yield for a given area; this can include genotypes which have not been trialled in the area in question. The results of these predictions have yet to be tested or published.\nDetailed environmental data enable breeders to make more informed decisions, ultimately leading to better-performing crops tailored to specific environmental conditions. In future, it will be most compelling to see how Resende et al.'s (2024b) modelling system can be developed to include real-time data, or data through the cropping cycle to enable model updates and better predict end season yields. Expanding the methodology to more diverse crop species with different engineered enviromic markers/environmental preferences will be important to evaluate its wider value, especially given the differences in genetic variability in crop species (Swarup et al., 2021; Khoury et al., 2022).\nThe future deployment of enviromics in crop breeding methodologies, amply demonstrated by Resende et al. (2024b), will allow far better estimation of phenotypic change due to environmental variance and will strengthen predictions of crop productivity.","PeriodicalId":214,"journal":{"name":"New Phytologist","volume":null,"pages":null},"PeriodicalIF":8.3000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Phytologist","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/nph.20079","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

With the expansion of technologies available to biological science has come an enormous rise in the amount and diverse nature of data. How we interrogate and combine ‘big data’ in different biological contexts has become the new challenge for crop biologists, be it at the genetic, phenotypic or environmental level (Pal et al., 2020). An enormous amount of environmental data is now being collected globally using instruments such as remote satellites and automated weather stations, ranging from rainfall and temperature to light intensity and soil characteristics. The challenge plant breeders now face is how to use these data effectively when evaluating the typical differential responses of genotypes to environments (Resende et al., 2021). Resende et al. (2024b; doi: 10.1111/nph.19951) begin to address the challenge of how to use ‘big data’ in an article recently published in New Phytologist. The study focuses on the application of machine learning to develop enviromic markers, providing a more precise and efficient method for predicting maize hybrid performance across diverse environments. The research aims to enhance maize crop yield and genetic selection gains, particularly in Brazil's four southernmost states.

‘The ability to accurately predict crop performance in untested environments holds substantial promise for addressing the challenges of global food security…’

Resende et al.'s research is particularly significant given the extensive geographical range and environmental variability within which maize is cultivated. Indeed, the range of environments and environmental measures is ambitious: 183 field trials conducted across four Brazilian states, involving 79 phenotyped maize hybrids and their 85 nonphenotyped parents. Data collection was carried out from 2017 to 2021, encompassing various environmental covariates sourced from weather, soil, sensors, and satellites, adding up to over 1300 envirotypic covariates. By focusing on precise environmental characterization, the study aimed to optimize the breeding of high-yielding, stable maize hybrids.

The concepts of envirotypes and enviromics are relatively new, certainly as applied to plant breeding efforts (Costa-Neto & Fritsche-Neto, 2021; Resende et al., 2021). Enviromics is a field that integrates environmental data with genomics to better understand and predict the interactions between an organism's genetic makeup and its environment. This interdisciplinary approach leverages the variety of environmental data mentioned above to study how these factors collectively influence phenotypic traits. Resende et al.'s most recent study (2024b) extends their proposed methodology (Resende et al., 2024a) to use a Geographic Information Systems (GIS) platform for the purpose of high-density envirotyping. The GIS environment for this study was meticulously designed to include a geoprocessing polygon encompassing all experimental points, with a 50 km buffer zone to ensure comprehensive coverage. This setup resulted in a prediction grid comprising of 14 966 geographical bins covering the Brazilian states of São Paulo, Paraná, Santa Catarina and Rio Grande de Sul. These states represent considerable environmental variation, given São Paulo's diverse tropical to temperate climate, Paraná's subtropical climate, Santa Catarina's mix of coastal and highland climates with strong industrial and tourism sectors, and Rio Grande do Sul's temperate climate.

A key contribution of this research is the development of Engineered Enviromic Markers (EEM), which provide a novel approach to understanding and predicting hybrid performance. By aggregating predictors into Random Forests and using hierarchical clustering, the researchers created a robust model capable of handling the complexities of G × E interactions. The study also introduced the Reaction to Engineered Enviromic Markers (REEM) model, an ensemble modelling technique that combines predictions from multiple models to enhance overall predictive accuracy. The genetic correlations derived from this approach allowed the researchers to define Breeding Zones in the environments studied, and to then predict the yield of a given genotype onto the map. Yield stability could also be predicted by the strength of the relationship between the yield of a given genotype and EEMs, with low relationships indicating greater yield stability.

The Random Forest methodologies employed by this study to define the EEMs are interesting as they are largely data agnostic (i.e. the nature of the different data sets included are largely irrelevant) and the modelling system allows for any new type of data to be included in future. Data types could include radar, thermal or LiDAR sensors (Newman & Furbank, 2021; Resende et al., 2024a), or, indeed new technologies not yet employed. Already this study has used data from MODIS, WorldClim, SoilGrid and NASA Power to build its collection of environmental covariates.

The integration of enviromics into precision breeding represents a significant advancement in agricultural science. By providing detailed environmental characterizations and leveraging machine learning techniques, Resende et al. (2024b) offer a pathway to more efficient and effective crop breeding strategies. Current approaches to understanding genotype–environment interactions, especially in diverse agronomic conditions, require crop trials in a broad range of environments. Trials can be prohibitively expensive and often aren't feasible, but the ability to collate data from the range of trials used here demonstrates the value of small trials or farm-based records for crop yield modelling. The ability to accurately predict crop performance in untested environments holds substantial promise for addressing the challenges of global food security, particularly in the face of climate change (Fig. 1).

Abstract Image — **Fig. 1**
Open in figure viewerPowerPoint

Remote sensing technologies: revolutionizing environmental data collection across spatial and temporal scales. The integration of these diverse, high-volume datasets into crop breeding and prediction models presents both opportunities and challenges for modern agriculture.

What is particularly interesting from the findings is the ability to predict which parental lines should be used to generate novel hybrids to maximize yield for a given area; this can include genotypes which have not been trialled in the area in question. The results of these predictions have yet to be tested or published.

Detailed environmental data enable breeders to make more informed decisions, ultimately leading to better-performing crops tailored to specific environmental conditions. In future, it will be most compelling to see how Resende et al.'s (2024b) modelling system can be developed to include real-time data, or data through the cropping cycle to enable model updates and better predict end season yields. Expanding the methodology to more diverse crop species with different engineered enviromic markers/environmental preferences will be important to evaluate its wider value, especially given the differences in genetic variability in crop species (Swarup et al., 2021; Khoury et al., 2022).

The future deployment of enviromics in crop breeding methodologies, amply demonstrated by Resende et al. (2024b), will allow far better estimation of phenotypic change due to environmental variance and will strengthen predictions of crop productivity.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

整合环境 "大数据 "问题的环境组学方法演示

将环境组学融入精准育种是农业科学的一大进步。通过提供详细的环境特征并利用机器学习技术，Resende 等人（2024b）为更高效、更有效的作物育种战略提供了一条途径。目前了解基因型与环境相互作用的方法，尤其是在不同农艺条件下，需要在广泛的环境中进行作物试验。试验的成本可能高得令人望而却步，而且往往并不可行，但本文所使用的从一系列试验中整理数据的能力证明了小型试验或基于农场的记录对于作物产量建模的价值。在未经测试的环境中准确预测作物表现的能力为应对全球粮食安全挑战，尤其是气候变化挑战带来了巨大希望（图 1）。将这些多样化、大容量的数据集整合到作物育种和预测模型中，既为现代农业带来了机遇，也提出了挑战。研究结果中尤其令人感兴趣的是，能够预测哪些亲本品系应用于产生新型杂交种，以最大限度地提高特定地区的产量；这可能包括尚未在相关地区试种过的基因型。详细的环境数据能让育种者做出更明智的决定，最终培育出适合特定环境条件的性能更好的作物。今后，最引人注目的是研究如何开发 Resende 等人（2024b）的建模系统，以纳入实时数据或整个种植周期的数据，从而实现模型更新并更好地预测季末产量。Resende 等人（2024b）的研究充分证明，未来在作物育种方法中应用环境组学将能更好地估计环境变异引起的表型变化，并加强对作物产量的预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

New Phytologist 生物-植物科学

自引率

5.30%

发文量

728

期刊介绍： New Phytologist is an international electronic journal published 24 times a year. It is owned by the New Phytologist Foundation, a non-profit-making charitable organization dedicated to promoting plant science. The journal publishes excellent, novel, rigorous, and timely research and scholarship in plant science and its applications. The articles cover topics in five sections: Physiology & Development, Environment, Interaction, Evolution, and Transformative Plant Biotechnology. These sections encompass intracellular processes, global environmental change, and encourage cross-disciplinary approaches. The journal recognizes the use of techniques from molecular and cell biology, functional genomics, modeling, and system-based approaches in plant science. Abstracting and Indexing Information for New Phytologist includes Academic Search, AgBiotech News & Information, Agroforestry Abstracts, Biochemistry & Biophysics Citation Index, Botanical Pesticides, CAB Abstracts®, Environment Index, Global Health, and Plant Breeding Abstracts, and others.