Pub Date : 2024-07-17DOI: 10.1093/insilicoplants/diae012
Mariana V Chiozza, Kyle A. Parmley, W. Schapaugh, A. R. Asebedo, Asheesh K. Singh, Fernando E Miguez
High-throughput crop phenotyping (HTP) in soybean [Glycine max L. (Merr.)] has been used to estimate seed yield with varying degrees of accuracy. Research in this area typically makes use of different machine learning approaches to predict seed yield based on crop images with a strong focus on analytics. On the other hand, a significant part of the soybean breeding community still utilizes linear approaches to relate canopy traits and seed yield relying on parsimony. Our research attempted to address the limitations related to interpretability, scope and system comprehension inherent in previous modelling approaches. We utilized a combination of empirical and simulated data to augment the experimental footprint as well as to explore the combined effects of genetics (G), environments (E) and management (M). We use flexible functions without assuming a pre-determined response between canopy traits and seed yield. Factors such as soybean maturity date, duration of vegetative and reproductive periods, harvest index (HI), potential leaf size, planting date and plant population affected the shape of the canopy-seed yield relationship as well as the canopy optimum values at which selection of high yielding genotypes should be conducted. This work demonstrates that there are avenues for improved application of HTP in soybean breeding programs if similar modelling approaches are considered.
大豆[Glycine max L. (Merr.)]的高通量作物表型(HTP)已被用于估算种子产量,准确度各不相同。该领域的研究通常使用不同的机器学习方法,根据作物图像预测种子产量,重点放在分析上。另一方面,大豆育种界仍有很大一部分人利用线性方法将冠层性状与种子产量联系起来,并依赖于解析性。我们的研究试图解决以往建模方法固有的可解释性、范围和系统理解方面的局限性。我们利用经验数据和模拟数据相结合的方法来增强实验足迹,并探索遗传(G)、环境(E)和管理(M)的综合效应。我们使用灵活的函数,不假定冠层性状与种子产量之间存在预先确定的反应。大豆成熟期、无性繁殖期和生殖期的持续时间、收获指数(HI)、潜在叶片大小、播种日期和植株数量等因素都会影响冠层-种子产量关系的形状以及冠层最佳值,在冠层最佳值上应进行高产基因型的筛选。这项工作表明,如果考虑采用类似的建模方法,HTP 在大豆育种计划中的应用还有改进的余地。
{"title":"Changes in the leaf area-seed yield relationship in soybean driven by genetic, management and environments: Implications for High-Throughput Phenotyping","authors":"Mariana V Chiozza, Kyle A. Parmley, W. Schapaugh, A. R. Asebedo, Asheesh K. Singh, Fernando E Miguez","doi":"10.1093/insilicoplants/diae012","DOIUrl":"https://doi.org/10.1093/insilicoplants/diae012","url":null,"abstract":"\u0000 High-throughput crop phenotyping (HTP) in soybean [Glycine max L. (Merr.)] has been used to estimate seed yield with varying degrees of accuracy. Research in this area typically makes use of different machine learning approaches to predict seed yield based on crop images with a strong focus on analytics. On the other hand, a significant part of the soybean breeding community still utilizes linear approaches to relate canopy traits and seed yield relying on parsimony. Our research attempted to address the limitations related to interpretability, scope and system comprehension inherent in previous modelling approaches. We utilized a combination of empirical and simulated data to augment the experimental footprint as well as to explore the combined effects of genetics (G), environments (E) and management (M). We use flexible functions without assuming a pre-determined response between canopy traits and seed yield. Factors such as soybean maturity date, duration of vegetative and reproductive periods, harvest index (HI), potential leaf size, planting date and plant population affected the shape of the canopy-seed yield relationship as well as the canopy optimum values at which selection of high yielding genotypes should be conducted. This work demonstrates that there are avenues for improved application of HTP in soybean breeding programs if similar modelling approaches are considered.","PeriodicalId":36138,"journal":{"name":"in silico Plants","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141830481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-03DOI: 10.1093/insilicoplants/diae007
Juliana Simas Coutinho Barbosa, Wheaton L. Schroeder, P. Suthers, Sara S. Jawdy, Jin-Gui Chen, W. Muchero, C. Maranas
Populus trichocarpa (poplar) is a fast-growing model tree whose lignocellulosic biomass is a promising biofuel feedstock. Enhancing its viability and yield in non-arable drought-prone lands can reduce biomass cost and accelerate adoption as a biofuel crop. Data from extensive -omics and phenotypic studies were leveraged herein to reconstruct a multi-tissue (root, stem, and leaf) genome-scale model (GSM) of poplar, iPotri3463, encompassing 14,360 reactions, 12,402 metabolites, and 3,463 genes. Two condition-specific GSMs were extracted from iPotri3463: iPotri3016C (control) and iPotri2999D (drought), supported by condition-specific transcript levels and reaction essentiality for growth. Physiological constraints consistent with experimental measurements of drought-stressed plants were imposed to growth, photorespiration, and carbon assimilation rates. Calculated increased flux capacity through the violaxanthin cycle and GABA biosynthetic pathways agree with established key strategies for improving drought tolerance. Differential gene expression analysis was performed on existing transcriptomes of poplar under different watering regimes. Computational flux knockdown was applied to reactions with increased flux capacity under drought which were associated with at least one downregulated gene. Several such reactions were essential for maintaining observed biomass yield and their associated genes are candidates for overexpression to improve drought tolerance. Glutamine synthetase is one whose overexpression in poplar confirms in silico predictions. However, the two most promising candidates are genes encoding ferulate-5-hydroxylase, Potri.007G016400 and Potri.005G117500, as their overexpression in other plant species led to demonstrably improved drought tolerance while previous overexpression in poplar reduced biomass recalcitrance. iPotri3463 is the first poplar-specific whole-plant GSM and the second one available for a woody plant.
{"title":"A Multi-Tissue Genome-Scale Model of Populus trichocarpa Elucidates Overexpression Targets for Improving Drought Tolerance","authors":"Juliana Simas Coutinho Barbosa, Wheaton L. Schroeder, P. Suthers, Sara S. Jawdy, Jin-Gui Chen, W. Muchero, C. Maranas","doi":"10.1093/insilicoplants/diae007","DOIUrl":"https://doi.org/10.1093/insilicoplants/diae007","url":null,"abstract":"\u0000 Populus trichocarpa (poplar) is a fast-growing model tree whose lignocellulosic biomass is a promising biofuel feedstock. Enhancing its viability and yield in non-arable drought-prone lands can reduce biomass cost and accelerate adoption as a biofuel crop. Data from extensive -omics and phenotypic studies were leveraged herein to reconstruct a multi-tissue (root, stem, and leaf) genome-scale model (GSM) of poplar, iPotri3463, encompassing 14,360 reactions, 12,402 metabolites, and 3,463 genes. Two condition-specific GSMs were extracted from iPotri3463: iPotri3016C (control) and iPotri2999D (drought), supported by condition-specific transcript levels and reaction essentiality for growth. Physiological constraints consistent with experimental measurements of drought-stressed plants were imposed to growth, photorespiration, and carbon assimilation rates. Calculated increased flux capacity through the violaxanthin cycle and GABA biosynthetic pathways agree with established key strategies for improving drought tolerance. Differential gene expression analysis was performed on existing transcriptomes of poplar under different watering regimes. Computational flux knockdown was applied to reactions with increased flux capacity under drought which were associated with at least one downregulated gene. Several such reactions were essential for maintaining observed biomass yield and their associated genes are candidates for overexpression to improve drought tolerance. Glutamine synthetase is one whose overexpression in poplar confirms in silico predictions. However, the two most promising candidates are genes encoding ferulate-5-hydroxylase, Potri.007G016400 and Potri.005G117500, as their overexpression in other plant species led to demonstrably improved drought tolerance while previous overexpression in poplar reduced biomass recalcitrance. iPotri3463 is the first poplar-specific whole-plant GSM and the second one available for a woody plant.","PeriodicalId":36138,"journal":{"name":"in silico Plants","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141270670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-09DOI: 10.1093/insilicoplants/diae004
F. Grisafi, S. Tombesi, D. Farinelli, E. Costes, J.B. Durand, F. Boudon
Hazelnut (Corylus avellana) cultivation is increasing worldwide. A 3D model of its structure could improve the managerial techniques such as pruning. This study aims to analyse, over two successive years, hazelnut architectural development to implement a functional structural plant model. 104 one-year-old shoots of own-rooted hazelnut trees were selected and analyzed in winter 2020 and 2021. Exploratory analyses, generalized linear models, and multinomial regression models were used to describe the architectural processes. The existence of sylleptic shoots on hazelnut one-year-old shoots, characterized by the presence of the male inflorescence on apical position, was detected. Along proleptic shoots the branching pattern was described by (1) blind nodes located in the proximal part (2) sylleptic shoots and mixed buds in the median part (3) vegetative buds in the distal part. Apical bud died during the growing season, suggesting that Tonda di Giffoni has a sympodial branching. The models revealed dependencies among buds located at the same node, in the case of proleptic shoots. Especially, the probability of a bud to burst depended on both its type (i.e., mixed or vegetative) and the presence of other buds, either mixed or vegetative. Based on these local models and on a flow diagram, which defines the steps that lead to the construction of hazelnut tree architecture, a first functional-structural plant model of hazelnut tree architecture was built. Further experiments will be needed and should be repeated over following years to extend this study toward the juvenility phase and tree architecture over time.
{"title":"Modelling the architecture of hazelnut (Corylus avellana) Tonda di Giffoni over two successive years","authors":"F. Grisafi, S. Tombesi, D. Farinelli, E. Costes, J.B. Durand, F. Boudon","doi":"10.1093/insilicoplants/diae004","DOIUrl":"https://doi.org/10.1093/insilicoplants/diae004","url":null,"abstract":"\u0000 Hazelnut (Corylus avellana) cultivation is increasing worldwide. A 3D model of its structure could improve the managerial techniques such as pruning. This study aims to analyse, over two successive years, hazelnut architectural development to implement a functional structural plant model. 104 one-year-old shoots of own-rooted hazelnut trees were selected and analyzed in winter 2020 and 2021. Exploratory analyses, generalized linear models, and multinomial regression models were used to describe the architectural processes. The existence of sylleptic shoots on hazelnut one-year-old shoots, characterized by the presence of the male inflorescence on apical position, was detected. Along proleptic shoots the branching pattern was described by (1) blind nodes located in the proximal part (2) sylleptic shoots and mixed buds in the median part (3) vegetative buds in the distal part. Apical bud died during the growing season, suggesting that Tonda di Giffoni has a sympodial branching. The models revealed dependencies among buds located at the same node, in the case of proleptic shoots. Especially, the probability of a bud to burst depended on both its type (i.e., mixed or vegetative) and the presence of other buds, either mixed or vegetative. Based on these local models and on a flow diagram, which defines the steps that lead to the construction of hazelnut tree architecture, a first functional-structural plant model of hazelnut tree architecture was built. Further experiments will be needed and should be repeated over following years to extend this study toward the juvenility phase and tree architecture over time.","PeriodicalId":36138,"journal":{"name":"in silico Plants","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140996763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-21DOI: 10.1093/insilicoplants/diae002
Shuangwei Li, W. van der Werf, Fang Gou, Junqi Zhu, Herman N C Berghuijs, Hu Zhou, Yan Guo, B. Li, Yuntao Ma, J. Evers
Dealing with heterogeneity in leaf canopies when calculating light interception per species in a mixed canopy is a challenge. Goudriaan developed a computationally simple, though conceptually sophisticated, model for light interception in strip canopies, which can be reasonably represented as “blocks”, such as vineyards and crop rows. This model is widely used, but there is no independent verification of the model. Hence, we developed a comparison of light interception calculations with Goudriaan’s model and with detailed spatially explicit three-dimensional functional-structural plant models (FSPM) of maize in which plant architecture can be represented explicitly. Two models were developed, one with small randomly oriented leaves in blocks, similar to Goudriaan’s assumption, which we refer to as the intermediate model (IM), and another with a realistic representation of individual plants with stems and leaves having shape, orientation, etc, referred as FSPM. In IM and FSPM, light interception was calculated using ray tracing. In Goudriaan’s model, the light extinction coefficient (k), including both its daily and seasonal average values, was generated using the FSPM. Correspondence between the three models was excellent in terms of light capture for different levels of crop height, leaf area and uniformity, with the difference less than 3.3%. The results are strong support for the use of Goudriaan's summary model for calculating light interception in strip canopies.
{"title":"An evaluation of Goudriaan's summary model for light interception in strip canopies, using functional-structural plant models","authors":"Shuangwei Li, W. van der Werf, Fang Gou, Junqi Zhu, Herman N C Berghuijs, Hu Zhou, Yan Guo, B. Li, Yuntao Ma, J. Evers","doi":"10.1093/insilicoplants/diae002","DOIUrl":"https://doi.org/10.1093/insilicoplants/diae002","url":null,"abstract":"\u0000 Dealing with heterogeneity in leaf canopies when calculating light interception per species in a mixed canopy is a challenge. Goudriaan developed a computationally simple, though conceptually sophisticated, model for light interception in strip canopies, which can be reasonably represented as “blocks”, such as vineyards and crop rows. This model is widely used, but there is no independent verification of the model. Hence, we developed a comparison of light interception calculations with Goudriaan’s model and with detailed spatially explicit three-dimensional functional-structural plant models (FSPM) of maize in which plant architecture can be represented explicitly. Two models were developed, one with small randomly oriented leaves in blocks, similar to Goudriaan’s assumption, which we refer to as the intermediate model (IM), and another with a realistic representation of individual plants with stems and leaves having shape, orientation, etc, referred as FSPM. In IM and FSPM, light interception was calculated using ray tracing. In Goudriaan’s model, the light extinction coefficient (k), including both its daily and seasonal average values, was generated using the FSPM. Correspondence between the three models was excellent in terms of light capture for different levels of crop height, leaf area and uniformity, with the difference less than 3.3%. The results are strong support for the use of Goudriaan's summary model for calculating light interception in strip canopies.","PeriodicalId":36138,"journal":{"name":"in silico Plants","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140444592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12DOI: 10.1093/insilicoplants/diad023
Simone Bregaglio, Giulia Carriero, Roberta Calone, Maddalena Romano, Sofia Bajocco
Simulation models are primary tools for synthesizing plant physiological knowledge, supporting farmers’ decisions, and predicting crop yields and functioning under climate change. The conventional approach within the scientific community consists of disseminating model outcomes through articles and technical reports, often impeding the share of knowledge among science, policy, and society. This work presents the mandala (modeled and abstracted plant), a simulation model translating crop phenology and physiology as a function of environmental drivers into symbols and sounds, focusing on plant responses to cold, drought, and heat stresses. The mandala has been realized with object-oriented (C#) and visual (vvvv) programming, and the source code is free for extension and improvement. We tested the mandala in six heterogeneous climates to show the potential to convey essential information on maize and wheat growth and responses to abiotic stresses. Despite lacking in artistic refinement, this work attempts to illustrate that visual and sound art can serve as unconventional means of disseminating crop model insights while showing their potential to enhance the breadth of information delivered to the public.
{"title":"Playing a crop simulation model using symbols and sounds: the ‘mandala’","authors":"Simone Bregaglio, Giulia Carriero, Roberta Calone, Maddalena Romano, Sofia Bajocco","doi":"10.1093/insilicoplants/diad023","DOIUrl":"https://doi.org/10.1093/insilicoplants/diad023","url":null,"abstract":"\u0000 Simulation models are primary tools for synthesizing plant physiological knowledge, supporting farmers’ decisions, and predicting crop yields and functioning under climate change. The conventional approach within the scientific community consists of disseminating model outcomes through articles and technical reports, often impeding the share of knowledge among science, policy, and society. This work presents the mandala (modeled and abstracted plant), a simulation model translating crop phenology and physiology as a function of environmental drivers into symbols and sounds, focusing on plant responses to cold, drought, and heat stresses. The mandala has been realized with object-oriented (C#) and visual (vvvv) programming, and the source code is free for extension and improvement. We tested the mandala in six heterogeneous climates to show the potential to convey essential information on maize and wheat growth and responses to abiotic stresses. Despite lacking in artistic refinement, this work attempts to illustrate that visual and sound art can serve as unconventional means of disseminating crop model insights while showing their potential to enhance the breadth of information delivered to the public.","PeriodicalId":36138,"journal":{"name":"in silico Plants","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139010107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-08DOI: 10.1093/insilicoplants/diad022
D. Helmrich, F. Bauer, Mona Giraud, Andrea Schnepf, J. Göbbert, H. Scharr, E. Hvannberg, Morris Riedel
In plant science it is an established method to obtain structural parameters of crops using image analysis. In recent years, deep learning techniques have improved the underlying processes significantly. However, since data acquisition is time and resource consuming, reliable training data is currently limiting. To overcome this bottleneck, synthetic data is a promising option for not only enabling a higher order of correctness by offering more training data, but also for validation of results. However, the creation of synthetic data is complex and requires extensive knowledge in Computer Graphics, Visualization and High-Performance Computing. We address this by introducing Synavis, a framework that allows users to train networks on real-time generated data. We created a pipeline that integrates realistic plant structures, simulated by the functional-structural plant model framework CPlantBox, into the game engine Unreal Engine. For this purpose, we needed to extend CPlantBox by introducing a new leaf geometrization that results in realistic leafs. All parameterized geometries of the plant are directly provided by the plant model. In the Unreal Engine, it is possible to alter the environment. WebRTC enables the streaming of the final image composition, which in turn can then be directly used to train deep neural networks to increase parameter robustness, for further plant trait detection and validation of original parameters. We enable user-friendly ready-to-use pipelines, providing virtual plant experiment and field visualizations, a python-binding library to access synthetic data, and a ready-to-run example to train models.
{"title":"A Scalable Pipeline to Create Synthetic Datasets from Functional-Structural Plant Models for Deep Learning","authors":"D. Helmrich, F. Bauer, Mona Giraud, Andrea Schnepf, J. Göbbert, H. Scharr, E. Hvannberg, Morris Riedel","doi":"10.1093/insilicoplants/diad022","DOIUrl":"https://doi.org/10.1093/insilicoplants/diad022","url":null,"abstract":"\u0000 In plant science it is an established method to obtain structural parameters of crops using image analysis. In recent years, deep learning techniques have improved the underlying processes significantly. However, since data acquisition is time and resource consuming, reliable training data is currently limiting. To overcome this bottleneck, synthetic data is a promising option for not only enabling a higher order of correctness by offering more training data, but also for validation of results. However, the creation of synthetic data is complex and requires extensive knowledge in Computer Graphics, Visualization and High-Performance Computing. We address this by introducing Synavis, a framework that allows users to train networks on real-time generated data. We created a pipeline that integrates realistic plant structures, simulated by the functional-structural plant model framework CPlantBox, into the game engine Unreal Engine. For this purpose, we needed to extend CPlantBox by introducing a new leaf geometrization that results in realistic leafs. All parameterized geometries of the plant are directly provided by the plant model. In the Unreal Engine, it is possible to alter the environment. WebRTC enables the streaming of the final image composition, which in turn can then be directly used to train deep neural networks to increase parameter robustness, for further plant trait detection and validation of original parameters.\u0000 We enable user-friendly ready-to-use pipelines, providing virtual plant experiment and field visualizations, a python-binding library to access synthetic data, and a ready-to-run example to train models.","PeriodicalId":36138,"journal":{"name":"in silico Plants","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138587674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-11DOI: 10.1093/insilicoplants/diad021
Serena Lotreck, Kenia Segura Abá, Melissa Lehti-Shiu, Abigail Seeger, Brianna N I Brown, Thilanka Ranaweera, Ally Schumacher, Mohammad Ghassemi, Shin-Han Shiu
Abstract Natural language processing (NLP) techniques can enhance our ability to interpret plant science literature. Many state-of-the-art algorithms for NLP tasks require high-quality labeled data in the target domain, in which entities like genes and proteins, as well as the relationships between entities are labeled according to a set of annotation guidelines. While there exist such datasets for other domains, these resources need development in the plant sciences. Here, we present the Plant ScIenCe KnowLedgE Graph (PICKLE) corpus, a collection of 250 plant science abstracts annotated with entities and relations, along with its annotation guidelines. The annotation guidelines were refined by iterative rounds of overlapping annotations, in which inter-annotator agreement was leveraged to improve the guidelines. To demonstrate PICKLE’s utility, we evaluated the performance of pretrained models from other domains and trained a new, PICKLE-based model for entity and relation extraction. The PICKLE-trained models exhibit the second-highest in-domain entity performance of all models evaluated, as well as a relation extraction performance that is on par with other models. Additionally, we found that computer science-domain models outperformed models trained on a biomedical corpus (GENIA) in entity extraction, which was unexpected given the intuition that biomedical literature is more similar to PICKLE than computer science. Upon further exploration, we established that the inclusion of new types on which the models were not trained substantially impacts performance. The PICKLE corpus is therefore an important contribution to training resources for entity and relation extraction in the plant sciences.
{"title":"In a PICKLE: A gold standard entity and relation corpus for the molecular plant sciences","authors":"Serena Lotreck, Kenia Segura Abá, Melissa Lehti-Shiu, Abigail Seeger, Brianna N I Brown, Thilanka Ranaweera, Ally Schumacher, Mohammad Ghassemi, Shin-Han Shiu","doi":"10.1093/insilicoplants/diad021","DOIUrl":"https://doi.org/10.1093/insilicoplants/diad021","url":null,"abstract":"Abstract Natural language processing (NLP) techniques can enhance our ability to interpret plant science literature. Many state-of-the-art algorithms for NLP tasks require high-quality labeled data in the target domain, in which entities like genes and proteins, as well as the relationships between entities are labeled according to a set of annotation guidelines. While there exist such datasets for other domains, these resources need development in the plant sciences. Here, we present the Plant ScIenCe KnowLedgE Graph (PICKLE) corpus, a collection of 250 plant science abstracts annotated with entities and relations, along with its annotation guidelines. The annotation guidelines were refined by iterative rounds of overlapping annotations, in which inter-annotator agreement was leveraged to improve the guidelines. To demonstrate PICKLE’s utility, we evaluated the performance of pretrained models from other domains and trained a new, PICKLE-based model for entity and relation extraction. The PICKLE-trained models exhibit the second-highest in-domain entity performance of all models evaluated, as well as a relation extraction performance that is on par with other models. Additionally, we found that computer science-domain models outperformed models trained on a biomedical corpus (GENIA) in entity extraction, which was unexpected given the intuition that biomedical literature is more similar to PICKLE than computer science. Upon further exploration, we established that the inclusion of new types on which the models were not trained substantially impacts performance. The PICKLE corpus is therefore an important contribution to training resources for entity and relation extraction in the plant sciences.","PeriodicalId":36138,"journal":{"name":"in silico Plants","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135087098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-08DOI: 10.1093/insilicoplants/diad018
Maleana G Khoury, Kenneth S Berenhaut, Katherine E Moore, Edward E Allen, Alexandria F Harkey, Joëlle K Mühlemann, Courtney N Craven, Jiayi Xu, Suchi S Jain, David J John, James L Norris, Gloria K Muday
Abstract Transcriptome studies that provide temporal information about transcript abundance facilitate identification of gene regulatory networks (GRNs). Inferring GRNs from time series data using computational modeling remains a central challenge in systems biology. Commonly employed clustering algorithms identify modules of like-responding genes but do not provide information on how these modules are interconnected. These methods also require users to specify parameters such as cluster number and size, adding complexity to the analysis. To address these challenges, we employed a recently developed algorithm, Partitioned Local Depth (PaLD), to generate cohesive networks for 4 time series transcriptome datasets (3 hormone and 1 abiotic stress dataset) from the model plant Arabidopsis thaliana. PaLD provided a cohesive network representation of the data, revealing networks with distinct structures and varying numbers of connections between transcripts. We utilized the networks to make predictions about GRNs by examining local neighborhoods of transcripts with highly similar temporal responses. We also partitioned the networks into groups of like-responding transcripts and identified enriched functional and regulatory features in them. Comparison of groups to clusters generated by commonly used approaches indicated that these methods identified modules of transcripts that have similar temporal and biological features, but also identified unique groups, suggesting a PaLD-based approach (supplemented with a community detection algorithm) can complement existing methods. These results revealed that PaLD could sort like-responding transcripts into biologically meaningful neighborhoods and groups while requiring minimal user input and producing cohesive network structure, offering an additional tool to the systems biology community to predict GRNs.
转录组研究提供了转录丰度的时间信息,有助于识别基因调控网络(grn)。利用计算模型从时间序列数据推断grn仍然是系统生物学的核心挑战。常用的聚类算法识别相似响应基因的模块,但不提供这些模块如何相互连接的信息。这些方法还要求用户指定参数,如簇数和大小,这增加了分析的复杂性。为了解决这些挑战,我们采用了最近开发的一种算法,Partitioned Local Depth (PaLD),为来自模式植物拟南芥的4个时间序列转录组数据集(3个激素和1个非生物胁迫数据集)生成内聚网络。PaLD提供了数据的内聚网络表示,揭示了具有不同结构和转录本之间不同数量连接的网络。我们利用该网络通过检查具有高度相似时间响应的转录本的局部邻域来预测grn。我们还将网络划分为类似响应的转录本组,并确定了其中丰富的功能和调控特征。将常用方法生成的组与聚类进行比较表明,这些方法识别出具有相似时间和生物学特征的转录本模块,但也识别出独特的组,这表明基于pald的方法(辅以群落检测算法)可以补充现有方法。这些结果表明,PaLD可以将类似响应的转录本分类到生物学上有意义的邻域和组中,同时需要最少的用户输入并产生内聚的网络结构,为系统生物学社区预测grn提供了额外的工具。
{"title":"Informative community structure revealed using Arabidopsis time series transcriptome data via Partitioned Local Depth","authors":"Maleana G Khoury, Kenneth S Berenhaut, Katherine E Moore, Edward E Allen, Alexandria F Harkey, Joëlle K Mühlemann, Courtney N Craven, Jiayi Xu, Suchi S Jain, David J John, James L Norris, Gloria K Muday","doi":"10.1093/insilicoplants/diad018","DOIUrl":"https://doi.org/10.1093/insilicoplants/diad018","url":null,"abstract":"Abstract Transcriptome studies that provide temporal information about transcript abundance facilitate identification of gene regulatory networks (GRNs). Inferring GRNs from time series data using computational modeling remains a central challenge in systems biology. Commonly employed clustering algorithms identify modules of like-responding genes but do not provide information on how these modules are interconnected. These methods also require users to specify parameters such as cluster number and size, adding complexity to the analysis. To address these challenges, we employed a recently developed algorithm, Partitioned Local Depth (PaLD), to generate cohesive networks for 4 time series transcriptome datasets (3 hormone and 1 abiotic stress dataset) from the model plant Arabidopsis thaliana. PaLD provided a cohesive network representation of the data, revealing networks with distinct structures and varying numbers of connections between transcripts. We utilized the networks to make predictions about GRNs by examining local neighborhoods of transcripts with highly similar temporal responses. We also partitioned the networks into groups of like-responding transcripts and identified enriched functional and regulatory features in them. Comparison of groups to clusters generated by commonly used approaches indicated that these methods identified modules of transcripts that have similar temporal and biological features, but also identified unique groups, suggesting a PaLD-based approach (supplemented with a community detection algorithm) can complement existing methods. These results revealed that PaLD could sort like-responding transcripts into biologically meaningful neighborhoods and groups while requiring minimal user input and producing cohesive network structure, offering an additional tool to the systems biology community to predict GRNs.","PeriodicalId":36138,"journal":{"name":"in silico Plants","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135430060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-08DOI: 10.1093/insilicoplants/diad020
Achraf Mamassi, Marie Lang, Bernard Tychon, Mouanis Lahlou, Joost Wellens, Mohamed El Gharous, Hélène Marrou
Abstract In the context of climate change, in-season and longer-term yield predictions are needed to anticipate local and regional food crises and propose adaptations to farmers’ practices. Mechanistic models and machine learning are two modelling options to consider in this perspective. In this study, regression (MR) and Random Forest (RF) models were calibrated for wheat yield prediction in Morocco, using data collected from 125 farmers’ wheat fields. Additionally , MR and RF models were calibrated both with or without remotely-sensed leaf area index (LAI), while considering all farmers’ fields, or specifically to agroecological zoning in Morocco. The same farmers’ fields were simulated using a mechanistic model (APSIM-wheat). We compared the predictive performances of the empirical models and APSIM-wheat. Results showed that both MR and RF showed rather good predictive quality (NRMSEs below 35%), but were always outperformed by APSIM model. Both RF and MR selected remotely-sensed LAI at heading, climate variables (maximal temperatures at emergence and tillering), and fertilization practices (amount of nitrogen applied at heading) as major yield predictors. Integration of remotely-sensed LAI in the calibration process reduced NRMSE of 4.5% and 1.8 % on average for MR and RF models respectively. Calibration of region specific models did not significantly improve the predictive. These findings lead to the conclusion that mechanistic models are better at capturing the impacts of in-season climate variability and would be preferred to support short term tactical adjustments to farmers’ practices, while machine learning models are easier to use in the perspective of mid-term regional prediction.
{"title":"A comparison of empirical and mechanistic models for wheat yield prediction at field level in Moroccan rainfed areas","authors":"Achraf Mamassi, Marie Lang, Bernard Tychon, Mouanis Lahlou, Joost Wellens, Mohamed El Gharous, Hélène Marrou","doi":"10.1093/insilicoplants/diad020","DOIUrl":"https://doi.org/10.1093/insilicoplants/diad020","url":null,"abstract":"Abstract In the context of climate change, in-season and longer-term yield predictions are needed to anticipate local and regional food crises and propose adaptations to farmers’ practices. Mechanistic models and machine learning are two modelling options to consider in this perspective. In this study, regression (MR) and Random Forest (RF) models were calibrated for wheat yield prediction in Morocco, using data collected from 125 farmers’ wheat fields. Additionally , MR and RF models were calibrated both with or without remotely-sensed leaf area index (LAI), while considering all farmers’ fields, or specifically to agroecological zoning in Morocco. The same farmers’ fields were simulated using a mechanistic model (APSIM-wheat). We compared the predictive performances of the empirical models and APSIM-wheat. Results showed that both MR and RF showed rather good predictive quality (NRMSEs below 35%), but were always outperformed by APSIM model. Both RF and MR selected remotely-sensed LAI at heading, climate variables (maximal temperatures at emergence and tillering), and fertilization practices (amount of nitrogen applied at heading) as major yield predictors. Integration of remotely-sensed LAI in the calibration process reduced NRMSE of 4.5% and 1.8 % on average for MR and RF models respectively. Calibration of region specific models did not significantly improve the predictive. These findings lead to the conclusion that mechanistic models are better at capturing the impacts of in-season climate variability and would be preferred to support short term tactical adjustments to farmers’ practices, while machine learning models are easier to use in the perspective of mid-term regional prediction.","PeriodicalId":36138,"journal":{"name":"in silico Plants","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135430034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-03DOI: 10.1093/insilicoplants/diad019
G L Hammer, G McLean, J Kholová, E van Oosterom
Abstract Tillering affects canopy leaf area, and hence crop growth via capture of light, water, and nutrients. Depending on the season, variation in tillering can result in increased or decreased yield. Reduced tillering has been associated with water-saving and enhanced yield in water-limited conditions. The objective of this study was to develop a generic model of the dynamics of tillering in sorghum incorporating key genetic and environmental controls. The dynamic of tillering was defined in four key phases – pre-tillering, tiller emergence, cessation of tiller emergence, and cessation of tiller growth. Tillering commenced at full expansion of leaf four and thereafter was synchronised with leaf appearance. The potential total number of tillers (TTN) was dependent on a genetic propensity to tiller and an index of assimilate availability dependent on the shoot source-sink balance. Cessation of tiller emergence could occur before TTN depending on extent of competition from neighbours. Subsequent cessation of growth of emerged tillers was related to the extent of internal competition for assimilate among plant organs, resulting in prediction of final fertile tiller number (FTN). The model predicted tillering dynamics well in an experiment with a range in plant density. Plausibility simulations of FTN conducted for diverse field conditions in the Australian sorghum belt reflected expectations. The model is able to predict fertile tiller number as an emergent property. Its utility to explore GxMxE crop adaptation landscapes, guide molecular discovery, provide a generic template for other cereals, and link to advanced methods for enhancing genetic gain in crops were discussed.
{"title":"Modelling the dynamics and phenotypic consequences of tiller outgrowth and cessation in sorghum","authors":"G L Hammer, G McLean, J Kholová, E van Oosterom","doi":"10.1093/insilicoplants/diad019","DOIUrl":"https://doi.org/10.1093/insilicoplants/diad019","url":null,"abstract":"Abstract Tillering affects canopy leaf area, and hence crop growth via capture of light, water, and nutrients. Depending on the season, variation in tillering can result in increased or decreased yield. Reduced tillering has been associated with water-saving and enhanced yield in water-limited conditions. The objective of this study was to develop a generic model of the dynamics of tillering in sorghum incorporating key genetic and environmental controls. The dynamic of tillering was defined in four key phases – pre-tillering, tiller emergence, cessation of tiller emergence, and cessation of tiller growth. Tillering commenced at full expansion of leaf four and thereafter was synchronised with leaf appearance. The potential total number of tillers (TTN) was dependent on a genetic propensity to tiller and an index of assimilate availability dependent on the shoot source-sink balance. Cessation of tiller emergence could occur before TTN depending on extent of competition from neighbours. Subsequent cessation of growth of emerged tillers was related to the extent of internal competition for assimilate among plant organs, resulting in prediction of final fertile tiller number (FTN). The model predicted tillering dynamics well in an experiment with a range in plant density. Plausibility simulations of FTN conducted for diverse field conditions in the Australian sorghum belt reflected expectations. The model is able to predict fertile tiller number as an emergent property. Its utility to explore GxMxE crop adaptation landscapes, guide molecular discovery, provide a generic template for other cereals, and link to advanced methods for enhancing genetic gain in crops were discussed.","PeriodicalId":36138,"journal":{"name":"in silico Plants","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135874837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}