{"title":"食品质量数据的因果人工智能模型","authors":"Ž. Kurtanjek","doi":"10.17113/ftb.62.01.24.8301","DOIUrl":null,"url":null,"abstract":"Research background. The motivation of this study is to emphasize the importance of artificial intelligence (AI) and causality modelling of food quality and analysis with “big data”. AI with structural causal modelling (SCM), based on Bayes networks and deep learning, enables the integration of theoretical field knowledge in food technology with process production, physical-chemical analytics, and consumer organoleptic assessments. Food products have complex nature and data are highly dimensional, with intricate interrelations (correlations) and are difficult to relate to consumer sensory perception of food quality. Standard regression modelling techniques such as multiple ordinary least squares (OLS) and partial least squares (PLS) are effectively applied for the prediction by linear interpolations of observed data under cross-sectional stationary conditions. Upgrading linear regression models by machine learning (ML) accounts for nonlinear relations and reveals functional patterns, but is prone to confounding and fails predictions under unobserved nonstationary conditions. Confounding of data variables is the main obstacle to applications of the regression models in food innovations under previously untrained conditions. Hence, this manuscript focuses on applying causal graphical models with Bayes networks to infer causal relationships and intervention effects between process variables and consumer sensory assessment of food quality. \nExperimental approach. This study is based on the literature available data on the process of wheat bread baking quality, consumer sensory quality assessments of fermented milk products, and professional wine tasting data. The data for wheat baking quality are regularized by the least absolute shrinkage and selection operator (LASSO elastic net). Applied is Bayes statistics for evaluation of the model joint probability function for inferring the network structure and parameters. The obtained SCM models are presented as directed acyclic graphs (DAG). D-separation criteria is applied to block confounding effects in estimating direct and total causal effects of process variables and consumer perception on food quality. Probability distributions of causal effects of the intervention of individual process variables on quality are presented as partial dependency plots determined by Bayes neural networks. In the case of wine quality causality, the total causal effects determined by SCM models are positively validated by the double machine learning (DML) algorithm.\nResults and conclusions. Analysed is the data set of 45 continuous variables corresponding to different chemical, physical and biochemical variables of wheat properties from seven Croatian cultivars during two years of controlled cultivation. LASSO regularization of the data set yielded the ten key predictors, accounting for 98 % variance of the baking quality data. Based on the key variables derived is the quality predictive random forest model with 75 % cross-validation accuracy. Causal analysis between the quality and key predictors is based on the Bayes model depicted as a DAG graph. Protein content shows the most important direct causal effect with the corresponding path coefficient of 0.71, and THMW (total high molecular glutenin subunits) content is an indirect cause with a path coefficient of 0.42, and protein total average causal effect (ACE) is 0.65. The large data set of quality fermented milk products includes binary consumer sensory data (taste, odour, turbidity), continuous physical variables (temperature, fat, pH, colour), and three grade classes of consumer quality assessment. Derived is a random forest model for the prediction of the quality classification with an “out of box” (OOB) error of 0.28 %. The Bayes network model predicts that the direct causes of the taste classification are temperature, colour, and fat content, while the direct causes for the quality classification are temperature, turbidity, odour, and fat content. Estimated are the key quality grade average causal effects (ACE) of temperature -0.04 grade/°C and 0.3 quality grade/fat content. The temperature ACE dependency shows a nonlinear type as negative saturation with the “breaking” point at 60 °C, while for fat ACE has a positive linear trend. Causal quality analysis of red and white wine is based on the large data set of eleven continuous variables of physical and chemical properties and quality assessments classified in ten classes, from 1 to 10. Each classification is obtained in triplicates by a panel of professional wine tasters. A non-structural double machine learning algorithm (DML) is applied for total ACE quality assessment. The alcohol content of red and white wine has the key positive ACE relative factor of 0.35 quality/alcohol, while volatile acidity has the key negative ACE –0.2 quality/acidity. The obtained ACE predictions by the unstructured DML algorithm are in close agreement with the ACE obtained by the structural SCM models. \nNovelty and scientific contribution. Presented are novel methodologies and results for the application of causal artificial intelligence models in the analysis of consumer assessment of the quality of food products. The application of Bayes network structural causal models (SCM) enables the d-separation of pronounced effects of confounding between parameters in noncausal regression models. Based on SCM, inference of average causal effects (ACE) provides substantiated and validated research hypotheses for new products and support for decisions of potential interventions for improvement in product design, new process introduction, process control, management, and marketing.","PeriodicalId":12400,"journal":{"name":"Food Technology and Biotechnology","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Causal Artificial Intelligence Models of Food Quality Data\",\"authors\":\"Ž. Kurtanjek\",\"doi\":\"10.17113/ftb.62.01.24.8301\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Research background. The motivation of this study is to emphasize the importance of artificial intelligence (AI) and causality modelling of food quality and analysis with “big data”. AI with structural causal modelling (SCM), based on Bayes networks and deep learning, enables the integration of theoretical field knowledge in food technology with process production, physical-chemical analytics, and consumer organoleptic assessments. Food products have complex nature and data are highly dimensional, with intricate interrelations (correlations) and are difficult to relate to consumer sensory perception of food quality. Standard regression modelling techniques such as multiple ordinary least squares (OLS) and partial least squares (PLS) are effectively applied for the prediction by linear interpolations of observed data under cross-sectional stationary conditions. Upgrading linear regression models by machine learning (ML) accounts for nonlinear relations and reveals functional patterns, but is prone to confounding and fails predictions under unobserved nonstationary conditions. Confounding of data variables is the main obstacle to applications of the regression models in food innovations under previously untrained conditions. Hence, this manuscript focuses on applying causal graphical models with Bayes networks to infer causal relationships and intervention effects between process variables and consumer sensory assessment of food quality. \\nExperimental approach. This study is based on the literature available data on the process of wheat bread baking quality, consumer sensory quality assessments of fermented milk products, and professional wine tasting data. The data for wheat baking quality are regularized by the least absolute shrinkage and selection operator (LASSO elastic net). Applied is Bayes statistics for evaluation of the model joint probability function for inferring the network structure and parameters. The obtained SCM models are presented as directed acyclic graphs (DAG). D-separation criteria is applied to block confounding effects in estimating direct and total causal effects of process variables and consumer perception on food quality. Probability distributions of causal effects of the intervention of individual process variables on quality are presented as partial dependency plots determined by Bayes neural networks. In the case of wine quality causality, the total causal effects determined by SCM models are positively validated by the double machine learning (DML) algorithm.\\nResults and conclusions. Analysed is the data set of 45 continuous variables corresponding to different chemical, physical and biochemical variables of wheat properties from seven Croatian cultivars during two years of controlled cultivation. LASSO regularization of the data set yielded the ten key predictors, accounting for 98 % variance of the baking quality data. Based on the key variables derived is the quality predictive random forest model with 75 % cross-validation accuracy. Causal analysis between the quality and key predictors is based on the Bayes model depicted as a DAG graph. Protein content shows the most important direct causal effect with the corresponding path coefficient of 0.71, and THMW (total high molecular glutenin subunits) content is an indirect cause with a path coefficient of 0.42, and protein total average causal effect (ACE) is 0.65. The large data set of quality fermented milk products includes binary consumer sensory data (taste, odour, turbidity), continuous physical variables (temperature, fat, pH, colour), and three grade classes of consumer quality assessment. Derived is a random forest model for the prediction of the quality classification with an “out of box” (OOB) error of 0.28 %. The Bayes network model predicts that the direct causes of the taste classification are temperature, colour, and fat content, while the direct causes for the quality classification are temperature, turbidity, odour, and fat content. Estimated are the key quality grade average causal effects (ACE) of temperature -0.04 grade/°C and 0.3 quality grade/fat content. The temperature ACE dependency shows a nonlinear type as negative saturation with the “breaking” point at 60 °C, while for fat ACE has a positive linear trend. Causal quality analysis of red and white wine is based on the large data set of eleven continuous variables of physical and chemical properties and quality assessments classified in ten classes, from 1 to 10. Each classification is obtained in triplicates by a panel of professional wine tasters. A non-structural double machine learning algorithm (DML) is applied for total ACE quality assessment. The alcohol content of red and white wine has the key positive ACE relative factor of 0.35 quality/alcohol, while volatile acidity has the key negative ACE –0.2 quality/acidity. The obtained ACE predictions by the unstructured DML algorithm are in close agreement with the ACE obtained by the structural SCM models. \\nNovelty and scientific contribution. Presented are novel methodologies and results for the application of causal artificial intelligence models in the analysis of consumer assessment of the quality of food products. The application of Bayes network structural causal models (SCM) enables the d-separation of pronounced effects of confounding between parameters in noncausal regression models. Based on SCM, inference of average causal effects (ACE) provides substantiated and validated research hypotheses for new products and support for decisions of potential interventions for improvement in product design, new process introduction, process control, management, and marketing.\",\"PeriodicalId\":12400,\"journal\":{\"name\":\"Food Technology and Biotechnology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-01-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Food Technology and Biotechnology\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.17113/ftb.62.01.24.8301\",\"RegionNum\":4,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Food Technology and Biotechnology","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.17113/ftb.62.01.24.8301","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
Causal Artificial Intelligence Models of Food Quality Data
Research background. The motivation of this study is to emphasize the importance of artificial intelligence (AI) and causality modelling of food quality and analysis with “big data”. AI with structural causal modelling (SCM), based on Bayes networks and deep learning, enables the integration of theoretical field knowledge in food technology with process production, physical-chemical analytics, and consumer organoleptic assessments. Food products have complex nature and data are highly dimensional, with intricate interrelations (correlations) and are difficult to relate to consumer sensory perception of food quality. Standard regression modelling techniques such as multiple ordinary least squares (OLS) and partial least squares (PLS) are effectively applied for the prediction by linear interpolations of observed data under cross-sectional stationary conditions. Upgrading linear regression models by machine learning (ML) accounts for nonlinear relations and reveals functional patterns, but is prone to confounding and fails predictions under unobserved nonstationary conditions. Confounding of data variables is the main obstacle to applications of the regression models in food innovations under previously untrained conditions. Hence, this manuscript focuses on applying causal graphical models with Bayes networks to infer causal relationships and intervention effects between process variables and consumer sensory assessment of food quality.
Experimental approach. This study is based on the literature available data on the process of wheat bread baking quality, consumer sensory quality assessments of fermented milk products, and professional wine tasting data. The data for wheat baking quality are regularized by the least absolute shrinkage and selection operator (LASSO elastic net). Applied is Bayes statistics for evaluation of the model joint probability function for inferring the network structure and parameters. The obtained SCM models are presented as directed acyclic graphs (DAG). D-separation criteria is applied to block confounding effects in estimating direct and total causal effects of process variables and consumer perception on food quality. Probability distributions of causal effects of the intervention of individual process variables on quality are presented as partial dependency plots determined by Bayes neural networks. In the case of wine quality causality, the total causal effects determined by SCM models are positively validated by the double machine learning (DML) algorithm.
Results and conclusions. Analysed is the data set of 45 continuous variables corresponding to different chemical, physical and biochemical variables of wheat properties from seven Croatian cultivars during two years of controlled cultivation. LASSO regularization of the data set yielded the ten key predictors, accounting for 98 % variance of the baking quality data. Based on the key variables derived is the quality predictive random forest model with 75 % cross-validation accuracy. Causal analysis between the quality and key predictors is based on the Bayes model depicted as a DAG graph. Protein content shows the most important direct causal effect with the corresponding path coefficient of 0.71, and THMW (total high molecular glutenin subunits) content is an indirect cause with a path coefficient of 0.42, and protein total average causal effect (ACE) is 0.65. The large data set of quality fermented milk products includes binary consumer sensory data (taste, odour, turbidity), continuous physical variables (temperature, fat, pH, colour), and three grade classes of consumer quality assessment. Derived is a random forest model for the prediction of the quality classification with an “out of box” (OOB) error of 0.28 %. The Bayes network model predicts that the direct causes of the taste classification are temperature, colour, and fat content, while the direct causes for the quality classification are temperature, turbidity, odour, and fat content. Estimated are the key quality grade average causal effects (ACE) of temperature -0.04 grade/°C and 0.3 quality grade/fat content. The temperature ACE dependency shows a nonlinear type as negative saturation with the “breaking” point at 60 °C, while for fat ACE has a positive linear trend. Causal quality analysis of red and white wine is based on the large data set of eleven continuous variables of physical and chemical properties and quality assessments classified in ten classes, from 1 to 10. Each classification is obtained in triplicates by a panel of professional wine tasters. A non-structural double machine learning algorithm (DML) is applied for total ACE quality assessment. The alcohol content of red and white wine has the key positive ACE relative factor of 0.35 quality/alcohol, while volatile acidity has the key negative ACE –0.2 quality/acidity. The obtained ACE predictions by the unstructured DML algorithm are in close agreement with the ACE obtained by the structural SCM models.
Novelty and scientific contribution. Presented are novel methodologies and results for the application of causal artificial intelligence models in the analysis of consumer assessment of the quality of food products. The application of Bayes network structural causal models (SCM) enables the d-separation of pronounced effects of confounding between parameters in noncausal regression models. Based on SCM, inference of average causal effects (ACE) provides substantiated and validated research hypotheses for new products and support for decisions of potential interventions for improvement in product design, new process introduction, process control, management, and marketing.
期刊介绍:
Food Technology and Biotechnology (FTB) is a diamond open access, peer-reviewed international quarterly scientific journal that publishes papers covering a wide range of topics, including molecular biology, genetic engineering, biochemistry, microbiology, biochemical engineering and biotechnological processing, food science, analysis of food ingredients and final products, food processing and technology, oenology and waste treatment.
The Journal is published by the University of Zagreb, Faculty of Food Technology and Biotechnology, Croatia. It is an official journal of Croatian Society of Biotechnology and Slovenian Microbiological Society, financed by the Croatian Ministry of Science and Education, and supported by the Croatian Academy of Sciences and Arts.