Jihao You, Dan Tulpan, Cheryl Krziyzek, Jennifer L Ellis
{"title":"基于变量选择和降维的多元线性回归预测工业饲料厂颗粒耐久性指数","authors":"Jihao You, Dan Tulpan, Cheryl Krziyzek, Jennifer L Ellis","doi":"10.1093/jas/skaf021","DOIUrl":null,"url":null,"abstract":"Pellet quality, measured as Pellet Durability Index (PDI), is an important key performance indicator (KPI) for commercial feed manufacturing, as it can impact both mill efficiency and downstream performance of animals fed the manufactured diets. However, it is an ongoing challenge for the feed industry to control pellet quality, due to the complexity of feed manufacturing and the large number of variables influencing the process. Previous studies have explored prediction of pellet quality using either simple empirical models with a few variables or machine learning models with many variables. The objective of the current study was to develop statistical regression models to predict PDI, and to describe the relationship between pellet quality and 55 available variables based on a dataset with 2691 observations collected from a commercial feed mill. In the current study, the response variable (PDI) was transformed using the Box-Cox approach into the transformed response variable (tPDI), that was more normally distributed. Three multiple regression models were developed based on subsets of variables processed by variable selection and dimensionality reduction methods: Forward Selection, Principal Component Analysis, and Partial Least Squares. The results indicated that Model 1 (Forward Selection with manual removal of sparse variables), built on 9 variables, performed better than the other two models. It exhibited consistent model prediction performance on the training data and testing data, in terms of MAE (1.93 ± 0.063 versus 1.96), RMSPE (2.45 ± 0.079 versus 2.45), and CCC (0.549 ± 0.0273 versus 0.550), with a better prediction precision based on the fit plot. Expanding Temperature (℃), Fat Content (%), and ADF Content (%), and Indoor Humidity (Pelletizer) (%) were identified as more influential than other variables on the transformed response variable (tPDI) in Model 1, based on a behavior analysis. The models developed in the current study can be helpful to feed mills for predicting and comprehending the effect of a number of commonly measured variables on pellet quality in the commercial setting.","PeriodicalId":14895,"journal":{"name":"Journal of animal science","volume":"28 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction of Pellet Durability Index (PDI) in a Commercial Feed Mill Using Multiple Linear Regression with Variable Selection and Dimensionality Reduction\",\"authors\":\"Jihao You, Dan Tulpan, Cheryl Krziyzek, Jennifer L Ellis\",\"doi\":\"10.1093/jas/skaf021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pellet quality, measured as Pellet Durability Index (PDI), is an important key performance indicator (KPI) for commercial feed manufacturing, as it can impact both mill efficiency and downstream performance of animals fed the manufactured diets. However, it is an ongoing challenge for the feed industry to control pellet quality, due to the complexity of feed manufacturing and the large number of variables influencing the process. Previous studies have explored prediction of pellet quality using either simple empirical models with a few variables or machine learning models with many variables. The objective of the current study was to develop statistical regression models to predict PDI, and to describe the relationship between pellet quality and 55 available variables based on a dataset with 2691 observations collected from a commercial feed mill. In the current study, the response variable (PDI) was transformed using the Box-Cox approach into the transformed response variable (tPDI), that was more normally distributed. Three multiple regression models were developed based on subsets of variables processed by variable selection and dimensionality reduction methods: Forward Selection, Principal Component Analysis, and Partial Least Squares. The results indicated that Model 1 (Forward Selection with manual removal of sparse variables), built on 9 variables, performed better than the other two models. It exhibited consistent model prediction performance on the training data and testing data, in terms of MAE (1.93 ± 0.063 versus 1.96), RMSPE (2.45 ± 0.079 versus 2.45), and CCC (0.549 ± 0.0273 versus 0.550), with a better prediction precision based on the fit plot. Expanding Temperature (℃), Fat Content (%), and ADF Content (%), and Indoor Humidity (Pelletizer) (%) were identified as more influential than other variables on the transformed response variable (tPDI) in Model 1, based on a behavior analysis. The models developed in the current study can be helpful to feed mills for predicting and comprehending the effect of a number of commonly measured variables on pellet quality in the commercial setting.\",\"PeriodicalId\":14895,\"journal\":{\"name\":\"Journal of animal science\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of animal science\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.1093/jas/skaf021\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of animal science","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1093/jas/skaf021","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0
摘要
颗粒质量,以颗粒耐久性指数(PDI)来衡量,是商业饲料生产的重要关键性能指标(KPI),因为它可以影响饲料厂效率和饲喂所制饲料的动物的下游性能。然而,由于饲料制造的复杂性和影响过程的大量变量,控制颗粒质量对饲料行业来说是一个持续的挑战。以前的研究已经探索了使用具有少量变量的简单经验模型或具有许多变量的机器学习模型来预测颗粒质量。本研究的目的是建立统计回归模型来预测PDI,并基于从商业饲料厂收集的2691个观测数据集描述颗粒质量与55个可用变量之间的关系。在本研究中,响应变量(PDI)采用Box-Cox方法转化为更符合正态分布的转化响应变量(tPDI)。通过正向选择、主成分分析和偏最小二乘三种变量选择和降维方法,建立了基于变量子集的多元回归模型。结果表明,建立在9个变量上的模型1 (Forward Selection with manual removal of sparse variables)的表现优于其他两个模型。在MAE(1.93±0.063 vs . 1.96)、RMSPE(2.45±0.079 vs . 2.45)、CCC(0.549±0.0273 vs . 0.550)方面,模型对训练数据和测试数据的预测性能一致,且基于拟合图的预测精度更高。通过行为分析,发现膨胀温度(℃)、脂肪含量(%)、ADF含量(%)和室内湿度(制粒机)(%)对模型1中转化响应变量(tPDI)的影响大于其他变量。本研究中建立的模型可以帮助饲料厂预测和理解商业环境中一些常用测量变量对颗粒质量的影响。
Prediction of Pellet Durability Index (PDI) in a Commercial Feed Mill Using Multiple Linear Regression with Variable Selection and Dimensionality Reduction
Pellet quality, measured as Pellet Durability Index (PDI), is an important key performance indicator (KPI) for commercial feed manufacturing, as it can impact both mill efficiency and downstream performance of animals fed the manufactured diets. However, it is an ongoing challenge for the feed industry to control pellet quality, due to the complexity of feed manufacturing and the large number of variables influencing the process. Previous studies have explored prediction of pellet quality using either simple empirical models with a few variables or machine learning models with many variables. The objective of the current study was to develop statistical regression models to predict PDI, and to describe the relationship between pellet quality and 55 available variables based on a dataset with 2691 observations collected from a commercial feed mill. In the current study, the response variable (PDI) was transformed using the Box-Cox approach into the transformed response variable (tPDI), that was more normally distributed. Three multiple regression models were developed based on subsets of variables processed by variable selection and dimensionality reduction methods: Forward Selection, Principal Component Analysis, and Partial Least Squares. The results indicated that Model 1 (Forward Selection with manual removal of sparse variables), built on 9 variables, performed better than the other two models. It exhibited consistent model prediction performance on the training data and testing data, in terms of MAE (1.93 ± 0.063 versus 1.96), RMSPE (2.45 ± 0.079 versus 2.45), and CCC (0.549 ± 0.0273 versus 0.550), with a better prediction precision based on the fit plot. Expanding Temperature (℃), Fat Content (%), and ADF Content (%), and Indoor Humidity (Pelletizer) (%) were identified as more influential than other variables on the transformed response variable (tPDI) in Model 1, based on a behavior analysis. The models developed in the current study can be helpful to feed mills for predicting and comprehending the effect of a number of commonly measured variables on pellet quality in the commercial setting.
期刊介绍:
The Journal of Animal Science (JAS) is the premier journal for animal science and serves as the leading source of new knowledge and perspective in this area. JAS publishes more than 500 fully reviewed research articles, invited reviews, technical notes, and letters to the editor each year.
Articles published in JAS encompass a broad range of research topics in animal production and fundamental aspects of genetics, nutrition, physiology, and preparation and utilization of animal products. Articles typically report research with beef cattle, companion animals, goats, horses, pigs, and sheep; however, studies involving other farm animals, aquatic and wildlife species, and laboratory animal species that address fundamental questions related to livestock and companion animal biology will be considered for publication.