Pub Date : 2025-04-10DOI: 10.1016/j.chemolab.2025.105390
Sadik Bhattarai , Kil To Chong , Hilal Tayara
Limited and imbalanced data hinder anticancer peptide (ACP) prediction, often resulting in over-fitting and poor performance on unseen peptides. To address these challenges, we propose a Deep Convolution Generative Adversarial Network (DC-GAN) based data augmentation method. This approach effectively expands the training dataset by generating peptides with anticancer properties, particularly underrepresented class such as N+ type ACPs, characterized by abundant positive residues in the N-terminus, which remain amnesic problem in anticancer peptide prediction. Compared to traditional methods like Synthetic Minority Over-sampling Technique (SMOTE) and SMOTE with Edited Nearest Neighbors (SMOTEENN), DC-GAN demonstrates superior performance by addressing both limited training samples and within-class imbalances, such as those between C+ and N+ type peptides. The proposed framework, GAN-ML cascade a linear model and an ensemble model, achieving accuracy rates of 82.96% (independent test), 96.06% (independent test), and 94.06% (5-fold cross-validation) for classifying peptides as anticancer, antimicrobial, or non-anticancer across various datasets integrating ACPs motif based authentication and physio-chemical properties based validation. These results highlight the efficacy of DC-GAN-based data augmentation in enhancing model generalization, improving performance by generating a samples with minority representation, and serving as a powerful tool for generative anticancer drug discovery.
{"title":"GAN-ML: Advancing anticancer peptide prediction through innovative Deep Convolution Generative Adversarial Network data augmentation technique","authors":"Sadik Bhattarai , Kil To Chong , Hilal Tayara","doi":"10.1016/j.chemolab.2025.105390","DOIUrl":"10.1016/j.chemolab.2025.105390","url":null,"abstract":"<div><div>Limited and imbalanced data hinder anticancer peptide (ACP) prediction, often resulting in over-fitting and poor performance on unseen peptides. To address these challenges, we propose a Deep Convolution Generative Adversarial Network (DC-GAN) based data augmentation method. This approach effectively expands the training dataset by generating peptides with anticancer properties, particularly underrepresented class such as N+ type ACPs, characterized by abundant positive residues in the N-terminus, which remain amnesic problem in anticancer peptide prediction. Compared to traditional methods like Synthetic Minority Over-sampling Technique (SMOTE) and SMOTE with Edited Nearest Neighbors (SMOTEENN), DC-GAN demonstrates superior performance by addressing both limited training samples and within-class imbalances, such as those between C+ and N+ type peptides. The proposed framework, GAN-ML cascade a linear model and an ensemble model, achieving accuracy rates of 82.96% (independent test), 96.06% (independent test), and 94.06% (5-fold cross-validation) for classifying peptides as anticancer, antimicrobial, or non-anticancer across various datasets integrating ACPs motif based authentication and physio-chemical properties based validation. These results highlight the efficacy of DC-GAN-based data augmentation in enhancing model generalization, improving performance by generating a samples with minority representation, and serving as a powerful tool for generative anticancer drug discovery.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"262 ","pages":"Article 105390"},"PeriodicalIF":3.7,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-08DOI: 10.1016/j.chemolab.2025.105400
Guglielmo Emanuele Franceschi , Lisa Rita Magnaghi , Marta Guembe-Garcia , Raffaela Biesuz
The detection of human serum albumin (HSA) in urine is crucial for the early diagnosis of nephrotic syndromes and diabetic nephropathy. In this study, we developed a cost-effective, colorimetric sensor based on Bromocresol Green (BCG) sorbed on Color Catcher® (CC) sheets for albumin detection. The sensor undergoes a visible color change from yellow to blue upon interaction with albumin at acidic pH, enabling qualitative detection. A Design of Experiment (DoE) approach was applied to optimize sensor preparation and application and to control experimental variability within the lab-scale preparation procedure, ensuring enhanced sensitivity and robustness. Several multivariate data analysis tools, including Principal Component Analysis (PCA) and Discriminant Analysis (LDA and QDA), were merged to describe the samples, develop robust and predictive models and assess detection performance. The optimized sensor proved a detection limit as low as 0.5 μM for albumin, making it a promising candidate for rapid, low-cost, and user-friendly point-of-care (PoC) applications.
{"title":"Enhancing Albumin Detection with Chemometrics: A Multivariate Approach to Bromocresol Green-based Colorimetric Sensor Development","authors":"Guglielmo Emanuele Franceschi , Lisa Rita Magnaghi , Marta Guembe-Garcia , Raffaela Biesuz","doi":"10.1016/j.chemolab.2025.105400","DOIUrl":"10.1016/j.chemolab.2025.105400","url":null,"abstract":"<div><div>The detection of human serum albumin (HSA) in urine is crucial for the early diagnosis of nephrotic syndromes and diabetic nephropathy. In this study, we developed a cost-effective, colorimetric sensor based on Bromocresol Green (BCG) sorbed on Color Catcher® (CC) sheets for albumin detection. The sensor undergoes a visible color change from yellow to blue upon interaction with albumin at acidic pH, enabling qualitative detection. A Design of Experiment (DoE) approach was applied to optimize sensor preparation and application and to control experimental variability within the lab-scale preparation procedure, ensuring enhanced sensitivity and robustness. Several multivariate data analysis tools, including Principal Component Analysis (PCA) and Discriminant Analysis (LDA and QDA), were merged to describe the samples, develop robust and predictive models and assess detection performance. The optimized sensor proved a detection limit as low as 0.5 μM for albumin, making it a promising candidate for rapid, low-cost, and user-friendly point-of-care (PoC) applications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"262 ","pages":"Article 105400"},"PeriodicalIF":3.7,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143808615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-07DOI: 10.1016/j.chemolab.2025.105399
Mana Saleh Al Reshan , Samina Amin , Muhammad Ali Zeb , Adel Sulaiman , Asadullah Shaikh , Hani Alshahrani , Khairan Rajab
Breast cancer (BC) is a fatal illness that affects millions of people every year. After lung cancer, BC illness is one of the world's major causes of death for women. A breast cell-derived malignant tumor is referred to as BC. Both developed and developing countries are struggling with this widespread cancer. Machine learning (ML) and Deep Learning (DL) have appeared as effective technologies in BC predictions with the highest accuracy in the past years due to their robust taxonomy and diagnostic capabilities. This paper introduces a novel Deep Neural Networks-based Stacking Ensemble Model (DNN-SEM) enhanced with a hybrid stacking ensemble model (SEM) and Extra Tree Classifier (ETC) technique to extract the most essential features from the suggested BC datasets. The proposed DNN-SEM integrates Deep Belief Network (DBN) and Artificial Neural Network (ANN) as level-1 models, referred to as SEM-DBN and SEM-ANN, respectively. The level-1 models are designed using four traditional ML algorithms, including XGBoost Classifier (XGBC), Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM), which are designed as level-0 models. The proposed DNN-SEM model is trained using four BC datasets, namely Diagnostic Wisconsin Breast Cancer Dataset (WBCD) (Dataset-I), Coimbra Breast Cancer Dataset (CBCD) (Dataset-II), Original Wisconsin Breast Cancer Dataset (WDBC) (Dataset-III), and Prognostic Wisconsin Breast Cancer (WBCP) (Dataset-IV). The efficacy of the proposed DNN-SEM is assessed through established evaluation metrics, including accuracy, sensitivity, specificity, Matthew's correlation coefficient (MCC), F-score, confusion matrix, and ROC curves. To analyze the efficiency of the DNN-SEM, its performance is compared with the proposed single classifiers, ensemble, and state-of-the-art models present in the literature. The results demonstrate that DBN-SEM achieves the highest accuracy of 99.62 %, with the lowest error rate. The proposed DBN-SEM and ANN-SEM achieved promising accuracy scores against level-0 and state-of-the-art methods.
{"title":"Advanced breast cancer prediction using Deep Neural Networks integrated with ensemble models","authors":"Mana Saleh Al Reshan , Samina Amin , Muhammad Ali Zeb , Adel Sulaiman , Asadullah Shaikh , Hani Alshahrani , Khairan Rajab","doi":"10.1016/j.chemolab.2025.105399","DOIUrl":"10.1016/j.chemolab.2025.105399","url":null,"abstract":"<div><div>Breast cancer (BC) is a fatal illness that affects millions of people every year. After lung cancer, BC illness is one of the world's major causes of death for women. A breast cell-derived malignant tumor is referred to as BC. Both developed and developing countries are struggling with this widespread cancer. Machine learning (ML) and Deep Learning (DL) have appeared as effective technologies in BC predictions with the highest accuracy in the past years due to their robust taxonomy and diagnostic capabilities. This paper introduces a novel Deep Neural Networks-based Stacking Ensemble Model (DNN-SEM) enhanced with a hybrid stacking ensemble model (SEM) and Extra Tree Classifier (ETC) technique to extract the most essential features from the suggested BC datasets. The proposed DNN-SEM integrates Deep Belief Network (DBN) and Artificial Neural Network (ANN) as level-1 models, referred to as SEM-DBN and SEM-ANN, respectively. The level-1 models are designed using four traditional ML algorithms, including XGBoost Classifier (XGBC), Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM), which are designed as level-0 models. The proposed DNN-SEM model is trained using four BC datasets, namely Diagnostic Wisconsin Breast Cancer Dataset (WBCD) (Dataset-I), Coimbra Breast Cancer Dataset (CBCD) (Dataset-II), Original Wisconsin Breast Cancer Dataset (WDBC) (Dataset-III), and Prognostic Wisconsin Breast Cancer (WBCP) (Dataset-IV). The efficacy of the proposed DNN-SEM is assessed through established evaluation metrics, including accuracy, sensitivity, specificity, Matthew's correlation coefficient (MCC), F-score, confusion matrix, and ROC curves. To analyze the efficiency of the DNN-SEM, its performance is compared with the proposed single classifiers, ensemble, and state-of-the-art models present in the literature. The results demonstrate that DBN-SEM achieves the highest accuracy of 99.62 %, with the lowest error rate. The proposed DBN-SEM and ANN-SEM achieved promising accuracy scores against level-0 and state-of-the-art methods.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"262 ","pages":"Article 105399"},"PeriodicalIF":3.7,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143799625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-03DOI: 10.1016/j.chemolab.2025.105391
Aimei Liu , Wenjing Xuan , Yongjun Xiao
Recent advances in Artificial Intelligence (AI) have significantly influenced biodiesel production as a renewable source of energy, primarily through the enhancement of transesterification reactions and yield optimization. This review summarizes key findings from multiple studies on optimization of biodiesel production from biomass using machine learning models. This review analyzes various machine learning models and optimization techniques used for biodiesel production. Several optimization strategies, including evolutionary algorithms and heuristic methods, are explored across different studies. Among the models evaluated, those employing advanced configurations and ensemble techniques demonstrated superior performance in accuracy and correlation with biodiesel datasets. Particularly, enhanced versions of neural networks, extreme learning models, and fuzzy systems emerged as top performers, offering robust solutions for biodiesel optimization. Findings suggest that machine learning not only augments traditional catalyst development and yield prediction methods but also offers a consolidated framework enhancing overall process efficiency. This work intends to offer an extensive examination of the present status and forthcoming prospects of Artificial Intelligence applications in biodiesel production, synthesizing a broad range of contemporary useful literature.
{"title":"State-of-the-Art Review on Applications of Various Machine Learning Models in Biodiesel Production","authors":"Aimei Liu , Wenjing Xuan , Yongjun Xiao","doi":"10.1016/j.chemolab.2025.105391","DOIUrl":"10.1016/j.chemolab.2025.105391","url":null,"abstract":"<div><div>Recent advances in Artificial Intelligence (AI) have significantly influenced biodiesel production as a renewable source of energy, primarily through the enhancement of transesterification reactions and yield optimization. This review summarizes key findings from multiple studies on optimization of biodiesel production from biomass using machine learning models. This review analyzes various machine learning models and optimization techniques used for biodiesel production. Several optimization strategies, including evolutionary algorithms and heuristic methods, are explored across different studies. Among the models evaluated, those employing advanced configurations and ensemble techniques demonstrated superior performance in accuracy and correlation with biodiesel datasets. Particularly, enhanced versions of neural networks, extreme learning models, and fuzzy systems emerged as top performers, offering robust solutions for biodiesel optimization. Findings suggest that machine learning not only augments traditional catalyst development and yield prediction methods but also offers a consolidated framework enhancing overall process efficiency. This work intends to offer an extensive examination of the present status and forthcoming prospects of Artificial Intelligence applications in biodiesel production, synthesizing a broad range of contemporary useful literature.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"262 ","pages":"Article 105391"},"PeriodicalIF":3.7,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143799300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1016/j.chemolab.2025.105384
Zhuangwei Shi , Jiale Wang , Yunhao Su , Xiaohong Liang , Jianchen Zi , Chenhui Wang , Hai Bi , Xia Xiang
Raman spectroscopy, a non-invasive analytical technique, reveals significant potential in clinical diagnosis of kidney disorders by detecting key biomolecules in urine samples, especially glucose and protein. Although machine learning models have been widely applied for efficiently analyzing Raman spectral data, the high-dimensionality, imbalance and sample-scarcity of Raman spectral data still pose challenges to the models in achieving accurate detection. To address these challenges, we propose a novel deep learning model, TCRaman, which integrates transfer learning and contrastive learning for urine detection using Raman spectral data. As contrastive learning is capable of representation learning on imbalanced data, TCRaman first utilizes a pretrained contrastive learning model on a large labeled Raman spectral dataset of bacteria, to enhance the model’s capability to learn meaningful low-dimensional representations from high-dimensional Raman spectral data. Then, the pretrained model is finetuned on clinical urine Raman spectral data. This transfer learning framework is a foundation model that can break through the limitation of sample-scarcity on different downstream tasks. The experiments demonstrate the superiority of TCRaman compared with current state-of-the-art models. The results show that TCRaman achieves 91% accuracy on the detection of both glucose and protein, and 95% accuracy on the prediction of kidney disorders, highlighting the effectiveness of our proposed method in detecting urine Raman spectra. The proposed TCRaman method provides a promising way for accurate, rapid, and cost-effective detection for spectral data of biochemical samples.
{"title":"Transfer contrastive learning for Raman spectra data of urine: Detection of glucose, protein, and prediction of kidney disorders","authors":"Zhuangwei Shi , Jiale Wang , Yunhao Su , Xiaohong Liang , Jianchen Zi , Chenhui Wang , Hai Bi , Xia Xiang","doi":"10.1016/j.chemolab.2025.105384","DOIUrl":"10.1016/j.chemolab.2025.105384","url":null,"abstract":"<div><div>Raman spectroscopy, a non-invasive analytical technique, reveals significant potential in clinical diagnosis of kidney disorders by detecting key biomolecules in urine samples, especially glucose and protein. Although machine learning models have been widely applied for efficiently analyzing Raman spectral data, the high-dimensionality, imbalance and sample-scarcity of Raman spectral data still pose challenges to the models in achieving accurate detection. To address these challenges, we propose a novel deep learning model, TCRaman, which integrates transfer learning and contrastive learning for urine detection using Raman spectral data. As contrastive learning is capable of representation learning on imbalanced data, TCRaman first utilizes a pretrained contrastive learning model on a large labeled Raman spectral dataset of bacteria, to enhance the model’s capability to learn meaningful low-dimensional representations from high-dimensional Raman spectral data. Then, the pretrained model is finetuned on clinical urine Raman spectral data. This transfer learning framework is a foundation model that can break through the limitation of sample-scarcity on different downstream tasks. The experiments demonstrate the superiority of TCRaman compared with current state-of-the-art models. The results show that TCRaman achieves 91% accuracy on the detection of both glucose and protein, and 95% accuracy on the prediction of kidney disorders, highlighting the effectiveness of our proposed method in detecting urine Raman spectra. The proposed TCRaman method provides a promising way for accurate, rapid, and cost-effective detection for spectral data of biochemical samples.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105384"},"PeriodicalIF":3.7,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143725394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-25DOI: 10.1016/j.chemolab.2025.105389
Paula Beatriz Silva Passarin, Rogerio Takao Okamoto, Fabiane Dörr, Mauricio Yonamine, Felipe Rebello Lourenço
In the pharmaceutical industry, analytical procedures are used to conduct research, development and quality control of drugs and medicines. Given the importance of the Analytical Quality by Design (AQbD) approach in the rational development of analytical procedures, especially by minimizing the need for experiments, acquiring and improving knowledge during development and ensuring the flexibility of the method, this study aimed to develop an analytical procedure, based on AQbD principles, for the identification and quantification of different cephalosporins, as well as to create a tool for defining the method operable design region. The initial screening phase of analytical development was carried out using an in silico tool developed in a previous study. During the analytical development phase, a forced degradation study was performed using Liquid Chromatography Coupled to Mass Spectrometry (LC-MS) to identify cephalosporin degradation products. The developed method, along with its Method Operable Design Region (MODR), was validated, resulting in a robust and flexible analytical procedure for identifying and quantifying different cephalosporins in accordance with AQbD principles. The proposed tool considers multiple chromatographic responses, target uncertainty and desired confidence level, simplifying the verification of compliance with quality requirements defined in Analytical Target Profile, offering a reliable process that adapts to parameter changes, maintaining the quality of results.
{"title":"AQbD approach for the development of an analytical procedure for the separation of cephalosporin drugs and their degradation products","authors":"Paula Beatriz Silva Passarin, Rogerio Takao Okamoto, Fabiane Dörr, Mauricio Yonamine, Felipe Rebello Lourenço","doi":"10.1016/j.chemolab.2025.105389","DOIUrl":"10.1016/j.chemolab.2025.105389","url":null,"abstract":"<div><div>In the pharmaceutical industry, analytical procedures are used to conduct research, development and quality control of drugs and medicines. Given the importance of the Analytical Quality by Design (AQbD) approach in the rational development of analytical procedures, especially by minimizing the need for experiments, acquiring and improving knowledge during development and ensuring the flexibility of the method, this study aimed to develop an analytical procedure, based on AQbD principles, for the identification and quantification of different cephalosporins, as well as to create a tool for defining the method operable design region. The initial screening phase of analytical development was carried out using an <em>in silico</em> tool developed in a previous study. During the analytical development phase, a forced degradation study was performed using Liquid Chromatography Coupled to Mass Spectrometry (LC-MS) to identify cephalosporin degradation products. The developed method, along with its Method Operable Design Region (MODR), was validated, resulting in a robust and flexible analytical procedure for identifying and quantifying different cephalosporins in accordance with AQbD principles. The proposed tool considers multiple chromatographic responses, target uncertainty and desired confidence level, simplifying the verification of compliance with quality requirements defined in Analytical Target Profile, offering a reliable process that adapts to parameter changes, maintaining the quality of results.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105389"},"PeriodicalIF":3.7,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143725395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24DOI: 10.1016/j.chemolab.2025.105388
Xunian Yang, Jieguang Yang, Xiaochen Hao
In the process of cement clinker calcination, the working conditions fluctuate dynamically, and multiple operational indices are interdependent. The inability to monitor key indicators, such as clinker quality and energy consumption, in real time, along with the absence of coordination mechanisms among various operational indicators, results in issues such as product instability, low energy efficiency, and insufficient robustness of the production system. To tackle these challenges under dynamic conditions, this paper proposes a robust optimization method for the cement calcination process (CCP). First, a prediction model for coal consumption and free calcium oxide (f-CaO) content is developed using a Time Series-Based Convolutional Neural Network (TS-CNN), incorporating the multi-time-scale characteristics and significant delays inherent in cement calcination data. Second, a multi-objective optimization model for the CCP is formulated by examining the relationships between process parameters and production indices. Subsequently, the mean effective function of the prediction model is defined as the fitness function, and a robust multi-objective difference algorithm (RMODE) is developed to solve the optimization model, yielding a robust optimal solution with high resistance to disturbances. Finally, comparative experiments are performed using real-world CCP data. The experimental results indicate that, compared to the baseline algorithm, the proposed method enhances system robustness while maintaining product quality and reducing coal consumption.
{"title":"Research on robust optimization of cement calcination process based on RMODE algorithm","authors":"Xunian Yang, Jieguang Yang, Xiaochen Hao","doi":"10.1016/j.chemolab.2025.105388","DOIUrl":"10.1016/j.chemolab.2025.105388","url":null,"abstract":"<div><div>In the process of cement clinker calcination, the working conditions fluctuate dynamically, and multiple operational indices are interdependent. The inability to monitor key indicators, such as clinker quality and energy consumption, in real time, along with the absence of coordination mechanisms among various operational indicators, results in issues such as product instability, low energy efficiency, and insufficient robustness of the production system. To tackle these challenges under dynamic conditions, this paper proposes a robust optimization method for the cement calcination process (CCP). First, a prediction model for coal consumption and free calcium oxide (f-CaO) content is developed using a Time Series-Based Convolutional Neural Network (TS-CNN), incorporating the multi-time-scale characteristics and significant delays inherent in cement calcination data. Second, a multi-objective optimization model for the CCP is formulated by examining the relationships between process parameters and production indices. Subsequently, the mean effective function of the prediction model is defined as the fitness function, and a robust multi-objective difference algorithm (RMODE) is developed to solve the optimization model, yielding a robust optimal solution with high resistance to disturbances. Finally, comparative experiments are performed using real-world CCP data. The experimental results indicate that, compared to the baseline algorithm, the proposed method enhances system robustness while maintaining product quality and reducing coal consumption.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105388"},"PeriodicalIF":3.7,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143735065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1016/j.chemolab.2025.105382
Marcelo Bourguignon , Diego I. Gallardo
The usual mean linear regression provides the average relationship between a response variable and explanatory variables, but it is not always the best metric for modeling right-skewed data in regression. In this paper, we extend the usual mean gamma regression model using a general and unified parameterization of this distribution that is indexed by some central tendency measure. Unlike the traditional gamma regression model, which focuses on the arithmetic mean, this new parameterization accommodates different measures of central tendency, including the median, mode, and geometric mean, harmonic mean along with a precision parameter. We consider a regression structure for both components. The model provides a robust framework for regression, allowing for greater adaptability to different data characteristics. Estimation is performed by maximum likelihood. Furthermore, we discuss residuals. A Monte Carlo experiment is conducted to evaluate the performances of these estimators and residuals in finite samples with a discussion of the obtained results. The methods developed are applied to two real data sets from minerals and nutrition.
{"title":"A general and unified class of gamma regression models","authors":"Marcelo Bourguignon , Diego I. Gallardo","doi":"10.1016/j.chemolab.2025.105382","DOIUrl":"10.1016/j.chemolab.2025.105382","url":null,"abstract":"<div><div>The usual mean linear regression provides the average relationship between a response variable and explanatory variables, but it is not always the best metric for modeling right-skewed data in regression. In this paper, we extend the usual mean gamma regression model using a general and unified parameterization of this distribution that is indexed by some central tendency measure. Unlike the traditional gamma regression model, which focuses on the arithmetic mean, this new parameterization accommodates different measures of central tendency, including the median, mode, and geometric mean, harmonic mean along with a precision parameter. We consider a regression structure for both components. The model provides a robust framework for regression, allowing for greater adaptability to different data characteristics. Estimation is performed by maximum likelihood. Furthermore, we discuss residuals. A Monte Carlo experiment is conducted to evaluate the performances of these estimators and residuals in finite samples with a discussion of the obtained results. The methods developed are applied to two real data sets from minerals and nutrition.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105382"},"PeriodicalIF":3.7,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143704584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-20DOI: 10.1016/j.chemolab.2025.105385
Lucas Almir Cavalcante Minho , Walter Nei Lopes dos Santo
While infrared and mass spectral libraries are well documented, the same cannot be said for the ultraviolet and visible region (UV-VIS), severely impacting HPLC/DAD identification operations. Considering advancements in machine learning and its technologies, the exhaustive task of compiling and maintaining extensive standardized libraries may become obsolete. When well-tuned to the problem of spectral recognition, machine learning models can identify complex patterns and relationships within spectra, reducing the need for direct comparison with reference spectra. Therefore, this study proposed the development and validation of a heterogeneous ensemble model, integrating decision tree algorithms and meta-learning techniques, specialized in UV-VIS spectral recognition using HPLC/DAD. The ensemble demonstrated satisfactory performance, with an accuracy of 95.88 ± 4.45 % and a precision of 96.74 % (MCC = 0.9571) with data from the test set (n = 97), and an accuracy of approximately 80 %, but with a considerable recall of 93.00 %, when evaluated with real application data. A weighted quantitative index based on the feature importance parameter of random forests was developed and applied to estimate the model probability of success. The model, its constituents and other additional resources were made available in an open repository.
{"title":"Heterogeneous ensemble learning applied to UV-VIS identification of multi-class pesticides by high-performance liquid chromatography with diode array detector (HPLC/DAD)","authors":"Lucas Almir Cavalcante Minho , Walter Nei Lopes dos Santo","doi":"10.1016/j.chemolab.2025.105385","DOIUrl":"10.1016/j.chemolab.2025.105385","url":null,"abstract":"<div><div>While infrared and mass spectral libraries are well documented, the same cannot be said for the ultraviolet and visible region (UV-VIS), severely impacting HPLC/DAD identification operations. Considering advancements in machine learning and its technologies, the exhaustive task of compiling and maintaining extensive standardized libraries may become obsolete. When well-tuned to the problem of spectral recognition, machine learning models can identify complex patterns and relationships within spectra, reducing the need for direct comparison with reference spectra. Therefore, this study proposed the development and validation of a heterogeneous ensemble model, integrating decision tree algorithms and meta-learning techniques, specialized in UV-VIS spectral recognition using HPLC/DAD. The ensemble demonstrated satisfactory performance, with an accuracy of 95.88 ± 4.45 % and a precision of 96.74 % (MCC = 0.9571) with data from the test set (n = 97), and an accuracy of approximately 80 %, but with a considerable recall of 93.00 %, when evaluated with real application data. A weighted quantitative index based on the feature importance parameter of random forests was developed and applied to estimate the model probability of success. The model, its constituents and other additional resources were made available in an open repository.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105385"},"PeriodicalIF":3.7,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143761164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-20DOI: 10.1016/j.chemolab.2025.105378
Haoran Li , Pengcheng Wu , Shihong Ding , Tao Chen , Xiaobo Zou , Jisheng Dai
In this paper, we propose a knowledge-informed spectroscopic regression method named burst-sparsity learning (BSL) to address limitations in interpretability and consistency analysis. The concept of burst-sparsity (BS) refers to the distribution of chemically relevant structures inspired by spectral response mechanisms, characterized by significant variables that are sparse and occur in clusters. First, we formulate spectroscopic regression as a sparse recovery problem using the sparse Bayesian learning (SBL) model, which leverages the flexibility of SBL to provide an accurate sparse representation and allows for the integration of prior knowledge. Second, since the BS structure is unavailable, an enhanced non-uniform pattern-coupled (PC) prior was developed to capture more BS structures by considering adjacent coefficients. Extensive experiments are conducted to verify the efficacy of the BSL method. The results show that the BSL enhances the prediction performance in term of RMSEP and Rp2 across various spectroscopic techniques and dataset scales, highlighting its impressive potential for real-world applications. In additional, the deep combination of domain knowledge into machine learning provides deeper insights into how chemically relevant features contribute to the model’s predictions.
{"title":"A knowledge-informed burst-sparsity learning (BSL) with non-uniform pattern-coupled prior for spectroscopic regression","authors":"Haoran Li , Pengcheng Wu , Shihong Ding , Tao Chen , Xiaobo Zou , Jisheng Dai","doi":"10.1016/j.chemolab.2025.105378","DOIUrl":"10.1016/j.chemolab.2025.105378","url":null,"abstract":"<div><div>In this paper, we propose a knowledge-informed spectroscopic regression method named burst-sparsity learning (BSL) to address limitations in interpretability and consistency analysis. The concept of burst-sparsity (BS) refers to the distribution of chemically relevant structures inspired by spectral response mechanisms, characterized by significant variables that are sparse and occur in clusters. First, we formulate spectroscopic regression as a sparse recovery problem using the sparse Bayesian learning (SBL) model, which leverages the flexibility of SBL to provide an accurate sparse representation and allows for the integration of prior knowledge. Second, since the BS structure is unavailable, an enhanced non-uniform pattern-coupled (PC) prior was developed to capture more BS structures by considering adjacent coefficients. Extensive experiments are conducted to verify the efficacy of the BSL method. The results show that the BSL enhances the prediction performance in term of RMSEP and <strong>R<sub>p</sub><sup>2</sup></strong> across various spectroscopic techniques and dataset scales, highlighting its impressive potential for real-world applications. In additional, the deep combination of domain knowledge into machine learning provides deeper insights into how chemically relevant features contribute to the model’s predictions.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"262 ","pages":"Article 105378"},"PeriodicalIF":3.7,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143799301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}