Pub Date : 2026-02-05DOI: 10.1016/j.chemolab.2026.105659
Erdem Önal , Zeynep Kalaycıoğlu
Protein–ligand binding studies are critical in drug discovery and development, as they offer valuable insights into molecular interactions that underlie biological function, disease mechanisms, and therapeutic effects. The potential of combining text mining with cheminformatics to explore trends in protein–ligand binding studies across a range of analytical techniques was evaluated in this study. Six widely used analytical techniques were selected to reveal important patterns. Utilizing an open-source Python platform (SCOPE), we analyzed over 33,000 scientific articles and more than 1.3 million chemical entities. The resulting data were visualized as two-dimensional hexbin plots, revealing trends in hydrophobicity (log P)–molecular weight (Da) for each technique. Instead of focusing solely on ligands, this study aims to characterize the overall chemical environments—including solvents, buffers, and supporting agents—associated with protein–ligand binding assays. By analyzing the physicochemical properties of compounds reported across different analytical techniques, we highlight how method-specific preferences shape the experimental design landscape. The analysis integrates unsupervised K-means clustering, multivariate principal component analysis (PCA), and nonparametric statistical testing to quantitatively compare technique-associated chemical spaces. Moreover, this study offers a data-driven perspective on methodologies and historical trends in protein–ligand binding research. It is positioned as a data-driven, method-centric literature analysis rather than a traditional narrative review.
{"title":"Text mining-based profiling of chemical environments in protein–ligand binding assays across analytical techniques","authors":"Erdem Önal , Zeynep Kalaycıoğlu","doi":"10.1016/j.chemolab.2026.105659","DOIUrl":"10.1016/j.chemolab.2026.105659","url":null,"abstract":"<div><div>Protein–ligand binding studies are critical in drug discovery and development, as they offer valuable insights into molecular interactions that underlie biological function, disease mechanisms, and therapeutic effects. The potential of combining text mining with cheminformatics to explore trends in protein–ligand binding studies across a range of analytical techniques was evaluated in this study. Six widely used analytical techniques were selected to reveal important patterns. Utilizing an open-source Python platform (SCOPE), we analyzed over 33,000 scientific articles and more than 1.3 million chemical entities. The resulting data were visualized as two-dimensional hexbin plots, revealing trends in hydrophobicity (log P)–molecular weight (Da) for each technique. Instead of focusing solely on ligands, this study aims to characterize the overall chemical environments—including solvents, buffers, and supporting agents—associated with protein–ligand binding assays. By analyzing the physicochemical properties of compounds reported across different analytical techniques, we highlight how method-specific preferences shape the experimental design landscape. The analysis integrates unsupervised K-means clustering, multivariate principal component analysis (PCA), and nonparametric statistical testing to quantitatively compare technique-associated chemical spaces. Moreover, this study offers a data-driven perspective on methodologies and historical trends in protein–ligand binding research. It is positioned as a data-driven, method-centric literature analysis rather than a traditional narrative review.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105659"},"PeriodicalIF":3.8,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1016/j.chemolab.2026.105652
Soumya Sahu , Thomas Mathew , Robert Gibbons , Dulal K. Bhaumik
This article addresses calibration challenges in analytical chemistry by employing a random-effects calibration curve model and its generalizations to capture variability in analyte concentrations. The model is motivated by specific issues in analytical chemistry, where measurement errors remain constant at low concentrations but increase proportionally as concentrations rise. To account for this, the model permits the parameters of the calibration curve, which relate instrument responses to true concentrations, to vary across different laboratories, thereby reflecting the potential variability in measurement processes. The calibration curve that accurately captures the heteroscedastic nature of the data results in more reliable estimates across diverse laboratory conditions. Noting that traditional large-sample interval estimation methods are inadequate for small samples, an alternative approach, namely the fiducial approach, is explored in this work. It turns out that the fiducial approach, when used to construct a confidence interval for an unknown concentration, outperforms all other available approaches in terms of maintaining the coverage probabilities. Applications considered include the determination of the presence of an analyte and the interval estimation of an unknown true analyte concentration. The proposed method is demonstrated for both simulated and real interlaboratory data, including examples involving copper and cadmium in distilled water.
{"title":"Fiducial inference for random-effects calibration models: Advancing reliable quantification in environmental analytical chemistry","authors":"Soumya Sahu , Thomas Mathew , Robert Gibbons , Dulal K. Bhaumik","doi":"10.1016/j.chemolab.2026.105652","DOIUrl":"10.1016/j.chemolab.2026.105652","url":null,"abstract":"<div><div>This article addresses calibration challenges in analytical chemistry by employing a random-effects calibration curve model and its generalizations to capture variability in analyte concentrations. The model is motivated by specific issues in analytical chemistry, where measurement errors remain constant at low concentrations but increase proportionally as concentrations rise. To account for this, the model permits the parameters of the calibration curve, which relate instrument responses to true concentrations, to vary across different laboratories, thereby reflecting the potential variability in measurement processes. The calibration curve that accurately captures the heteroscedastic nature of the data results in more reliable estimates across diverse laboratory conditions. Noting that traditional large-sample interval estimation methods are inadequate for small samples, an alternative approach, namely the fiducial approach, is explored in this work. It turns out that the fiducial approach, when used to construct a confidence interval for an unknown concentration, outperforms all other available approaches in terms of maintaining the coverage probabilities. Applications considered include the determination of the presence of an analyte and the interval estimation of an unknown true analyte concentration. The proposed method is demonstrated for both simulated and real interlaboratory data, including examples involving copper and cadmium in distilled water.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105652"},"PeriodicalIF":3.8,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1016/j.chemolab.2026.105661
Jiaxue Cui , Dawei Zhang , Banglian Xu , Jianzhong Fan , Xianglong Cao
This study addresses the challenges of high-dimensional collinearity and regional information heterogeneity in near-infrared spectroscopy for gasoline olefin content prediction by proposing a systematic optimization approach combining a Continuous Region Utilizing Integrated Spectral Evaluation for Near-Infrared (CRUISE-NIR) algorithm with a Region-Sensitive Adaptive Ensemble Learning (RAEL) framework. The CRUISE-NIR algorithm shifts spectral analysis from a “point” to a “region” perspective, fully considering the physical correlation of adjacent wavelengths and chemical prior knowledge, reducing 4443 original variables to 16 key features. Meanwhile, the RAEL framework dynamically adjusts prediction weights according to sample performance characteristics in different spectral regions, achieving sample-specific precision prediction. Experimental results demonstrate that the proposed method achieves a root mean square error (RMSE) of 0.2795 and a coefficient of determination (R2) of 0.9646 on the test set, significantly outperforming traditional methods in prediction accuracy and fitting capability.Furthermore, the robustness of the framework was successfully validated on heterogeneous matrices including SWRI Diesel, IDRC Tablets, and Soil, demonstrating robust generalizability across diverse liquid and solid physical states. Experimental results indicate that prioritizing high-quality feature selection over variable quantity significantly enhances model performance. The proposed systematic framework demonstrates robust analytical capabilities for high-dimensional spectral data across diverse and complex molecular systems.
{"title":"Near-infrared spectroscopic prediction of gasoline olefin content: A systematic approach using continuous region feature selection and region-sensitive ensemble learning","authors":"Jiaxue Cui , Dawei Zhang , Banglian Xu , Jianzhong Fan , Xianglong Cao","doi":"10.1016/j.chemolab.2026.105661","DOIUrl":"10.1016/j.chemolab.2026.105661","url":null,"abstract":"<div><div>This study addresses the challenges of high-dimensional collinearity and regional information heterogeneity in near-infrared spectroscopy for gasoline olefin content prediction by proposing a systematic optimization approach combining a Continuous Region Utilizing Integrated Spectral Evaluation for Near-Infrared (CRUISE-NIR) algorithm with a Region-Sensitive Adaptive Ensemble Learning (RAEL) framework. The CRUISE-NIR algorithm shifts spectral analysis from a “point” to a “region” perspective, fully considering the physical correlation of adjacent wavelengths and chemical prior knowledge, reducing 4443 original variables to 16 key features. Meanwhile, the RAEL framework dynamically adjusts prediction weights according to sample performance characteristics in different spectral regions, achieving sample-specific precision prediction. Experimental results demonstrate that the proposed method achieves a root mean square error (RMSE) of 0.2795 and a coefficient of determination (R<sup>2</sup>) of 0.9646 on the test set, significantly outperforming traditional methods in prediction accuracy and fitting capability.Furthermore, the robustness of the framework was successfully validated on heterogeneous matrices including SWRI Diesel, IDRC Tablets, and Soil, demonstrating robust generalizability across diverse liquid and solid physical states. Experimental results indicate that prioritizing high-quality feature selection over variable quantity significantly enhances model performance. The proposed systematic framework demonstrates robust analytical capabilities for high-dimensional spectral data across diverse and complex molecular systems.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105661"},"PeriodicalIF":3.8,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1016/j.chemolab.2026.105654
Mohammed Faisal Noaman , Moinul Haq , Sanjog Chhetri Sapkota , Mehboob Anwer Khan , Kausar Ali , Hesam Kamyab
The present study illustrates an experimental, machine learning (ML), and explainable artificial intelligence integrated framework for the prediction of swelling pressure and consolidation characteristics of polypropylene geo-fiber (PPGF) reinforced clayey soil. A dataset of laboratory consolidation tests that included PPGF content, coefficient of consolidation (Cv), coefficient of compressibility (av), compression index (Cc), coefficient of volume change (mv), settlement (S), and swelling pressure values (ps) was compiled. The experimental observations revealed that the Cc, mv, and S was averagely decreased by about 39.5%, 45.31%, and 90%, respectively, at the optimum PPGF content of 0.3%, thus demonstrating the effectiveness of reinforcing fibers in restraining time-dependent deformation. Six machine learning models, including KNN, SVM, ANN, DT, RF, and XGB, were developed using five folds cross-validation. The XGB regressor proved to have the best predictive performances, having an R2 of 0.994 (with RMSE of 3.14) on training and generalizability in testing, with an R2 of 0.913 (having RMSE of 14.05). The remaining models demonstrated comparatively weaker performance, with ANN and DT exhibiting pronounced overfitting, while KNN and SVM failed to adequately capture the nonlinear swelling response of the gels. The XAI analysis using SHAP indicates that polypropylene geofiber content is the most influential factor governing swelling pressure, followed by mv and soil compressibility. An interactive graphical user interface was built based on the optimized XGB model to predict and visualize swelling pressure in real time from given user inputs. The proposed model integrates experimental validation with robust predictive capability and interpretability, and is complemented by a user-friendly interface and a reliable decision-support system for geotechnical design and soil improvement.
{"title":"Prediction of consolidation behavior of modified clayey soil reinforced with artificial geo-fibers using explainable artificial intelligence","authors":"Mohammed Faisal Noaman , Moinul Haq , Sanjog Chhetri Sapkota , Mehboob Anwer Khan , Kausar Ali , Hesam Kamyab","doi":"10.1016/j.chemolab.2026.105654","DOIUrl":"10.1016/j.chemolab.2026.105654","url":null,"abstract":"<div><div>The present study illustrates an experimental, machine learning (ML), and explainable artificial intelligence integrated framework for the prediction of swelling pressure and consolidation characteristics of polypropylene geo-fiber (<em>PPGF</em>) reinforced clayey soil. A dataset of laboratory consolidation tests that included PPGF content, coefficient of consolidation (<em>C</em><sub><em>v</em></sub>), coefficient of compressibility (<em>a</em><sub><em>v</em></sub>), compression index (<em>C</em><sub><em>c</em></sub>), coefficient of volume change (<em>m</em><sub><em>v</em></sub>), settlement (<em>S</em>), and swelling pressure values (<em>p</em><sub><em>s</em></sub>) was compiled. The experimental observations revealed that the <em>C</em><sub><em>c</em></sub>, <em>m</em><sub><em>v</em></sub>, and <em>S</em> was averagely decreased by about 39.5%, 45.31%, and 90%, respectively, at the optimum PPGF content of 0.3%, thus demonstrating the effectiveness of reinforcing fibers in restraining time-dependent deformation. Six machine learning models, including KNN, SVM, ANN, DT, RF, and XGB, were developed using five folds cross-validation. The XGB regressor proved to have the best predictive performances, having an R<sup>2</sup> of 0.994 (with RMSE of 3.14) on training and generalizability in testing, with an R<sup>2</sup> of 0.913 (having RMSE of 14.05). The remaining models demonstrated comparatively weaker performance, with ANN and DT exhibiting pronounced overfitting, while KNN and SVM failed to adequately capture the nonlinear swelling response of the gels. The XAI analysis using SHAP indicates that polypropylene geofiber content is the most influential factor governing swelling pressure, followed by <em>m</em><sub><em>v</em></sub> and soil compressibility. An interactive graphical user interface was built based on the optimized XGB model to predict and visualize swelling pressure in real time from given user inputs. The proposed model integrates experimental validation with robust predictive capability and interpretability, and is complemented by a user-friendly interface and a reliable decision-support system for geotechnical design and soil improvement.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105654"},"PeriodicalIF":3.8,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02DOI: 10.1016/j.chemolab.2026.105656
Jingwen Ou, Yuhong Wang
Polypropylene serves as a fundamental material used in consumer products and advanced technological applications, where accurate melt index (MI) prediction is critical for quality control in polymerization. Existing offline analysis of MI are time-consuming and costly, so the development of MI soft sensor has become a research hit. The variables in the propylene polymerization process form a complex nonlinear relationship through the polymerization reaction. Graph Convolutional networks can better capture the spatial dependence between variables, but have the disadvantages of fixed structure and insufficient propagation depth. To this end, this work proposes a Feature Expansion Multi-hop Graph Attention Network (FMGAT) framework considering the receptive field enhancement and multi-level capture of features. The novelty of this framework lies in its integrated design for MI soft sensor, combining established attention and feature expansion mechanisms in a novel configuration tailored for polymerization processes. Unconnected nodes are connected by attention diffusion, which increases the receptive field of each layer. FMGAT uses multi-subspace parallel computing to extract features, which effectively reduces the homogenization of features. Marginally Regression Conditional Tabular Generative Adversarial Network (MRCTGAN) is introduced to generate samples in data processing. The statistical and regression evaluation metrics are developed to comprehensively study the performance of MRCTGAN and FMGAT on an industrial dataset. Results show that MRCTGAN has the optimal histogram intersection dissimilarity in sample generation methods. Models trained on MRCTGAN-augmented data achieves average 8.2% lower Root Mean Square Error (RMSE) than original data. FMGAT significantly outperforms baselines, reducing RMSE to 0.4643g/10min. FMGAT establishes an interpretable, robust paradigm for complex industrial process modeling.
{"title":"A graph-based soft sensor using feature expansion and multi-hop attention for melt index prediction","authors":"Jingwen Ou, Yuhong Wang","doi":"10.1016/j.chemolab.2026.105656","DOIUrl":"10.1016/j.chemolab.2026.105656","url":null,"abstract":"<div><div>Polypropylene serves as a fundamental material used in consumer products and advanced technological applications, where accurate melt index (MI) prediction is critical for quality control in polymerization. Existing offline analysis of MI are time-consuming and costly, so the development of MI soft sensor has become a research hit. The variables in the propylene polymerization process form a complex nonlinear relationship through the polymerization reaction. Graph Convolutional networks can better capture the spatial dependence between variables, but have the disadvantages of fixed structure and insufficient propagation depth. To this end, this work proposes a Feature Expansion Multi-hop Graph Attention Network (FMGAT) framework considering the receptive field enhancement and multi-level capture of features. The novelty of this framework lies in its integrated design for MI soft sensor, combining established attention and feature expansion mechanisms in a novel configuration tailored for polymerization processes. Unconnected nodes are connected by attention diffusion, which increases the receptive field of each layer. FMGAT uses multi-subspace parallel computing to extract features, which effectively reduces the homogenization of features. Marginally Regression Conditional Tabular Generative Adversarial Network (MRCTGAN) is introduced to generate samples in data processing. The statistical and regression evaluation metrics are developed to comprehensively study the performance of MRCTGAN and FMGAT on an industrial dataset. Results show that MRCTGAN has the optimal histogram intersection dissimilarity in sample generation methods. Models trained on MRCTGAN-augmented data achieves average 8.2% lower Root Mean Square Error (RMSE) than original data. FMGAT significantly outperforms baselines, reducing RMSE to 0.4643g/10min. FMGAT establishes an interpretable, robust paradigm for complex industrial process modeling.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105656"},"PeriodicalIF":3.8,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1016/j.chemolab.2026.105653
Zhanchang Zhang , Qiao Ning , Xulun Shi , Shikai Guo , Hui Li
Protein S-sulfhydration is an important post-translational modification that regulates signaling pathways in animal cells by influencing protein activity and function. It also plays a crucial role in regulating plant metabolism and morphogenesis. Therefore, the identification of S-sulfhydration sites is crucial for cellular biology research. In this study, we propose a deep learning framework with directional multi-LSTM (Long Short-Term Memory) for predicting protein S-sulfhydration sites. In this study, we propose a deep learning framework utilizing a directional multi-LSTM (Long Short-Term Memory) network to predict protein S-sulfhydration sites. Initially, protein sequence data is preprocessed via an improved BERT strategy to extract high-dimensional sequence features. Hypothesizing that S-sulfhydration modification exhibits directionality, we partition sequences around cysteine residues and extract features using directional multi-LSTM, simulating the enzymatic reaction conditions. Subsequently, a convolutional neural network (CNN) is employed to capture deep local information features. On an independent test set, the accuracy, sensitivity, specificity, Matthews correlation coefficient, area under the curve, and precision are 76.76%, 85.45%, 67.21%, 53.77%, 76.33% and 74.11% respectively. The results demonstrate that the multi-directional LSTM deep learning framework is an effective tool for predicting protein S-sulfhydration. The source code is available on the website https://github.com/endeavor-zzc/Multi-LSTM.
{"title":"A directional multi-LSTM framework integrated BERT for S-sulfhydration sites prediction","authors":"Zhanchang Zhang , Qiao Ning , Xulun Shi , Shikai Guo , Hui Li","doi":"10.1016/j.chemolab.2026.105653","DOIUrl":"10.1016/j.chemolab.2026.105653","url":null,"abstract":"<div><div>Protein S-sulfhydration is an important post-translational modification that regulates signaling pathways in animal cells by influencing protein activity and function. It also plays a crucial role in regulating plant metabolism and morphogenesis. Therefore, the identification of S-sulfhydration sites is crucial for cellular biology research. In this study, we propose a deep learning framework with directional multi-LSTM (Long Short-Term Memory) for predicting protein S-sulfhydration sites. In this study, we propose a deep learning framework utilizing a directional multi-LSTM (Long Short-Term Memory) network to predict protein S-sulfhydration sites. Initially, protein sequence data is preprocessed via an improved BERT strategy to extract high-dimensional sequence features. Hypothesizing that S-sulfhydration modification exhibits directionality, we partition sequences around cysteine residues and extract features using directional multi-LSTM, simulating the enzymatic reaction conditions. Subsequently, a convolutional neural network (CNN) is employed to capture deep local information features. On an independent test set, the accuracy, sensitivity, specificity, Matthews correlation coefficient, area under the curve, and precision are 76.76%, 85.45%, 67.21%, 53.77%, 76.33% and 74.11% respectively. The results demonstrate that the multi-directional LSTM deep learning framework is an effective tool for predicting protein S-sulfhydration. The source code is available on the website <span><span>https://github.com/endeavor-zzc/Multi-LSTM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105653"},"PeriodicalIF":3.8,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.1016/j.chemolab.2026.105647
Sanatan Das , Poly Karmakar
This research paper explores the innovative application of artificial intelligence (AI) in understanding the behaviors of silver and magnesium oxide nanoparticles within milk flow. This study utilizes a specially designed vibrating electromagnetic channel to observe the effects under controlled parabolic thermal ramping and oscillatory pressure variations. This framework couples essential physical mechanisms-radiative emission, thermal sinks, and porous matrix interactions-where Darcy's law quantifies the permeability-driven viscous drag. The mechanics of milk flow through an electromagnetically activated channel are meticulously formulated and solved using mathematical and computational methods, with the Laplace transform (LT) technique facilitating a streamlined solution to the equations. The analysis concentrates on flow metrics, presenting results through detailed graphical representations. Significant findings comprise the enhancement of thermal conductivity and flow viscosity due to the nanoparticles, which improve heat transport efficiency and modify flow patterns. The operational control of milk flow dynamics shows dual dependencies-momentum amplification via electromagnetic intensity (Hartmann number) versus suppression through electrode spacing, while thermal management reveals frequency-dependent shear stress (SS) augmentation and rate of heat transfer (RHT) enhancement through optimized heat uptake parameter. An artificial neural network (ANN) is calibrated to emulate the LT solver's outputs for wall SS and RHT. The ANN achieves high fidelity in predicting these metrics across the parameter space explored in the LT simulations, but its generalization to experimental or real dairy systems remains unvalidated and is a focus of future work. The key findings demonstrate the potential of integrating advanced materials and AI technologies to improve product characteristics and processing efficiency.
{"title":"Time-resolved simulation of hybrid nano-milk flow in an electromagnetic vibration channel with parabolic thermal ramping: A Python AI approach","authors":"Sanatan Das , Poly Karmakar","doi":"10.1016/j.chemolab.2026.105647","DOIUrl":"10.1016/j.chemolab.2026.105647","url":null,"abstract":"<div><div>This research paper explores the innovative application of artificial intelligence (AI) in understanding the behaviors of silver and magnesium oxide nanoparticles within milk flow. This study utilizes a specially designed vibrating electromagnetic channel to observe the effects under controlled parabolic thermal ramping and oscillatory pressure variations. This framework couples essential physical mechanisms-radiative emission, thermal sinks, and porous matrix interactions-where Darcy's law quantifies the permeability-driven viscous drag. The mechanics of milk flow through an electromagnetically activated channel are meticulously formulated and solved using mathematical and computational methods, with the Laplace transform (LT) technique facilitating a streamlined solution to the equations. The analysis concentrates on flow metrics, presenting results through detailed graphical representations. Significant findings comprise the enhancement of thermal conductivity and flow viscosity due to the nanoparticles, which improve heat transport efficiency and modify flow patterns. The operational control of milk flow dynamics shows dual dependencies-momentum amplification via electromagnetic intensity (Hartmann number) versus suppression through electrode spacing, while thermal management reveals frequency-dependent shear stress (SS) augmentation and rate of heat transfer (RHT) enhancement through optimized heat uptake parameter. An artificial neural network (ANN) is calibrated to emulate the LT solver's outputs for wall SS and RHT. The ANN achieves high fidelity <span><math><mrow><mo>(</mo><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>></mo><mn>0.99</mn></mrow><mo>)</mo></mrow></math></span> in predicting these metrics across the parameter space explored in the LT simulations, but its generalization to experimental or real dairy systems remains unvalidated and is a focus of future work. The key findings demonstrate the potential of integrating advanced materials and AI technologies to improve product characteristics and processing efficiency.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105647"},"PeriodicalIF":3.8,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.chemolab.2026.105649
Mingxi Ai , Jin Zhang , Zhaohui Tang , Yongfang Xie
Froth flotation is a widely used mineral beneficiation technique, where effective process monitoring is essential for optimizing mineral separation. However, in practical industry, manual labeling suffers from noises, leading to a significant portion of incorrectly labeled data. Though deep learning monitoring models are powerful in capturing complex visual patterns, their high capacity makes them vulnerable to overfitting noisy labels, hindering robust model development. To address this challenge, this study proposes a noise-robust contrastive ensemble learning method for practical industrial process monitoring. The method first constructs multiple diverse monitoring models in distinct representation spaces using a novel disparity contrastive learning strategy. Then, clean and mislabeled data for each sub-model are distinguished by measuring the inter-model consensus and intra-model uncertainty of its peer models. Finally, a structure-consistency-based semi-supervised learning strategy is proposed to refine these sub-models by treating mislabeled data as unlabeled, encouraging representation-aligned predictions through mutual information maximization. Through iterative noisy-label identification and semi-supervised refinement, robust monitoring model are obtained even with heavily corrupted training data. Extensive experiments on industrial froth flotation data demonstrate the effectiveness and advantages of the proposed method compared to existing state-of-the-art noise-robust learning techniques.
{"title":"Noise-robust contrastive ensemble learning for flotation process monitoring","authors":"Mingxi Ai , Jin Zhang , Zhaohui Tang , Yongfang Xie","doi":"10.1016/j.chemolab.2026.105649","DOIUrl":"10.1016/j.chemolab.2026.105649","url":null,"abstract":"<div><div>Froth flotation is a widely used mineral beneficiation technique, where effective process monitoring is essential for optimizing mineral separation. However, in practical industry, manual labeling suffers from noises, leading to a significant portion of incorrectly labeled data. Though deep learning monitoring models are powerful in capturing complex visual patterns, their high capacity makes them vulnerable to overfitting noisy labels, hindering robust model development. To address this challenge, this study proposes a noise-robust contrastive ensemble learning method for practical industrial process monitoring. The method first constructs multiple diverse monitoring models in distinct representation spaces using a novel disparity contrastive learning strategy. Then, clean and mislabeled data for each sub-model are distinguished by measuring the inter-model consensus and intra-model uncertainty of its peer models. Finally, a structure-consistency-based semi-supervised learning strategy is proposed to refine these sub-models by treating mislabeled data as unlabeled, encouraging representation-aligned predictions through mutual information maximization. Through iterative noisy-label identification and semi-supervised refinement, robust monitoring model are obtained even with heavily corrupted training data. Extensive experiments on industrial froth flotation data demonstrate the effectiveness and advantages of the proposed method compared to existing state-of-the-art noise-robust learning techniques.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105649"},"PeriodicalIF":3.8,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146074867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Geology routinely employs isotopic geochemistry with the main objective of measuring radiogenic or stable isotopic compositions to reconstruct the history of the Earth. A critical aspect of this analytical process lies in verifying the accuracy and reliability of the measurements performed. To this end, standards or reference materials are repeatedly analyzed enabling calibration or adjustment of experimental instruments. In order to ensure a strong correlation between the reference values and the averaged measurements, a linear regression is the most widely adopted. Among the available methodologies, this work advocates for the use of models compliant with the ISO 28037:2010 standard, which is specifically designed to perform linear regression in a statistically robust manner. The guidelines established by this standard are, regrettably, not always implemented correctly, and the statistical nature of the measurements is frequently overlooked. This study provides a detailed examination of the methodologies advocated by the standard, with the objective of facilitating their application to geochemical problems specifically, issues related to isotopic measurement by revisiting the underlying theoretical principles, assumptions, and the respective advantages and limitations inherent to each approach. To facilitate implementation and respect recommendations, we propose a software application developed in Python 3.14. This computational tool has been tested and validated using experimental datasets obtained from isotopic analyses of carbon, oxygen, and sulfur elements of fundamental interest in geological studies. The objective of this study is therefore to clearly and practically illustrate the challenges involved in geochemical calibration and adjustment.
{"title":"Evaluating calibration models in isotope geochemistry: Lessons from carbonates and sulfides","authors":"Alban Petitjean , Olivier Musset , Ludovic Duponchel , Christophe Thomazo","doi":"10.1016/j.chemolab.2026.105640","DOIUrl":"10.1016/j.chemolab.2026.105640","url":null,"abstract":"<div><div>Geology routinely employs isotopic geochemistry with the main objective of measuring radiogenic or stable isotopic compositions to reconstruct the history of the Earth. A critical aspect of this analytical process lies in verifying the accuracy and reliability of the measurements performed. To this end, standards or reference materials are repeatedly analyzed enabling calibration or adjustment of experimental instruments. In order to ensure a strong correlation between the reference values and the averaged measurements, a linear regression is the most widely adopted. Among the available methodologies, this work advocates for the use of models compliant with the ISO 28037:2010 standard, which is specifically designed to perform linear regression in a statistically robust manner. The guidelines established by this standard are, regrettably, not always implemented correctly, and the statistical nature of the measurements is frequently overlooked. This study provides a detailed examination of the methodologies advocated by the standard, with the objective of facilitating their application to geochemical problems specifically, issues related to isotopic measurement by revisiting the underlying theoretical principles, assumptions, and the respective advantages and limitations inherent to each approach. To facilitate implementation and respect recommendations, we propose a software application developed in Python 3.14. This computational tool has been tested and validated using experimental datasets obtained from isotopic analyses of carbon, oxygen, and sulfur elements of fundamental interest in geological studies. The objective of this study is therefore to clearly and practically illustrate the challenges involved in geochemical calibration and adjustment.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105640"},"PeriodicalIF":3.8,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.chemolab.2026.105648
Naif Almusallam , Maqsood Hayat
The biological functions of bacteria are significantly impacted by bacteriophage virion proteins (BVPs), which are bacterial viruses. BVPs play a major role in phage therapy and genetic engineering. Secure and accurate identification of these proteins is essential for understanding phage-host interactions and for bioinformatics and medical applications. However, ensuring privacy and robustness in computational models is challenging, especially when handling complex biological data. Previous works relied on wet-lab experiments, had limited scalability, incomplete feature coverage, and low generalization ability. In this study, we introduce a privacy-preserving and adversarial-robust deep learning framework. It integrates natural language processing (NLP) descriptors with transformer-guided ideal proximity matrix reconstruction to capture rich information from protein sequences. For post-hoc interpretability, we use SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME). These techniques increase openness and confidence in predictions. SHAP analyzes the dataset to identify the most significant proximity-based and NLP-derived descriptors at global and class levels. LIME provides instance-specific explanations, emphasizing local decision boundaries for particular predictions. The proposed model achieved 95.75 % and 90.27 % accuracy on the training and independent datasets, respectively. We calculated statistical measures, such as Chi-Square and P-value, for each dataset to demonstrate reliability. Our model improves predictive outcomes, transparency, and security. The empirical results validate its outstanding performance compared to existing models, while preserving security and explainable AI. This makes it suitable and reliable for real-world applications in proteomics and bioinformatics.
{"title":"Explainable AI for secure and accurate prediction of bacteriophage virion proteins using NLP descriptors and transformer-guided ideal proximity matrix reconstruction","authors":"Naif Almusallam , Maqsood Hayat","doi":"10.1016/j.chemolab.2026.105648","DOIUrl":"10.1016/j.chemolab.2026.105648","url":null,"abstract":"<div><div>The biological functions of bacteria are significantly impacted by bacteriophage virion proteins (BVPs), which are bacterial viruses. BVPs play a major role in phage therapy and genetic engineering. Secure and accurate identification of these proteins is essential for understanding phage-host interactions and for bioinformatics and medical applications. However, ensuring privacy and robustness in computational models is challenging, especially when handling complex biological data. Previous works relied on wet-lab experiments, had limited scalability, incomplete feature coverage, and low generalization ability. In this study, we introduce a privacy-preserving and adversarial-robust deep learning framework. It integrates natural language processing (NLP) descriptors with transformer-guided ideal proximity matrix reconstruction to capture rich information from protein sequences. For post-hoc interpretability, we use SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME). These techniques increase openness and confidence in predictions. SHAP analyzes the dataset to identify the most significant proximity-based and NLP-derived descriptors at global and class levels. LIME provides instance-specific explanations, emphasizing local decision boundaries for particular predictions. The proposed model achieved 95.75 % and 90.27 % accuracy on the training and independent datasets, respectively. We calculated statistical measures, such as Chi-Square and P-value, for each dataset to demonstrate reliability. Our model improves predictive outcomes, transparency, and security. The empirical results validate its outstanding performance compared to existing models, while preserving security and explainable AI. This makes it suitable and reliable for real-world applications in proteomics and bioinformatics.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"270 ","pages":"Article 105648"},"PeriodicalIF":3.8,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146074875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}