Pub Date : 2025-12-06DOI: 10.1016/j.iswa.2025.200615
Ilya Kus , Cemal Kocak , Ayse Keles
Facial expression is one of the most important indicators used to convey human emotions. Facial expression recognition is the process of automatically detecting and classifying these expressions by computer systems. Multimodal facial expression recognition aims to perform a more accurate and comprehensive emotion analysis by combining facial expressions with different modalities such as image, speech, Electroencephalogram (EEG), or text. This study systematically reviews research conducted between 2021 and 2025 on the Vision Transformer (ViT) based approaches and Explainable Artificial Intelligence (XAI) techniques in multimodal facial expression recognition, as well as the datasets employed in these studies. The findings indicate that ViT-based models outperform conventional Convolutional Neural Networks (CNNs) by effectively capturing long-range dependencies between spatially distant facial regions, thereby enhancing emotion classification accuracy. However, significant challenges remain, including data privacy risks arising from the collection of multimodal biometric information, data imbalance and inter-modality incompatibility, high computational costs hindering real-time applications, and limited progress in model explainability. Overall, this study highlights that integrating advanced ViT architectures with robust XAI and privacy-preserving techniques can enhance the reliability, transparency, and ethical deployment of multimodal facial expression recognition systems.
{"title":"A systematic review of vision transformer and explainable AI advances in multimodal facial expression recognition","authors":"Ilya Kus , Cemal Kocak , Ayse Keles","doi":"10.1016/j.iswa.2025.200615","DOIUrl":"10.1016/j.iswa.2025.200615","url":null,"abstract":"<div><div>Facial expression is one of the most important indicators used to convey human emotions. Facial expression recognition is the process of automatically detecting and classifying these expressions by computer systems. Multimodal facial expression recognition aims to perform a more accurate and comprehensive emotion analysis by combining facial expressions with different modalities such as image, speech, Electroencephalogram (EEG), or text. This study systematically reviews research conducted between 2021 and 2025 on the Vision Transformer (ViT) based approaches and Explainable Artificial Intelligence (XAI) techniques in multimodal facial expression recognition, as well as the datasets employed in these studies. The findings indicate that ViT-based models outperform conventional Convolutional Neural Networks (CNNs) by effectively capturing long-range dependencies between spatially distant facial regions, thereby enhancing emotion classification accuracy. However, significant challenges remain, including data privacy risks arising from the collection of multimodal biometric information, data imbalance and inter-modality incompatibility, high computational costs hindering real-time applications, and limited progress in model explainability. Overall, this study highlights that integrating advanced ViT architectures with robust XAI and privacy-preserving techniques can enhance the reliability, transparency, and ethical deployment of multimodal facial expression recognition systems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200615"},"PeriodicalIF":4.3,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In major training and export markets, the coffee bean grading process still relies heavily on manual labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming, costly, and prone to human error, especially within Thailand’s rapidly expanding Robusta coffee sector. This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that operationalizes active learning and transformer-based feature extraction within a single, production-oriented pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL) query strategies, random sampling, entropy-based selection, Bayesian Active Learning by Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed to balance informativeness and representativeness during sample acquisition. A high-resolution dataset of 2098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy sampling showed the fastest early-stage gains, whereas BADGE lagged by >1 pp; Core-Set and Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline already achieving well-calibrated probabilities. Statistical validation via paired t-tests, effect sizes, and bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction of the labeling cost.
{"title":"AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers","authors":"Sirawich Vachmanus , Wimolsiri Pridasawas , Worapan Kusakunniran , Kitti Thamrongaphichartkul , Noppanan Phinklao","doi":"10.1016/j.iswa.2025.200612","DOIUrl":"10.1016/j.iswa.2025.200612","url":null,"abstract":"<div><div>In major training and export markets, the coffee bean grading process still relies heavily on manual labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming, costly, and prone to human error, especially within Thailand’s rapidly expanding Robusta coffee sector. This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that operationalizes active learning and transformer-based feature extraction within a single, production-oriented pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL) query strategies, random sampling, entropy-based selection, Bayesian Active Learning by Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed to balance informativeness and representativeness during sample acquisition. A high-resolution dataset of 2098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy sampling showed the fastest early-stage gains, whereas BADGE lagged by >1 pp; Core-Set and Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline already achieving well-calibrated probabilities. Statistical validation via paired <em>t</em>-tests, effect sizes, and bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction of the labeling cost.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200612"},"PeriodicalIF":4.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1016/j.iswa.2025.200609
Floor Rademaker , Faizan Ahmed , Marcos R. Machado
The digitalization of asset management within the architecture, engineering and construction (AEC) sector is in need of effective methods for the automatic classification of documents. This study focuses on the development and evaluation of multimodal document classification models, utilizing visual, textual, and layout-related document information. We examine various state-of-the-art machine learning models and combine them through an iterative development process. The performance of these models is evaluated on two different AEC-document datasets. The results demonstrate that each of the modalities is useful in classifying the documents, as well as the integration of the different information types. This study contributes by applying AI techniques, specifically document classification in the AEC sector, setting the initial step to automating information extraction and processing for Intelligent Asset Management, and lastly, by combining and comparing multimodal state-of-the-art classification models on real-life datasets.
{"title":"Multi-modal document classification in AEC asset management","authors":"Floor Rademaker , Faizan Ahmed , Marcos R. Machado","doi":"10.1016/j.iswa.2025.200609","DOIUrl":"10.1016/j.iswa.2025.200609","url":null,"abstract":"<div><div>The digitalization of asset management within the architecture, engineering and construction (AEC) sector is in need of effective methods for the automatic classification of documents. This study focuses on the development and evaluation of multimodal document classification models, utilizing visual, textual, and layout-related document information. We examine various state-of-the-art machine learning models and combine them through an iterative development process. The performance of these models is evaluated on two different AEC-document datasets. The results demonstrate that each of the modalities is useful in classifying the documents, as well as the integration of the different information types. This study contributes by applying AI techniques, specifically document classification in the AEC sector, setting the initial step to automating information extraction and processing for Intelligent Asset Management, and lastly, by combining and comparing multimodal state-of-the-art classification models on real-life datasets.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200609"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145555492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1016/j.iswa.2025.200608
Miguel De la Cruz Cabello , Tiago Prince Sales , Marcos R. Machado
Modern IT systems generate large volumes of log data that challenge timely and effective anomaly detection. Traditional methods often require intensive feature engineering and struggle to adapt to dynamic operational environments. This Systematic Literature Review (SLR) analyzes how Artificial Intelligence for IT Operations (AIOps) benefits from advanced language models, emphasizing Large Language Models (LLMs) for more effective log anomaly detection. By comparing state-of-art frameworks with LLM-driven methods, this study reveals that prompt engineering – the practice of designing and refining inputs to AI models to produce accurate and useful outputs – and Retrieval Augmented Generation (RAG) boost accuracy and interpretability without extensive fine-tuning. Experimental findings demonstrate that LLM-based approaches significantly outperform traditional methods across evaluation metrics that include F1-score, precision, and recall. Furthermore, the integration of LLMs with RAG techniques has shown a strong adaptability to changing environments. The applicability of these methods also extends to the military industry. Consequently, the development of specialized LLM systems with RAG tailored for the military industry represents a promising research direction to improve operational effectiveness and responsiveness of defense systems.
{"title":"AIOps for log anomaly detection in the era of LLMs: A systematic literature review","authors":"Miguel De la Cruz Cabello , Tiago Prince Sales , Marcos R. Machado","doi":"10.1016/j.iswa.2025.200608","DOIUrl":"10.1016/j.iswa.2025.200608","url":null,"abstract":"<div><div>Modern IT systems generate large volumes of log data that challenge timely and effective anomaly detection. Traditional methods often require intensive feature engineering and struggle to adapt to dynamic operational environments. This Systematic Literature Review (SLR) analyzes how Artificial Intelligence for IT Operations (AIOps) benefits from advanced language models, emphasizing Large Language Models (LLMs) for more effective log anomaly detection. By comparing state-of-art frameworks with LLM-driven methods, this study reveals that prompt engineering – the practice of designing and refining inputs to AI models to produce accurate and useful outputs – and Retrieval Augmented Generation (RAG) boost accuracy and interpretability without extensive fine-tuning. Experimental findings demonstrate that LLM-based approaches significantly outperform traditional methods across evaluation metrics that include F1-score, precision, and recall. Furthermore, the integration of LLMs with RAG techniques has shown a strong adaptability to changing environments. The applicability of these methods also extends to the military industry. Consequently, the development of specialized LLM systems with RAG tailored for the military industry represents a promising research direction to improve operational effectiveness and responsiveness of defense systems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200608"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1016/j.iswa.2025.200611
Alfredo Baione , Giuseppe Rizzo , Luca Barco , Angelica Urbanelli , Luigi Di Biasi , Genoveffa Tortora
Generative art is a challenging area of research in deep generative modeling. Exploring AI’s role in human–machine co-creative processes requires understanding machine learning’s potential in the arts. Building on this premise, this paper presents Musipainter, a cross-modal generative framework adapted to create artistic images that are historically and stylistically aligned with 30-second musical inputs, with a focus on creative and semantic coherence. To support this goal, we introduce Museart, a dataset designed explicitly for this research, and GIILS, a creativity-oriented metric that enables us to assess both artistic-semantic consistency and diversity in the generated outputs. The results indicate that Musipainter, supported by the Museart dataset and the exploratory GIILS metric, can offer a foundation for further research on AI’s role in artistic generation, while also highlighting the need for systematic validation and future refinements.
{"title":"Musipainter: A music-conditioned generative architecture for artistic image synthesis","authors":"Alfredo Baione , Giuseppe Rizzo , Luca Barco , Angelica Urbanelli , Luigi Di Biasi , Genoveffa Tortora","doi":"10.1016/j.iswa.2025.200611","DOIUrl":"10.1016/j.iswa.2025.200611","url":null,"abstract":"<div><div>Generative art is a challenging area of research in deep generative modeling. Exploring AI’s role in human–machine co-creative processes requires understanding machine learning’s potential in the arts. Building on this premise, this paper presents Musipainter, a cross-modal generative framework adapted to create artistic images that are historically and stylistically aligned with 30-second musical inputs, with a focus on creative and semantic coherence. To support this goal, we introduce Museart, a dataset designed explicitly for this research, and GIILS, a creativity-oriented metric that enables us to assess both artistic-semantic consistency and diversity in the generated outputs. The results indicate that Musipainter, supported by the Museart dataset and the exploratory GIILS metric, can offer a foundation for further research on AI’s role in artistic generation, while also highlighting the need for systematic validation and future refinements.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200611"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1016/j.iswa.2025.200603
Xifeng Ning , Chao Yang , Hailu Sun , Xinyuan Song , Zifan Hu , Yu Feng , Jiawei Li , Yifan Zhu
In modern monitoring and operational management, whether in industrial systems, financial risk control, or infrastructure maintenance, decision-making increasingly relies on integrating heterogeneous data from multiple sources. However, due to data privacy regulations, distributed storage, communication constraints, and sensor failures, it is often difficult to centralize modeling when dealing with high-dimensional, incomplete datasets held by different institutions. Federated learning offers a privacy-preserving joint modeling solution, yet still faces challenges such as high communication overhead, low robustness to participant dropout, and risks of gradient leakage. In certain incomplete-data scenarios, not all data is private—labels such as equipment inspection results, fault reports, or corporate blacklists and whitelists published by authoritative bodies may be public—while feature data remains private and partially missing. To address this, we propose an innovative collaborative modeling framework tailored for incomplete-data monitoring and operations, in which each participant independently trains a model on its private features and exchanges only prediction results rather than gradients. Inspired by collective expert scoring, each “expert” evaluates based on its own data, then shares scores that are integrated into a comprehensive assessment. This approach offers multiple advantages: independent model training for each party, improved efficiency by migrating only prediction results, enhanced security by avoiding gradient transmission, and higher robustness since the failure of one participant does not halt others’ training. We present three variants of this prediction-result fusion method and evaluate them on representative datasets, including enterprise credit risk assessment as a case study, comparing against vertical federated logistic regression. Experimental results validate the effectiveness of the proposed approach, which can be widely applied to diverse monitoring and operational scenarios under incomplete data conditions.
{"title":"A corporate credit evaluation method considering strong feature privacy with non-private label: A vertical heterogeneous feature fusion approach","authors":"Xifeng Ning , Chao Yang , Hailu Sun , Xinyuan Song , Zifan Hu , Yu Feng , Jiawei Li , Yifan Zhu","doi":"10.1016/j.iswa.2025.200603","DOIUrl":"10.1016/j.iswa.2025.200603","url":null,"abstract":"<div><div>In modern monitoring and operational management, whether in industrial systems, financial risk control, or infrastructure maintenance, decision-making increasingly relies on integrating heterogeneous data from multiple sources. However, due to data privacy regulations, distributed storage, communication constraints, and sensor failures, it is often difficult to centralize modeling when dealing with high-dimensional, incomplete datasets held by different institutions. Federated learning offers a privacy-preserving joint modeling solution, yet still faces challenges such as high communication overhead, low robustness to participant dropout, and risks of gradient leakage. In certain incomplete-data scenarios, not all data is private—labels such as equipment inspection results, fault reports, or corporate blacklists and whitelists published by authoritative bodies may be public—while feature data remains private and partially missing. To address this, we propose an innovative collaborative modeling framework tailored for incomplete-data monitoring and operations, in which each participant independently trains a model on its private features and exchanges only prediction results rather than gradients. Inspired by collective expert scoring, each “expert” evaluates based on its own data, then shares scores that are integrated into a comprehensive assessment. This approach offers multiple advantages: independent model training for each party, improved efficiency by migrating only prediction results, enhanced security by avoiding gradient transmission, and higher robustness since the failure of one participant does not halt others’ training. We present three variants of this prediction-result fusion method and evaluate them on representative datasets, including enterprise credit risk assessment as a case study, comparing against vertical federated logistic regression. Experimental results validate the effectiveness of the proposed approach, which can be widely applied to diverse monitoring and operational scenarios under incomplete data conditions.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200603"},"PeriodicalIF":4.3,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annotating vibration data from heavy-duty pumps in the mining industry is highly challenging because it demands domain knowledge, a complex inspection setup, and, in many cases, remains infeasible. A self-supervised data annotation (SSDA) framework is therefore proposed and evaluated on historical data of slurry-pump vibration signals. The framework began with the collection of heterogeneous information, followed by information fusion using an autoencoder. This was then followed by a datafication step for preprocessing and achieving a better representation of features through a feature embedding technique. As a result, redundant information was pushed into an eight-dimensional latent space, achieving a reconstruction loss of 0.0023. Furthermore, Initial data annotation was obtained by combining the Isolation Forest and Kneedle algorithms to locate a data-driven knee or threshold, and it was found to be 0.58 for predicting labels. Partial samples were labeled and considered accurate. Lastly, an attention-based fuzzy neural network (AFNN) is trained on those labels where membership functions convert each latent feature into graded truth values. At the same time, an attention layer highlights the most relevant rules. An iterative self-training loop was implemented to refine the training set and obtain labeled data with higher model confidence. Here, we also tested six baseline models and found AFNN quite impressive. After seven iterations 2780 of 2872 samples were labeled and the remaining 92 are considered uncertain, still need some review from an expert, and the AFNN model confidence was (96.8%). Statistical analysis confirmed that the model predictions were significantly associated with true labels () and not driven by chance.
{"title":"Attention-based fuzzy neural networks for self-supervised data annotation","authors":"Md Rakibul Islam, Shahina Begum, Mobyen Uddin Ahmed, Shaibal Barua","doi":"10.1016/j.iswa.2025.200610","DOIUrl":"10.1016/j.iswa.2025.200610","url":null,"abstract":"<div><div>Annotating vibration data from heavy-duty pumps in the mining industry is highly challenging because it demands domain knowledge, a complex inspection setup, and, in many cases, remains infeasible. A self-supervised data annotation (SSDA) framework is therefore proposed and evaluated on historical data of slurry-pump vibration signals. The framework began with the collection of heterogeneous information, followed by information fusion using an autoencoder. This was then followed by a datafication step for preprocessing and achieving a better representation of features through a feature embedding technique. As a result, redundant information was pushed into an eight-dimensional latent space, achieving a reconstruction loss of 0.0023. Furthermore, Initial data annotation was obtained by combining the Isolation Forest and Kneedle algorithms to locate a data-driven knee or threshold, and it was found to be 0.58 for predicting labels. Partial samples were labeled and considered accurate. Lastly, an attention-based fuzzy neural network (AFNN) is trained on those labels where membership functions convert each latent feature into graded truth values. At the same time, an attention layer highlights the most relevant rules. An iterative self-training loop was implemented to refine the training set and obtain labeled data with higher model confidence. Here, we also tested six baseline models and found AFNN quite impressive. After seven iterations 2780 of 2872 samples were labeled and the remaining 92 are considered uncertain, still need some review from an expert, and the AFNN model confidence was (96.8%). Statistical analysis confirmed that the model predictions were significantly associated with true labels (<span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>05</mn></mrow></math></span>) and not driven by chance.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200610"},"PeriodicalIF":4.3,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1016/j.iswa.2025.200602
Kadri Kukk , Ants Torim , Erki Eessaar , Tarmo Kadak
The printing industry benefits from digitalizing workflows such as customer quoting. Intelligent printing process planning is essential to determine the near-optimal price for automated quoting. This paper addresses the automation of sheet imposition, a critical and computationally intensive step in optimizing the printing process that belongs to the general class of cutting and packing problems. We propose a simple recursive sheet imposition representation as the basis for our algorithms. The Brute Force algorithm for optimizing sheet imposition guarantees the cheapest solution but is computationally infeasible for complex tasks. As alternatives, we investigate heuristic algorithms, specifically Monte Carlo Tree Search (MCTS) and Simulated Annealing (SA). Our findings show that while Brute Force is prohibitively slow, MCTS strikes a robust balance between computational performance and solution quality, consistently finding solutions within a 5% margin of optimal price. Although SA can occasionally find superior solutions, MCTS provides a more reliable and efficient approach by consistently delivering results close to the optimal price.
{"title":"Optimizing printing processes with MCTS","authors":"Kadri Kukk , Ants Torim , Erki Eessaar , Tarmo Kadak","doi":"10.1016/j.iswa.2025.200602","DOIUrl":"10.1016/j.iswa.2025.200602","url":null,"abstract":"<div><div>The printing industry benefits from digitalizing workflows such as customer quoting. Intelligent printing process planning is essential to determine the near-optimal price for automated quoting. This paper addresses the automation of sheet imposition, a critical and computationally intensive step in optimizing the printing process that belongs to the general class of cutting and packing problems. We propose a simple recursive sheet imposition representation as the basis for our algorithms. The Brute Force algorithm for optimizing sheet imposition guarantees the cheapest solution but is computationally infeasible for complex tasks. As alternatives, we investigate heuristic algorithms, specifically Monte Carlo Tree Search (MCTS) and Simulated Annealing (SA). Our findings show that while Brute Force is prohibitively slow, MCTS strikes a robust balance between computational performance and solution quality, consistently finding solutions within a 5% margin of optimal price. Although SA can occasionally find superior solutions, MCTS provides a more reliable and efficient approach by consistently delivering results close to the optimal price.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200602"},"PeriodicalIF":4.3,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1016/j.iswa.2025.200607
Jake Street, Isibor Kennedy Ihianle, Funminiyi Olajide, Ahmad Lotfi
Online Grooming (OG) is a prevalent threat facing predominately children online, with groomers using deceptive methods to prey on the vulnerability of children on social media/messaging platforms. These attacks can have severe psychological and physical impacts, including a tendency towards revictimization. Current technical measures are inadequate, especially with the advent of end-to-end encryption which hampers message monitoring. Existing solutions focus on the signature analysis of child abuse media, which does not effectively address real-time OG detection. This paper proposes that OG attacks are complex, requiring the identification of specific communication patterns between adults and children alongside identifying other insights (e.g. Sexual language) to make an accurate determination. It introduces a novel approach leveraging advanced models such as BERT and RoBERTa for Message-Level Analysis and a Context Determination approach for classifying actor interactions, between adults attempting to groom children and honeypot children actors. This approach included the introduction of Actor Significance Thresholds and Message Significance Thresholds to make these determinations. The proposed method aims to enhance accuracy and robustness in detecting OG by considering the dynamic and multi-faceted nature of these attacks. Cross-dataset experiments evaluate the robustness and versatility of our approach. This paper’s contributions include improved detection methodologies and the potential for application in various scenarios, addressing gaps in current literature and practices.
{"title":"Enhanced Online Grooming detection employing Context Determination and Message-Level Analysis","authors":"Jake Street, Isibor Kennedy Ihianle, Funminiyi Olajide, Ahmad Lotfi","doi":"10.1016/j.iswa.2025.200607","DOIUrl":"10.1016/j.iswa.2025.200607","url":null,"abstract":"<div><div>Online Grooming (OG) is a prevalent threat facing predominately children online, with groomers using deceptive methods to prey on the vulnerability of children on social media/messaging platforms. These attacks can have severe psychological and physical impacts, including a tendency towards revictimization. Current technical measures are inadequate, especially with the advent of end-to-end encryption which hampers message monitoring. Existing solutions focus on the signature analysis of child abuse media, which does not effectively address real-time OG detection. This paper proposes that OG attacks are complex, requiring the identification of specific communication patterns between adults and children alongside identifying other insights (e.g. Sexual language) to make an accurate determination. It introduces a novel approach leveraging advanced models such as BERT and RoBERTa for Message-Level Analysis and a Context Determination approach for classifying actor interactions, between adults attempting to groom children and honeypot children actors. This approach included the introduction of Actor Significance Thresholds and Message Significance Thresholds to make these determinations. The proposed method aims to enhance accuracy and robustness in detecting OG by considering the dynamic and multi-faceted nature of these attacks. Cross-dataset experiments evaluate the robustness and versatility of our approach. This paper’s contributions include improved detection methodologies and the potential for application in various scenarios, addressing gaps in current literature and practices.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200607"},"PeriodicalIF":4.3,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper surveys the different approaches in semantic Simultaneous Localization and Mapping (SLAM), exploring how the incorporation of semantic information has enhanced performance in both indoor and outdoor settings, while highlighting key advancements in the field. It also identifies existing gaps and proposes potential directions for future improvements to address these issues. We provide a detailed review of the fundamentals of semantic SLAM, illustrating how incorporating semantic data enhances scene understanding and mapping accuracy. The paper presents semantic SLAM methods and core techniques that contribute to improved robustness and precision in mapping. A comprehensive overview of commonly used datasets for evaluating semantic SLAM systems is provided, along with a discussion of performance metrics used to assess their efficiency and accuracy. To demonstrate the reliability of semantic SLAM methodologies, we reproduce selected results from existing studies offering insights into the reproducibility of these approaches. The paper also addresses key challenges such as real-time processing, dynamic scene adaptation, and scalability while highlighting future research directions. Unlike prior surveys, this paper uniquely combines (i) a systematic taxonomy of semantic SLAM approaches across different sensing modalities and environments, (ii) a comparative review of datasets and evaluation metrics, and (iii) a reproducibility study of selected methods. To our knowledge, this is the first survey that integrates methods, datasets, evaluation practices, and application insights into a single comprehensive review, thereby offering a unified reference for researchers and practitioners. In conclusion, this review underscores the vital role of semantic SLAM in driving advancements in autonomous systems and intelligent navigation by analyzing recent developments, validating findings, and highlighting future research directions.
{"title":"Semantic SLAM: A comprehensive survey of methods and applications","authors":"Houssein Kanso , Abhilasha Singh , Etaf El Zarif , Nooruldeen Almohammed , Jinane Mounsef , Noel Maalouf , Bilal Arain","doi":"10.1016/j.iswa.2025.200591","DOIUrl":"10.1016/j.iswa.2025.200591","url":null,"abstract":"<div><div>This paper surveys the different approaches in semantic Simultaneous Localization and Mapping (SLAM), exploring how the incorporation of semantic information has enhanced performance in both indoor and outdoor settings, while highlighting key advancements in the field. It also identifies existing gaps and proposes potential directions for future improvements to address these issues. We provide a detailed review of the fundamentals of semantic SLAM, illustrating how incorporating semantic data enhances scene understanding and mapping accuracy. The paper presents semantic SLAM methods and core techniques that contribute to improved robustness and precision in mapping. A comprehensive overview of commonly used datasets for evaluating semantic SLAM systems is provided, along with a discussion of performance metrics used to assess their efficiency and accuracy. To demonstrate the reliability of semantic SLAM methodologies, we reproduce selected results from existing studies offering insights into the reproducibility of these approaches. The paper also addresses key challenges such as real-time processing, dynamic scene adaptation, and scalability while highlighting future research directions. Unlike prior surveys, this paper uniquely combines (i) a systematic taxonomy of semantic SLAM approaches across different sensing modalities and environments, (ii) a comparative review of datasets and evaluation metrics, and (iii) a reproducibility study of selected methods. To our knowledge, this is the first survey that integrates methods, datasets, evaluation practices, and application insights into a single comprehensive review, thereby offering a unified reference for researchers and practitioners. In conclusion, this review underscores the vital role of semantic SLAM in driving advancements in autonomous systems and intelligent navigation by analyzing recent developments, validating findings, and highlighting future research directions.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200591"},"PeriodicalIF":4.3,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}