Pub Date : 2025-11-19DOI: 10.1016/j.iswa.2025.200609
Floor Rademaker , Faizan Ahmed , Marcos R. Machado
The digitalization of asset management within the architecture, engineering and construction (AEC) sector is in need of effective methods for the automatic classification of documents. This study focuses on the development and evaluation of multimodal document classification models, utilizing visual, textual, and layout-related document information. We examine various state-of-the-art machine learning models and combine them through an iterative development process. The performance of these models is evaluated on two different AEC-document datasets. The results demonstrate that each of the modalities is useful in classifying the documents, as well as the integration of the different information types. This study contributes by applying AI techniques, specifically document classification in the AEC sector, setting the initial step to automating information extraction and processing for Intelligent Asset Management, and lastly, by combining and comparing multimodal state-of-the-art classification models on real-life datasets.
{"title":"Multi-modal document classification in AEC asset management","authors":"Floor Rademaker , Faizan Ahmed , Marcos R. Machado","doi":"10.1016/j.iswa.2025.200609","DOIUrl":"10.1016/j.iswa.2025.200609","url":null,"abstract":"<div><div>The digitalization of asset management within the architecture, engineering and construction (AEC) sector is in need of effective methods for the automatic classification of documents. This study focuses on the development and evaluation of multimodal document classification models, utilizing visual, textual, and layout-related document information. We examine various state-of-the-art machine learning models and combine them through an iterative development process. The performance of these models is evaluated on two different AEC-document datasets. The results demonstrate that each of the modalities is useful in classifying the documents, as well as the integration of the different information types. This study contributes by applying AI techniques, specifically document classification in the AEC sector, setting the initial step to automating information extraction and processing for Intelligent Asset Management, and lastly, by combining and comparing multimodal state-of-the-art classification models on real-life datasets.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200609"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145555492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1016/j.iswa.2025.200608
Miguel De la Cruz Cabello , Tiago Prince Sales , Marcos R. Machado
Modern IT systems generate large volumes of log data that challenge timely and effective anomaly detection. Traditional methods often require intensive feature engineering and struggle to adapt to dynamic operational environments. This Systematic Literature Review (SLR) analyzes how Artificial Intelligence for IT Operations (AIOps) benefits from advanced language models, emphasizing Large Language Models (LLMs) for more effective log anomaly detection. By comparing state-of-art frameworks with LLM-driven methods, this study reveals that prompt engineering – the practice of designing and refining inputs to AI models to produce accurate and useful outputs – and Retrieval Augmented Generation (RAG) boost accuracy and interpretability without extensive fine-tuning. Experimental findings demonstrate that LLM-based approaches significantly outperform traditional methods across evaluation metrics that include F1-score, precision, and recall. Furthermore, the integration of LLMs with RAG techniques has shown a strong adaptability to changing environments. The applicability of these methods also extends to the military industry. Consequently, the development of specialized LLM systems with RAG tailored for the military industry represents a promising research direction to improve operational effectiveness and responsiveness of defense systems.
{"title":"AIOps for log anomaly detection in the era of LLMs: A systematic literature review","authors":"Miguel De la Cruz Cabello , Tiago Prince Sales , Marcos R. Machado","doi":"10.1016/j.iswa.2025.200608","DOIUrl":"10.1016/j.iswa.2025.200608","url":null,"abstract":"<div><div>Modern IT systems generate large volumes of log data that challenge timely and effective anomaly detection. Traditional methods often require intensive feature engineering and struggle to adapt to dynamic operational environments. This Systematic Literature Review (SLR) analyzes how Artificial Intelligence for IT Operations (AIOps) benefits from advanced language models, emphasizing Large Language Models (LLMs) for more effective log anomaly detection. By comparing state-of-art frameworks with LLM-driven methods, this study reveals that prompt engineering – the practice of designing and refining inputs to AI models to produce accurate and useful outputs – and Retrieval Augmented Generation (RAG) boost accuracy and interpretability without extensive fine-tuning. Experimental findings demonstrate that LLM-based approaches significantly outperform traditional methods across evaluation metrics that include F1-score, precision, and recall. Furthermore, the integration of LLMs with RAG techniques has shown a strong adaptability to changing environments. The applicability of these methods also extends to the military industry. Consequently, the development of specialized LLM systems with RAG tailored for the military industry represents a promising research direction to improve operational effectiveness and responsiveness of defense systems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200608"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1016/j.iswa.2025.200611
Alfredo Baione , Giuseppe Rizzo , Luca Barco , Angelica Urbanelli , Luigi Di Biasi , Genoveffa Tortora
Generative art is a challenging area of research in deep generative modeling. Exploring AI’s role in human–machine co-creative processes requires understanding machine learning’s potential in the arts. Building on this premise, this paper presents Musipainter, a cross-modal generative framework adapted to create artistic images that are historically and stylistically aligned with 30-second musical inputs, with a focus on creative and semantic coherence. To support this goal, we introduce Museart, a dataset designed explicitly for this research, and GIILS, a creativity-oriented metric that enables us to assess both artistic-semantic consistency and diversity in the generated outputs. The results indicate that Musipainter, supported by the Museart dataset and the exploratory GIILS metric, can offer a foundation for further research on AI’s role in artistic generation, while also highlighting the need for systematic validation and future refinements.
{"title":"Musipainter: A music-conditioned generative architecture for artistic image synthesis","authors":"Alfredo Baione , Giuseppe Rizzo , Luca Barco , Angelica Urbanelli , Luigi Di Biasi , Genoveffa Tortora","doi":"10.1016/j.iswa.2025.200611","DOIUrl":"10.1016/j.iswa.2025.200611","url":null,"abstract":"<div><div>Generative art is a challenging area of research in deep generative modeling. Exploring AI’s role in human–machine co-creative processes requires understanding machine learning’s potential in the arts. Building on this premise, this paper presents Musipainter, a cross-modal generative framework adapted to create artistic images that are historically and stylistically aligned with 30-second musical inputs, with a focus on creative and semantic coherence. To support this goal, we introduce Museart, a dataset designed explicitly for this research, and GIILS, a creativity-oriented metric that enables us to assess both artistic-semantic consistency and diversity in the generated outputs. The results indicate that Musipainter, supported by the Museart dataset and the exploratory GIILS metric, can offer a foundation for further research on AI’s role in artistic generation, while also highlighting the need for systematic validation and future refinements.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200611"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1016/j.iswa.2025.200603
Xifeng Ning , Chao Yang , Hailu Sun , Xinyuan Song , Zifan Hu , Yu Feng , Jiawei Li , Yifan Zhu
In modern monitoring and operational management, whether in industrial systems, financial risk control, or infrastructure maintenance, decision-making increasingly relies on integrating heterogeneous data from multiple sources. However, due to data privacy regulations, distributed storage, communication constraints, and sensor failures, it is often difficult to centralize modeling when dealing with high-dimensional, incomplete datasets held by different institutions. Federated learning offers a privacy-preserving joint modeling solution, yet still faces challenges such as high communication overhead, low robustness to participant dropout, and risks of gradient leakage. In certain incomplete-data scenarios, not all data is private—labels such as equipment inspection results, fault reports, or corporate blacklists and whitelists published by authoritative bodies may be public—while feature data remains private and partially missing. To address this, we propose an innovative collaborative modeling framework tailored for incomplete-data monitoring and operations, in which each participant independently trains a model on its private features and exchanges only prediction results rather than gradients. Inspired by collective expert scoring, each “expert” evaluates based on its own data, then shares scores that are integrated into a comprehensive assessment. This approach offers multiple advantages: independent model training for each party, improved efficiency by migrating only prediction results, enhanced security by avoiding gradient transmission, and higher robustness since the failure of one participant does not halt others’ training. We present three variants of this prediction-result fusion method and evaluate them on representative datasets, including enterprise credit risk assessment as a case study, comparing against vertical federated logistic regression. Experimental results validate the effectiveness of the proposed approach, which can be widely applied to diverse monitoring and operational scenarios under incomplete data conditions.
{"title":"A corporate credit evaluation method considering strong feature privacy with non-private label: A vertical heterogeneous feature fusion approach","authors":"Xifeng Ning , Chao Yang , Hailu Sun , Xinyuan Song , Zifan Hu , Yu Feng , Jiawei Li , Yifan Zhu","doi":"10.1016/j.iswa.2025.200603","DOIUrl":"10.1016/j.iswa.2025.200603","url":null,"abstract":"<div><div>In modern monitoring and operational management, whether in industrial systems, financial risk control, or infrastructure maintenance, decision-making increasingly relies on integrating heterogeneous data from multiple sources. However, due to data privacy regulations, distributed storage, communication constraints, and sensor failures, it is often difficult to centralize modeling when dealing with high-dimensional, incomplete datasets held by different institutions. Federated learning offers a privacy-preserving joint modeling solution, yet still faces challenges such as high communication overhead, low robustness to participant dropout, and risks of gradient leakage. In certain incomplete-data scenarios, not all data is private—labels such as equipment inspection results, fault reports, or corporate blacklists and whitelists published by authoritative bodies may be public—while feature data remains private and partially missing. To address this, we propose an innovative collaborative modeling framework tailored for incomplete-data monitoring and operations, in which each participant independently trains a model on its private features and exchanges only prediction results rather than gradients. Inspired by collective expert scoring, each “expert” evaluates based on its own data, then shares scores that are integrated into a comprehensive assessment. This approach offers multiple advantages: independent model training for each party, improved efficiency by migrating only prediction results, enhanced security by avoiding gradient transmission, and higher robustness since the failure of one participant does not halt others’ training. We present three variants of this prediction-result fusion method and evaluate them on representative datasets, including enterprise credit risk assessment as a case study, comparing against vertical federated logistic regression. Experimental results validate the effectiveness of the proposed approach, which can be widely applied to diverse monitoring and operational scenarios under incomplete data conditions.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200603"},"PeriodicalIF":4.3,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annotating vibration data from heavy-duty pumps in the mining industry is highly challenging because it demands domain knowledge, a complex inspection setup, and, in many cases, remains infeasible. A self-supervised data annotation (SSDA) framework is therefore proposed and evaluated on historical data of slurry-pump vibration signals. The framework began with the collection of heterogeneous information, followed by information fusion using an autoencoder. This was then followed by a datafication step for preprocessing and achieving a better representation of features through a feature embedding technique. As a result, redundant information was pushed into an eight-dimensional latent space, achieving a reconstruction loss of 0.0023. Furthermore, Initial data annotation was obtained by combining the Isolation Forest and Kneedle algorithms to locate a data-driven knee or threshold, and it was found to be 0.58 for predicting labels. Partial samples were labeled and considered accurate. Lastly, an attention-based fuzzy neural network (AFNN) is trained on those labels where membership functions convert each latent feature into graded truth values. At the same time, an attention layer highlights the most relevant rules. An iterative self-training loop was implemented to refine the training set and obtain labeled data with higher model confidence. Here, we also tested six baseline models and found AFNN quite impressive. After seven iterations 2780 of 2872 samples were labeled and the remaining 92 are considered uncertain, still need some review from an expert, and the AFNN model confidence was (96.8%). Statistical analysis confirmed that the model predictions were significantly associated with true labels () and not driven by chance.
{"title":"Attention-based fuzzy neural networks for self-supervised data annotation","authors":"Md Rakibul Islam, Shahina Begum, Mobyen Uddin Ahmed, Shaibal Barua","doi":"10.1016/j.iswa.2025.200610","DOIUrl":"10.1016/j.iswa.2025.200610","url":null,"abstract":"<div><div>Annotating vibration data from heavy-duty pumps in the mining industry is highly challenging because it demands domain knowledge, a complex inspection setup, and, in many cases, remains infeasible. A self-supervised data annotation (SSDA) framework is therefore proposed and evaluated on historical data of slurry-pump vibration signals. The framework began with the collection of heterogeneous information, followed by information fusion using an autoencoder. This was then followed by a datafication step for preprocessing and achieving a better representation of features through a feature embedding technique. As a result, redundant information was pushed into an eight-dimensional latent space, achieving a reconstruction loss of 0.0023. Furthermore, Initial data annotation was obtained by combining the Isolation Forest and Kneedle algorithms to locate a data-driven knee or threshold, and it was found to be 0.58 for predicting labels. Partial samples were labeled and considered accurate. Lastly, an attention-based fuzzy neural network (AFNN) is trained on those labels where membership functions convert each latent feature into graded truth values. At the same time, an attention layer highlights the most relevant rules. An iterative self-training loop was implemented to refine the training set and obtain labeled data with higher model confidence. Here, we also tested six baseline models and found AFNN quite impressive. After seven iterations 2780 of 2872 samples were labeled and the remaining 92 are considered uncertain, still need some review from an expert, and the AFNN model confidence was (96.8%). Statistical analysis confirmed that the model predictions were significantly associated with true labels (<span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>05</mn></mrow></math></span>) and not driven by chance.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200610"},"PeriodicalIF":4.3,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1016/j.iswa.2025.200602
Kadri Kukk , Ants Torim , Erki Eessaar , Tarmo Kadak
The printing industry benefits from digitalizing workflows such as customer quoting. Intelligent printing process planning is essential to determine the near-optimal price for automated quoting. This paper addresses the automation of sheet imposition, a critical and computationally intensive step in optimizing the printing process that belongs to the general class of cutting and packing problems. We propose a simple recursive sheet imposition representation as the basis for our algorithms. The Brute Force algorithm for optimizing sheet imposition guarantees the cheapest solution but is computationally infeasible for complex tasks. As alternatives, we investigate heuristic algorithms, specifically Monte Carlo Tree Search (MCTS) and Simulated Annealing (SA). Our findings show that while Brute Force is prohibitively slow, MCTS strikes a robust balance between computational performance and solution quality, consistently finding solutions within a 5% margin of optimal price. Although SA can occasionally find superior solutions, MCTS provides a more reliable and efficient approach by consistently delivering results close to the optimal price.
{"title":"Optimizing printing processes with MCTS","authors":"Kadri Kukk , Ants Torim , Erki Eessaar , Tarmo Kadak","doi":"10.1016/j.iswa.2025.200602","DOIUrl":"10.1016/j.iswa.2025.200602","url":null,"abstract":"<div><div>The printing industry benefits from digitalizing workflows such as customer quoting. Intelligent printing process planning is essential to determine the near-optimal price for automated quoting. This paper addresses the automation of sheet imposition, a critical and computationally intensive step in optimizing the printing process that belongs to the general class of cutting and packing problems. We propose a simple recursive sheet imposition representation as the basis for our algorithms. The Brute Force algorithm for optimizing sheet imposition guarantees the cheapest solution but is computationally infeasible for complex tasks. As alternatives, we investigate heuristic algorithms, specifically Monte Carlo Tree Search (MCTS) and Simulated Annealing (SA). Our findings show that while Brute Force is prohibitively slow, MCTS strikes a robust balance between computational performance and solution quality, consistently finding solutions within a 5% margin of optimal price. Although SA can occasionally find superior solutions, MCTS provides a more reliable and efficient approach by consistently delivering results close to the optimal price.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200602"},"PeriodicalIF":4.3,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1016/j.iswa.2025.200607
Jake Street, Isibor Kennedy Ihianle, Funminiyi Olajide, Ahmad Lotfi
Online Grooming (OG) is a prevalent threat facing predominately children online, with groomers using deceptive methods to prey on the vulnerability of children on social media/messaging platforms. These attacks can have severe psychological and physical impacts, including a tendency towards revictimization. Current technical measures are inadequate, especially with the advent of end-to-end encryption which hampers message monitoring. Existing solutions focus on the signature analysis of child abuse media, which does not effectively address real-time OG detection. This paper proposes that OG attacks are complex, requiring the identification of specific communication patterns between adults and children alongside identifying other insights (e.g. Sexual language) to make an accurate determination. It introduces a novel approach leveraging advanced models such as BERT and RoBERTa for Message-Level Analysis and a Context Determination approach for classifying actor interactions, between adults attempting to groom children and honeypot children actors. This approach included the introduction of Actor Significance Thresholds and Message Significance Thresholds to make these determinations. The proposed method aims to enhance accuracy and robustness in detecting OG by considering the dynamic and multi-faceted nature of these attacks. Cross-dataset experiments evaluate the robustness and versatility of our approach. This paper’s contributions include improved detection methodologies and the potential for application in various scenarios, addressing gaps in current literature and practices.
{"title":"Enhanced Online Grooming detection employing Context Determination and Message-Level Analysis","authors":"Jake Street, Isibor Kennedy Ihianle, Funminiyi Olajide, Ahmad Lotfi","doi":"10.1016/j.iswa.2025.200607","DOIUrl":"10.1016/j.iswa.2025.200607","url":null,"abstract":"<div><div>Online Grooming (OG) is a prevalent threat facing predominately children online, with groomers using deceptive methods to prey on the vulnerability of children on social media/messaging platforms. These attacks can have severe psychological and physical impacts, including a tendency towards revictimization. Current technical measures are inadequate, especially with the advent of end-to-end encryption which hampers message monitoring. Existing solutions focus on the signature analysis of child abuse media, which does not effectively address real-time OG detection. This paper proposes that OG attacks are complex, requiring the identification of specific communication patterns between adults and children alongside identifying other insights (e.g. Sexual language) to make an accurate determination. It introduces a novel approach leveraging advanced models such as BERT and RoBERTa for Message-Level Analysis and a Context Determination approach for classifying actor interactions, between adults attempting to groom children and honeypot children actors. This approach included the introduction of Actor Significance Thresholds and Message Significance Thresholds to make these determinations. The proposed method aims to enhance accuracy and robustness in detecting OG by considering the dynamic and multi-faceted nature of these attacks. Cross-dataset experiments evaluate the robustness and versatility of our approach. This paper’s contributions include improved detection methodologies and the potential for application in various scenarios, addressing gaps in current literature and practices.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200607"},"PeriodicalIF":4.3,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper surveys the different approaches in semantic Simultaneous Localization and Mapping (SLAM), exploring how the incorporation of semantic information has enhanced performance in both indoor and outdoor settings, while highlighting key advancements in the field. It also identifies existing gaps and proposes potential directions for future improvements to address these issues. We provide a detailed review of the fundamentals of semantic SLAM, illustrating how incorporating semantic data enhances scene understanding and mapping accuracy. The paper presents semantic SLAM methods and core techniques that contribute to improved robustness and precision in mapping. A comprehensive overview of commonly used datasets for evaluating semantic SLAM systems is provided, along with a discussion of performance metrics used to assess their efficiency and accuracy. To demonstrate the reliability of semantic SLAM methodologies, we reproduce selected results from existing studies offering insights into the reproducibility of these approaches. The paper also addresses key challenges such as real-time processing, dynamic scene adaptation, and scalability while highlighting future research directions. Unlike prior surveys, this paper uniquely combines (i) a systematic taxonomy of semantic SLAM approaches across different sensing modalities and environments, (ii) a comparative review of datasets and evaluation metrics, and (iii) a reproducibility study of selected methods. To our knowledge, this is the first survey that integrates methods, datasets, evaluation practices, and application insights into a single comprehensive review, thereby offering a unified reference for researchers and practitioners. In conclusion, this review underscores the vital role of semantic SLAM in driving advancements in autonomous systems and intelligent navigation by analyzing recent developments, validating findings, and highlighting future research directions.
{"title":"Semantic SLAM: A comprehensive survey of methods and applications","authors":"Houssein Kanso , Abhilasha Singh , Etaf El Zarif , Nooruldeen Almohammed , Jinane Mounsef , Noel Maalouf , Bilal Arain","doi":"10.1016/j.iswa.2025.200591","DOIUrl":"10.1016/j.iswa.2025.200591","url":null,"abstract":"<div><div>This paper surveys the different approaches in semantic Simultaneous Localization and Mapping (SLAM), exploring how the incorporation of semantic information has enhanced performance in both indoor and outdoor settings, while highlighting key advancements in the field. It also identifies existing gaps and proposes potential directions for future improvements to address these issues. We provide a detailed review of the fundamentals of semantic SLAM, illustrating how incorporating semantic data enhances scene understanding and mapping accuracy. The paper presents semantic SLAM methods and core techniques that contribute to improved robustness and precision in mapping. A comprehensive overview of commonly used datasets for evaluating semantic SLAM systems is provided, along with a discussion of performance metrics used to assess their efficiency and accuracy. To demonstrate the reliability of semantic SLAM methodologies, we reproduce selected results from existing studies offering insights into the reproducibility of these approaches. The paper also addresses key challenges such as real-time processing, dynamic scene adaptation, and scalability while highlighting future research directions. Unlike prior surveys, this paper uniquely combines (i) a systematic taxonomy of semantic SLAM approaches across different sensing modalities and environments, (ii) a comparative review of datasets and evaluation metrics, and (iii) a reproducibility study of selected methods. To our knowledge, this is the first survey that integrates methods, datasets, evaluation practices, and application insights into a single comprehensive review, thereby offering a unified reference for researchers and practitioners. In conclusion, this review underscores the vital role of semantic SLAM in driving advancements in autonomous systems and intelligent navigation by analyzing recent developments, validating findings, and highlighting future research directions.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200591"},"PeriodicalIF":4.3,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current research in radiology report generation tend to overlook the utilization of abnormalities depicted in medical images. This study introduces a novel radiology report generator that integrates a multi-label learning approach for predicting abnormality tags and employs transformer models for generating reports. Additionally, the research explores contrast-based image enhancement to mitigate noise in medical images, evaluating its impact on model performance. The multi-label learning is trained on a dataset with 180 abnormality labels and the features used as initial weights for MIMICCXR, as a visual feature extractor.Imbalance handling and ensemble methods are employed to optimize multi-label model performance for abnormality tag prediction. Multi-head attention, in conjunction with GPT-2, facilitates context building for medical report generation, utilizing BERT embeddings for text feature extraction. Evaluation metrics demonstrate that the proposed model achieves superior performance in both multi-label prediction accuracy 77 % and text generation, showing an increase in similarity 28 % in average compared to the baseline model. These findings suggest that leveraging transfer learning with an ensemble classifier, combined with a transformer for context building and decoding, effectively utilizes visual and text features. Furthermore, the incorporation of image enhancement techniques significantly impacts model performance.
{"title":"Enhanced radiology report: Leveraging image enhancement and multi-label transfer learning with attention-based text generation","authors":"Hilya Tsaniya , Chastine Fatichah , Nanik Suciati , Takashi Obi , Joong-sun Lee","doi":"10.1016/j.iswa.2025.200605","DOIUrl":"10.1016/j.iswa.2025.200605","url":null,"abstract":"<div><div>Current research in radiology report generation tend to overlook the utilization of abnormalities depicted in medical images. This study introduces a novel radiology report generator that integrates a multi-label learning approach for predicting abnormality tags and employs transformer models for generating reports. Additionally, the research explores contrast-based image enhancement to mitigate noise in medical images, evaluating its impact on model performance. The multi-label learning is trained on a dataset with 180 abnormality labels and the features used as initial weights for MIMIC<img>CXR, as a visual feature extractor.Imbalance handling and ensemble methods are employed to optimize multi-label model performance for abnormality tag prediction. Multi-head attention, in conjunction with GPT-2, facilitates context building for medical report generation, utilizing BERT embeddings for text feature extraction. Evaluation metrics demonstrate that the proposed model achieves superior performance in both multi-label prediction accuracy 77 % and text generation, showing an increase in similarity 28 % in average compared to the baseline model. These findings suggest that leveraging transfer learning with an ensemble classifier, combined with a transformer for context building and decoding, effectively utilizes visual and text features. Furthermore, the incorporation of image enhancement techniques significantly impacts model performance.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200605"},"PeriodicalIF":4.3,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-08DOI: 10.1016/j.iswa.2025.200601
Rim El Badaoui , Ester Bonmati , Vasileios Argyriou , Barbara Villarini
Deep learning for medical imaging has shown great potential in improving patient outcomes due to its high accuracy in disease diagnosis. However, a major challenge preventing the widespread adoption of such models in clinical settings is data accessibility, which conflicts with the General Data Protection Regulation (GDPR) in a traditional centralised training environment. Hence, to address this issue, Federated Learning (FL) was introduced as a decentralised alternative that enables collaborative model training among data owners without sharing any private data. Despite its significance in healthcare, limited research has explored FL for medical imaging, particularly in multimodal brain tumour segmentation, due to challenges such as data heterogeneity.
In this study, we present Federated E-CATBraTS, an advanced federated deep learning model derived from the existing E-CATBraTS framework. This model is designed to segment brain tumours from multimodal magnetic resonance imaging (MRI) while preserving data privacy. Our framework introduces a novel aggregation method, DaQAvg, which optimally combines model weights based on data size and quality, demonstrating resilience against corrupted medical images.
We evaluated the performance of Federated E-CATBraTS using two publicly available datasets: UPenn-GBM and UCSF-PDGM, including a degraded version of the latter to assess the efficacy of our aggregation method. The results indicate a 6% overall improvement over traditional centralised approaches. Furthermore, we conducted a comprehensive comparison against state-of-the-art FL aggregation algorithms, including FedAVG, FedProx and FedNova. While FedNova demonstrated the highest overall DSC, DaQAvg demonstrated superior robustness to noisy conditions, showcasing its specific advantage in maintaining performance with variable data quality, a critical aspect in medical imaging.
{"title":"Federated learning using quality-based aggregation method for brain tumour segmentation on multimodality medical images","authors":"Rim El Badaoui , Ester Bonmati , Vasileios Argyriou , Barbara Villarini","doi":"10.1016/j.iswa.2025.200601","DOIUrl":"10.1016/j.iswa.2025.200601","url":null,"abstract":"<div><div>Deep learning for medical imaging has shown great potential in improving patient outcomes due to its high accuracy in disease diagnosis. However, a major challenge preventing the widespread adoption of such models in clinical settings is data accessibility, which conflicts with the General Data Protection Regulation (GDPR) in a traditional centralised training environment. Hence, to address this issue, Federated Learning (FL) was introduced as a decentralised alternative that enables collaborative model training among data owners without sharing any private data. Despite its significance in healthcare, limited research has explored FL for medical imaging, particularly in multimodal brain tumour segmentation, due to challenges such as data heterogeneity.</div><div>In this study, we present Federated E-CATBraTS, an advanced federated deep learning model derived from the existing E-CATBraTS framework. This model is designed to segment brain tumours from multimodal magnetic resonance imaging (MRI) while preserving data privacy. Our framework introduces a novel aggregation method, DaQAvg, which optimally combines model weights based on data size and quality, demonstrating resilience against corrupted medical images.</div><div>We evaluated the performance of Federated E-CATBraTS using two publicly available datasets: UPenn-GBM and UCSF-PDGM, including a degraded version of the latter to assess the efficacy of our aggregation method. The results indicate a 6% overall improvement over traditional centralised approaches. Furthermore, we conducted a comprehensive comparison against state-of-the-art FL aggregation algorithms, including FedAVG, FedProx and FedNova. While FedNova demonstrated the highest overall DSC, DaQAvg demonstrated superior robustness to noisy conditions, showcasing its specific advantage in maintaining performance with variable data quality, a critical aspect in medical imaging.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200601"},"PeriodicalIF":4.3,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}