Pub Date : 2026-01-01Epub Date: 2025-12-05DOI: 10.1007/s44163-025-00713-y
Ahmed A Harby, Farhana Zulkernine, Hanady M Abdulsalam
The rapid growth of multimedia content has increased the demand for effective methods to reduce storage requirements while maintaining quality and enabling fast data transmission. Existing standards and generative model approaches often involve high computational cost, require extensive parameter tuning, and produce inconsistent results, particularly in environments with limited processing resources. This paper presents a convolutional autoencoder framework for reducing the storage footprint of image and video data. The proposed method is designed for efficient integration with existing storage and retrieval systems. Several autoencoder architectures are developed and evaluated on diverse datasets including CelebA, IMDb Faces, Oxford Flowers 102, MNIST, and UCF101. The results show 56.6% to 70.8% for image data volume with minimal degradation in perceptual quality. The system incorporates a latent representation module that supports compact storage, efficient indexing, and accurate reconstruction. These capabilities are essential for practical deployment in multimedia platforms. Experimental evaluation demonstrates that the proposed approach performs competitively with recent techniques while providing greater consistency and reduced computational overhead. In comparison to generative models, the method achieves a higher peak signal to noise ratio and improved structural fidelity. This study offers a practical and reproducible solution for storage reduction, well suited for large scale image and video archiving and retrieval under constrained or high-throughput conditions.
{"title":"Ai-guided vectorization for efficient storage and semantic retrieval of visual data.","authors":"Ahmed A Harby, Farhana Zulkernine, Hanady M Abdulsalam","doi":"10.1007/s44163-025-00713-y","DOIUrl":"10.1007/s44163-025-00713-y","url":null,"abstract":"<p><p>The rapid growth of multimedia content has increased the demand for effective methods to reduce storage requirements while maintaining quality and enabling fast data transmission. Existing standards and generative model approaches often involve high computational cost, require extensive parameter tuning, and produce inconsistent results, particularly in environments with limited processing resources. This paper presents a convolutional autoencoder framework for reducing the storage footprint of image and video data. The proposed method is designed for efficient integration with existing storage and retrieval systems. Several autoencoder architectures are developed and evaluated on diverse datasets including CelebA, IMDb Faces, Oxford Flowers 102, MNIST, and UCF101. The results show 56.6% to 70.8% for image data volume with minimal degradation in perceptual quality. The system incorporates a latent representation module that supports compact storage, efficient indexing, and accurate reconstruction. These capabilities are essential for practical deployment in multimedia platforms. Experimental evaluation demonstrates that the proposed approach performs competitively with recent techniques while providing greater consistency and reduced computational overhead. In comparison to generative models, the method achieves a higher peak signal to noise ratio and improved structural fidelity. This study offers a practical and reproducible solution for storage reduction, well suited for large scale image and video archiving and retrieval under constrained or high-throughput conditions.</p>","PeriodicalId":520312,"journal":{"name":"Discover artificial intelligence","volume":"6 1","pages":"16"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795887/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-12DOI: 10.1007/s44163-025-00241-9
Hajar Rabie, Moulay A Akhloufi
Millions of people worldwide suffer from Parkinson's disease (PD), a neurodegenerative disorder marked by motor symptoms such as tremors, bradykinesia, and stiffness. Accurate early diagnosis is crucial for effective management and treatment. This article presents a novel review of Machine Learning (ML) and Deep Learning (DL) techniques for PD detection and progression monitoring, offering new perspectives by integrating diverse data sources. We examine the public datasets recently used in studies, including audio recordings, gait analysis, and medical imaging. We discuss the preprocessing methods applied, the state-of-the-art models utilized, and their performance. Our evaluation included different algorithms such as support vector machines (SVM), random forests (RF), convolutional neural networks (CNN). These algorithms have shown promising results in PD diagnosis with accuracy rates exceeding 99% in some studies combining data sources. Our analysis particularly showcases the effectiveness of audio analysis in early symptom detection and gait analysis, including the Unified Parkinson's Disease Rating Scale (UPDRS), in monitoring disease progression. Medical imaging, enhanced by DL techniques, has improved the identification of PD. The application of ML and DL in PD research offers significant potential for improving diagnostic accuracy. However, challenges like the need for large and diverse datasets, data privacy concerns, and data quality in healthcare remain. Additionally, developing explainable AI is crucial to ensure that clinicians can trust and understand ML and DL models. Our review highlights these key challenges that must be addressed to enhance the robustness and applicability of AI models in PD diagnosis, setting the groundwork for future research to overcome these obstacles.
{"title":"A review of machine learning and deep learning for Parkinson's disease detection.","authors":"Hajar Rabie, Moulay A Akhloufi","doi":"10.1007/s44163-025-00241-9","DOIUrl":"https://doi.org/10.1007/s44163-025-00241-9","url":null,"abstract":"<p><p>Millions of people worldwide suffer from Parkinson's disease (PD), a neurodegenerative disorder marked by motor symptoms such as tremors, bradykinesia, and stiffness. Accurate early diagnosis is crucial for effective management and treatment. This article presents a novel review of Machine Learning (ML) and Deep Learning (DL) techniques for PD detection and progression monitoring, offering new perspectives by integrating diverse data sources. We examine the public datasets recently used in studies, including audio recordings, gait analysis, and medical imaging. We discuss the preprocessing methods applied, the state-of-the-art models utilized, and their performance. Our evaluation included different algorithms such as support vector machines (SVM), random forests (RF), convolutional neural networks (CNN). These algorithms have shown promising results in PD diagnosis with accuracy rates exceeding 99% in some studies combining data sources. Our analysis particularly showcases the effectiveness of audio analysis in early symptom detection and gait analysis, including the Unified Parkinson's Disease Rating Scale (UPDRS), in monitoring disease progression. Medical imaging, enhanced by DL techniques, has improved the identification of PD. The application of ML and DL in PD research offers significant potential for improving diagnostic accuracy. However, challenges like the need for large and diverse datasets, data privacy concerns, and data quality in healthcare remain. Additionally, developing explainable AI is crucial to ensure that clinicians can trust and understand ML and DL models. Our review highlights these key challenges that must be addressed to enhance the robustness and applicability of AI models in PD diagnosis, setting the groundwork for future research to overcome these obstacles.</p>","PeriodicalId":520312,"journal":{"name":"Discover artificial intelligence","volume":"5 1","pages":"24"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11903556/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143653218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-08-31DOI: 10.1007/s44163-025-00503-6
Martin Woo, Ahmed A Harby, Farhana Zulkernine, Hanady M Abdulsalam
Human Activity Recognition (HAR) using data streams from wearable sensors is challenging due to high data dimensionality, noise, and the lack of labeled data in unsupervised settings. Our prior work proved that traditional clustering models, which achieve state-of-the-art performance on simulated datasets, perform poorly on time-series numeric sensor data. This paper explores different autoencoder (AE) architectures to extract latent features with reduced dimensionality from streaming HAR datasets, which is then clustered using a clustering model to identify different activity patterns. Since the vanilla AE has shortcomings in learning distinguishing data patterns from spatio temporal time-series sensor data, we leverage the vanilla AE with convolutional, long-short term memory (LSTM), and a combination of convolutional and LSTM layers in multiple design phases. We apply supervised learning to train a superior spatio-temporal feature extraction AE model. Using the data features extracted by the trained AE, we train a clustering model with unsupervised learning approach. Our end-to-end integrated hybrid convolutional AE+LSTM feature extractor and K-Means clustering model achieves state-of-the-art clustering accuracy of up to 0.99 in terms of Normalized Mutual Information (NMI) and Adjusted Rand Index (ARI) scores for MobiAct and UCI HAR datasets, improving clustering performance by over 50% compared to previous methods. Further improvements are achieved through rigorous experimentation and advanced data preprocessing methods. We also present a visualization of the clusters, which explains the transitional activity patterns in the overlapping parts of the clusters.
{"title":"A two-phase hybrid clustering framework exploring transitional activities in HAR.","authors":"Martin Woo, Ahmed A Harby, Farhana Zulkernine, Hanady M Abdulsalam","doi":"10.1007/s44163-025-00503-6","DOIUrl":"10.1007/s44163-025-00503-6","url":null,"abstract":"<p><p>Human Activity Recognition (HAR) using data streams from wearable sensors is challenging due to high data dimensionality, noise, and the lack of labeled data in unsupervised settings. Our prior work proved that traditional clustering models, which achieve state-of-the-art performance on simulated datasets, perform poorly on time-series numeric sensor data. This paper explores different autoencoder (AE) architectures to extract latent features with reduced dimensionality from streaming HAR datasets, which is then clustered using a clustering model to identify different activity patterns. Since the vanilla AE has shortcomings in learning distinguishing data patterns from spatio temporal time-series sensor data, we leverage the vanilla AE with convolutional, long-short term memory (LSTM), and a combination of convolutional and LSTM layers in multiple design phases. We apply supervised learning to train a superior spatio-temporal feature extraction AE model. Using the data features extracted by the trained AE, we train a clustering model with unsupervised learning approach. Our end-to-end integrated hybrid convolutional AE+LSTM feature extractor and K-Means clustering model achieves state-of-the-art clustering accuracy of up to 0.99 in terms of Normalized Mutual Information (NMI) and Adjusted Rand Index (ARI) scores for MobiAct and UCI HAR datasets, improving clustering performance by over 50% compared to previous methods. Further improvements are achieved through rigorous experimentation and advanced data preprocessing methods. We also present a visualization of the clusters, which explains the transitional activity patterns in the overlapping parts of the clusters.</p>","PeriodicalId":520312,"journal":{"name":"Discover artificial intelligence","volume":"5 1","pages":"233"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12399724/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144994989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-13DOI: 10.1007/s44163-025-00243-7
Sanchita Mondal, Debnarayan Khatua, Sourav Mandal, Dilip K Prasad, Arif Ahmed Sekh
Solving math word problems of varying complexities is one of the most challenging and exciting research questions in artificial intelligence (AI), particularly in natural language processing (NLP) and machine learning (ML). Foundational language models such as GPT must be evaluated for intelligence, and solving word problems is a key method for this assessment. These problems become especially difficult when presented in low-resource regional languages such as Bengali. Word problem solving integrates the cognitive domains of language processing, comprehension, and transformation into real-world solutions. During the past decade, advances in AI and machine learning have significantly progressed in addressing this complex issue. Although researchers worldwide have primarily utilized datasets in English and some in Chinese, there has been a lack of standard datasets for low-resource languages such as Bengali. In this pioneering study, we introduce the first Bengali Math Word Problem Benchmark Data Set (BMWP), comprising 8653 word problems. We detail the creation of this dataset and the benchmarking methods employed. Furthermore, we investigate operation prediction from Bengali word problems using state-of-the-art deep learning (DL) techniques. We implemented and compared various standard DL-based neural network architectures, achieving an accuracy of . The data set and the code will be available at https://github.com/SanchitaMondal/BMWP.
{"title":"BMWP: the first Bengali math word problems dataset for operation prediction and solving.","authors":"Sanchita Mondal, Debnarayan Khatua, Sourav Mandal, Dilip K Prasad, Arif Ahmed Sekh","doi":"10.1007/s44163-025-00243-7","DOIUrl":"https://doi.org/10.1007/s44163-025-00243-7","url":null,"abstract":"<p><p>Solving math word problems of varying complexities is one of the most challenging and exciting research questions in artificial intelligence (AI), particularly in natural language processing (NLP) and machine learning (ML). Foundational language models such as GPT must be evaluated for intelligence, and solving word problems is a key method for this assessment. These problems become especially difficult when presented in low-resource regional languages such as Bengali. Word problem solving integrates the cognitive domains of language processing, comprehension, and transformation into real-world solutions. During the past decade, advances in AI and machine learning have significantly progressed in addressing this complex issue. Although researchers worldwide have primarily utilized datasets in English and some in Chinese, there has been a lack of standard datasets for low-resource languages such as Bengali. In this pioneering study, we introduce the first Bengali Math Word Problem Benchmark Data Set (BMWP), comprising 8653 word problems. We detail the creation of this dataset and the benchmarking methods employed. Furthermore, we investigate operation prediction from Bengali word problems using state-of-the-art deep learning (DL) techniques. We implemented and compared various standard DL-based neural network architectures, achieving an accuracy of <math><mrow><mn>92</mn> <mo>±</mo> <mn>2</mn> <mo>%</mo></mrow> </math> . The data set and the code will be available at https://github.com/SanchitaMondal/BMWP.</p>","PeriodicalId":520312,"journal":{"name":"Discover artificial intelligence","volume":"5 1","pages":"25"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11903620/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143653156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-11-26DOI: 10.1007/s44163-024-00199-0
Janny Xue Chen Ke, Arunachalam DhakshinaMurthy, Ronald B George, Paula Branco
Purpose: The availability of population datasets and machine learning techniques heralded a new era of sophisticated prediction models involving a large number of routinely collected variables. However, severe class imbalance in clinical datasets is a major challenge. The aim of this study is to investigate the impact of commonly-used resampling techniques in combination with commonly-used machine learning algorithms in a clinical dataset, to determine whether combination(s) of these approaches improve upon the original multivariable logistic regression with no resampling.
Methods: We previously developed and internally validated a multivariable logistic regression 30-day mortality prediction model in 30,619 patients using preoperative and intraoperative features.Using the same dataset, we systematically evaluated and compared model performances after application of resampling techniques [random under-sampling, near miss under-sampling, random oversampling, and synthetic minority oversampling (SMOTE)] in combination with machine learning algorithms (logistic regression, elastic net, decision trees, random forest, and extreme gradient boosting).
Results: We found that in the setting of severe class imbalance, the impact of resampling techniques on model performance varied by the machine learning algorithm and the evaluation metric. Existing resampling techniques did not meaningfully improve area under receiving operating curve (AUROC). The area under the precision recall curve (AUPRC) was only increased by random under-sampling and SMOTE for decision trees, and oversampling and SMOTE for extreme gradient boosting. Importantly, some combinations of algorithm and resampling technique decreased AUROC and AUPRC compared to no resampling.
Conclusion: Existing resampling techniques had a variable impact on models, depending on the algorithms and the evaluation metrics. Future research is needed to improve predictive performances in the setting of severe class imbalance.
{"title":"The effect of resampling techniques on the performances of machine learning clinical risk prediction models in the setting of severe class imbalance: development and internal validation in a retrospective cohort.","authors":"Janny Xue Chen Ke, Arunachalam DhakshinaMurthy, Ronald B George, Paula Branco","doi":"10.1007/s44163-024-00199-0","DOIUrl":"https://doi.org/10.1007/s44163-024-00199-0","url":null,"abstract":"<p><strong>Purpose: </strong>The availability of population datasets and machine learning techniques heralded a new era of sophisticated prediction models involving a large number of routinely collected variables. However, severe class imbalance in clinical datasets is a major challenge. The aim of this study is to investigate the impact of commonly-used resampling techniques in combination with commonly-used machine learning algorithms in a clinical dataset, to determine whether combination(s) of these approaches improve upon the original multivariable logistic regression with no resampling.</p><p><strong>Methods: </strong>We previously developed and internally validated a multivariable logistic regression 30-day mortality prediction model in 30,619 patients using preoperative and intraoperative features.Using the same dataset, we systematically evaluated and compared model performances after application of resampling techniques [random under-sampling, near miss under-sampling, random oversampling, and synthetic minority oversampling (SMOTE)] in combination with machine learning algorithms (logistic regression, elastic net, decision trees, random forest, and extreme gradient boosting).</p><p><strong>Results: </strong>We found that in the setting of severe class imbalance, the impact of resampling techniques on model performance varied by the machine learning algorithm and the evaluation metric. Existing resampling techniques did not meaningfully improve area under receiving operating curve (AUROC). The area under the precision recall curve (AUPRC) was only increased by random under-sampling and SMOTE for decision trees, and oversampling and SMOTE for extreme gradient boosting. Importantly, some combinations of algorithm and resampling technique decreased AUROC and AUPRC compared to no resampling.</p><p><strong>Conclusion: </strong>Existing resampling techniques had a variable impact on models, depending on the algorithms and the evaluation metrics. Future research is needed to improve predictive performances in the setting of severe class imbalance.</p>","PeriodicalId":520312,"journal":{"name":"Discover artificial intelligence","volume":"4 1","pages":"91"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610218/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142776258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}