Pub Date : 2025-12-01DOI: 10.1016/j.array.2025.100602
Sercan Sari , Ugur Nezir , Onur Demir , Gurhan Kucuk
Within the wide variety of applications running in the cloud, demands for service quality and performance guarantees need special attention. Since cloud nodes typically utilize Simultaneous Multithreading (SMT) cores, processor resources may need to be shared and managed among multiple applications. A customer application that demands high levels of Quality of Service (QoS) may require more resources than its best-effort allocated share. QoSMT is an earlier study that focuses on 2-thread SMT cores to address this issue. While a simple relationship between two threads may be relatively straightforward to manage, the interactions among multiple threads can quickly become highly complex and difficult to design and maintain. In this paper, we extend the original QoSMT study to 4-thread SMT cores by proposing three alternative scheduling algorithms that regulate resource usage between a high-priority thread and three ordinary batch-type threads. On average, we achieve 7.2% better overall throughput and up to 218% better throughput for the latency-sensitive, high-priority thread. More than 92% of 600 workloads display higher performance than the baseline configuration, and 8% of workloads perform ten times better compared to a traditional 4-thread baseline SMT with unmanaged resources.
{"title":"Breaking the Complexity Barrier: Enhancing Quality of Service in Simultaneous Multithreading Processors","authors":"Sercan Sari , Ugur Nezir , Onur Demir , Gurhan Kucuk","doi":"10.1016/j.array.2025.100602","DOIUrl":"10.1016/j.array.2025.100602","url":null,"abstract":"<div><div>Within the wide variety of applications running in the cloud, demands for service quality and performance guarantees need special attention. Since cloud nodes typically utilize Simultaneous Multithreading (SMT) cores, processor resources may need to be shared and managed among multiple applications. A customer application that demands high levels of Quality of Service (QoS) may require more resources than its best-effort allocated share. QoSMT is an earlier study that focuses on 2-thread SMT cores to address this issue. While a simple relationship between two threads may be relatively straightforward to manage, the interactions among multiple threads can quickly become highly complex and difficult to design and maintain. In this paper, we extend the original QoSMT study to 4-thread SMT cores by proposing three alternative scheduling algorithms that regulate resource usage between a high-priority thread and three ordinary batch-type threads. On average, we achieve 7.2% better overall throughput and up to 218% better throughput for the latency-sensitive, high-priority thread. More than 92% of 600 workloads display higher performance than the baseline configuration, and 8% of workloads perform ten times better compared to a traditional 4-thread baseline SMT with unmanaged resources.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100602"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.array.2025.100597
Lucia Arnau Muñoz, José Vicente Berná Martínez, David Saavedra Pastor, Iren Lorenzo Fonseca
FOG computing is an essential paradigm for Internet of Things (IoT) environments, as it enables the decentralisation of data processing to make better use of resources. The rapid proliferation of IoT devices has created new challenges for this middle layer, especially in terms of reconfiguration and dynamic scalability, processes that have traditionally been managed manually. This study presents an architecture for the FOG layer that implements dynamic control and scaling automation through an advanced control layer composed of a coordinator that calculates the optimal distribution of computational load and a balancer that manages traffic routing. The central component of the proposal is the Simple FOG Management Protocol (SFMP), designed specifically to facilitate automated resource management, ensuring efficient allocation of computational capacity, fair load balancing, and robust fault tolerance mechanisms. Experimental validation in several complex IoT environments at the University of Alicante demonstrated a substantial optimisation in resource utilisation with a reduction in energy consumption of between 40 % and 65 %. All this confirms the flexibility and adaptability of the system to dynamic variations in computational demand in large-scale IoT implementations and its benefits over traditional proposals.
{"title":"Efficient management of computational resources in the IoT FOG layer. Automation and load balancing using Simple FOG Management Protocol","authors":"Lucia Arnau Muñoz, José Vicente Berná Martínez, David Saavedra Pastor, Iren Lorenzo Fonseca","doi":"10.1016/j.array.2025.100597","DOIUrl":"10.1016/j.array.2025.100597","url":null,"abstract":"<div><div>FOG computing is an essential paradigm for Internet of Things (IoT) environments, as it enables the decentralisation of data processing to make better use of resources. The rapid proliferation of IoT devices has created new challenges for this middle layer, especially in terms of reconfiguration and dynamic scalability, processes that have traditionally been managed manually. This study presents an architecture for the FOG layer that implements dynamic control and scaling automation through an advanced control layer composed of a coordinator that calculates the optimal distribution of computational load and a balancer that manages traffic routing. The central component of the proposal is the Simple FOG Management Protocol (SFMP), designed specifically to facilitate automated resource management, ensuring efficient allocation of computational capacity, fair load balancing, and robust fault tolerance mechanisms. Experimental validation in several complex IoT environments at the University of Alicante demonstrated a substantial optimisation in resource utilisation with a reduction in energy consumption of between 40 % and 65 %. All this confirms the flexibility and adaptability of the system to dynamic variations in computational demand in large-scale IoT implementations and its benefits over traditional proposals.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100597"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.array.2025.100596
José Carlos Mondragón , Andres Eduardo Gutierrez-Rodríguez , Victor Adrián Sosa Hernández
The financial, health, and education sectors produce vast amounts of data daily. Labeling entries such as assets, patients, and students is both costly and complex due to the evolution of databases into multi-label settings. Handling real-world data requires automatic labeling to circumvent slow manual procedures and explanations for compliance with regulations. In this work, we introduce GSMT, an inductive Explainable Semi-Supervised Multi-Label Random Forest Method based on Gower Distance, which uses supervised and unsupervised data to provide a non-linear solution for mainly tabular multi-label datasets with fully unknown label vectors. GSMT splits the dataset using multi-dimensional manifolds, completes missing label information and inductively predicts new observations while achieving explainability. We demonstrate state-of-the-art performance across Micro F1 Score, AUPRC, AUROC, and Label Rank Average Precision in a study involving 20 numerical and 5 mostly categorical datasets with five missing data ratios. By leveraging unsupervised information on top of numerical and categorical data, GSMT outputs the pattern rules annotated with performance measures, explanations on attribute and label space as well as an inductive model capable of predicting multi-label observations.
金融、卫生和教育部门每天都会产生大量的数据。标记资产、患者和学生等条目既昂贵又复杂,因为数据库已演变为多标签设置。处理真实世界的数据需要自动标记,以规避缓慢的人工程序和符合法规的解释。在这项工作中,我们引入了GSMT,一种基于高尔距离的归纳可解释半监督多标签随机森林方法,它使用监督和无监督数据为具有完全未知标签向量的主要表格多标签数据集提供非线性解决方案。GSMT使用多维流形分割数据集,完成缺失的标签信息,并在实现可解释性的同时归纳预测新的观测值。我们在一项涉及20个数字和5个主要分类数据集的研究中展示了Micro F1 Score、AUPRC、AUROC和Label Rank Average Precision的最先进性能,其中包含5个缺失数据比率。通过利用数值和分类数据之上的无监督信息,GSMT输出带有性能度量注释的模式规则、对属性和标签空间的解释,以及能够预测多标签观测值的归纳模型。
{"title":"GSMT: An explainable semi-supervised multi-label method based on Gower distance","authors":"José Carlos Mondragón , Andres Eduardo Gutierrez-Rodríguez , Victor Adrián Sosa Hernández","doi":"10.1016/j.array.2025.100596","DOIUrl":"10.1016/j.array.2025.100596","url":null,"abstract":"<div><div>The financial, health, and education sectors produce vast amounts of data daily. Labeling entries such as assets, patients, and students is both costly and complex due to the evolution of databases into multi-label settings. Handling real-world data requires automatic labeling to circumvent slow manual procedures and explanations for compliance with regulations. In this work, we introduce GSMT, an inductive Explainable Semi-Supervised Multi-Label Random Forest Method based on Gower Distance, which uses supervised and unsupervised data to provide a non-linear solution for mainly tabular multi-label datasets with fully unknown label vectors. GSMT splits the dataset using multi-dimensional manifolds, completes missing label information and inductively predicts new observations while achieving explainability. We demonstrate state-of-the-art performance across Micro F1 Score, AUPRC, AUROC, and Label Rank Average Precision in a study involving 20 numerical and 5 mostly categorical datasets with five missing data ratios. By leveraging unsupervised information on top of numerical and categorical data, GSMT outputs the pattern rules annotated with performance measures, explanations on attribute and label space as well as an inductive model capable of predicting multi-label observations.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100596"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.array.2025.100578
Xuanyi Zhang , Genghong Zhao , Yi Ren , Weiguang Wang , Yingying Feng , Xia Zhang , Jiren Liu
Disease diagnosis is a pivotal task in Clinical Decision Support (CDS), which aids physicians in differential diagnosis, faces challenges in achieving high precision and improving clinical adaptability. The development of deep learning models based on pre-trained transformers, especially Large Language Models (LLMs) and text embedding models, bring opportunities to construct advanced disease diagnosis models, but either kind of model meets challenges. LLM-based disease diagnosis models have not performed reliability on generated diagnoses and physicians difficultly trace the original evidence for these generated diagnoses. Text embedding models can address the previous challenges but the domain-specific semantic misalignment caused by corpus distribution differences between open-domain corpora and authentic clinical notes during the training of general text embedding models causes inaccurate ranking. In this paper, we build Disease Diagnoser based on a hybrid information retrieval model architecture of an augmented retriever and an augmented reranker. We propose an approach to respectively construct DD-retriever and DD-reranker through contrastive text embeddings with Effective Sample Mining, for addressing the domain-specific semantic misalignment during contrastive learning and thereby enhancing the diagnostic accuracy in disease diagnosis. Specifically, Effective Sample Mining provides high-quality positive and negative samples for model fine-tuning, augmenting contrastive learning. We define Semantic Target in order to improve the capability of a text embedding model in identifying positive sample during contrastive learning. Extensive experiments demonstrate that Disease Diagnoser outperforms the best performing SOTA LLM, by 12.9%, 22.9% and 24.8% respectively on top-3, top-5 and top-10 diagnostic accuracy. Our approach is validated to be generalized to any hospital, using its private annotated clinical notes, to construct a specific disease diagnosis model. Additionally, we construct PMC-Patients-DD, a new public clinical note dataset with grounded truth, specifically designed for disease diagnosis related tasks. This dataset is available for more researchers in the field of disease diagnosis to facilitate further researches.
疾病诊断是临床决策支持(CDS)的关键任务,是临床医生鉴别诊断的辅助手段,在实现高精度和提高临床适应性方面面临着挑战。基于预训练变形器的深度学习模型的发展,尤其是大语言模型(Large Language models, llm)和文本嵌入模型,为构建高级疾病诊断模型带来了机遇,但这两种模型都面临挑战。基于llm的疾病诊断模型对生成的诊断没有表现出可靠性,医生很难追踪这些生成诊断的原始证据。文本嵌入模型可以解决上述问题,但在常规文本嵌入模型的训练过程中,由于开放领域语料库与真实临床笔记之间的语料库分布差异而导致的特定领域语义失调导致了不准确的排序。本文基于增强检索器和增强重新排序器的混合信息检索模型体系结构构建了疾病诊断器。我们提出了一种基于有效样本挖掘的对比文本嵌入分别构建dd -检索器和dd -重新排序器的方法,以解决对比学习过程中特定领域的语义错位问题,从而提高疾病诊断中的诊断准确性。具体来说,有效样本挖掘为模型微调提供了高质量的正负样本,增强了对比学习。为了提高文本嵌入模型在对比学习过程中识别阳性样本的能力,我们定义了语义目标。大量实验表明,在前3名、前5名和前10名的诊断准确率上,疾病诊断器比表现最好的SOTA LLM分别高出12.9%、22.9%和24.8%。我们的方法被证明可以推广到任何医院,使用其私人注释临床笔记,构建特定的疾病诊断模型。此外,我们构建了PMC-Patients-DD,这是一个新的公共临床记录数据集,具有接地真理,专门为疾病诊断相关任务设计。该数据集可供更多疾病诊断领域的研究人员使用,以促进进一步的研究。
{"title":"Contrastive text embeddings with effective sample mining for enhanced disease diagnosis","authors":"Xuanyi Zhang , Genghong Zhao , Yi Ren , Weiguang Wang , Yingying Feng , Xia Zhang , Jiren Liu","doi":"10.1016/j.array.2025.100578","DOIUrl":"10.1016/j.array.2025.100578","url":null,"abstract":"<div><div>Disease diagnosis is a pivotal task in Clinical Decision Support (CDS), which aids physicians in differential diagnosis, faces challenges in achieving high precision and improving clinical adaptability. The development of deep learning models based on pre-trained transformers, especially Large Language Models (LLMs) and text embedding models, bring opportunities to construct advanced disease diagnosis models, but either kind of model meets challenges. LLM-based disease diagnosis models have not performed reliability on generated diagnoses and physicians difficultly trace the original evidence for these generated diagnoses. Text embedding models can address the previous challenges but the domain-specific semantic misalignment caused by corpus distribution differences between open-domain corpora and authentic clinical notes during the training of general text embedding models causes inaccurate ranking. In this paper, we build <em>Disease Diagnoser</em> based on a hybrid information retrieval model architecture of an augmented retriever and an augmented reranker. We propose an approach to respectively construct <em>DD-retriever</em> and <em>DD-reranker</em> through contrastive text embeddings with Effective Sample Mining, for addressing the domain-specific semantic misalignment during contrastive learning and thereby enhancing the diagnostic accuracy in disease diagnosis. Specifically, Effective Sample Mining provides high-quality positive and negative samples for model fine-tuning, augmenting contrastive learning. We define Semantic Target in order to improve the capability of a text embedding model in identifying positive sample during contrastive learning. Extensive experiments demonstrate that <em>Disease Diagnoser</em> outperforms the best performing SOTA LLM, by 12.9%, 22.9% and 24.8% respectively on top-3, top-5 and top-10 diagnostic accuracy. Our approach is validated to be generalized to any hospital, using its private annotated clinical notes, to construct a specific disease diagnosis model. Additionally, we construct PMC-Patients-DD, a new public clinical note dataset with grounded truth, specifically designed for disease diagnosis related tasks. This dataset is available for more researchers in the field of disease diagnosis to facilitate further researches.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100578"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.array.2025.100599
Huang Chenan , Narumasa Tsutsumida
Unsupervised clustering has emerged as a critical tool for uncovering hidden patterns in vast, unlabeled datasets. However, traditional methods, such as Partitioning Around Medoids (PAM), struggle with scalability owing to their quadratic computational complexity. To address this limitation, we introduce WOA-kMedoids, a novel unsupervised clustering method that incorporates the Whale Optimization Algorithm (WOA), a nature-inspired metaheuristic inspired by the hunting strategies of humpback whales. By optimizing the centroid selection, WOA-kMedoids reduces the computational complexity from quadratic to near-linear with respect to the number of observations, enabling scalability to large datasets while maintaining high clustering accuracy. We evaluated WOA-kMedoids using 25 diverse time-series datasets from the UCR archive. Our empirical results show that WOA-kMedoids achieved a clustering performance comparable to PAM, with an average Rand Index (RI) of 0.731 compared to PAM’s 0.739, outperforming PAM on 12 out of 25 datasets. While exhibiting a slightly higher runtime than PAM on small datasets ( observations), WOA-kMedoids outperformed PAM on larger datasets, with an average speedup of 1.7× and a maximum of 2.3×. The scalability of WOA-kMedoids, combined with its high accuracy, makes them a promising choice for unsupervised clustering in big data applications. This method has implications for efficient knowledge discovery in massive unlabeled datasets, particularly where traditional k-medoids methods are computationally infeasible, including IoT anomaly detection, biomedical signal analysis, and customer behavior clustering.
{"title":"A scalable k-medoids clustering via whale optimization algorithm","authors":"Huang Chenan , Narumasa Tsutsumida","doi":"10.1016/j.array.2025.100599","DOIUrl":"10.1016/j.array.2025.100599","url":null,"abstract":"<div><div>Unsupervised clustering has emerged as a critical tool for uncovering hidden patterns in vast, unlabeled datasets. However, traditional methods, such as Partitioning Around Medoids (PAM), struggle with scalability owing to their quadratic computational complexity. To address this limitation, we introduce WOA-kMedoids, a novel unsupervised clustering method that incorporates the Whale Optimization Algorithm (WOA), a nature-inspired metaheuristic inspired by the hunting strategies of humpback whales. By optimizing the centroid selection, WOA-kMedoids reduces the computational complexity from quadratic to near-linear with respect to the number of observations, enabling scalability to large datasets while maintaining high clustering accuracy. We evaluated WOA-kMedoids using 25 diverse time-series datasets from the UCR archive. Our empirical results show that WOA-kMedoids achieved a clustering performance comparable to PAM, with an average Rand Index (RI) of 0.731 compared to PAM’s 0.739, outperforming PAM on 12 out of 25 datasets. While exhibiting a slightly higher runtime than PAM on small datasets (<span><math><mrow><mo><</mo><mn>300</mn></mrow></math></span> observations), WOA-kMedoids outperformed PAM on larger datasets, with an average speedup of 1.7× and a maximum of 2.3×. The scalability of WOA-kMedoids, combined with its high accuracy, makes them a promising choice for unsupervised clustering in big data applications. This method has implications for efficient knowledge discovery in massive unlabeled datasets, particularly where traditional k-medoids methods are computationally infeasible, including IoT anomaly detection, biomedical signal analysis, and customer behavior clustering.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100599"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.array.2025.100613
Bilal Sowan , Li Zhang , Essam H. Houssein , Hazem Qattous , Mohammad Azzeh , Bayan Massad
Chronic Kidney Disease (CKD) affects over 800 million individuals worldwide, yet existing prediction models deliver suboptimal performance on imbalanced datasets and lack clinical interpretability for early detection. This study presents DOVE-FELM, a framework that integrates Diverse Optimization Voting Ensemble (DOVE) with Fusion Ensemble Learning Model (FELM) for early-stage CKD risk prediction. DOVE employs consensus-based feature selection through eight heterogeneous optimization algorithms and incorporates a Symptom-Weighted and Influenced Patient Network (SWIPN) to capture patient–symptom connectivity patterns. FELM combines Random Forest and REPtree classifiers through adaptive weighting optimized via meta-level learning. The framework was validated on six imbalanced medical datasets (imbalance ratios from 1.67:1 to 29.06:1) through 10-fold stratified cross-validation. Specifically, evaluated using the UCI CKD dataset, DOVE-FELM achieved 99.75%, 99.8%, 99.8%, 99.8%, and 0.9947 for accuracy, AUC, sensitivity, specificity, and Cohen’s Kappa scores, respectively. The Wilcoxon signed-rank test further ascertains statistical significance of the proposed model over nine baseline methods. Also, external validation on the Tawam Hospital CKD dataset with an imbalance ratio of 7.77:1, yielded an accuracy rate of 95.19%. Cross-disease validation on thyroid disease, thoracic surgery, cervical cancer, and AIDS clinical trials datasets demonstrated consistent performance (96.80–99.71% accuracy). Feature dimensionality reduction of 70.8% (24 7 biomarkers) enhanced clinical interpretability. DOVE-FELM advances computational frameworks for early chronic disease prediction through the integration of optimization-based feature selection with clinical domain knowledge. It shows strong potential for application in population-scale screening programs.
{"title":"DOVE-FELM: A fusion-optimized feature selection and heterogeneous ensemble learning framework for early prediction of chronic kidney disease risk","authors":"Bilal Sowan , Li Zhang , Essam H. Houssein , Hazem Qattous , Mohammad Azzeh , Bayan Massad","doi":"10.1016/j.array.2025.100613","DOIUrl":"10.1016/j.array.2025.100613","url":null,"abstract":"<div><div>Chronic Kidney Disease (CKD) affects over 800 million individuals worldwide, yet existing prediction models deliver suboptimal performance on imbalanced datasets and lack clinical interpretability for early detection. This study presents DOVE-FELM, a framework that integrates Diverse Optimization Voting Ensemble (DOVE) with Fusion Ensemble Learning Model (FELM) for early-stage CKD risk prediction. DOVE employs consensus-based feature selection through eight heterogeneous optimization algorithms and incorporates a Symptom-Weighted and Influenced Patient Network (SWIPN) to capture patient–symptom connectivity patterns. FELM combines Random Forest and REPtree classifiers through adaptive weighting optimized via meta-level learning. The framework was validated on six imbalanced medical datasets (imbalance ratios from 1.67:1 to 29.06:1) through 10-fold stratified cross-validation. Specifically, evaluated using the UCI CKD dataset, DOVE-FELM achieved 99.75%, 99.8%, 99.8%, 99.8%, and 0.9947 for accuracy, AUC, sensitivity, specificity, and Cohen’s Kappa scores, respectively. The Wilcoxon signed-rank test further ascertains statistical significance of the proposed model over nine baseline methods. Also, external validation on the Tawam Hospital CKD dataset with an imbalance ratio of 7.77:1, yielded an accuracy rate of 95.19%. Cross-disease validation on thyroid disease, thoracic surgery, cervical cancer, and AIDS clinical trials datasets demonstrated consistent performance (96.80–99.71% accuracy). Feature dimensionality reduction of 70.8% (24 <span><math><mo>→</mo></math></span> 7 biomarkers) enhanced clinical interpretability. DOVE-FELM advances computational frameworks for early chronic disease prediction through the integration of optimization-based feature selection with clinical domain knowledge. It shows strong potential for application in population-scale screening programs.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100613"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1016/j.array.2025.100590
Mohammad Madine , Khaled Salah , Raja Jayaraman , Ibrar Yaqoob
In recent years, the healthcare sector has been increasingly challenged in securing patient identities and medical records on blockchain due to rising privacy demands and strict regulatory requirements. Although advanced techniques like self-sovereign identity and zero-knowledge proofs (ZKPs) show promise, these solutions fail to limit unwarranted patient data disclosure effectively. In this paper, we propose a ZKP-based solution that combines STARKs and anonymous credentials to enable anonymous authentication and enhance privacy across both public and private blockchains. Leveraging transparent ZKP schemes and anonymous credentials, our approach ensures unlinkability by preventing the correlation of multiple patient interactions. We present sequence diagrams of real-world interactions, detailed algorithms for on- and off-chain computations, and implement the system on Ethereum and Starknet blockchains. We present a rigorous evaluation of the proposed solution, encompassing smart contract testing on Starknet networks, transaction cost analysis, performance benchmarking, scalability assessment, and static security auditing. The results demonstrate consistent and economically viable transaction costs, millisecond-level execution times for credential issuance, presentation generation, and verification, linear scalability with increasing claim count and size. We compare our solution with state-of-the-art ZKP-based identity systems to demonstrate its superiority. We further discuss its broader applicability beyond healthcare, including domains such as finance, education, and supply chain management. We make the smart contract codes publicly available on GitHub.
{"title":"Zero-knowledge proofs for anonymous authentication of patients on public and private blockchains","authors":"Mohammad Madine , Khaled Salah , Raja Jayaraman , Ibrar Yaqoob","doi":"10.1016/j.array.2025.100590","DOIUrl":"10.1016/j.array.2025.100590","url":null,"abstract":"<div><div>In recent years, the healthcare sector has been increasingly challenged in securing patient identities and medical records on blockchain due to rising privacy demands and strict regulatory requirements. Although advanced techniques like self-sovereign identity and zero-knowledge proofs (ZKPs) show promise, these solutions fail to limit unwarranted patient data disclosure effectively. In this paper, we propose a ZKP-based solution that combines STARKs and anonymous credentials to enable anonymous authentication and enhance privacy across both public and private blockchains. Leveraging transparent ZKP schemes and anonymous credentials, our approach ensures unlinkability by preventing the correlation of multiple patient interactions. We present sequence diagrams of real-world interactions, detailed algorithms for on- and off-chain computations, and implement the system on Ethereum and Starknet blockchains. We present a rigorous evaluation of the proposed solution, encompassing smart contract testing on Starknet networks, transaction cost analysis, performance benchmarking, scalability assessment, and static security auditing. The results demonstrate consistent and economically viable transaction costs, millisecond-level execution times for credential issuance, presentation generation, and verification, linear scalability with increasing claim count and size. We compare our solution with state-of-the-art ZKP-based identity systems to demonstrate its superiority. We further discuss its broader applicability beyond healthcare, including domains such as finance, education, and supply chain management. We make the smart contract codes publicly available on GitHub.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100590"},"PeriodicalIF":4.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early and accurate detection of lung and colon cancer is critical for improving patient survival rates. However, conventional classification techniques often struggle with the complexity of medical imaging data. In this study, we propose a novel hybrid architecture, the Separable Convolution and Attention Mechanism (SCA-mechanism), to enhance the accuracy of cancer classification. The proposed model integrates separable convolutions with residual connections to facilitate efficient feature extraction while incorporating attention mechanisms to emphasize key regions indicative of malignancy. We evaluated the performance of the SCA-mechanism model against state-of-the-art deep learning techniques using a benchmark dataset of lung and colon cancer images. Experimental results demonstrate that our proposed approach achieves superior classification performance, outperforming conventional deep learning methods in both accuracy and computational efficiency. These findings highlight the potential of the SCA-mechanism as a promising tool for automated cancer diagnosis, offering significant advancements in computationally efficient medical applications and clinical decision making.
{"title":"A hybrid architecture with separable convolutions and attention for lung and colon cancer detection","authors":"Md. Darun Nayeem, Md. Emdadul Hasan Shishir, Munshi Touibur Rahman, Zeeshan Chowdhury Juwel, Sagor Sutradhar, Sudipto Chaki, Md. Saifur Rahman, A.B.M. Shawkat Ali","doi":"10.1016/j.array.2025.100591","DOIUrl":"10.1016/j.array.2025.100591","url":null,"abstract":"<div><div>Early and accurate detection of lung and colon cancer is critical for improving patient survival rates. However, conventional classification techniques often struggle with the complexity of medical imaging data. In this study, we propose a novel hybrid architecture, the Separable Convolution and Attention Mechanism (SCA-mechanism), to enhance the accuracy of cancer classification. The proposed model integrates separable convolutions with residual connections to facilitate efficient feature extraction while incorporating attention mechanisms to emphasize key regions indicative of malignancy. We evaluated the performance of the SCA-mechanism model against state-of-the-art deep learning techniques using a benchmark dataset of lung and colon cancer images. Experimental results demonstrate that our proposed approach achieves superior classification performance, outperforming conventional deep learning methods in both accuracy and computational efficiency. These findings highlight the potential of the SCA-mechanism as a promising tool for automated cancer diagnosis, offering significant advancements in computationally efficient medical applications and clinical decision making.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100591"},"PeriodicalIF":4.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The transition to decarbonized energy systems has been accelerated by the rapid deployment of photovoltaic (PV) power generation. Accurate forecasting technology of PV power output is crucial for maintaining grid stability and effectively optimal operation considering ramping requirements. However, traditional forecasting methods often struggle with unreliable weather predictions and regional discrepancies in forecast accuracy, leading to suboptimal performance of operation results. In this paper, a novel PV power output forecasting model that combines an enhanced temporal convolutional network (TCN) with pre-trained heuristic invariant risk minimization (PHIRM) was proposed to address these challenges. The enhanced TCN captures long-term temporal dependencies in the data, while IRM improves the model's generalization under distributional shifts caused by regional variations in weather forecast errors. Additionally, a method for estimating the ramping capacity of power systems based on the predicted PV power output was also proposed. Case studies on a modified IEEE 33-bus distribution system demonstrated that the proposed method significantly improved forecasting accuracies compared to traditional approaches. The results also indicated that effective ramping capacity estimation results could enhance the integrating capability of PV generation for power grids.
{"title":"Robust photovoltaic power output forecasting and ramping capacity estimation for smart energy systems","authors":"Sheng Chen , Zhiqi Chen , Quanxi Yu , Honglue Zhang , Chao Lin","doi":"10.1016/j.array.2025.100576","DOIUrl":"10.1016/j.array.2025.100576","url":null,"abstract":"<div><div>The transition to decarbonized energy systems has been accelerated by the rapid deployment of photovoltaic (PV) power generation. Accurate forecasting technology of PV power output is crucial for maintaining grid stability and effectively optimal operation considering ramping requirements. However, traditional forecasting methods often struggle with unreliable weather predictions and regional discrepancies in forecast accuracy, leading to suboptimal performance of operation results. In this paper, a novel PV power output forecasting model that combines an enhanced temporal convolutional network (TCN) with pre-trained heuristic invariant risk minimization (PHIRM) was proposed to address these challenges. The enhanced TCN captures long-term temporal dependencies in the data, while IRM improves the model's generalization under distributional shifts caused by regional variations in weather forecast errors. Additionally, a method for estimating the ramping capacity of power systems based on the predicted PV power output was also proposed. Case studies on a modified IEEE 33-bus distribution system demonstrated that the proposed method significantly improved forecasting accuracies compared to traditional approaches. The results also indicated that effective ramping capacity estimation results could enhance the integrating capability of PV generation for power grids.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100576"},"PeriodicalIF":4.5,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multifactorial colorectal cancer is a frequently diagnosed disease that is a leading cause of cancer-related mortality worldwide. Optimizing the patient's chances of recovery requires early detection. This research aims to implement a Convolutional Neural Network (CNN) model that can classify colon endoscopic images into three categories: normal, polyp, and cancer, using the VGG-16 architecture. This study employed the following techniques: K-means clustering to manage outliers, data augmentation to enhance the diversity of datasets, and Pearson correlation analysis to confirm the relationship between the augmentation and initial datasets. Data processing steps resulted in an enhanced dataset. Compared to models trained on the initial dataset, the results demonstrated a significant increase in accuracy and generalizability for CNN models trained on the enhanced dataset, which included outlier handling, augmentation, validation, and class balancing. The model's performance evaluation showed an accuracy of 86 %, with notable improvements in F1-score and recall for the cancer class. These findings indicate that the model better-classified images after dataset enhancement.
{"title":"Data enhancement to improve CNN accuracy for early detection of colorectal cancer","authors":"Ummi Athiyah , Supriadi Rustad , Moch Arief Soeleman , Muljono Muljono , Muhamad Akrom , Rahel Amanda Konoralma","doi":"10.1016/j.array.2025.100581","DOIUrl":"10.1016/j.array.2025.100581","url":null,"abstract":"<div><div>Multifactorial colorectal cancer is a frequently diagnosed disease that is a leading cause of cancer-related mortality worldwide. Optimizing the patient's chances of recovery requires early detection. This research aims to implement a Convolutional Neural Network (CNN) model that can classify colon endoscopic images into three categories: normal, polyp, and cancer, using the VGG-16 architecture. This study employed the following techniques: K-means clustering to manage outliers, data augmentation to enhance the diversity of datasets, and Pearson correlation analysis to confirm the relationship between the augmentation and initial datasets. Data processing steps resulted in an enhanced dataset. Compared to models trained on the initial dataset, the results demonstrated a significant increase in accuracy and generalizability for CNN models trained on the enhanced dataset, which included outlier handling, augmentation, validation, and class balancing. The model's performance evaluation showed an accuracy of 86 %, with notable improvements in F1-score and recall for the cancer class. These findings indicate that the model better-classified images after dataset enhancement.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100581"},"PeriodicalIF":4.5,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}