首页 > 最新文献

Array最新文献

英文 中文
Breaking the Complexity Barrier: Enhancing Quality of Service in Simultaneous Multithreading Processors 突破复杂性障碍:提高同步多线程处理器的服务质量
IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-01 DOI: 10.1016/j.array.2025.100602
Sercan Sari , Ugur Nezir , Onur Demir , Gurhan Kucuk
Within the wide variety of applications running in the cloud, demands for service quality and performance guarantees need special attention. Since cloud nodes typically utilize Simultaneous Multithreading (SMT) cores, processor resources may need to be shared and managed among multiple applications. A customer application that demands high levels of Quality of Service (QoS) may require more resources than its best-effort allocated share. QoSMT is an earlier study that focuses on 2-thread SMT cores to address this issue. While a simple relationship between two threads may be relatively straightforward to manage, the interactions among multiple threads can quickly become highly complex and difficult to design and maintain. In this paper, we extend the original QoSMT study to 4-thread SMT cores by proposing three alternative scheduling algorithms that regulate resource usage between a high-priority thread and three ordinary batch-type threads. On average, we achieve 7.2% better overall throughput and up to 218% better throughput for the latency-sensitive, high-priority thread. More than 92% of 600 workloads display higher performance than the baseline configuration, and 8% of workloads perform ten times better compared to a traditional 4-thread baseline SMT with unmanaged resources.
在云中运行的各种应用程序中,需要特别注意对服务质量和性能保证的需求。由于云节点通常使用同步多线程(SMT)内核,因此可能需要在多个应用程序之间共享和管理处理器资源。要求高水平服务质量(QoS)的客户应用程序可能需要比其最佳分配共享更多的资源。qsmt是一个较早的研究,它关注于2线程SMT内核来解决这个问题。虽然两个线程之间的简单关系可能相对容易管理,但多个线程之间的交互可能很快变得非常复杂,难以设计和维护。在本文中,我们通过提出三种可选择的调度算法来调节高优先级线程和三个普通批处理类型线程之间的资源使用,将原始的qsmt研究扩展到4线程SMT内核。平均而言,我们的总吞吐量提高了7.2%,对于延迟敏感的高优先级线程,吞吐量提高了218%。在600个工作负载中,超过92%的工作负载显示出比基线配置更高的性能,8%的工作负载的性能比具有非托管资源的传统4线程基准SMT要好10倍。
{"title":"Breaking the Complexity Barrier: Enhancing Quality of Service in Simultaneous Multithreading Processors","authors":"Sercan Sari ,&nbsp;Ugur Nezir ,&nbsp;Onur Demir ,&nbsp;Gurhan Kucuk","doi":"10.1016/j.array.2025.100602","DOIUrl":"10.1016/j.array.2025.100602","url":null,"abstract":"<div><div>Within the wide variety of applications running in the cloud, demands for service quality and performance guarantees need special attention. Since cloud nodes typically utilize Simultaneous Multithreading (SMT) cores, processor resources may need to be shared and managed among multiple applications. A customer application that demands high levels of Quality of Service (QoS) may require more resources than its best-effort allocated share. QoSMT is an earlier study that focuses on 2-thread SMT cores to address this issue. While a simple relationship between two threads may be relatively straightforward to manage, the interactions among multiple threads can quickly become highly complex and difficult to design and maintain. In this paper, we extend the original QoSMT study to 4-thread SMT cores by proposing three alternative scheduling algorithms that regulate resource usage between a high-priority thread and three ordinary batch-type threads. On average, we achieve 7.2% better overall throughput and up to 218% better throughput for the latency-sensitive, high-priority thread. More than 92% of 600 workloads display higher performance than the baseline configuration, and 8% of workloads perform ten times better compared to a traditional 4-thread baseline SMT with unmanaged resources.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100602"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient management of computational resources in the IoT FOG layer. Automation and load balancing using Simple FOG Management Protocol 物联网FOG层计算资源的高效管理。自动化和负载平衡使用简单的雾管理协议
IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-01 DOI: 10.1016/j.array.2025.100597
Lucia Arnau Muñoz, José Vicente Berná Martínez, David Saavedra Pastor, Iren Lorenzo Fonseca
FOG computing is an essential paradigm for Internet of Things (IoT) environments, as it enables the decentralisation of data processing to make better use of resources. The rapid proliferation of IoT devices has created new challenges for this middle layer, especially in terms of reconfiguration and dynamic scalability, processes that have traditionally been managed manually. This study presents an architecture for the FOG layer that implements dynamic control and scaling automation through an advanced control layer composed of a coordinator that calculates the optimal distribution of computational load and a balancer that manages traffic routing. The central component of the proposal is the Simple FOG Management Protocol (SFMP), designed specifically to facilitate automated resource management, ensuring efficient allocation of computational capacity, fair load balancing, and robust fault tolerance mechanisms. Experimental validation in several complex IoT environments at the University of Alicante demonstrated a substantial optimisation in resource utilisation with a reduction in energy consumption of between 40 % and 65 %. All this confirms the flexibility and adaptability of the system to dynamic variations in computational demand in large-scale IoT implementations and its benefits over traditional proposals.
FOG计算是物联网(IoT)环境的基本范例,因为它使数据处理的分散化能够更好地利用资源。物联网设备的快速扩散给中间层带来了新的挑战,特别是在重新配置和动态可扩展性方面,这些过程传统上是手工管理的。本研究提出了FOG层的架构,该架构通过一个高级控制层实现动态控制和缩放自动化,该控制层由一个计算计算负载的最佳分配的协调器和一个管理流量路由的平衡器组成。该提案的核心组件是简单FOG管理协议(SFMP),专门用于促进自动化资源管理,确保计算能力的有效分配,公平的负载平衡和强大的容错机制。阿利坎特大学在几个复杂的物联网环境中进行的实验验证表明,该系统在资源利用方面实现了实质性优化,能耗降低了40%至65%。所有这些都证实了系统在大规模物联网实施中对计算需求动态变化的灵活性和适应性,以及它相对于传统方案的优势。
{"title":"Efficient management of computational resources in the IoT FOG layer. Automation and load balancing using Simple FOG Management Protocol","authors":"Lucia Arnau Muñoz,&nbsp;José Vicente Berná Martínez,&nbsp;David Saavedra Pastor,&nbsp;Iren Lorenzo Fonseca","doi":"10.1016/j.array.2025.100597","DOIUrl":"10.1016/j.array.2025.100597","url":null,"abstract":"<div><div>FOG computing is an essential paradigm for Internet of Things (IoT) environments, as it enables the decentralisation of data processing to make better use of resources. The rapid proliferation of IoT devices has created new challenges for this middle layer, especially in terms of reconfiguration and dynamic scalability, processes that have traditionally been managed manually. This study presents an architecture for the FOG layer that implements dynamic control and scaling automation through an advanced control layer composed of a coordinator that calculates the optimal distribution of computational load and a balancer that manages traffic routing. The central component of the proposal is the Simple FOG Management Protocol (SFMP), designed specifically to facilitate automated resource management, ensuring efficient allocation of computational capacity, fair load balancing, and robust fault tolerance mechanisms. Experimental validation in several complex IoT environments at the University of Alicante demonstrated a substantial optimisation in resource utilisation with a reduction in energy consumption of between 40 % and 65 %. All this confirms the flexibility and adaptability of the system to dynamic variations in computational demand in large-scale IoT implementations and its benefits over traditional proposals.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100597"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GSMT: An explainable semi-supervised multi-label method based on Gower distance GSMT:一种基于高尔距离的可解释半监督多标签方法
IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-01 DOI: 10.1016/j.array.2025.100596
José Carlos Mondragón , Andres Eduardo Gutierrez-Rodríguez , Victor Adrián Sosa Hernández
The financial, health, and education sectors produce vast amounts of data daily. Labeling entries such as assets, patients, and students is both costly and complex due to the evolution of databases into multi-label settings. Handling real-world data requires automatic labeling to circumvent slow manual procedures and explanations for compliance with regulations. In this work, we introduce GSMT, an inductive Explainable Semi-Supervised Multi-Label Random Forest Method based on Gower Distance, which uses supervised and unsupervised data to provide a non-linear solution for mainly tabular multi-label datasets with fully unknown label vectors. GSMT splits the dataset using multi-dimensional manifolds, completes missing label information and inductively predicts new observations while achieving explainability. We demonstrate state-of-the-art performance across Micro F1 Score, AUPRC, AUROC, and Label Rank Average Precision in a study involving 20 numerical and 5 mostly categorical datasets with five missing data ratios. By leveraging unsupervised information on top of numerical and categorical data, GSMT outputs the pattern rules annotated with performance measures, explanations on attribute and label space as well as an inductive model capable of predicting multi-label observations.
金融、卫生和教育部门每天都会产生大量的数据。标记资产、患者和学生等条目既昂贵又复杂,因为数据库已演变为多标签设置。处理真实世界的数据需要自动标记,以规避缓慢的人工程序和符合法规的解释。在这项工作中,我们引入了GSMT,一种基于高尔距离的归纳可解释半监督多标签随机森林方法,它使用监督和无监督数据为具有完全未知标签向量的主要表格多标签数据集提供非线性解决方案。GSMT使用多维流形分割数据集,完成缺失的标签信息,并在实现可解释性的同时归纳预测新的观测值。我们在一项涉及20个数字和5个主要分类数据集的研究中展示了Micro F1 Score、AUPRC、AUROC和Label Rank Average Precision的最先进性能,其中包含5个缺失数据比率。通过利用数值和分类数据之上的无监督信息,GSMT输出带有性能度量注释的模式规则、对属性和标签空间的解释,以及能够预测多标签观测值的归纳模型。
{"title":"GSMT: An explainable semi-supervised multi-label method based on Gower distance","authors":"José Carlos Mondragón ,&nbsp;Andres Eduardo Gutierrez-Rodríguez ,&nbsp;Victor Adrián Sosa Hernández","doi":"10.1016/j.array.2025.100596","DOIUrl":"10.1016/j.array.2025.100596","url":null,"abstract":"<div><div>The financial, health, and education sectors produce vast amounts of data daily. Labeling entries such as assets, patients, and students is both costly and complex due to the evolution of databases into multi-label settings. Handling real-world data requires automatic labeling to circumvent slow manual procedures and explanations for compliance with regulations. In this work, we introduce GSMT, an inductive Explainable Semi-Supervised Multi-Label Random Forest Method based on Gower Distance, which uses supervised and unsupervised data to provide a non-linear solution for mainly tabular multi-label datasets with fully unknown label vectors. GSMT splits the dataset using multi-dimensional manifolds, completes missing label information and inductively predicts new observations while achieving explainability. We demonstrate state-of-the-art performance across Micro F1 Score, AUPRC, AUROC, and Label Rank Average Precision in a study involving 20 numerical and 5 mostly categorical datasets with five missing data ratios. By leveraging unsupervised information on top of numerical and categorical data, GSMT outputs the pattern rules annotated with performance measures, explanations on attribute and label space as well as an inductive model capable of predicting multi-label observations.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100596"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrastive text embeddings with effective sample mining for enhanced disease diagnosis 基于有效样本挖掘的对比文本嵌入增强疾病诊断
IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-01 DOI: 10.1016/j.array.2025.100578
Xuanyi Zhang , Genghong Zhao , Yi Ren , Weiguang Wang , Yingying Feng , Xia Zhang , Jiren Liu
Disease diagnosis is a pivotal task in Clinical Decision Support (CDS), which aids physicians in differential diagnosis, faces challenges in achieving high precision and improving clinical adaptability. The development of deep learning models based on pre-trained transformers, especially Large Language Models (LLMs) and text embedding models, bring opportunities to construct advanced disease diagnosis models, but either kind of model meets challenges. LLM-based disease diagnosis models have not performed reliability on generated diagnoses and physicians difficultly trace the original evidence for these generated diagnoses. Text embedding models can address the previous challenges but the domain-specific semantic misalignment caused by corpus distribution differences between open-domain corpora and authentic clinical notes during the training of general text embedding models causes inaccurate ranking. In this paper, we build Disease Diagnoser based on a hybrid information retrieval model architecture of an augmented retriever and an augmented reranker. We propose an approach to respectively construct DD-retriever and DD-reranker through contrastive text embeddings with Effective Sample Mining, for addressing the domain-specific semantic misalignment during contrastive learning and thereby enhancing the diagnostic accuracy in disease diagnosis. Specifically, Effective Sample Mining provides high-quality positive and negative samples for model fine-tuning, augmenting contrastive learning. We define Semantic Target in order to improve the capability of a text embedding model in identifying positive sample during contrastive learning. Extensive experiments demonstrate that Disease Diagnoser outperforms the best performing SOTA LLM, by 12.9%, 22.9% and 24.8% respectively on top-3, top-5 and top-10 diagnostic accuracy. Our approach is validated to be generalized to any hospital, using its private annotated clinical notes, to construct a specific disease diagnosis model. Additionally, we construct PMC-Patients-DD, a new public clinical note dataset with grounded truth, specifically designed for disease diagnosis related tasks. This dataset is available for more researchers in the field of disease diagnosis to facilitate further researches.
疾病诊断是临床决策支持(CDS)的关键任务,是临床医生鉴别诊断的辅助手段,在实现高精度和提高临床适应性方面面临着挑战。基于预训练变形器的深度学习模型的发展,尤其是大语言模型(Large Language models, llm)和文本嵌入模型,为构建高级疾病诊断模型带来了机遇,但这两种模型都面临挑战。基于llm的疾病诊断模型对生成的诊断没有表现出可靠性,医生很难追踪这些生成诊断的原始证据。文本嵌入模型可以解决上述问题,但在常规文本嵌入模型的训练过程中,由于开放领域语料库与真实临床笔记之间的语料库分布差异而导致的特定领域语义失调导致了不准确的排序。本文基于增强检索器和增强重新排序器的混合信息检索模型体系结构构建了疾病诊断器。我们提出了一种基于有效样本挖掘的对比文本嵌入分别构建dd -检索器和dd -重新排序器的方法,以解决对比学习过程中特定领域的语义错位问题,从而提高疾病诊断中的诊断准确性。具体来说,有效样本挖掘为模型微调提供了高质量的正负样本,增强了对比学习。为了提高文本嵌入模型在对比学习过程中识别阳性样本的能力,我们定义了语义目标。大量实验表明,在前3名、前5名和前10名的诊断准确率上,疾病诊断器比表现最好的SOTA LLM分别高出12.9%、22.9%和24.8%。我们的方法被证明可以推广到任何医院,使用其私人注释临床笔记,构建特定的疾病诊断模型。此外,我们构建了PMC-Patients-DD,这是一个新的公共临床记录数据集,具有接地真理,专门为疾病诊断相关任务设计。该数据集可供更多疾病诊断领域的研究人员使用,以促进进一步的研究。
{"title":"Contrastive text embeddings with effective sample mining for enhanced disease diagnosis","authors":"Xuanyi Zhang ,&nbsp;Genghong Zhao ,&nbsp;Yi Ren ,&nbsp;Weiguang Wang ,&nbsp;Yingying Feng ,&nbsp;Xia Zhang ,&nbsp;Jiren Liu","doi":"10.1016/j.array.2025.100578","DOIUrl":"10.1016/j.array.2025.100578","url":null,"abstract":"<div><div>Disease diagnosis is a pivotal task in Clinical Decision Support (CDS), which aids physicians in differential diagnosis, faces challenges in achieving high precision and improving clinical adaptability. The development of deep learning models based on pre-trained transformers, especially Large Language Models (LLMs) and text embedding models, bring opportunities to construct advanced disease diagnosis models, but either kind of model meets challenges. LLM-based disease diagnosis models have not performed reliability on generated diagnoses and physicians difficultly trace the original evidence for these generated diagnoses. Text embedding models can address the previous challenges but the domain-specific semantic misalignment caused by corpus distribution differences between open-domain corpora and authentic clinical notes during the training of general text embedding models causes inaccurate ranking. In this paper, we build <em>Disease Diagnoser</em> based on a hybrid information retrieval model architecture of an augmented retriever and an augmented reranker. We propose an approach to respectively construct <em>DD-retriever</em> and <em>DD-reranker</em> through contrastive text embeddings with Effective Sample Mining, for addressing the domain-specific semantic misalignment during contrastive learning and thereby enhancing the diagnostic accuracy in disease diagnosis. Specifically, Effective Sample Mining provides high-quality positive and negative samples for model fine-tuning, augmenting contrastive learning. We define Semantic Target in order to improve the capability of a text embedding model in identifying positive sample during contrastive learning. Extensive experiments demonstrate that <em>Disease Diagnoser</em> outperforms the best performing SOTA LLM, by 12.9%, 22.9% and 24.8% respectively on top-3, top-5 and top-10 diagnostic accuracy. Our approach is validated to be generalized to any hospital, using its private annotated clinical notes, to construct a specific disease diagnosis model. Additionally, we construct PMC-Patients-DD, a new public clinical note dataset with grounded truth, specifically designed for disease diagnosis related tasks. This dataset is available for more researchers in the field of disease diagnosis to facilitate further researches.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100578"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scalable k-medoids clustering via whale optimization algorithm 基于鲸鱼优化算法的可扩展k- medioids聚类
IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-01 DOI: 10.1016/j.array.2025.100599
Huang Chenan , Narumasa Tsutsumida
Unsupervised clustering has emerged as a critical tool for uncovering hidden patterns in vast, unlabeled datasets. However, traditional methods, such as Partitioning Around Medoids (PAM), struggle with scalability owing to their quadratic computational complexity. To address this limitation, we introduce WOA-kMedoids, a novel unsupervised clustering method that incorporates the Whale Optimization Algorithm (WOA), a nature-inspired metaheuristic inspired by the hunting strategies of humpback whales. By optimizing the centroid selection, WOA-kMedoids reduces the computational complexity from quadratic to near-linear with respect to the number of observations, enabling scalability to large datasets while maintaining high clustering accuracy. We evaluated WOA-kMedoids using 25 diverse time-series datasets from the UCR archive. Our empirical results show that WOA-kMedoids achieved a clustering performance comparable to PAM, with an average Rand Index (RI) of 0.731 compared to PAM’s 0.739, outperforming PAM on 12 out of 25 datasets. While exhibiting a slightly higher runtime than PAM on small datasets (<300 observations), WOA-kMedoids outperformed PAM on larger datasets, with an average speedup of 1.7× and a maximum of 2.3×. The scalability of WOA-kMedoids, combined with its high accuracy, makes them a promising choice for unsupervised clustering in big data applications. This method has implications for efficient knowledge discovery in massive unlabeled datasets, particularly where traditional k-medoids methods are computationally infeasible, including IoT anomaly detection, biomedical signal analysis, and customer behavior clustering.
无监督聚类已经成为揭示大量未标记数据集中隐藏模式的关键工具。然而,传统的方法,如围绕介质分区(PAM),由于其二次型的计算复杂度而难以实现可伸缩性。为了解决这一限制,我们引入了WOA- kmedoids,这是一种新的无监督聚类方法,它结合了鲸鱼优化算法(WOA),这是一种受座头鲸狩猎策略启发的自然启发的元启发式算法。通过优化质心选择,WOA-kMedoids将观测值的计算复杂度从二次型降低到近线性,在保持高聚类精度的同时实现了大数据集的可扩展性。我们使用来自UCR存档的25个不同的时间序列数据集对woa - kmediids进行了评估。我们的实证结果表明,WOA-kMedoids实现了与PAM相当的聚类性能,其平均Rand指数(RI)为0.731,而PAM为0.739,在25个数据集中的12个上优于PAM。虽然在小型数据集(<;300个观测值)上的运行时间略高于PAM,但在大型数据集上,WOA-kMedoids的性能优于PAM,平均加速1.7倍,最大加速2.3倍。WOA-kMedoids的可扩展性,加上它的高准确性,使它们成为大数据应用中无监督聚类的一个有前途的选择。该方法对大规模未标记数据集的有效知识发现具有重要意义,特别是在传统的k- mediids方法在计算上不可行的情况下,包括物联网异常检测、生物医学信号分析和客户行为聚类。
{"title":"A scalable k-medoids clustering via whale optimization algorithm","authors":"Huang Chenan ,&nbsp;Narumasa Tsutsumida","doi":"10.1016/j.array.2025.100599","DOIUrl":"10.1016/j.array.2025.100599","url":null,"abstract":"<div><div>Unsupervised clustering has emerged as a critical tool for uncovering hidden patterns in vast, unlabeled datasets. However, traditional methods, such as Partitioning Around Medoids (PAM), struggle with scalability owing to their quadratic computational complexity. To address this limitation, we introduce WOA-kMedoids, a novel unsupervised clustering method that incorporates the Whale Optimization Algorithm (WOA), a nature-inspired metaheuristic inspired by the hunting strategies of humpback whales. By optimizing the centroid selection, WOA-kMedoids reduces the computational complexity from quadratic to near-linear with respect to the number of observations, enabling scalability to large datasets while maintaining high clustering accuracy. We evaluated WOA-kMedoids using 25 diverse time-series datasets from the UCR archive. Our empirical results show that WOA-kMedoids achieved a clustering performance comparable to PAM, with an average Rand Index (RI) of 0.731 compared to PAM’s 0.739, outperforming PAM on 12 out of 25 datasets. While exhibiting a slightly higher runtime than PAM on small datasets (<span><math><mrow><mo>&lt;</mo><mn>300</mn></mrow></math></span> observations), WOA-kMedoids outperformed PAM on larger datasets, with an average speedup of 1.7× and a maximum of 2.3×. The scalability of WOA-kMedoids, combined with its high accuracy, makes them a promising choice for unsupervised clustering in big data applications. This method has implications for efficient knowledge discovery in massive unlabeled datasets, particularly where traditional k-medoids methods are computationally infeasible, including IoT anomaly detection, biomedical signal analysis, and customer behavior clustering.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100599"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DOVE-FELM: A fusion-optimized feature selection and heterogeneous ensemble learning framework for early prediction of chronic kidney disease risk DOVE-FELM:用于慢性肾脏疾病风险早期预测的融合优化特征选择和异构集成学习框架
IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-01 DOI: 10.1016/j.array.2025.100613
Bilal Sowan , Li Zhang , Essam H. Houssein , Hazem Qattous , Mohammad Azzeh , Bayan Massad
Chronic Kidney Disease (CKD) affects over 800 million individuals worldwide, yet existing prediction models deliver suboptimal performance on imbalanced datasets and lack clinical interpretability for early detection. This study presents DOVE-FELM, a framework that integrates Diverse Optimization Voting Ensemble (DOVE) with Fusion Ensemble Learning Model (FELM) for early-stage CKD risk prediction. DOVE employs consensus-based feature selection through eight heterogeneous optimization algorithms and incorporates a Symptom-Weighted and Influenced Patient Network (SWIPN) to capture patient–symptom connectivity patterns. FELM combines Random Forest and REPtree classifiers through adaptive weighting optimized via meta-level learning. The framework was validated on six imbalanced medical datasets (imbalance ratios from 1.67:1 to 29.06:1) through 10-fold stratified cross-validation. Specifically, evaluated using the UCI CKD dataset, DOVE-FELM achieved 99.75%, 99.8%, 99.8%, 99.8%, and 0.9947 for accuracy, AUC, sensitivity, specificity, and Cohen’s Kappa scores, respectively. The Wilcoxon signed-rank test further ascertains statistical significance of the proposed model over nine baseline methods. Also, external validation on the Tawam Hospital CKD dataset with an imbalance ratio of 7.77:1, yielded an accuracy rate of 95.19%. Cross-disease validation on thyroid disease, thoracic surgery, cervical cancer, and AIDS clinical trials datasets demonstrated consistent performance (96.80–99.71% accuracy). Feature dimensionality reduction of 70.8% (24 7 biomarkers) enhanced clinical interpretability. DOVE-FELM advances computational frameworks for early chronic disease prediction through the integration of optimization-based feature selection with clinical domain knowledge. It shows strong potential for application in population-scale screening programs.
慢性肾脏疾病(CKD)影响全球超过8亿人,然而现有的预测模型在不平衡的数据集上表现不佳,并且缺乏早期检测的临床可解释性。本研究提出了DOVE-FELM框架,该框架将多元优化投票集成(DOVE)与融合集成学习模型(FELM)集成在一起,用于早期CKD风险预测。DOVE通过八种异构优化算法采用基于共识的特征选择,并结合症状加权和受影响的患者网络(SWIPN)来捕获患者-症状连接模式。FELM通过元级学习优化的自适应加权,将随机森林和REPtree分类器结合在一起。通过10倍分层交叉验证,在6个不平衡医疗数据集(不平衡比率从1.67:1到29.06:1)上对该框架进行了验证。具体来说,使用UCI CKD数据集进行评估,DOVE-FELM在准确性、AUC、敏感性、特异性和Cohen’s Kappa评分方面分别达到99.75%、99.8%、99.8%、99.8%和0.9947。Wilcoxon sign -rank检验进一步确定了所提出模型在九种基线方法上的统计学显著性。此外,对Tawam医院CKD数据集进行外部验证,其不平衡比为7.77:1,准确率为95.19%。甲状腺疾病、胸外科、宫颈癌和艾滋病临床试验数据集的交叉疾病验证结果一致(准确率为96.80-99.71%)。特征维度降低70.8%(24→7个生物标志物),增强了临床可解释性。DOVE-FELM通过将基于优化的特征选择与临床领域知识相结合,推进了早期慢性疾病预测的计算框架。它在人群规模的筛查项目中显示出强大的应用潜力。
{"title":"DOVE-FELM: A fusion-optimized feature selection and heterogeneous ensemble learning framework for early prediction of chronic kidney disease risk","authors":"Bilal Sowan ,&nbsp;Li Zhang ,&nbsp;Essam H. Houssein ,&nbsp;Hazem Qattous ,&nbsp;Mohammad Azzeh ,&nbsp;Bayan Massad","doi":"10.1016/j.array.2025.100613","DOIUrl":"10.1016/j.array.2025.100613","url":null,"abstract":"<div><div>Chronic Kidney Disease (CKD) affects over 800 million individuals worldwide, yet existing prediction models deliver suboptimal performance on imbalanced datasets and lack clinical interpretability for early detection. This study presents DOVE-FELM, a framework that integrates Diverse Optimization Voting Ensemble (DOVE) with Fusion Ensemble Learning Model (FELM) for early-stage CKD risk prediction. DOVE employs consensus-based feature selection through eight heterogeneous optimization algorithms and incorporates a Symptom-Weighted and Influenced Patient Network (SWIPN) to capture patient–symptom connectivity patterns. FELM combines Random Forest and REPtree classifiers through adaptive weighting optimized via meta-level learning. The framework was validated on six imbalanced medical datasets (imbalance ratios from 1.67:1 to 29.06:1) through 10-fold stratified cross-validation. Specifically, evaluated using the UCI CKD dataset, DOVE-FELM achieved 99.75%, 99.8%, 99.8%, 99.8%, and 0.9947 for accuracy, AUC, sensitivity, specificity, and Cohen’s Kappa scores, respectively. The Wilcoxon signed-rank test further ascertains statistical significance of the proposed model over nine baseline methods. Also, external validation on the Tawam Hospital CKD dataset with an imbalance ratio of 7.77:1, yielded an accuracy rate of 95.19%. Cross-disease validation on thyroid disease, thoracic surgery, cervical cancer, and AIDS clinical trials datasets demonstrated consistent performance (96.80–99.71% accuracy). Feature dimensionality reduction of 70.8% (24 <span><math><mo>→</mo></math></span> 7 biomarkers) enhanced clinical interpretability. DOVE-FELM advances computational frameworks for early chronic disease prediction through the integration of optimization-based feature selection with clinical domain knowledge. It shows strong potential for application in population-scale screening programs.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100613"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-knowledge proofs for anonymous authentication of patients on public and private blockchains 在公共和私有区块链上对患者进行匿名认证的零知识证明
IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-11-20 DOI: 10.1016/j.array.2025.100590
Mohammad Madine , Khaled Salah , Raja Jayaraman , Ibrar Yaqoob
In recent years, the healthcare sector has been increasingly challenged in securing patient identities and medical records on blockchain due to rising privacy demands and strict regulatory requirements. Although advanced techniques like self-sovereign identity and zero-knowledge proofs (ZKPs) show promise, these solutions fail to limit unwarranted patient data disclosure effectively. In this paper, we propose a ZKP-based solution that combines STARKs and anonymous credentials to enable anonymous authentication and enhance privacy across both public and private blockchains. Leveraging transparent ZKP schemes and anonymous credentials, our approach ensures unlinkability by preventing the correlation of multiple patient interactions. We present sequence diagrams of real-world interactions, detailed algorithms for on- and off-chain computations, and implement the system on Ethereum and Starknet blockchains. We present a rigorous evaluation of the proposed solution, encompassing smart contract testing on Starknet networks, transaction cost analysis, performance benchmarking, scalability assessment, and static security auditing. The results demonstrate consistent and economically viable transaction costs, millisecond-level execution times for credential issuance, presentation generation, and verification, linear scalability with increasing claim count and size. We compare our solution with state-of-the-art ZKP-based identity systems to demonstrate its superiority. We further discuss its broader applicability beyond healthcare, including domains such as finance, education, and supply chain management. We make the smart contract codes publicly available on GitHub.
近年来,由于日益增长的隐私要求和严格的监管要求,医疗保健行业在b区块链上保护患者身份和医疗记录方面面临越来越大的挑战。尽管自我主权身份和零知识证明(ZKPs)等先进技术显示出希望,但这些解决方案无法有效限制不必要的患者数据泄露。在本文中,我们提出了一种基于zkp的解决方案,该解决方案结合了STARKs和匿名凭证,以实现匿名身份验证并增强公共和私有区块链的隐私性。利用透明的ZKP方案和匿名凭证,我们的方法通过防止多个患者交互的相关性来确保不可链接性。我们给出了真实世界交互的序列图,链上和链下计算的详细算法,并在以太坊和Starknet区块链上实现了该系统。我们对提议的解决方案进行了严格的评估,包括在Starknet网络上的智能合约测试、交易成本分析、性能基准测试、可扩展性评估和静态安全审计。结果显示了一致且经济上可行的事务成本、凭证颁发、表示生成和验证的毫秒级执行时间、随着索赔计数和大小的增加而具有的线性可伸缩性。我们将我们的解决方案与最先进的基于zkp的身份系统进行比较,以证明其优越性。我们进一步讨论了它在医疗保健之外的更广泛的适用性,包括金融、教育和供应链管理等领域。我们在GitHub上公开了智能合约代码。
{"title":"Zero-knowledge proofs for anonymous authentication of patients on public and private blockchains","authors":"Mohammad Madine ,&nbsp;Khaled Salah ,&nbsp;Raja Jayaraman ,&nbsp;Ibrar Yaqoob","doi":"10.1016/j.array.2025.100590","DOIUrl":"10.1016/j.array.2025.100590","url":null,"abstract":"<div><div>In recent years, the healthcare sector has been increasingly challenged in securing patient identities and medical records on blockchain due to rising privacy demands and strict regulatory requirements. Although advanced techniques like self-sovereign identity and zero-knowledge proofs (ZKPs) show promise, these solutions fail to limit unwarranted patient data disclosure effectively. In this paper, we propose a ZKP-based solution that combines STARKs and anonymous credentials to enable anonymous authentication and enhance privacy across both public and private blockchains. Leveraging transparent ZKP schemes and anonymous credentials, our approach ensures unlinkability by preventing the correlation of multiple patient interactions. We present sequence diagrams of real-world interactions, detailed algorithms for on- and off-chain computations, and implement the system on Ethereum and Starknet blockchains. We present a rigorous evaluation of the proposed solution, encompassing smart contract testing on Starknet networks, transaction cost analysis, performance benchmarking, scalability assessment, and static security auditing. The results demonstrate consistent and economically viable transaction costs, millisecond-level execution times for credential issuance, presentation generation, and verification, linear scalability with increasing claim count and size. We compare our solution with state-of-the-art ZKP-based identity systems to demonstrate its superiority. We further discuss its broader applicability beyond healthcare, including domains such as finance, education, and supply chain management. We make the smart contract codes publicly available on GitHub.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100590"},"PeriodicalIF":4.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hybrid architecture with separable convolutions and attention for lung and colon cancer detection 具有可分离卷积的混合结构及对肺癌和结肠癌检测的关注
IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-11-20 DOI: 10.1016/j.array.2025.100591
Md. Darun Nayeem, Md. Emdadul Hasan Shishir, Munshi Touibur Rahman, Zeeshan Chowdhury Juwel, Sagor Sutradhar, Sudipto Chaki, Md. Saifur Rahman, A.B.M. Shawkat Ali
Early and accurate detection of lung and colon cancer is critical for improving patient survival rates. However, conventional classification techniques often struggle with the complexity of medical imaging data. In this study, we propose a novel hybrid architecture, the Separable Convolution and Attention Mechanism (SCA-mechanism), to enhance the accuracy of cancer classification. The proposed model integrates separable convolutions with residual connections to facilitate efficient feature extraction while incorporating attention mechanisms to emphasize key regions indicative of malignancy. We evaluated the performance of the SCA-mechanism model against state-of-the-art deep learning techniques using a benchmark dataset of lung and colon cancer images. Experimental results demonstrate that our proposed approach achieves superior classification performance, outperforming conventional deep learning methods in both accuracy and computational efficiency. These findings highlight the potential of the SCA-mechanism as a promising tool for automated cancer diagnosis, offering significant advancements in computationally efficient medical applications and clinical decision making.
肺癌和结肠癌的早期准确检测对提高患者生存率至关重要。然而,传统的分类技术往往与医学影像数据的复杂性作斗争。在本研究中,我们提出了一种新的混合架构,即可分离卷积和注意机制(SCA-mechanism),以提高癌症分类的准确性。该模型将可分离卷积与残差连接结合在一起,以促进有效的特征提取,同时结合注意机制来强调指示恶性肿瘤的关键区域。我们使用肺癌和结肠癌图像的基准数据集评估了sca机制模型与最先进的深度学习技术的性能。实验结果表明,我们提出的方法在分类精度和计算效率上都优于传统的深度学习方法。这些发现突出了sca机制作为一种有前途的自动化癌症诊断工具的潜力,在计算效率的医疗应用和临床决策方面提供了重大进展。
{"title":"A hybrid architecture with separable convolutions and attention for lung and colon cancer detection","authors":"Md. Darun Nayeem,&nbsp;Md. Emdadul Hasan Shishir,&nbsp;Munshi Touibur Rahman,&nbsp;Zeeshan Chowdhury Juwel,&nbsp;Sagor Sutradhar,&nbsp;Sudipto Chaki,&nbsp;Md. Saifur Rahman,&nbsp;A.B.M. Shawkat Ali","doi":"10.1016/j.array.2025.100591","DOIUrl":"10.1016/j.array.2025.100591","url":null,"abstract":"<div><div>Early and accurate detection of lung and colon cancer is critical for improving patient survival rates. However, conventional classification techniques often struggle with the complexity of medical imaging data. In this study, we propose a novel hybrid architecture, the Separable Convolution and Attention Mechanism (SCA-mechanism), to enhance the accuracy of cancer classification. The proposed model integrates separable convolutions with residual connections to facilitate efficient feature extraction while incorporating attention mechanisms to emphasize key regions indicative of malignancy. We evaluated the performance of the SCA-mechanism model against state-of-the-art deep learning techniques using a benchmark dataset of lung and colon cancer images. Experimental results demonstrate that our proposed approach achieves superior classification performance, outperforming conventional deep learning methods in both accuracy and computational efficiency. These findings highlight the potential of the SCA-mechanism as a promising tool for automated cancer diagnosis, offering significant advancements in computationally efficient medical applications and clinical decision making.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100591"},"PeriodicalIF":4.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust photovoltaic power output forecasting and ramping capacity estimation for smart energy systems 智能能源系统的鲁棒光伏输出预测和爬坡容量估计
IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-11-19 DOI: 10.1016/j.array.2025.100576
Sheng Chen , Zhiqi Chen , Quanxi Yu , Honglue Zhang , Chao Lin
The transition to decarbonized energy systems has been accelerated by the rapid deployment of photovoltaic (PV) power generation. Accurate forecasting technology of PV power output is crucial for maintaining grid stability and effectively optimal operation considering ramping requirements. However, traditional forecasting methods often struggle with unreliable weather predictions and regional discrepancies in forecast accuracy, leading to suboptimal performance of operation results. In this paper, a novel PV power output forecasting model that combines an enhanced temporal convolutional network (TCN) with pre-trained heuristic invariant risk minimization (PHIRM) was proposed to address these challenges. The enhanced TCN captures long-term temporal dependencies in the data, while IRM improves the model's generalization under distributional shifts caused by regional variations in weather forecast errors. Additionally, a method for estimating the ramping capacity of power systems based on the predicted PV power output was also proposed. Case studies on a modified IEEE 33-bus distribution system demonstrated that the proposed method significantly improved forecasting accuracies compared to traditional approaches. The results also indicated that effective ramping capacity estimation results could enhance the integrating capability of PV generation for power grids.
光伏(PV)发电的快速部署加速了向脱碳能源系统的过渡。光伏发电输出的准确预测技术对于维护电网稳定和考虑坡降需求的有效优化运行至关重要。然而,传统的天气预报方法往往与不可靠的天气预报和预报精度的区域差异作斗争,导致操作结果的次优性能。本文提出了一种结合增强时间卷积网络(TCN)和预训练启发式不变风险最小化(PHIRM)的新型光伏发电输出预测模型来解决这些挑战。增强的TCN捕获了数据的长期时间依赖性,而IRM提高了模型在天气预报误差区域变化引起的分布变化下的泛化能力。此外,还提出了一种基于光伏发电出力预测的电力系统爬坡容量估算方法。对一个改进的IEEE 33总线配电系统的实例研究表明,与传统方法相比,该方法显著提高了预测精度。结果还表明,有效的爬坡容量估算结果可以提高光伏发电的并网能力。
{"title":"Robust photovoltaic power output forecasting and ramping capacity estimation for smart energy systems","authors":"Sheng Chen ,&nbsp;Zhiqi Chen ,&nbsp;Quanxi Yu ,&nbsp;Honglue Zhang ,&nbsp;Chao Lin","doi":"10.1016/j.array.2025.100576","DOIUrl":"10.1016/j.array.2025.100576","url":null,"abstract":"<div><div>The transition to decarbonized energy systems has been accelerated by the rapid deployment of photovoltaic (PV) power generation. Accurate forecasting technology of PV power output is crucial for maintaining grid stability and effectively optimal operation considering ramping requirements. However, traditional forecasting methods often struggle with unreliable weather predictions and regional discrepancies in forecast accuracy, leading to suboptimal performance of operation results. In this paper, a novel PV power output forecasting model that combines an enhanced temporal convolutional network (TCN) with pre-trained heuristic invariant risk minimization (PHIRM) was proposed to address these challenges. The enhanced TCN captures long-term temporal dependencies in the data, while IRM improves the model's generalization under distributional shifts caused by regional variations in weather forecast errors. Additionally, a method for estimating the ramping capacity of power systems based on the predicted PV power output was also proposed. Case studies on a modified IEEE 33-bus distribution system demonstrated that the proposed method significantly improved forecasting accuracies compared to traditional approaches. The results also indicated that effective ramping capacity estimation results could enhance the integrating capability of PV generation for power grids.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100576"},"PeriodicalIF":4.5,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data enhancement to improve CNN accuracy for early detection of colorectal cancer 数据增强,提高CNN在大肠癌早期检测中的准确性
IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-11-19 DOI: 10.1016/j.array.2025.100581
Ummi Athiyah , Supriadi Rustad , Moch Arief Soeleman , Muljono Muljono , Muhamad Akrom , Rahel Amanda Konoralma
Multifactorial colorectal cancer is a frequently diagnosed disease that is a leading cause of cancer-related mortality worldwide. Optimizing the patient's chances of recovery requires early detection. This research aims to implement a Convolutional Neural Network (CNN) model that can classify colon endoscopic images into three categories: normal, polyp, and cancer, using the VGG-16 architecture. This study employed the following techniques: K-means clustering to manage outliers, data augmentation to enhance the diversity of datasets, and Pearson correlation analysis to confirm the relationship between the augmentation and initial datasets. Data processing steps resulted in an enhanced dataset. Compared to models trained on the initial dataset, the results demonstrated a significant increase in accuracy and generalizability for CNN models trained on the enhanced dataset, which included outlier handling, augmentation, validation, and class balancing. The model's performance evaluation showed an accuracy of 86 %, with notable improvements in F1-score and recall for the cancer class. These findings indicate that the model better-classified images after dataset enhancement.
多因素结直肠癌是一种常见的诊断疾病,是全球癌症相关死亡的主要原因。优化患者康复的机会需要早期发现。本研究旨在使用VGG-16架构实现卷积神经网络(CNN)模型,该模型可以将结肠内窥镜图像分为三类:正常,息肉和癌症。本研究采用了以下技术:K-means聚类来管理异常值,数据增强来增强数据集的多样性,Pearson相关分析来确认增强与初始数据集之间的关系。数据处理步骤产生了增强的数据集。与在初始数据集上训练的模型相比,结果表明在增强数据集上训练的CNN模型的准确性和泛化性显著提高,包括异常值处理、增强、验证和类平衡。该模型的性能评估显示准确率为86%,在f1得分和癌症类别的召回率方面有显著提高。这些结果表明,该模型在数据集增强后能更好地对图像进行分类。
{"title":"Data enhancement to improve CNN accuracy for early detection of colorectal cancer","authors":"Ummi Athiyah ,&nbsp;Supriadi Rustad ,&nbsp;Moch Arief Soeleman ,&nbsp;Muljono Muljono ,&nbsp;Muhamad Akrom ,&nbsp;Rahel Amanda Konoralma","doi":"10.1016/j.array.2025.100581","DOIUrl":"10.1016/j.array.2025.100581","url":null,"abstract":"<div><div>Multifactorial colorectal cancer is a frequently diagnosed disease that is a leading cause of cancer-related mortality worldwide. Optimizing the patient's chances of recovery requires early detection. This research aims to implement a Convolutional Neural Network (CNN) model that can classify colon endoscopic images into three categories: normal, polyp, and cancer, using the VGG-16 architecture. This study employed the following techniques: K-means clustering to manage outliers, data augmentation to enhance the diversity of datasets, and Pearson correlation analysis to confirm the relationship between the augmentation and initial datasets. Data processing steps resulted in an enhanced dataset. Compared to models trained on the initial dataset, the results demonstrated a significant increase in accuracy and generalizability for CNN models trained on the enhanced dataset, which included outlier handling, augmentation, validation, and class balancing. The model's performance evaluation showed an accuracy of 86 %, with notable improvements in F1-score and recall for the cancer class. These findings indicate that the model better-classified images after dataset enhancement.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100581"},"PeriodicalIF":4.5,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Array
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1