Pub Date : 2025-06-01Epub Date: 2025-06-18DOI: 10.3390/a18060368
Emir Veledar, Lili Zhou, Omar Veledar, Hannah Gardener, Carolina M Gutierrez, Jose G Romano, Tatjana Rundek
Explainable Machine Learning (XML) in high-stakes domains demands reproducible methods to aggregate feature importance across multiple models applied to the same structured dataset. We propose the Weighted Importance Score and Frequency Count (WISFC) framework, which combines importance magnitude and consistency by aggregating ranked outputs from diverse explainers. WISFC assigns a weighted score to each feature based on its rank and frequency across model-explainer pairs, providing a robust ensemble feature-importance ranking. Unlike simple consensus voting or ranking heuristics that are insufficient for capturing complex relationships among different explainer outputs, WISFC offers a more principled approach to reconciling and aggregating this information. By aggregating many "weak signals" from brute-force modeling runs, WISFC can surface a stronger consensus on which variables matter most. The framework is designed to be reproducible and generalizable, capable of taking important outputs from any set of machine-learning models and producing an aggregated ranking highlighting consistently important features. This approach acknowledges that any single model is a simplification of complex, multidimensional phenomena; using multiple diverse models, each optimized from a different perspective, WISFC systematically captures different facets of the problem space to create a more structured and comprehensive view. As a consequence, this study offers a useful strategy for researchers and practitioners who seek innovative ways of exploring complex systems, not by discovering entirely new variables but by introducing a novel mindset for systematically combining multiple modeling perspectives.
{"title":"Synthesizing Explainability Across Multiple ML Models for Structured Data.","authors":"Emir Veledar, Lili Zhou, Omar Veledar, Hannah Gardener, Carolina M Gutierrez, Jose G Romano, Tatjana Rundek","doi":"10.3390/a18060368","DOIUrl":"10.3390/a18060368","url":null,"abstract":"<p><p>Explainable Machine Learning (XML) in high-stakes domains demands reproducible methods to aggregate feature importance across multiple models applied to the same structured dataset. We propose the Weighted Importance Score and Frequency Count (WISFC) framework, which combines importance magnitude and consistency by aggregating ranked outputs from diverse explainers. WISFC assigns a weighted score to each feature based on its rank and frequency across model-explainer pairs, providing a robust ensemble feature-importance ranking. Unlike simple consensus voting or ranking heuristics that are insufficient for capturing complex relationships among different explainer outputs, WISFC offers a more principled approach to reconciling and aggregating this information. By aggregating many \"weak signals\" from brute-force modeling runs, WISFC can surface a stronger consensus on which variables matter most. The framework is designed to be reproducible and generalizable, capable of taking important outputs from any set of machine-learning models and producing an aggregated ranking highlighting consistently important features. This approach acknowledges that any single model is a simplification of complex, multidimensional phenomena; using multiple diverse models, each optimized from a different perspective, WISFC systematically captures different facets of the problem space to create a more structured and comprehensive view. As a consequence, this study offers a useful strategy for researchers and practitioners who seek innovative ways of exploring complex systems, not by discovering entirely new variables but by introducing a novel mindset for systematically combining multiple modeling perspectives.</p>","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"18 6","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885564/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146155614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Integer linear programs (ILPs) and mixed integer programs (MIPs) often have multiple distinct optimal solutions, yet the widely used Gurobi optimization solver returns certain solutions at disproportionately high frequencies. This behavior is disadvantageous, as, in fields such as biomedicine, the identification and analysis of distinct optima yields valuable domain-specific insights that inform future research directions. In the present work, we introduce MORSE (Multiple Optima via Random Sampling and careful choice of the parameter Epsilon), a randomized, parallelizable algorithm to efficiently generate multiple optima for ILPs. MORSE maps multiplicative perturbations to the coefficients in an instance's objective function, generating a modified instance that retains an optimum of the original problem. We formalize and prove the above claim in some practical conditions. Furthermore, we prove that for 0/1 selection problems, MORSE finds each distinct optimum with equal probability. We evaluate MORSE using two measures; the number of distinct optima found in independent runs, and the diversity of the list (with repetitions) of solutions by average pairwise Hamming distance and Shannon entropy. Using these metrics, we provide empirical results demonstrating that MORSE outperforms the Gurobi method and unweighted variations of the MORSE method on a set of 20 Mixed Integer Programming Library (MIPLIB) instances and on a combinatorial optimization problem in cancer genomics.
整数线性规划(ILPs)和混合整数规划(MIPs)通常有多个不同的最优解,但广泛使用的Gurobi优化求解器以不成比例的高频率返回某些解。这种行为是不利的,因为在生物医学等领域,识别和分析不同的最优值会产生有价值的领域特定见解,为未来的研究方向提供信息。在本工作中,我们引入了MORSE (Multiple Optima via Random Sampling and careful choice of parameter Epsilon)算法,这是一种随机的、可并行的算法,可以有效地为ILPs生成多个最优解。MORSE将乘法扰动映射到实例目标函数的系数上,生成一个保留原始问题最优值的修改实例。我们在一些实际条件下形式化并证明了上述论断。进一步,我们证明了对于0/1选择问题,MORSE以等概率找到每个不同的最优。我们用两种方法来评估MORSE;在r个独立运行中发现的不同最优值的数量,以及通过平均成对汉明距离和香农熵计算的解决方案列表的多样性。利用这些指标,我们提供了实证结果,证明MORSE在一组20个混合整数规划库(MIPLIB)实例和癌症基因组学的组合优化问题上优于Gurobi方法和MORSE方法的未加权变体。
{"title":"Finding Multiple Optimal Solutions to an Integer Linear Program by Random Perturbations of Its Objective Function.","authors":"Noah Schulhof, Pattara Sukprasert, Eytan Ruppin, Samir Khuller, Alejandro A Schäffer","doi":"10.3390/a18030140","DOIUrl":"10.3390/a18030140","url":null,"abstract":"<p><p>Integer linear programs (ILPs) and mixed integer programs (MIPs) often have multiple distinct optimal solutions, yet the widely used Gurobi optimization solver returns certain solutions at disproportionately high frequencies. This behavior is disadvantageous, as, in fields such as biomedicine, the identification and analysis of distinct optima yields valuable domain-specific insights that inform future research directions. In the present work, we introduce MORSE (Multiple Optima via Random Sampling and careful choice of the parameter Epsilon), a randomized, parallelizable algorithm to efficiently generate multiple optima for ILPs. MORSE maps multiplicative perturbations to the coefficients in an instance's objective function, generating a modified instance that retains an optimum of the original problem. We formalize and prove the above claim in some practical conditions. Furthermore, we prove that for 0/1 selection problems, MORSE finds each distinct optimum with equal probability. We evaluate MORSE using two measures; the number of distinct optima found in <math><mi>r</mi></math> independent runs, and the diversity of the list (with repetitions) of solutions by average pairwise Hamming distance and Shannon entropy. Using these metrics, we provide empirical results demonstrating that MORSE outperforms the Gurobi method and unweighted variations of the MORSE method on a set of 20 Mixed Integer Programming Library (MIPLIB) instances and on a combinatorial optimization problem in cancer genomics.</p>","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"18 3","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11970949/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143794351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2025-01-24DOI: 10.3390/a18020062
Yunge Wang, Lingling Zhang, Tong Si, Graham Bishop, Haijun Gong
The purpose of anomaly detection is to identify special data points or patterns that significantly deviate from the expected or typical behavior of the majority of the data, and it has a wide range of applications across various domains. Most existing statistical and machine learning-based anomaly detection algorithms face challenges when applied to high-dimensional data. For instance, the unconstrained least-squares importance fitting (uLSIF) method, a state-of-the-art anomaly detection approach, encounters the unboundedness problem under certain conditions. In this study, we propose a scaled Bregman divergence-based anomaly detection algorithm using both least absolute deviation and least-squares loss for parameter learning. This new algorithm effectively addresses the unboundedness problem, making it particularly suitable for high-dimensional data. The proposed technique was evaluated on both synthetic and real-world high-dimensional time series datasets, demonstrating its effectiveness in detecting anomalies. Its performance was also compared to other density ratio estimation-based anomaly detection methods.
{"title":"Anomaly Detection in High-Dimensional Time Series Data with Scaled Bregman Divergence.","authors":"Yunge Wang, Lingling Zhang, Tong Si, Graham Bishop, Haijun Gong","doi":"10.3390/a18020062","DOIUrl":"10.3390/a18020062","url":null,"abstract":"<p><p>The purpose of anomaly detection is to identify special data points or patterns that significantly deviate from the expected or typical behavior of the majority of the data, and it has a wide range of applications across various domains. Most existing statistical and machine learning-based anomaly detection algorithms face challenges when applied to high-dimensional data. For instance, the unconstrained least-squares importance fitting (uLSIF) method, a state-of-the-art anomaly detection approach, encounters the unboundedness problem under certain conditions. In this study, we propose a scaled Bregman divergence-based anomaly detection algorithm using both least absolute deviation and least-squares loss for parameter learning. This new algorithm effectively addresses the unboundedness problem, making it particularly suitable for high-dimensional data. The proposed technique was evaluated on both synthetic and real-world high-dimensional time series datasets, demonstrating its effectiveness in detecting anomalies. Its performance was also compared to other density ratio estimation-based anomaly detection methods.</p>","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"18 2","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11790285/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Using spatial data in mobile applications has grown significantly, thereby empowering users to explore locations, navigate unfamiliar areas, find transportation routes, employ geomarketing strategies, and model environmental factors. Spatial databases are pivotal in efficiently storing, retrieving, and manipulating spatial data to fulfill users’ needs. Two fundamental spatial query types, k-nearest neighbors (kNN) and range search, enable users to access specific points of interest (POIs) based on their location, which are measured by actual road distance. However, retrieving the nearest POIs using actual road distance can be computationally intensive due to the need to find the shortest distance. Using straight-line measurements could expedite the process but might compromise accuracy. Consequently, this study aims to evaluate the accuracy of the Euclidean distance method in POIs retrieval by comparing it with the road network distance method. The primary focus is determining whether the trade-off between computational time and accuracy is justified, thus employing the Open Source Routing Machine (OSRM) for distance extraction. The assessment encompasses diverse scenarios and analyses factors influencing the accuracy of the Euclidean distance method. The methodology employs a quantitative approach, thereby categorizing query points based on density and analyzing them using kNN and range query methods. Accuracy in the Euclidean distance method is evaluated against the road network distance method. The results demonstrate peak accuracy for kNN queries at k=1, thus exceeding 85% across classes but declining as k increases. Range queries show varied accuracy based on POI density, with higher-density classes exhibiting earlier accuracy increases. Notably, datasets with fewer POIs exhibit unexpectedly higher accuracy, thereby providing valuable insights into spatial query processing.
在移动应用中使用空间数据的情况显著增加,从而使用户有能力探索地点、浏览陌生区域、寻找交通路线、采用地理营销策略以及建立环境因素模型。空间数据库在有效存储、检索和处理空间数据以满足用户需求方面发挥着关键作用。k-nearest neighbors(kNN)和范围搜索这两种基本空间查询类型使用户能够根据实际道路距离测量的位置访问特定兴趣点(POIs)。然而,由于需要找到最短的距离,使用实际道路距离检索最近的兴趣点可能会耗费大量计算资源。使用直线测量可以加快这一过程,但可能会影响准确性。因此,本研究旨在通过比较欧氏距离法和路网距离法来评估 POI 检索的准确性。主要重点是确定计算时间和准确性之间的权衡是否合理,从而采用开源路由器(OSRM)进行距离提取。评估涵盖了多种场景,并分析了影响欧氏距离法准确性的因素。该方法采用定量方法,根据密度对查询点进行分类,并使用 kNN 和范围查询方法对其进行分析。欧氏距离法的准确性对照路网距离法进行了评估。结果表明,在 k=1 时,kNN 查询的准确率达到峰值,因此在不同类别中的准确率超过 85%,但随着 k 的增加,准确率有所下降。范围查询根据 POI 密度的不同显示出不同的准确率,密度较高的类别显示出较早的准确率增长。值得注意的是, POI 较少的数据集的准确率出乎意料地高,从而为空间查询处理提供了宝贵的见解。
{"title":"Navigating the Maps: Euclidean vs. Road Network Distances in Spatial Queries","authors":"Pornrawee Tatit, Kiki Adhinugraha, David Taniar","doi":"10.3390/a17010029","DOIUrl":"https://doi.org/10.3390/a17010029","url":null,"abstract":"Using spatial data in mobile applications has grown significantly, thereby empowering users to explore locations, navigate unfamiliar areas, find transportation routes, employ geomarketing strategies, and model environmental factors. Spatial databases are pivotal in efficiently storing, retrieving, and manipulating spatial data to fulfill users’ needs. Two fundamental spatial query types, k-nearest neighbors (kNN) and range search, enable users to access specific points of interest (POIs) based on their location, which are measured by actual road distance. However, retrieving the nearest POIs using actual road distance can be computationally intensive due to the need to find the shortest distance. Using straight-line measurements could expedite the process but might compromise accuracy. Consequently, this study aims to evaluate the accuracy of the Euclidean distance method in POIs retrieval by comparing it with the road network distance method. The primary focus is determining whether the trade-off between computational time and accuracy is justified, thus employing the Open Source Routing Machine (OSRM) for distance extraction. The assessment encompasses diverse scenarios and analyses factors influencing the accuracy of the Euclidean distance method. The methodology employs a quantitative approach, thereby categorizing query points based on density and analyzing them using kNN and range query methods. Accuracy in the Euclidean distance method is evaluated against the road network distance method. The results demonstrate peak accuracy for kNN queries at k=1, thus exceeding 85% across classes but declining as k increases. Range queries show varied accuracy based on POI density, with higher-density classes exhibiting earlier accuracy increases. Notably, datasets with fewer POIs exhibit unexpectedly higher accuracy, thereby providing valuable insights into spatial query processing.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"81 5","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139440524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software specifications are of great importance to improve the quality of software. To automatically mine specifications from software systems, some specification mining approaches based on finite-state automatons have been proposed. However, these approaches are inaccurate when dealing with large-scale systems. In order to improve the accuracy of mined specifications, we propose a specification mining approach based on the ordering points to identify the clustering structure clustering algorithm and model checking. In the approach, the neural network model is first used to produce the feature values of states in the traces of the program. Then, according to the feature values, finite-state automatons are generated based on the ordering points to identify the clustering structure clustering algorithm. Further, the finite-state automaton with the highest F-measure is selected. To improve the quality of the finite-state automatons, we refine it based on model checking. The proposed approach was implemented in a tool named MCLSM and experiments, including 13 target classes, were conducted to evaluate its effectiveness. The experimental results show that the average F-measure of finite-state automatons generated by our method reaches 92.19%, which is higher than most related tools.
{"title":"Specification Mining Based on the Ordering Points to Identify the Clustering Structure Clustering Algorithm and Model Checking","authors":"Y. Fan, Meng Wang","doi":"10.3390/a17010028","DOIUrl":"https://doi.org/10.3390/a17010028","url":null,"abstract":"Software specifications are of great importance to improve the quality of software. To automatically mine specifications from software systems, some specification mining approaches based on finite-state automatons have been proposed. However, these approaches are inaccurate when dealing with large-scale systems. In order to improve the accuracy of mined specifications, we propose a specification mining approach based on the ordering points to identify the clustering structure clustering algorithm and model checking. In the approach, the neural network model is first used to produce the feature values of states in the traces of the program. Then, according to the feature values, finite-state automatons are generated based on the ordering points to identify the clustering structure clustering algorithm. Further, the finite-state automaton with the highest F-measure is selected. To improve the quality of the finite-state automatons, we refine it based on model checking. The proposed approach was implemented in a tool named MCLSM and experiments, including 13 target classes, were conducted to evaluate its effectiveness. The experimental results show that the average F-measure of finite-state automatons generated by our method reaches 92.19%, which is higher than most related tools.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"3 5","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139439250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The growing popularity of e-commerce has prompted researchers to take a greater interest in deeper understanding online shopping behavior, consumer interest patterns, and the effectiveness of advertising campaigns. This paper presents a fresh approach for targeting high-value e-shop clients by utilizing clickstream data. We propose the new algorithm to measure customer engagement and recognizing high-value customers. Clickstream data is employed in the algorithm to compute a Customer Merit (CM) index that measures the customer’s level of engagement and anticipates their purchase intent. The CM index is evaluated dynamically by the algorithm, examining the customer’s activity level, efficiency in selecting items, and time spent in browsing. It combines tracking customers browsing and purchasing behaviors with other relevant factors: time spent on the website and frequency of visits to e-shops. This strategy proves highly beneficial for e-commerce enterprises, enabling them to pinpoint potential buyers and design targeted advertising campaigns exclusively for high-value customers of e-shops. It allows not only boosts e-shop sales but also minimizes advertising expenses effectively. The proposed method was tested on actual clickstream data from two e-commerce websites and showed that the personalized advertising campaign outperformed the non-personalized campaign in terms of click-through and conversion rate. In general, the findings suggest, that personalized advertising scenarios can be a useful tool for boosting e-commerce sales and reduce advertising cost. By utilizing clickstream data and adopting a targeted approach, e-commerce businesses can attract and retain high-value customers, leading to higher revenue and profitability.
{"title":"Personalized Advertising in E-Commerce: Using Clickstream Data to Target High-Value Customers","authors":"V. Sakalauskas, D. Kriksciuniene","doi":"10.3390/a17010027","DOIUrl":"https://doi.org/10.3390/a17010027","url":null,"abstract":"The growing popularity of e-commerce has prompted researchers to take a greater interest in deeper understanding online shopping behavior, consumer interest patterns, and the effectiveness of advertising campaigns. This paper presents a fresh approach for targeting high-value e-shop clients by utilizing clickstream data. We propose the new algorithm to measure customer engagement and recognizing high-value customers. Clickstream data is employed in the algorithm to compute a Customer Merit (CM) index that measures the customer’s level of engagement and anticipates their purchase intent. The CM index is evaluated dynamically by the algorithm, examining the customer’s activity level, efficiency in selecting items, and time spent in browsing. It combines tracking customers browsing and purchasing behaviors with other relevant factors: time spent on the website and frequency of visits to e-shops. This strategy proves highly beneficial for e-commerce enterprises, enabling them to pinpoint potential buyers and design targeted advertising campaigns exclusively for high-value customers of e-shops. It allows not only boosts e-shop sales but also minimizes advertising expenses effectively. The proposed method was tested on actual clickstream data from two e-commerce websites and showed that the personalized advertising campaign outperformed the non-personalized campaign in terms of click-through and conversion rate. In general, the findings suggest, that personalized advertising scenarios can be a useful tool for boosting e-commerce sales and reduce advertising cost. By utilizing clickstream data and adopting a targeted approach, e-commerce businesses can attract and retain high-value customers, leading to higher revenue and profitability.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"91 8","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139440304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amr A. Abd El-Mageed, A. Al-Hamadi, Samy Bakheet, Asmaa H. Abd El-Rahiem
It is difficult to determine unknown solar cell and photovoltaic (PV) module parameters owing to the nonlinearity of the characteristic current–voltage (I-V) curve. Despite this, precise parameter estimation is necessary due to the substantial effect parameters have on the efficacy of the PV system with respect to current and energy results. The problem’s characteristics make the handling of algorithms susceptible to local optima and resource-intensive processing. To effectively extract PV model parameter values, an improved hybrid Sparrow Search Algorithm (SSA) with Exponential Distribution Optimization (EDO) based on the Differential Evolution (DE) technique and the bound-constraint modification procedure, called ISSAEDO, is presented in this article. The hybrid strategy utilizes EDO to improve global exploration and SSA to effectively explore the solution space, while DE facilitates local search to improve parameter estimations. The proposed method is compared to standard optimization methods using solar PV system data to demonstrate its effectiveness and speed in obtaining PV model parameters such as the single diode model (SDM) and the double diode model (DDM). The results indicate that the hybrid technique is a viable instrument for enhancing solar PV system design and performance analysis because it can predict PV model parameters accurately.
由于电流-电压(I-V)特性曲线的非线性,很难确定未知的太阳能电池和光伏(PV)模块参数。尽管如此,精确的参数估计仍是必要的,因为参数会对光伏系统的电流和能量效果产生重大影响。该问题的特点使得算法处理容易出现局部最优和资源密集型处理。为了有效提取光伏模型参数值,本文介绍了一种基于差分进化(DE)技术和约束条件修改程序的改进型混合麻雀搜索算法(SSA)和指数分布优化(EDO),称为 ISSAEDO。该混合策略利用 EDO 改善全局探索,利用 SSA 有效探索解空间,而 DE 则促进局部搜索以改善参数估计。利用太阳能光伏系统数据,将提出的方法与标准优化方法进行了比较,以证明其在获取单二极管模型(SDM)和双二极管模型(DDM)等光伏模型参数方面的有效性和速度。结果表明,混合技术可以准确预测光伏模型参数,是提高太阳能光伏系统设计和性能分析的可行工具。
{"title":"Hybrid Sparrow Search-Exponential Distribution Optimization with Differential Evolution for Parameter Prediction of Solar Photovoltaic Models","authors":"Amr A. Abd El-Mageed, A. Al-Hamadi, Samy Bakheet, Asmaa H. Abd El-Rahiem","doi":"10.3390/a17010026","DOIUrl":"https://doi.org/10.3390/a17010026","url":null,"abstract":"It is difficult to determine unknown solar cell and photovoltaic (PV) module parameters owing to the nonlinearity of the characteristic current–voltage (I-V) curve. Despite this, precise parameter estimation is necessary due to the substantial effect parameters have on the efficacy of the PV system with respect to current and energy results. The problem’s characteristics make the handling of algorithms susceptible to local optima and resource-intensive processing. To effectively extract PV model parameter values, an improved hybrid Sparrow Search Algorithm (SSA) with Exponential Distribution Optimization (EDO) based on the Differential Evolution (DE) technique and the bound-constraint modification procedure, called ISSAEDO, is presented in this article. The hybrid strategy utilizes EDO to improve global exploration and SSA to effectively explore the solution space, while DE facilitates local search to improve parameter estimations. The proposed method is compared to standard optimization methods using solar PV system data to demonstrate its effectiveness and speed in obtaining PV model parameters such as the single diode model (SDM) and the double diode model (DDM). The results indicate that the hybrid technique is a viable instrument for enhancing solar PV system design and performance analysis because it can predict PV model parameters accurately.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"44 20","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139442474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a general version of polygonal fitting problem called Unconstrained Polygonal Fitting (UPF). Our goal is to represent a given 2D shape S with an N-vertex polygonal curve P with a known number of vertices, so that the Intersection over Union (IoU) metric between S and P is maximized without any assumption or prior knowledge of the object structure and the location of the N-vertices of P that can be placed anywhere in the 2D space. The search space of the UPF problem is a superset of the classical polygonal approximation (PA) problem, where the vertices are constrained to belong in the boundary of the given 2D shape. Therefore, the resulting solutions of the UPF may better approximate the given curve than the solutions of the PA problem. For a given number of vertices N, a Particle Swarm Optimization (PSO) method is used to maximize the IoU metric, which yields almost optimal solutions. Furthermore, the proposed method has also been implemented under the equal area principle so that the total area covered by P is equal to the area of the original 2D shape to measure how this constraint affects IoU metric. The quantitative results obtained on more than 2800 2D shapes included in two standard datasets quantify the performance of the proposed methods and illustrate that their solutions outperform baselines from the literature.
本文提出了多边形拟合问题的一般版本,称为无约束多边形拟合(UPF)。我们的目标是用已知顶点数的 N 个顶点多边形曲线 P 来表示给定的二维形状 S,从而在不假设或预先知道对象结构和 P 的 N 个顶点位置的情况下,最大化 S 和 P 之间的 "交集大于联合"(IoU)度量,而 P 的 N 个顶点可以放置在二维空间的任何位置。UPF 问题的搜索空间是经典多边形逼近 (PA) 问题的超集,其中顶点受限于属于给定二维形状的边界。因此,UPF 问题的解可能比 PA 问题的解更好地逼近给定曲线。对于给定的顶点数 N,采用粒子群优化(PSO)方法来最大化 IoU 指标,几乎可以得到最优解。此外,还在等面积原则下实施了所提出的方法,使 P 所覆盖的总面积等于原始二维形状的面积,以衡量这一约束条件对 IoU 指标的影响。在两个标准数据集中的 2800 多个二维图形上获得的定量结果量化了所提方法的性能,并说明其解决方案优于文献中的基准。
{"title":"Particle Swarm Optimization-Based Unconstrained Polygonal Fitting of 2D Shapes","authors":"C. Panagiotakis","doi":"10.3390/a17010025","DOIUrl":"https://doi.org/10.3390/a17010025","url":null,"abstract":"In this paper, we present a general version of polygonal fitting problem called Unconstrained Polygonal Fitting (UPF). Our goal is to represent a given 2D shape S with an N-vertex polygonal curve P with a known number of vertices, so that the Intersection over Union (IoU) metric between S and P is maximized without any assumption or prior knowledge of the object structure and the location of the N-vertices of P that can be placed anywhere in the 2D space. The search space of the UPF problem is a superset of the classical polygonal approximation (PA) problem, where the vertices are constrained to belong in the boundary of the given 2D shape. Therefore, the resulting solutions of the UPF may better approximate the given curve than the solutions of the PA problem. For a given number of vertices N, a Particle Swarm Optimization (PSO) method is used to maximize the IoU metric, which yields almost optimal solutions. Furthermore, the proposed method has also been implemented under the equal area principle so that the total area covered by P is equal to the area of the original 2D shape to measure how this constraint affects IoU metric. The quantitative results obtained on more than 2800 2D shapes included in two standard datasets quantify the performance of the proposed methods and illustrate that their solutions outperform baselines from the literature.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"29 5","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139448704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ștefan-Andrei Ionescu, Camelia Delcea, Nora Chirita, I. Nica
This research provides a comprehensive analysis of the dynamic interplay between agent-based modeling (ABM) and artificial intelligence (AI) through a meticulous bibliometric study. This study reveals a substantial increase in scholarly interest, particularly post-2006, peaking in 2021 and 2022, indicating a contemporary surge in research on the synergy between AI and ABM. Temporal trends and fluctuations prompt questions about influencing factors, potentially linked to technological advancements or shifts in research focus. The sustained increase in citations per document per year underscores the field’s impact, with the 2021 peak suggesting cumulative influence. Reference Publication Year Spectroscopy (RPYS) reveals historical patterns, and the recent decline prompts exploration into shifts in research focus. Lotka’s law is reflected in the author’s contributions, supported by Pareto analysis. Journal diversity signals extensive exploration of AI applications in ABM. Identifying impactful journals and clustering them per Bradford’s Law provides insights for researchers. Global scientific production dominance and regional collaboration maps emphasize the worldwide landscape. Despite acknowledging limitations, such as citation lag and interdisciplinary challenges, our study offers a global perspective with implications for future research and as a resource in the evolving AI and ABM landscape.
{"title":"Exploring the Use of Artificial Intelligence in Agent-Based Modeling Applications: A Bibliometric Study","authors":"Ștefan-Andrei Ionescu, Camelia Delcea, Nora Chirita, I. Nica","doi":"10.3390/a17010021","DOIUrl":"https://doi.org/10.3390/a17010021","url":null,"abstract":"This research provides a comprehensive analysis of the dynamic interplay between agent-based modeling (ABM) and artificial intelligence (AI) through a meticulous bibliometric study. This study reveals a substantial increase in scholarly interest, particularly post-2006, peaking in 2021 and 2022, indicating a contemporary surge in research on the synergy between AI and ABM. Temporal trends and fluctuations prompt questions about influencing factors, potentially linked to technological advancements or shifts in research focus. The sustained increase in citations per document per year underscores the field’s impact, with the 2021 peak suggesting cumulative influence. Reference Publication Year Spectroscopy (RPYS) reveals historical patterns, and the recent decline prompts exploration into shifts in research focus. Lotka’s law is reflected in the author’s contributions, supported by Pareto analysis. Journal diversity signals extensive exploration of AI applications in ABM. Identifying impactful journals and clustering them per Bradford’s Law provides insights for researchers. Global scientific production dominance and regional collaboration maps emphasize the worldwide landscape. Despite acknowledging limitations, such as citation lag and interdisciplinary challenges, our study offers a global perspective with implications for future research and as a resource in the evolving AI and ABM landscape.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"38 3","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139452010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Corceiro, Nuno Pereira, Khadijeh Alibabaei, Pedro D. Gaspar
The global population’s rapid growth necessitates a 70% increase in agricultural production, posing challenges exacerbated by weed infestation and herbicide drawbacks. To address this, machine learning (ML) models, particularly convolutional neural networks (CNNs), are employed in precision agriculture (PA) for weed detection. This study focuses on testing CNN architectures for image classification tasks using the PyTorch framework, emphasizing hyperparameter optimization. Four groups of experiments were carried out: the first one trained all the PyTorch architectures, followed by the creation of a baseline, the evaluation of a new and extended dataset in the best models, and finally, the test phase was conducted using a web application developed for this purpose. Of 80 CNN sub-architectures tested, the MaxVit, ShuffleNet, and EfficientNet models stand out, achieving a maximum accuracy of 96.0%, 99.3%, and 99.3%, respectively, for the first test phase of PyTorch classification architectures. In addition, EfficientNet_B1 and EfficientNet_B5 stood out compared to all other models. During experiment 3, with a new dataset, both models achieved a high accuracy of 95.13% and 94.83%, respectively. Furthermore, in experiment 4, both EfficientNet_B1 and EfficientNet_B5 achieved a maximum accuracy of 96.15%, the highest one. ML models can help to automate crop problem detection, promote organic farming, optimize resource use, aid precision farming, reduce waste, boost efficiency, and contribute to a greener, sustainable agricultural future.
{"title":"Leveraging Machine Learning for Weed Management and Crop Enhancement: Vineyard Flora Classification","authors":"Ana Corceiro, Nuno Pereira, Khadijeh Alibabaei, Pedro D. Gaspar","doi":"10.3390/a17010019","DOIUrl":"https://doi.org/10.3390/a17010019","url":null,"abstract":"The global population’s rapid growth necessitates a 70% increase in agricultural production, posing challenges exacerbated by weed infestation and herbicide drawbacks. To address this, machine learning (ML) models, particularly convolutional neural networks (CNNs), are employed in precision agriculture (PA) for weed detection. This study focuses on testing CNN architectures for image classification tasks using the PyTorch framework, emphasizing hyperparameter optimization. Four groups of experiments were carried out: the first one trained all the PyTorch architectures, followed by the creation of a baseline, the evaluation of a new and extended dataset in the best models, and finally, the test phase was conducted using a web application developed for this purpose. Of 80 CNN sub-architectures tested, the MaxVit, ShuffleNet, and EfficientNet models stand out, achieving a maximum accuracy of 96.0%, 99.3%, and 99.3%, respectively, for the first test phase of PyTorch classification architectures. In addition, EfficientNet_B1 and EfficientNet_B5 stood out compared to all other models. During experiment 3, with a new dataset, both models achieved a high accuracy of 95.13% and 94.83%, respectively. Furthermore, in experiment 4, both EfficientNet_B1 and EfficientNet_B5 achieved a maximum accuracy of 96.15%, the highest one. ML models can help to automate crop problem detection, promote organic farming, optimize resource use, aid precision farming, reduce waste, boost efficiency, and contribute to a greener, sustainable agricultural future.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"115 25","pages":""},"PeriodicalIF":2.3,"publicationDate":"2023-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139135272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}