Pub Date : 2025-03-01Epub Date: 2025-01-03DOI: 10.1016/j.simpa.2024.100740
Samir Brahim Belhaouari , Ashhadul Islam , Khelil Kassoul , Ala Al-Fuqaha , Abdesselam Bouzerdoum
KNNOR-Reg is a Python package designed to address the challenge of imbalanced regression. While popular Python packages exist for tackling imbalanced classification, support for imbalanced regression remains limited. Imbalanced regression involves the underrepresentation of important ranges within a continuous target variable. KNNOR-Reg implements an oversampling technique that generates synthetic samples through interpolation between minority class samples and their nearest neighbors. The labels for synthetic samples are computed based on the inverse distance-weighted average of the nearest neighbors’ labels. KNNOR-Reg offers a user-friendly and extensible Python implementation for oversampling imbalanced regression data, aiming to reduce regressor bias and enhance model outcomes.
{"title":"KNNOR-Reg: A python package for oversampling in imbalanced regression","authors":"Samir Brahim Belhaouari , Ashhadul Islam , Khelil Kassoul , Ala Al-Fuqaha , Abdesselam Bouzerdoum","doi":"10.1016/j.simpa.2024.100740","DOIUrl":"10.1016/j.simpa.2024.100740","url":null,"abstract":"<div><div>KNNOR-Reg is a Python package designed to address the challenge of imbalanced regression. While popular Python packages exist for tackling imbalanced classification, support for imbalanced regression remains limited. Imbalanced regression involves the underrepresentation of important ranges within a continuous target variable. KNNOR-Reg implements an oversampling technique that generates synthetic samples through interpolation between minority class samples and their nearest neighbors. The labels for synthetic samples are computed based on the inverse distance-weighted average of the nearest neighbors’ labels. KNNOR-Reg offers a user-friendly and extensible Python implementation for oversampling imbalanced regression data, aiming to reduce regressor bias and enhance model outcomes.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"23 ","pages":"Article 100740"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-12-27DOI: 10.1016/j.simpa.2024.100738
Wenzhen Li , Hongyan Lin , Lvxin Peng , Qianhu Jiang , Yushu Gou , Lu Xie , Jian Huang
Identifying complementary-determining regions (CDRs) and antigen-binding regions (ABRs) requires accurate antibody numbering, which is essential for therapeutic antibody development. AbNumPro is a comprehensive offline toolkit developed for antibody numbering and ABRs prediction, addressing the limitations of existing tools, which often lack comprehensiveness and rely solely on online services. By integrating five established numbering schemes—Kabat, Chothia, IMGT, Aho, and Martin—AbNumPro provides precise delineation of CDRs and ABRs, offering both compatibility with diverse research applications and the assurance of data security.
{"title":"AbNumPro: A comprehensive offline toolkit for antibody numbering and antigen-binding region prediction","authors":"Wenzhen Li , Hongyan Lin , Lvxin Peng , Qianhu Jiang , Yushu Gou , Lu Xie , Jian Huang","doi":"10.1016/j.simpa.2024.100738","DOIUrl":"10.1016/j.simpa.2024.100738","url":null,"abstract":"<div><div>Identifying complementary-determining regions (CDRs) and antigen-binding regions (ABRs) requires accurate antibody numbering, which is essential for therapeutic antibody development. AbNumPro is a comprehensive offline toolkit developed for antibody numbering and ABRs prediction, addressing the limitations of existing tools, which often lack comprehensiveness and rely solely on online services. By integrating five established numbering schemes—Kabat, Chothia, IMGT, Aho, and Martin—AbNumPro provides precise delineation of CDRs and ABRs, offering both compatibility with diverse research applications and the assurance of data security.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"23 ","pages":"Article 100738"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-12-28DOI: 10.1016/j.simpa.2024.100735
Mihaela Orić , Vlatko Galić , Filip Novoselnik
The success of machine learning models for object detection highly depends on the training data size and quality. Generating synthetic data speeds up the data acquisition process by removing the need for human annotation. Moreover, since annotation is done automatically, there is no room for human error. We present a pipeline that automatically generates and annotates aerial images of vehicles on roads. The pipeline is structured to allow easy adding of various new vehicles and is not limited to cars only. The resolution of the generated images and the level of detail can be modified by changing the output settings.
{"title":"Synthetic dataset generation system for vehicle detection","authors":"Mihaela Orić , Vlatko Galić , Filip Novoselnik","doi":"10.1016/j.simpa.2024.100735","DOIUrl":"10.1016/j.simpa.2024.100735","url":null,"abstract":"<div><div>The success of machine learning models for object detection highly depends on the training data size and quality. Generating synthetic data speeds up the data acquisition process by removing the need for human annotation. Moreover, since annotation is done automatically, there is no room for human error. We present a pipeline that automatically generates and annotates aerial images of vehicles on roads. The pipeline is structured to allow easy adding of various new vehicles and is not limited to cars only. The resolution of the generated images and the level of detail can be modified by changing the output settings.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"23 ","pages":"Article 100735"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143140076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates a novel approach to plant disease classification, addressing cases where symptoms are not visually apparent. Traditional machine learning methods, reliant on observable symptoms, face challenges such as limited training data, high costs, and low interpretability. To overcome these limitations, a spectroscopy-based classification technique was developed. Experimental data, collected over 15 months at Anand Agriculture University, Gujarat, and Charotar University Space Research Centre, utilized spectral signatures (400–1000 nm) to detect mango diseases. The SSTAS Software, developed with a fine-tuned deep learning model, Deep-Spectro, demonstrated superior accuracy using an 80:20 training-to-testing ratio, surpassing existing models reported in prior research.
{"title":"Plant diseases classification with Spectral Signature Taxonomy & Analysis Software (SSTAS)","authors":"Hardik Jayswal, Hetvi Desai, Hasti Vakani, Mithil Mistry, Nilesh Dubey","doi":"10.1016/j.simpa.2025.100744","DOIUrl":"10.1016/j.simpa.2025.100744","url":null,"abstract":"<div><div>This paper investigates a novel approach to plant disease classification, addressing cases where symptoms are not visually apparent. Traditional machine learning methods, reliant on observable symptoms, face challenges such as limited training data, high costs, and low interpretability. To overcome these limitations, a spectroscopy-based classification technique was developed. Experimental data, collected over 15 months at Anand Agriculture University, Gujarat, and Charotar University Space Research Centre, utilized spectral signatures (400–1000 nm) to detect mango diseases. The SSTAS Software, developed with a fine-tuned deep learning model, Deep-Spectro, demonstrated superior accuracy using an 80:20 training-to-testing ratio, surpassing existing models reported in prior research.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"23 ","pages":"Article 100744"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-12-13DOI: 10.1016/j.simpa.2024.100728
Gaofeng Zhu , Qiang Chen , Xiangyu Yu , Cong Xu , Kun Zhang , Yunquan Wang , Wei Gong , Tao Che
Bayesian inference is crucial for optimizing parameters in complex models, but often requires sampling due to high-dimensional, intractable posteriors. Beyond Markov-Chain Monte Carlo (MCMC) methods, Sequential Monte Carlo (SMC) algorithms offer an alternative. This paper introduces a Matlab toolbox for the Particle Evolution Metropolis Sequential Monte Carlo (PEM-SMC) algorithm, which combines the strengths of population-based MCMC and SMC. Two case studies – a complex multi-modal probability and a land surface model – demonstrate the toolbox’s capabilities. This tool is valuable for Bayesian inference across fields like statistics, ecology, hydrology, and land surface processes.
{"title":"PEM-SMC: An algorithm for optimizing model parameters","authors":"Gaofeng Zhu , Qiang Chen , Xiangyu Yu , Cong Xu , Kun Zhang , Yunquan Wang , Wei Gong , Tao Che","doi":"10.1016/j.simpa.2024.100728","DOIUrl":"10.1016/j.simpa.2024.100728","url":null,"abstract":"<div><div>Bayesian inference is crucial for optimizing parameters in complex models, but often requires sampling due to high-dimensional, intractable posteriors. Beyond Markov-Chain Monte Carlo (MCMC) methods, Sequential Monte Carlo (SMC) algorithms offer an alternative. This paper introduces a Matlab toolbox for the Particle Evolution Metropolis Sequential Monte Carlo (PEM-SMC) algorithm, which combines the strengths of population-based MCMC and SMC. Two case studies – a complex multi-modal probability and a land surface model – demonstrate the toolbox’s capabilities. This tool is valuable for Bayesian inference across fields like statistics, ecology, hydrology, and land surface processes.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"23 ","pages":"Article 100728"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-12-05DOI: 10.1016/j.simpa.2024.100726
Joel Antonio Trejo-Sánchez , Candelaria E. Sansores , Francisco J. Hernandez-Lopez , Jonás Velasco , Daniel Fajardo Delgado , Jose Luis Lopez-Martinez , Julio Cesar Ramirez-Pacheco
Several scheduling optimization problems belong to the NP-complete class, including, task scheduling, job shop scheduling, and patient admission. These problems commonly require the development of heuristics approaches to find near-optimal solutions within reasonable timeframes. In this work, we present MaSchedule an open-source multi-agent tool for the design of heuristics for scheduling problems.
{"title":"MaSchedule. A multi-agent tool for scheduling problems","authors":"Joel Antonio Trejo-Sánchez , Candelaria E. Sansores , Francisco J. Hernandez-Lopez , Jonás Velasco , Daniel Fajardo Delgado , Jose Luis Lopez-Martinez , Julio Cesar Ramirez-Pacheco","doi":"10.1016/j.simpa.2024.100726","DOIUrl":"10.1016/j.simpa.2024.100726","url":null,"abstract":"<div><div>Several scheduling optimization problems belong to the NP-complete class, including, task scheduling, job shop scheduling, and patient admission. These problems commonly require the development of heuristics approaches to find near-optimal solutions within reasonable timeframes. In this work, we present <span>MaSchedule</span> an open-source multi-agent tool for the design of heuristics for scheduling problems.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"23 ","pages":"Article 100726"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-11-28DOI: 10.1016/j.simpa.2024.100721
Gabriela Pedraza-Jiménez, Gerardo Tinoco-Guerrero, Francisco Javier Domínguez-Mota, José Alberto Guzmán-Torres, José Gerardo Tinoco-Ruiz
This work introduces mGFD: Cloud Generator, a web-based software for generating non-structured clouds of points that is useful in numerical analysis, particularly in applying the Meshless Generalized Finite Difference Method (mGFD). mGFD: CloudGenerator allows to manually define external and internal boundary nodes, using an image as a guide, providing precise control over boundary conditions. It supports image uploads (.png, .jpg, .jpeg) to guide node placement and automatically generates the internal cloud of points. The web-based software is open-source and accessible for research and has been used to produce results in some papers, such as the ones mentioned in this paper.
{"title":"mGFD: CloudGenerator","authors":"Gabriela Pedraza-Jiménez, Gerardo Tinoco-Guerrero, Francisco Javier Domínguez-Mota, José Alberto Guzmán-Torres, José Gerardo Tinoco-Ruiz","doi":"10.1016/j.simpa.2024.100721","DOIUrl":"10.1016/j.simpa.2024.100721","url":null,"abstract":"<div><div>This work introduces mGFD: Cloud Generator, a web-based software for generating non-structured clouds of points that is useful in numerical analysis, particularly in applying the Meshless Generalized Finite Difference Method (mGFD). mGFD: CloudGenerator allows to manually define external and internal boundary nodes, using an image as a guide, providing precise control over boundary conditions. It supports image uploads (.png, .jpg, .jpeg) to guide node placement and automatically generates the internal cloud of points. The web-based software is open-source and accessible for research and has been used to produce results in some papers, such as the ones mentioned in this paper.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"23 ","pages":"Article 100721"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142756998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2025-03-04DOI: 10.1016/j.simpa.2025.100745
Aleix Seguí , Arantza Ugalde , Juan José Egozcue
hvarma is a Python software for estimating the horizontal-to-vertical (H/V) spectral ratio through seismic ambient vibration measurements. It employs a parametric approach to model the H/V transfer function using an AutoRegressive Moving Average (ARMA) filter. Compared to traditional methods, this technique enhances accuracy and reliability in spectral estimates, determining the ground fundamental resonance frequency with high spectral resolution, which is important for engineering geology projects. The program inverts to find optimal filter coefficients and computes coherence between horizontal and vertical components, generating H/V transfer function visualizations across both negative and positive frequencies. Results are saved as image and text files.
{"title":"hvarma: Autoregressive moving average model of microtremor H/V spectral ratio","authors":"Aleix Seguí , Arantza Ugalde , Juan José Egozcue","doi":"10.1016/j.simpa.2025.100745","DOIUrl":"10.1016/j.simpa.2025.100745","url":null,"abstract":"<div><div>hvarma is a Python software for estimating the horizontal-to-vertical (<em>H</em>/<em>V</em>) spectral ratio through seismic ambient vibration measurements. It employs a parametric approach to model the <em>H</em>/<em>V</em> transfer function using an AutoRegressive Moving Average (ARMA) filter. Compared to traditional methods, this technique enhances accuracy and reliability in spectral estimates, determining the ground fundamental resonance frequency with high spectral resolution, which is important for engineering geology projects. The program inverts to find optimal filter coefficients and computes coherence between horizontal and vertical components, generating <em>H</em>/<em>V</em> transfer function visualizations across both negative and positive frequencies. Results are saved as image and text files.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"23 ","pages":"Article 100745"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143580377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-12-13DOI: 10.1016/j.simpa.2024.100730
Débora Pina , Liliane Kunstmann , Daniel de Oliveira , Marta Mattoso
To train a Deep Learning (DL) model, a workflow must be executed with four well-defined activities: (i) Acquiring data, (ii) Preprocessing, (iii) Splitting and balancing the dataset, and (iv) Building and training the model. After generating several DL models, they undergo a process called model selection. After being selected, the DL model is put into a production environment to make predictions on new data. One of the challenges in supporting these analyses is related to providing relationships between candidate models, their datasets for train, test, and validation, input data, and other derivations paths. These relationships are also essential for trust, reproducibility, and evolution of the selected model. While existing solutions allow monitoring and analyzing the artifacts generated throughout the DL workflow, they often fail to establish relationships for supporting data derivation within the DL workflow. DLProv is a provenance-centric service to support DL workflow analyses and reproducibility. DLProv captures provenance data and exports provenance graphs for DL model reproducibility. DLProv is W3C PROV compliant, ensuring standardized prospective and retrospective provenance, and enables provenance capture in arbitrary execution frameworks.
{"title":"Breadcrumbs for your Deep Learning Model: Following Provenance Traces with DLProv","authors":"Débora Pina , Liliane Kunstmann , Daniel de Oliveira , Marta Mattoso","doi":"10.1016/j.simpa.2024.100730","DOIUrl":"10.1016/j.simpa.2024.100730","url":null,"abstract":"<div><div>To train a Deep Learning (DL) model, a workflow must be executed with four well-defined activities: (i) Acquiring data, (ii) Preprocessing, (iii) Splitting and balancing the dataset, and (iv) Building and training the model. After generating several DL models, they undergo a process called model selection. After being selected, the DL model is put into a production environment to make predictions on new data. One of the challenges in supporting these analyses is related to providing relationships between candidate models, their datasets for train, test, and validation, input data, and other derivations paths. These relationships are also essential for trust, reproducibility, and evolution of the selected model. While existing solutions allow monitoring and analyzing the artifacts generated throughout the DL workflow, they often fail to establish relationships for supporting data derivation within the DL workflow. DLProv is a provenance-centric service to support DL workflow analyses and reproducibility. DLProv captures provenance data and exports provenance graphs for DL model reproducibility. DLProv is W3C PROV compliant, ensuring standardized prospective and retrospective provenance, and enables provenance capture in arbitrary execution frameworks.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"23 ","pages":"Article 100730"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-12-05DOI: 10.1016/j.simpa.2024.100724
Bladimir Toaza, Domokos Esztergár-Kiss
SpatialzOSM, a package to spatialize aggregated locations into coordinates, thereby supporting population synthesis processes. This paper addresses the need for high-resolution data while ensuring data privacy. SpatialzOSM features include the generation of coordinates using three random distribution techniques: across zones, along road networks, and within buildings for residential locations. For non-residential locations, the package extracts points of interest from open sources. By leveraging open-source data, SpatialzOSM minimizes the risks of reidentification associated with census and survey datasets, ensuring privacy protection. This package is valuable for researchers and modelers engaged in synthetic population generation for models requiring explicit geographic location data.
{"title":"SpatialzOSM: A Python package for supporting the explicit spatialization in the population synthesis process","authors":"Bladimir Toaza, Domokos Esztergár-Kiss","doi":"10.1016/j.simpa.2024.100724","DOIUrl":"10.1016/j.simpa.2024.100724","url":null,"abstract":"<div><div>SpatialzOSM, a package to spatialize aggregated locations into coordinates, thereby supporting population synthesis processes. This paper addresses the need for high-resolution data while ensuring data privacy. SpatialzOSM features include the generation of coordinates using three random distribution techniques: across zones, along road networks, and within buildings for residential locations. For non-residential locations, the package extracts points of interest from open sources. By leveraging open-source data, SpatialzOSM minimizes the risks of reidentification associated with census and survey datasets, ensuring privacy protection. This package is valuable for researchers and modelers engaged in synthetic population generation for models requiring explicit geographic location data.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"23 ","pages":"Article 100724"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}