Pub Date : 2026-02-01Epub Date: 2026-01-22DOI: 10.1016/j.softx.2026.102516
Samriddha Das, C. Igathinathane, Xin Sun
The growing reliance on AI and deep learning in vision-based applications requires efficient dataset preparation tools, however, existing solutions are often commercially licensed or lack integrated, multi-format workflows. This study presents An-Augmenter, a cross-platform, open-source software that integrates image annotation and augmentation within an offline environment. It supports YOLO, XML, and JSON formats and ensures annotation-consistent augmentation for labeled and unlabeled datasets. Processing 1200 images with all possible augmentation techniques required 50 s on a standard CPU. Validation using YOLO11n object detection model improved [email protected] from 0.905 to 0.941 on a custom egg dataset and from 0.799 to 0.825 on a public apple dataset, demonstrating improved detection performance with augmented data.
{"title":"An-augmenter: A unified platform for efficient image annotation and data augmentation","authors":"Samriddha Das, C. Igathinathane, Xin Sun","doi":"10.1016/j.softx.2026.102516","DOIUrl":"10.1016/j.softx.2026.102516","url":null,"abstract":"<div><div>The growing reliance on AI and deep learning in vision-based applications requires efficient dataset preparation tools, however, existing solutions are often commercially licensed or lack integrated, multi-format workflows. This study presents An-Augmenter, a cross-platform, open-source software that integrates image annotation and augmentation within an offline environment. It supports YOLO, XML, and JSON formats and ensures annotation-consistent augmentation for labeled and unlabeled datasets. Processing 1200 images with all possible augmentation techniques required 50 s on a standard CPU. Validation using YOLO11n object detection model improved [email protected] from 0.905 to 0.941 on a custom egg dataset and from 0.799 to 0.825 on a public apple dataset, demonstrating improved detection performance with augmented data.</div></div>","PeriodicalId":21905,"journal":{"name":"SoftwareX","volume":"33 ","pages":"Article 102516"},"PeriodicalIF":2.4,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-21DOI: 10.1016/j.softx.2026.102523
Michael Herbert Ziegler , Mariusz Nowostawski , Basel Katt
The tension between blockchain transparency and user privacy has driven innovation in mixing protocols creating a need for comprehensive analytical frameworks that can rigorously evaluate privacy properties across different implementations. Dakar is an open-source framework that unifies ingestion and provides reproducible classification and analysis of CoinJoin transactions on UTXO blockchains. Its graph database captures the relationships between mixing transactions while a web interface enables experimentation with built-in privacy tools such as CoinJoin transaction heuristics and similarity measures. By enabling researchers to compare and quantify CoinJoin activity across multiple protocols Dakar facilitates studies on privacy-enhancing techniques and supports the discovery and analysis of differences in CoinJoin implementations.
{"title":"Dakar: A CoinJoin forensic software","authors":"Michael Herbert Ziegler , Mariusz Nowostawski , Basel Katt","doi":"10.1016/j.softx.2026.102523","DOIUrl":"10.1016/j.softx.2026.102523","url":null,"abstract":"<div><div>The tension between blockchain transparency and user privacy has driven innovation in mixing protocols creating a need for comprehensive analytical frameworks that can rigorously evaluate privacy properties across different implementations. Dakar is an open-source framework that unifies ingestion and provides reproducible classification and analysis of CoinJoin transactions on UTXO blockchains. Its graph database captures the relationships between mixing transactions while a web interface enables experimentation with built-in privacy tools such as CoinJoin transaction heuristics and similarity measures. By enabling researchers to compare and quantify CoinJoin activity across multiple protocols Dakar facilitates studies on privacy-enhancing techniques and supports the discovery and analysis of differences in CoinJoin implementations.</div></div>","PeriodicalId":21905,"journal":{"name":"SoftwareX","volume":"33 ","pages":"Article 102523"},"PeriodicalIF":2.4,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-21DOI: 10.1016/j.softx.2026.102518
Wojciech Sałabun , Damian Kedziora , Andrii Shekhovtsov
In this paper, we present an extension of the AsymIntervals library, designed to enhance the modelling and processing of uncertainty using Asymmetric Interval Numbers (AINs). In response to the growing demand for expressive and mathematically consistent tools for interval-based uncertainty representation, the library has been extended with a comprehensive set of interval characteristics, logical predicates, relational operators, and mathematical transformations implemented within a unified core class. The extension introduces support for advanced algebraic, trigonometric, as well as exponential and logarithmic operations, flexible construction of AIN objects from multiple input formats, sampling-based data generation, and normalization of AIN collections. Additionally, enhanced export and serialisation mechanisms enable seamless integration with numerical workflows and scientific applications. These improvements substantially broaden the applicability of AsymIntervals in decision analysis, uncertainty modelling, and computational research.
{"title":"Version [1.2]-[AsymIntervals: A Python library for uncertainty modeling with asymmetric interval numbers]","authors":"Wojciech Sałabun , Damian Kedziora , Andrii Shekhovtsov","doi":"10.1016/j.softx.2026.102518","DOIUrl":"10.1016/j.softx.2026.102518","url":null,"abstract":"<div><div>In this paper, we present an extension of the AsymIntervals library, designed to enhance the modelling and processing of uncertainty using Asymmetric Interval Numbers (AINs). In response to the growing demand for expressive and mathematically consistent tools for interval-based uncertainty representation, the library has been extended with a comprehensive set of interval characteristics, logical predicates, relational operators, and mathematical transformations implemented within a unified core class. The extension introduces support for advanced algebraic, trigonometric, as well as exponential and logarithmic operations, flexible construction of AIN objects from multiple input formats, sampling-based data generation, and normalization of AIN collections. Additionally, enhanced export and serialisation mechanisms enable seamless integration with numerical workflows and scientific applications. These improvements substantially broaden the applicability of AsymIntervals in decision analysis, uncertainty modelling, and computational research.</div></div>","PeriodicalId":21905,"journal":{"name":"SoftwareX","volume":"33 ","pages":"Article 102518"},"PeriodicalIF":2.4,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-16DOI: 10.1016/j.softx.2025.102501
Muhammad Wajeeh Uz Zaman , Umer Rashid , Qaisar Abbas , Abdur Rehman Khan
The proliferation of online multimedia content has transformed user information-seeking behavior from lookup to exploratory search. Existing web search engines present search results in disjoint, linearly ranked search result lists called verticals to bridge the information-exploration gap. However, search results presented by vertical search engines require extensive cognitive effort, hindering users’ ability to explore relevant content across verticals. We propose ExSMuV: [Ex]ploration Software for [S]ummarized [Mu]ltimedia [V]ertical Search Results, a framework that aggregates search results across verticals into coherent multimedia documents based on the most prominent topics, using a customized frequent-term scoring algorithm. Based on the identified important topics, a cosine similarity measure is used to aggregate the top-k similar results across verticals into a multimedia document. These documents combine conceptually similar web, image, and video search results into a comprehensive, unified Search User Interface (SUI) to reduce user navigation effort and improve exploration of relevant search results. We conducted a cognitive user study (N=23) comparing ExSMuV with a Bing vertical search baseline. The proposed framework enabled participants to perform exploratory search tasks with +37 % processing speed, +34 % selective attention, and +41 % better working memory compared to the baseline with statistically significant results (p 0.01).
在线多媒体内容的激增已经将用户的信息搜索行为从查找转变为探索性搜索。现有的网络搜索引擎以不相交的、线性排列的搜索结果列表呈现搜索结果,称为垂直搜索,以弥合信息探索的差距。然而,垂直搜索引擎呈现的搜索结果需要大量的认知努力,阻碍了用户在垂直领域探索相关内容的能力。我们提出了ExSMuV: [Ex] explore Software for [S] summarized [Mu]ltimedia [V] vertical Search Results,这是一个框架,可以根据最突出的主题将垂直搜索结果聚合到连贯的多媒体文档中,使用定制的频繁项评分算法。基于确定的重要主题,使用余弦相似性度量将垂直方向上的top-k相似结果聚合到一个多媒体文档中。这些文档将概念上相似的web、图像和视频搜索结果组合成一个全面、统一的搜索用户界面(search User Interface, SUI),以减少用户导航工作并改进对相关搜索结果的探索。我们进行了一项认知用户研究(N=23),将ExSMuV与必应垂直搜索基线进行比较。与基线相比,所提出的框架使参与者能够以+ 37%的处理速度,+ 34%的选择性注意力和+ 41%的工作记忆进行探索性搜索任务,结果具有统计学意义(p≤0.01)。
{"title":"ExSMuV: [Ex]ploration software for [S]ummarized [Mu]ltimedia [V]ertical search results","authors":"Muhammad Wajeeh Uz Zaman , Umer Rashid , Qaisar Abbas , Abdur Rehman Khan","doi":"10.1016/j.softx.2025.102501","DOIUrl":"10.1016/j.softx.2025.102501","url":null,"abstract":"<div><div>The proliferation of online multimedia content has transformed user information-seeking behavior from lookup to exploratory search. Existing web search engines present search results in disjoint, linearly ranked search result lists called verticals to bridge the information-exploration gap. However, search results presented by vertical search engines require extensive cognitive effort, hindering users’ ability to explore relevant content across verticals. We propose ExSMuV: [Ex]ploration Software for [S]ummarized [Mu]ltimedia [V]ertical Search Results, a framework that aggregates search results across verticals into coherent multimedia documents based on the most prominent topics, using a customized frequent-term scoring algorithm. Based on the identified important topics, a cosine similarity measure is used to aggregate the top-k similar results across verticals into a multimedia document. These documents combine conceptually similar web, image, and video search results into a comprehensive, unified Search User Interface (SUI) to reduce user navigation effort and improve exploration of relevant search results. We conducted a cognitive user study (N=23) comparing ExSMuV with a Bing vertical search baseline. The proposed framework enabled participants to perform exploratory search tasks with +37 % processing speed, +34 % selective attention, and +41 % better working memory compared to the baseline with statistically significant results (p <span><math><mo>≤</mo></math></span> 0.01).</div></div>","PeriodicalId":21905,"journal":{"name":"SoftwareX","volume":"33 ","pages":"Article 102501"},"PeriodicalIF":2.4,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-17DOI: 10.1016/j.softx.2025.102486
Sipeng Luo, Tianyang Zhao, Zhaohong Bie
Modern distribution networks are being transformed into hybrid AC/DC active systems through large-scale deployment of converter-interfaced resources (photovoltaic generators, battery energy storage systems). Existing open-source time-series power flow tools lack unified medium/low-voltage DC modeling, multi-mode converter control, and endogenous multi-interval scheduling, which limits hybrid analysis. To address this gap, the open-source HyDistFlow.jl package (Julia) is introduced for accurate and efficient hybrid distribution studies. A unified component model set for AC, DC, and coupling interfaces is provided, where distributed generation and battery storage systems are explicitly represented. Seven consistent voltage source converter control modes are implemented for AC/DC converters. Endogenous scheduling automatically generates storage charge/discharge profiles under network constraints. Switching topology changes in medium/low-voltage distribution networks are accommodated in an engineering-oriented manner. CPU/GPU heterogeneous computation is enabled for scalable computation. Accuracy has been benchmarked against open-source AC solvers and ETAP hybrid AC/DC results, while functional correctness of all control modes and automatic loss-reducing storage dispatch has been demonstrated in designed studies.
{"title":"HyDistFlow.jl: A unified dynamic hybrid AC/DC power flow package for DER-rich distribution systems","authors":"Sipeng Luo, Tianyang Zhao, Zhaohong Bie","doi":"10.1016/j.softx.2025.102486","DOIUrl":"10.1016/j.softx.2025.102486","url":null,"abstract":"<div><div>Modern distribution networks are being transformed into hybrid AC/DC active systems through large-scale deployment of converter-interfaced resources (photovoltaic generators, battery energy storage systems). Existing open-source time-series power flow tools lack unified medium/low-voltage DC modeling, multi-mode converter control, and endogenous multi-interval scheduling, which limits hybrid analysis. To address this gap, the open-source HyDistFlow.jl package (Julia) is introduced for accurate and efficient hybrid distribution studies. A unified component model set for AC, DC, and coupling interfaces is provided, where distributed generation and battery storage systems are explicitly represented. Seven consistent voltage source converter control modes are implemented for AC/DC converters. Endogenous scheduling automatically generates storage charge/discharge profiles under network constraints. Switching topology changes in medium/low-voltage distribution networks are accommodated in an engineering-oriented manner. CPU/GPU heterogeneous computation is enabled for scalable computation. Accuracy has been benchmarked against open-source AC solvers and ETAP hybrid AC/DC results, while functional correctness of all control modes and automatic loss-reducing storage dispatch has been demonstrated in designed studies.</div></div>","PeriodicalId":21905,"journal":{"name":"SoftwareX","volume":"33 ","pages":"Article 102486"},"PeriodicalIF":2.4,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-13DOI: 10.1016/j.softx.2025.102488
Mariana Bárcenas Castañeda , Luis Enrique Calatayud Velázquez , Manuel Sabino Lazo Cortes , Mauricio Gabriel Orozco del Castillo , Víctor Augusto Castellanos Escamilla
Corrosion is a complex phenomenon that deteriorates metal surfaces, generating significant economic and operational challenges across industries. Its assessment often requires expert interpretation of macroscopic damage. This work presents the development and validation of two graphical interfaces for SEAViM-CORR, a fuzzy logic-based expert system for corrosion diagnosis using surface images. The desktop version supports laboratory analysis with image editing tools, while the mobile version enables in situ diagnosis in under 200 ms. Using a dual-output model to identify primary and secondary corrosion mechanisms, the interfaces achieved up to 71.4 % efficiency in seven documented case studies, enhancing interpretability, usability for non-expert users, and applicability in both industrial and field environments.
{"title":"Advancing corrosion detection: A fuzzy expert system with desktop and mobile interfaces","authors":"Mariana Bárcenas Castañeda , Luis Enrique Calatayud Velázquez , Manuel Sabino Lazo Cortes , Mauricio Gabriel Orozco del Castillo , Víctor Augusto Castellanos Escamilla","doi":"10.1016/j.softx.2025.102488","DOIUrl":"10.1016/j.softx.2025.102488","url":null,"abstract":"<div><div>Corrosion is a complex phenomenon that deteriorates metal surfaces, generating significant economic and operational challenges across industries. Its assessment often requires expert interpretation of macroscopic damage. This work presents the development and validation of two graphical interfaces for SEAViM-CORR, a fuzzy logic-based expert system for corrosion diagnosis using surface images. The desktop version supports laboratory analysis with image editing tools, while the mobile version enables in situ diagnosis in under 200 ms. Using a dual-output model to identify primary and secondary corrosion mechanisms, the interfaces achieved up to 71.4 % efficiency in seven documented case studies, enhancing interpretability, usability for non-expert users, and applicability in both industrial and field environments.</div></div>","PeriodicalId":21905,"journal":{"name":"SoftwareX","volume":"33 ","pages":"Article 102488"},"PeriodicalIF":2.4,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Crossover designs are widely applied in medicine, agriculture, and other biological sciences, yet their analysis remains challenging due to longitudinal observations within each unit and the presence of carry-over effects. Despite their prevalence, there is no comprehensive R package dedicated to the statistical modeling of crossover data. The CrossCarry package addresses this gap by providing a flexible and open-source framework for analyzing any crossover design with response variables from the exponential family, with or without washout periods. It extends the generalized estimating equations (GEE) methodology by incorporating correlation structures specifically tailored to crossover data, capturing both within- and between-period dependencies. Moreover, CrossCarry integrates a parametric component for treatment effects and a nonparametric spline-based component for time and carry-over effects. This combination allows users to model complex correlation patterns and temporal structures with minimal coding effort. By offering a domain-independent implementation of advanced statistical methodology, CrossCarry facilitates reproducible research and promotes the reuse of robust analytical tools across disciplines. Its potential applications span medical trials, agricultural field experiments, and other areas where crossover designs are essential, thus contributing to broader scientific discovery and cross-domain methodological standardization.
{"title":"CrossCarry: An R package for the analysis of data from a crossover design with GEE","authors":"N.A. Cruz , O.O. Melo , C.A. Martinez , R. Alberich","doi":"10.1016/j.softx.2025.102482","DOIUrl":"10.1016/j.softx.2025.102482","url":null,"abstract":"<div><div>Crossover designs are widely applied in medicine, agriculture, and other biological sciences, yet their analysis remains challenging due to longitudinal observations within each unit and the presence of carry-over effects. Despite their prevalence, there is no comprehensive <span>R</span> package dedicated to the statistical modeling of crossover data. The <span>CrossCarry</span> package addresses this gap by providing a flexible and open-source framework for analyzing any crossover design with response variables from the exponential family, with or without washout periods. It extends the generalized estimating equations (GEE) methodology by incorporating correlation structures specifically tailored to crossover data, capturing both within- and between-period dependencies. Moreover, <span>CrossCarry</span> integrates a parametric component for treatment effects and a nonparametric spline-based component for time and carry-over effects. This combination allows users to model complex correlation patterns and temporal structures with minimal coding effort. By offering a domain-independent implementation of advanced statistical methodology, <span>CrossCarry</span> facilitates reproducible research and promotes the reuse of robust analytical tools across disciplines. Its potential applications span medical trials, agricultural field experiments, and other areas where crossover designs are essential, thus contributing to broader scientific discovery and cross-domain methodological standardization.</div></div>","PeriodicalId":21905,"journal":{"name":"SoftwareX","volume":"33 ","pages":"Article 102482"},"PeriodicalIF":2.4,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-03DOI: 10.1016/j.softx.2025.102457
Carlos Sandoval Olascoaga , Nicholas de Monchaux
While architects and planners routinely rely on Geospatial Information Systems (GIS) and Computer Aided Design (CAD) tools, both types of tools are infrastructurally incompatible leading to cumbersome workarounds, lack of adoption in practice, and missed opportunities to incorporate the large-scale geographic insights of GIS with the building-scale precision of CAD into a seamless design process. Local Software (LS) inquires into how bringing together CAD and GIS tools and workflows can lead to more sustainable urban design proposals. The framework introduces Site Packages (SP), a cross-platform information model based on GeoJSON that enables seamless integration between design and analysis tools and a new design methodology that connects large scale modeling with small scale design decisions. LS provides a web interface and open-source plugins for Grasshopper and QGIS, that allow designers to parametrically generate networked urban interventions while evaluating their ecological and social impacts through GIS. Case studies have demonstrated that proposals created with the LS framework can replace 88–96 % of traditional stormwater systems at 50 % lower cost of underground work, while enhancing urban resilience, reducing heat island effects, and providing community benefits.
{"title":"Local software: Integrated design and geo-computing workflows for urban design","authors":"Carlos Sandoval Olascoaga , Nicholas de Monchaux","doi":"10.1016/j.softx.2025.102457","DOIUrl":"10.1016/j.softx.2025.102457","url":null,"abstract":"<div><div>While architects and planners routinely rely on Geospatial Information Systems (GIS) and Computer Aided Design (CAD) tools, both types of tools are infrastructurally incompatible leading to cumbersome workarounds, lack of adoption in practice, and missed opportunities to incorporate the large-scale geographic insights of GIS with the building-scale precision of CAD into a seamless design process. Local Software (LS) inquires into how bringing together CAD and GIS tools and workflows can lead to more sustainable urban design proposals. The framework introduces Site Packages (SP), a cross-platform information model based on GeoJSON that enables seamless integration between design and analysis tools and a new design methodology that connects large scale modeling with small scale design decisions. LS provides a web interface and open-source plugins for Grasshopper and QGIS, that allow designers to parametrically generate networked urban interventions while evaluating their ecological and social impacts through GIS. Case studies have demonstrated that proposals created with the LS framework can replace 88–96 % of traditional stormwater systems at 50 % lower cost of underground work, while enhancing urban resilience, reducing heat island effects, and providing community benefits.</div></div>","PeriodicalId":21905,"journal":{"name":"SoftwareX","volume":"33 ","pages":"Article 102457"},"PeriodicalIF":2.4,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-05DOI: 10.1016/j.softx.2025.102463
Damian Frąszczak, Edyta Frąszczak
Website phishing represents a significant cyber threat, where attackers create fraudulent websites that imitate legitimate sites to deceive users. Continuous monitoring and detection of malicious websites are crucial for mitigating this threat. This paper introduces PhishingWebCollector, an open-source Python library designed to simplify the collection and integration of phishing feeds. It is an appropriate tool for real-time blacklist updates, creating historical datasets for research, and serving as a foundation for developing AI-based phishing detection systems. Identifying phishing and spoofed websites helps generate high-quality datasets necessary for training models in automated website classification and threat identification. Leveraging Python’s asyncio, it processes multiple feeds concurrently to achieve optimal performance. Available on PyPI with extensive documentation and examples, PhishingWebCollector offers a resource-efficient solution for cybersecurity professionals and researchers.
{"title":"PhishingWebCollector: Async python library for automated phishing feed collection","authors":"Damian Frąszczak, Edyta Frąszczak","doi":"10.1016/j.softx.2025.102463","DOIUrl":"10.1016/j.softx.2025.102463","url":null,"abstract":"<div><div>Website phishing represents a significant cyber threat, where attackers create fraudulent websites that imitate legitimate sites to deceive users. Continuous monitoring and detection of malicious websites are crucial for mitigating this threat. This paper introduces PhishingWebCollector, an open-source Python library designed to simplify the collection and integration of phishing feeds. It is an appropriate tool for real-time blacklist updates, creating historical datasets for research, and serving as a foundation for developing AI-based phishing detection systems. Identifying phishing and spoofed websites helps generate high-quality datasets necessary for training models in automated website classification and threat identification. Leveraging Python’s asyncio, it processes multiple feeds concurrently to achieve optimal performance. Available on PyPI with extensive documentation and examples, PhishingWebCollector offers a resource-efficient solution for cybersecurity professionals and researchers.</div></div>","PeriodicalId":21905,"journal":{"name":"SoftwareX","volume":"33 ","pages":"Article 102463"},"PeriodicalIF":2.4,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-02-05DOI: 10.1016/j.softx.2026.102543
Manuel Couto , Javier Parapar , David E. Losada
Python’s flexibility accelerates research prototyping but frequently results in unmaintainable code and duplicated computational effort. The absence of software engineering practices in academic development leads to fragile experiments where even minor modifications require rerunning expensive computations from scratch. LabChain addresses this through a pipeline-and-filter architecture with hash-based caching that automatically identifies and reuses intermediate results. When evaluating multiple classifiers on the same embeddings, the framework computes embeddings once—regardless of how many classifiers are tested. This automatic reuse extends across research teams: if another researcher applies different models to the same preprocessed data, LabChain detects existing results and eliminates redundant computation. Beyond efficiency, the framework’s modular structure reduces technical debt that obscures experimental logic. Pipelines serialize to JSON for reproducibility and distributed execution across computational clusters. A mental health detection case study demonstrates dual impact: computational savings exceeding 12 hours per task with reduced CO2 emissions, alongside substantial scientific improvements—performance gains up to 192.3% in some tasks. These improvements emerged from clearer experimental organization that exposed a critical preprocessing bug hidden in the original monolithic implementation. LabChain proves that software engineering discipline amplifies scientific discovery.
{"title":"LabChain: Enabling reproducible and modular scientific experiments in Python","authors":"Manuel Couto , Javier Parapar , David E. Losada","doi":"10.1016/j.softx.2026.102543","DOIUrl":"10.1016/j.softx.2026.102543","url":null,"abstract":"<div><div>Python’s flexibility accelerates research prototyping but frequently results in unmaintainable code and duplicated computational effort. The absence of software engineering practices in academic development leads to fragile experiments where even minor modifications require rerunning expensive computations from scratch. LabChain addresses this through a pipeline-and-filter architecture with hash-based caching that automatically identifies and reuses intermediate results. When evaluating multiple classifiers on the same embeddings, the framework computes embeddings once—regardless of how many classifiers are tested. This automatic reuse extends across research teams: if another researcher applies different models to the same preprocessed data, LabChain detects existing results and eliminates redundant computation. Beyond efficiency, the framework’s modular structure reduces technical debt that obscures experimental logic. Pipelines serialize to JSON for reproducibility and distributed execution across computational clusters. A mental health detection case study demonstrates dual impact: computational savings exceeding 12 hours per task with reduced CO<sub>2</sub> emissions, alongside substantial scientific improvements—performance gains up to 192.3% in some tasks. These improvements emerged from clearer experimental organization that exposed a critical preprocessing bug hidden in the original monolithic implementation. LabChain proves that software engineering discipline amplifies scientific discovery.</div></div>","PeriodicalId":21905,"journal":{"name":"SoftwareX","volume":"33 ","pages":"Article 102543"},"PeriodicalIF":2.4,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146187982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}