With the rapid development of the global digital economy, cross-border e-commerce has rapidly emerged and developed at a high speed, and has become a crucial bridge connecting global markets. This research focuses on the cross-border e-commerce sector of outdoor sports products, in response to the common problems in the cross-border e-commerce field, such as "information overload" and "insufficient recommendation accuracy," a personalized recommendation optimization framework integrating customer value segmentation and collaborative filtering is proposed. Based on the classic RFM model, the purchase quantity indicator (Quantity) is introduced to construct the RFMQ model, thereby more comprehensively characterizing user behavior characteristics. Further, the customer value stratification is achieved by using the indicator segmentation method and the K-means clustering algorithm, and a differentiated collaborative filtering recommendation mechanism is designed based on the segmented groups. Through a five-fold cross-validation experiment, it is shown that the proposed method significantly outperforms the traditional collaborative filtering model in the TOPN recommendation task. Specifically, when the number of recommended products is between 3 and 7, the RFMQ recommendation model based on indicator segmentation performs best in terms of F1 score (for example, when TOPN = 5, the F1 value increases from 0.1709 to 0.3093), and the method based on K-means clustering also shows a stable improvement (with the F1 value reaching 0.267 at the same time). The results indicate that the indicator segmentation method has a significant advantage in smaller recommendation quantity scenarios. This study verifies the effectiveness of the RFMQ model in customer segmentation and recommendation performance optimization, providing an operational solution for e-commerce platforms to implement precise marketing, enhance user stickiness and commercial competitiveness, and is particularly suitable for low-cost and high-efficiency personalized recommendation scenarios of small and medium-sized enterprises.
{"title":"Research on optimization of personalized recommendation method based on RFMQ model- taking outdoor sports products in cross-border e-commerce as an example.","authors":"Qianlan Chen, Chupeng Chen, Zubai Jiang, Chaoling Li, Yangxizi Tan, Niannian Li, Bolin Zhou, Bingxian Yang","doi":"10.3389/fdata.2025.1680669","DOIUrl":"10.3389/fdata.2025.1680669","url":null,"abstract":"<p><p>With the rapid development of the global digital economy, cross-border e-commerce has rapidly emerged and developed at a high speed, and has become a crucial bridge connecting global markets. This research focuses on the cross-border e-commerce sector of outdoor sports products, in response to the common problems in the cross-border e-commerce field, such as \"information overload\" and \"insufficient recommendation accuracy,\" a personalized recommendation optimization framework integrating customer value segmentation and collaborative filtering is proposed. Based on the classic RFM model, the purchase quantity indicator (Quantity) is introduced to construct the RFMQ model, thereby more comprehensively characterizing user behavior characteristics. Further, the customer value stratification is achieved by using the indicator segmentation method and the K-means clustering algorithm, and a differentiated collaborative filtering recommendation mechanism is designed based on the segmented groups. Through a five-fold cross-validation experiment, it is shown that the proposed method significantly outperforms the traditional collaborative filtering model in the TOPN recommendation task. Specifically, when the number of recommended products is between 3 and 7, the RFMQ recommendation model based on indicator segmentation performs best in terms of F1 score (for example, when TOPN = 5, the F1 value increases from 0.1709 to 0.3093), and the method based on K-means clustering also shows a stable improvement (with the F1 value reaching 0.267 at the same time). The results indicate that the indicator segmentation method has a significant advantage in smaller recommendation quantity scenarios. This study verifies the effectiveness of the RFMQ model in customer segmentation and recommendation performance optimization, providing an operational solution for e-commerce platforms to implement precise marketing, enhance user stickiness and commercial competitiveness, and is particularly suitable for low-cost and high-efficiency personalized recommendation scenarios of small and medium-sized enterprises.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1680669"},"PeriodicalIF":2.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12558725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145402935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-13eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1532397
Gabriella Waters, Phillip Honenberger
The understanding of bias in AI is currently undergoing a revolution. Often assumed to be errors or flaws, biases are increasingly recognized as integral to AI systems and sometimes preferable to less biased alternatives. In this paper we review the reasons for this changed understanding and provide new guidance on three questions: First, how should we think about and measure biases in AI systems, consistent with the new understanding? Second, what kinds of bias in an AI system should we accept or even amplify, and why? And, third, what kinds should we attempt to minimize or eliminate, and why? In answer to the first question, we argue that biases are "violations of a symmetry standard" (following Kelly). Per this definition, many biases in AI systems are benign. This raises the question of how to identify biases that are problematic or undesirable when they occur. To address this question, we distinguish three main ways that asymmetries in AI systems can be problematic or undesirable-erroneous representation, unfair treatment, and violation of process ideals-and highlight places in the pipeline of AI development and application where bias of these types can occur.
{"title":"AI biases as asymmetries: a review to guide practice.","authors":"Gabriella Waters, Phillip Honenberger","doi":"10.3389/fdata.2025.1532397","DOIUrl":"10.3389/fdata.2025.1532397","url":null,"abstract":"<p><p>The understanding of bias in AI is currently undergoing a revolution. Often assumed to be errors or flaws, biases are increasingly recognized as integral to AI systems and sometimes preferable to less biased alternatives. In this paper we review the reasons for this changed understanding and provide new guidance on three questions: First, how should we think about and measure biases in AI systems, consistent with the new understanding? Second, what kinds of bias in an AI system should we accept or even amplify, and why? And, third, what kinds should we attempt to minimize or eliminate, and why? In answer to the first question, we argue that biases are \"violations of a symmetry standard\" (following Kelly). Per this definition, many biases in AI systems are benign. This raises the question of how to identify biases that <i>are</i> problematic or undesirable when they occur. To address this question, we distinguish three main ways that asymmetries in AI systems can be problematic or undesirable-erroneous representation, unfair treatment, and violation of process ideals-and highlight places in the pipeline of AI development and application where bias of these types can occur.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1532397"},"PeriodicalIF":2.4,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12554557/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145394968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-06eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1605258
Jie Su, Yuechao Tang, Yanan Wang, Chao Chen, Biao Song
Objective: Lower limb deep vein thrombosis (DVT) is a serious health problem, causing local discomfort and hindering walking. It can lead to severe complications, including pulmonary embolism, chronic post-thrombotic syndrome, and limb amputation, posing risks of death or severe disability. This study aims to develop a diagnostic model for DVT using routine blood analysis and evaluate its effectiveness in early diagnosis.
Methods: This study retrospectively analyzed patient medical records from January 2022 to June 2023, including 658 DVT patients (case group) and 1,418 healthy subjects (control group). SHAP (SHapley Additive exPlanations) analysis was employed for feature selection to identify key blood indices significantly impacting DVT risk prediction. Based on the selected features, six machine learning models were constructed: k-Nearest Neighbors (kNN), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). Model performance was assessed using the area under the curve (AUC).
Results: SHAP analysis identified ten key blood routine indices. The six models constructed using these indices demonstrated strong predictive performance, with AUC values exceeding 0.8, accuracy above 70%, and sensitivity and specificity over 70%. Notably, the RF model exhibited superior performance in assessing the risk of DVT.
Conclusions: Our study successfully developed machine learning models for predicting DVT risk using routine blood tests. These models achieved high predictive performance, suggesting their potential for early DVT diagnosis without additional medical burden on patients. Future research will focus on further validation and refinement of these models to enhance their clinical applicability.
{"title":"Predicting deep vein thrombosis using machine learning and blood routine analysis.","authors":"Jie Su, Yuechao Tang, Yanan Wang, Chao Chen, Biao Song","doi":"10.3389/fdata.2025.1605258","DOIUrl":"10.3389/fdata.2025.1605258","url":null,"abstract":"<p><strong>Objective: </strong>Lower limb deep vein thrombosis (DVT) is a serious health problem, causing local discomfort and hindering walking. It can lead to severe complications, including pulmonary embolism, chronic post-thrombotic syndrome, and limb amputation, posing risks of death or severe disability. This study aims to develop a diagnostic model for DVT using routine blood analysis and evaluate its effectiveness in early diagnosis.</p><p><strong>Methods: </strong>This study retrospectively analyzed patient medical records from January 2022 to June 2023, including 658 DVT patients (case group) and 1,418 healthy subjects (control group). SHAP (SHapley Additive exPlanations) analysis was employed for feature selection to identify key blood indices significantly impacting DVT risk prediction. Based on the selected features, six machine learning models were constructed: k-Nearest Neighbors (kNN), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). Model performance was assessed using the area under the curve (AUC).</p><p><strong>Results: </strong>SHAP analysis identified ten key blood routine indices. The six models constructed using these indices demonstrated strong predictive performance, with AUC values exceeding 0.8, accuracy above 70%, and sensitivity and specificity over 70%. Notably, the RF model exhibited superior performance in assessing the risk of DVT.</p><p><strong>Conclusions: </strong>Our study successfully developed machine learning models for predicting DVT risk using routine blood tests. These models achieved high predictive performance, suggesting their potential for early DVT diagnosis without additional medical burden on patients. Future research will focus on further validation and refinement of these models to enhance their clinical applicability.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1605258"},"PeriodicalIF":2.4,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12535902/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1682151
Immanuel Azaad Moonesar, M V Manoj Kumar, Khulood Alsayegh, Ayat Abu-Agla, Likewin Thomas
{"title":"Editorial: Navigating the nexus of big data, AI, and public health: transformations, triumphs, and trials in multiple sclerosis care access.","authors":"Immanuel Azaad Moonesar, M V Manoj Kumar, Khulood Alsayegh, Ayat Abu-Agla, Likewin Thomas","doi":"10.3389/fdata.2025.1682151","DOIUrl":"10.3389/fdata.2025.1682151","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1682151"},"PeriodicalIF":2.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12520868/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145310090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-30eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1581734
Rakesh M Verma, Nachum Dershowitz, Victor Zeng, Dainis Boumber, Xuting Liu
Introduction: Internet-based economies and societies are drowning in deceptive attacks. These attacks take many forms, such as fake news, phishing, and job scams, which we call "domains of deception." Machine learning and natural language processing researchers have been attempting to ameliorate this precarious situation by designing domain-specific detectors. Only a few recent works have considered domain-independent deception. We collect these disparate threads of research and investigate domain-independent deception.
Methods: First, we provide a new computational definition of deception and break down deception into a new taxonomy. Then, we briefly mention the debate on linguistic cues for deception. We build a new comprehensive real-world dataset for studying deception. We investigate common linguistic features for deception using both classical and deep learning models in a variety of situations including cross-domain experiments.
Results: We find common linguistic cues for deception and give significant evidence for knowledge transfer across different forms of deception.
Discussion: We list several directions for future work based on our results.
{"title":"Domain-independent deception: a new taxonomy and linguistic analysis.","authors":"Rakesh M Verma, Nachum Dershowitz, Victor Zeng, Dainis Boumber, Xuting Liu","doi":"10.3389/fdata.2025.1581734","DOIUrl":"10.3389/fdata.2025.1581734","url":null,"abstract":"<p><strong>Introduction: </strong>Internet-based economies and societies are drowning in deceptive attacks. These attacks take many forms, such as fake news, phishing, and job scams, which we call \"domains of deception.\" Machine learning and natural language processing researchers have been attempting to ameliorate this precarious situation by designing domain-specific detectors. Only a few recent works have considered domain-independent deception. We collect these disparate threads of research and investigate domain-independent deception.</p><p><strong>Methods: </strong>First, we provide a new computational definition of deception and break down deception into a new taxonomy. Then, we briefly mention the debate on linguistic cues for deception. We build a new comprehensive real-world dataset for studying deception. We investigate common linguistic features for deception using both classical and deep learning models in a variety of situations including cross-domain experiments.</p><p><strong>Results: </strong>We find common linguistic cues for deception and give significant evidence for knowledge transfer across different forms of deception.</p><p><strong>Discussion: </strong>We list several directions for future work based on our results.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1581734"},"PeriodicalIF":2.4,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12521749/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145310044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-25eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1648730
Willy A Valdivia-Granda
Illicit firearms trafficking imposes severe social and economic costs, eroding public safety, distorting markets, and weakening state capacity while affecting vulnerable populations. Despite its profound consequences for global health, trade, and security, the network structure and dynamics of illicit firearms trafficking are one of the most elusive dimensions of transnational organized crime. News reports documenting these events are fragmented across countries, languages, and outlets with different levels of quality and bias. Motivated by the disproportionate impact in Latin America, this study operationalizes the International Classification of Crime for Statistical Purposes (ICCS) to convert multilingual news into structured and auditable indicators through a three-part analytic pipeline using BERT architecture and zero-shot prompts for entity resolution. This analytical approach generated outputs enriched with named entities, geocodes, and timestamps and stored as structured JSON, enabling reproducible analysis. The results of this implementation identified 8,171 firearms trafficking reports published from 2014 through July 2024. The number of firearms-related reports rose sharply over the decade. Incidents increase roughly tenfold, and the geographic footprint expands from about twenty to more than eighty countries, with a one hundred fifty five percent increase from 2022 to 2023. Correlation analysis links firearms trafficking to twelve other ICCS Level 1 categories, including drug trafficking, human trafficking, homicide, terrorism, and environmental crimes. Entity extraction and geocoding show a clear maritime bias; ports are referenced about six times more often than land or air routes. The analysis yielded eighty-five distinct points of entry or exit and forty-one named transnational criminal organizations, though attribution appears in only about forty percent of reports. This is the first automated and multilingual application of ICCS to firearms trafficking using modern language technologies. The outputs enable early warning through signals associated with ICCS categories, cross-border coordination focused on recurrent routes and high-risk ports, and evaluation of interventions. In short, embedding ICCS in a reproducible pipeline transforms fragmented media narratives into comparable evidence for strategic, tactical, and operational environments.
{"title":"Structure and dynamics mapping of illicit firearms trafficking using artificial intelligence models.","authors":"Willy A Valdivia-Granda","doi":"10.3389/fdata.2025.1648730","DOIUrl":"10.3389/fdata.2025.1648730","url":null,"abstract":"<p><p>Illicit firearms trafficking imposes severe social and economic costs, eroding public safety, distorting markets, and weakening state capacity while affecting vulnerable populations. Despite its profound consequences for global health, trade, and security, the network structure and dynamics of illicit firearms trafficking are one of the most elusive dimensions of transnational organized crime. News reports documenting these events are fragmented across countries, languages, and outlets with different levels of quality and bias. Motivated by the disproportionate impact in Latin America, this study operationalizes the International Classification of Crime for Statistical Purposes (ICCS) to convert multilingual news into structured and auditable indicators through a three-part analytic pipeline using BERT architecture and zero-shot prompts for entity resolution. This analytical approach generated outputs enriched with named entities, geocodes, and timestamps and stored as structured JSON, enabling reproducible analysis. The results of this implementation identified 8,171 firearms trafficking reports published from 2014 through July 2024. The number of firearms-related reports rose sharply over the decade. Incidents increase roughly tenfold, and the geographic footprint expands from about twenty to more than eighty countries, with a one hundred fifty five percent increase from 2022 to 2023. Correlation analysis links firearms trafficking to twelve other ICCS Level 1 categories, including drug trafficking, human trafficking, homicide, terrorism, and environmental crimes. Entity extraction and geocoding show a clear maritime bias; ports are referenced about six times more often than land or air routes. The analysis yielded eighty-five distinct points of entry or exit and forty-one named transnational criminal organizations, though attribution appears in only about forty percent of reports. This is the first automated and multilingual application of ICCS to firearms trafficking using modern language technologies. The outputs enable early warning through signals associated with ICCS categories, cross-border coordination focused on recurrent routes and high-risk ports, and evaluation of interventions. In short, embedding ICCS in a reproducible pipeline transforms fragmented media narratives into comparable evidence for strategic, tactical, and operational environments.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1648730"},"PeriodicalIF":2.4,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12507642/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145281761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-22eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1448785
Qaiser Abbas, Waqas Nawaz, Sadia Niazi, Muhammad Awais
Introduction: Navigating legal texts like a national constitution is notoriously difficult due to specialized jargon and complex internal references. For the Constitution of Pakistan, no automated, user-friendly search tool existed to address this challenge. This paper introduces ULBERT, a novel AI-powered information retrieval framework designed to make the constitution accessible to all users, from legal experts to ordinary citizens, in both English and Urdu.
Methods: The system is built around a custom AI model that moves beyond keyword matching to understand the semantic meaning of a user's query. It processes questions in English or Urdu and compares them to the constitutional text, identifying the most relevant passages based on contextual and semantic similarity.
Results: In performance testing, the ULBERT framework proved highly effective. It successfully retrieved the correct constitutional information with an accuracy of 86% for English queries and 73% for Urdu queries.
Discussion: These results demonstrate a significant breakthrough in enhancing the accessibility of foundational legal documents through artificial intelligence. The framework provides an effective and intuitive tool for legal inquiry, empowering a broader audience to understand the Constitution of Pakistan.
{"title":"ULBERT: a domain-adapted BERT model for bilingual information retrieval from Pakistan's constitution.","authors":"Qaiser Abbas, Waqas Nawaz, Sadia Niazi, Muhammad Awais","doi":"10.3389/fdata.2025.1448785","DOIUrl":"10.3389/fdata.2025.1448785","url":null,"abstract":"<p><strong>Introduction: </strong>Navigating legal texts like a national constitution is notoriously difficult due to specialized jargon and complex internal references. For the Constitution of Pakistan, no automated, user-friendly search tool existed to address this challenge. This paper introduces ULBERT, a novel AI-powered information retrieval framework designed to make the constitution accessible to all users, from legal experts to ordinary citizens, in both English and Urdu.</p><p><strong>Methods: </strong>The system is built around a custom AI model that moves beyond keyword matching to understand the semantic meaning of a user's query. It processes questions in English or Urdu and compares them to the constitutional text, identifying the most relevant passages based on contextual and semantic similarity.</p><p><strong>Results: </strong>In performance testing, the ULBERT framework proved highly effective. It successfully retrieved the correct constitutional information with an accuracy of 86% for English queries and 73% for Urdu queries.</p><p><strong>Discussion: </strong>These results demonstrate a significant breakthrough in enhancing the accessibility of foundational legal documents through artificial intelligence. The framework provides an effective and intuitive tool for legal inquiry, empowering a broader audience to understand the Constitution of Pakistan.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1448785"},"PeriodicalIF":2.4,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12497596/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145245803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-22eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1640539
Lucas Wafula Wekesa, Stephen Korir
Introduction: The effectiveness of intelligence operations depends heavily on the reliability and performance of human intelligence (HUMINT) sources. Yet, source behavior is often unpredictable, deceptive or shaped by operational context, complicating resource allocation and tasking decisions.
Methods: This study developed a hybrid framework combining Machine Learning (ML) techniques and Two-Stage Stochastic Programming (TSSP) for HUMINT source performance management under uncertainty. A synthetic dataset reflecting HUMINT operational patterns was generated and used to train classification and regression models. The extreme Gradient Boosting (XGBoost) and Support Vector Machines (SVM) were applied for behavioral classification and prediction of reliability and deception scores. The predictive outputs were then transformed into scenario probabilities and integrated into the TSSP model to optimize task allocation under varying behavioral uncertainties.
Results: The classifiers achieved 98% overall accuracy, with XGBoost exhibiting higher precision and SVM demonstrating superior recall for rare but operationally significant categories. The regression models achieved R-squared scores of 93% for reliability and 81% for deception. These predictive outputs were transformed into scenario probabilities for integration into the TSSP model, optimizing task allocation under varying behavioral risks. When compared to a deterministic optimization baseline, the hybrid framework delivered a 16.8% reduction in expected tasking costs and a 19.3% improvement in mission success rates.
Discussion and conclusion: The findings demonstrated that scenario-based probabilistic planning offers significant advantages over static heuristics in managing uncertainty in HUMINT operations. While the simulation results are promising, validation through field data is required before operational deployment.
{"title":"Enhancing intelligence source performance management through two-stage stochastic programming and machine learning techniques.","authors":"Lucas Wafula Wekesa, Stephen Korir","doi":"10.3389/fdata.2025.1640539","DOIUrl":"10.3389/fdata.2025.1640539","url":null,"abstract":"<p><strong>Introduction: </strong>The effectiveness of intelligence operations depends heavily on the reliability and performance of human intelligence (HUMINT) sources. Yet, source behavior is often unpredictable, deceptive or shaped by operational context, complicating resource allocation and tasking decisions.</p><p><strong>Methods: </strong>This study developed a hybrid framework combining Machine Learning (ML) techniques and Two-Stage Stochastic Programming (TSSP) for HUMINT source performance management under uncertainty. A synthetic dataset reflecting HUMINT operational patterns was generated and used to train classification and regression models. The extreme Gradient Boosting (XGBoost) and Support Vector Machines (SVM) were applied for behavioral classification and prediction of reliability and deception scores. The predictive outputs were then transformed into scenario probabilities and integrated into the TSSP model to optimize task allocation under varying behavioral uncertainties.</p><p><strong>Results: </strong>The classifiers achieved 98% overall accuracy, with XGBoost exhibiting higher precision and SVM demonstrating superior recall for rare but operationally significant categories. The regression models achieved R-squared scores of 93% for reliability and 81% for deception. These predictive outputs were transformed into scenario probabilities for integration into the TSSP model, optimizing task allocation under varying behavioral risks. When compared to a deterministic optimization baseline, the hybrid framework delivered a 16.8% reduction in expected tasking costs and a 19.3% improvement in mission success rates.</p><p><strong>Discussion and conclusion: </strong>The findings demonstrated that scenario-based probabilistic planning offers significant advantages over static heuristics in managing uncertainty in HUMINT operations. While the simulation results are promising, validation through field data is required before operational deployment.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1640539"},"PeriodicalIF":2.4,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12498342/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145245750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-18eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1632766
Peter Müllner, Anna Schreuer, Simone Kopeinik, Bernhard Wieser, Dominik Kowald
Algorithmic decision-support systems, i.e., recommender systems, are popular digital tools that help tourists decide which places and attractions to explore. However, algorithms often unintentionally direct tourist streams in a way that negatively affects the environment, local communities, or other stakeholders. This issue can be partly attributed to the computer science community's limited understanding of the complex relationships and trade-offs among stakeholders in the real world. In this work, we draw on the practical findings and methods from tourism management to inform research on multistakeholder fairness in algorithmic decision-support. Leveraging a semi-systematic literature review, we synthesize literature from tourism management as well as literature from computer science. Our findings suggest that tourism management actively tries to identify the specific needs of stakeholders and utilizes qualitative, inclusive and participatory methods to study fairness from a normative and holistic research perspective. In contrast, computer science lacks sufficient understanding of the stakeholder needs and primarily considers fairness through descriptive factors, such as measureable discrimination, while heavily relying on few mathematically formalized fairness criteria that fail to capture the multidimensional nature of fairness in tourism. With the results of this work, we aim to illustrate the shortcomings of purely algorithmic research and stress the potential and particular need for future interdisciplinary collaboration. We believe such a collaboration is a fundamental and necessary step to enhance algorithmic decision-support systems toward understanding and supporting true multistakeholder fairness in tourism.
{"title":"Multistakeholder fairness in tourism: what can algorithms learn from tourism management?","authors":"Peter Müllner, Anna Schreuer, Simone Kopeinik, Bernhard Wieser, Dominik Kowald","doi":"10.3389/fdata.2025.1632766","DOIUrl":"10.3389/fdata.2025.1632766","url":null,"abstract":"<p><p>Algorithmic decision-support systems, i.e., recommender systems, are popular digital tools that help tourists decide which places and attractions to explore. However, algorithms often unintentionally direct tourist streams in a way that negatively affects the environment, local communities, or other stakeholders. This issue can be partly attributed to the computer science community's limited understanding of the complex relationships and trade-offs among stakeholders in the real world. In this work, we draw on the practical findings and methods from tourism management to inform research on multistakeholder fairness in algorithmic decision-support. Leveraging a semi-systematic literature review, we synthesize literature from tourism management as well as literature from computer science. Our findings suggest that tourism management actively tries to identify the specific needs of stakeholders and utilizes qualitative, inclusive and participatory methods to study fairness from a normative and holistic research perspective. In contrast, computer science lacks sufficient understanding of the stakeholder needs and primarily considers fairness through descriptive factors, such as measureable discrimination, while heavily relying on few mathematically formalized fairness criteria that fail to capture the multidimensional nature of fairness in tourism. With the results of this work, we aim to illustrate the shortcomings of purely algorithmic research and stress the potential and particular need for future interdisciplinary collaboration. We believe such a collaboration is a fundamental and necessary step to enhance algorithmic decision-support systems toward understanding and supporting true multistakeholder fairness in tourism.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1632766"},"PeriodicalIF":2.4,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12488424/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145234040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-12eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1655260
Harmehr Sekhon, Farid Al Zoubi, Paul E Beaulé, Pascal Fallavollita
Background: The use of machine learning (ML) in surgery till date has largely focused on predication of surgical variables, which has not been found to significantly improve operating room efficiencies and surgical success rates (SSR). Due to the long surgery wait times, limited health care resources and an increased population need, innovative ML models are needed. Thus, the Framework for AI-based Surgical Transformation (FAST) was created to make real time recommendations to improve OR efficiency.
Methods: The FAST model was developed and evaluated using a dataset of n=4796 orthopedic cases that utilizes surgery and team specific variables (e.g. specific team composition, OR turnover time, procedure duration), along with regular positive deviance seminars with the stakeholders for adherence and uptake. FAST was created using six ML algorithms, including decision trees and neural networks. The FAST was implemented in orthopedic surgeries at a hospital in Canada's capital (Ottawa).
Results: FAST was found to be feasible and implementable in the hospital orthopedic OR, with good team engagement due to the PD seminars. FAST led to a SSR of 93% over 23 weeks (57 arthroplasty surgery days) compared to 39% at baseline. Key variables impacting SSR included starting the first surgery on time, turnover time, and team composition.
Conclusions: FAST is a novel ML framework that can provide real time feedback for improving OR efficiency and SSR. Stakeholder integration is key in its success in uptake and adherence. This unique framework can be implemented in different hospitals and for diverse surgeries, offering a novel and innovative application of ML for improving OR efficiency without additional resources.
{"title":"FAST-framework for AI-based surgical transformation.","authors":"Harmehr Sekhon, Farid Al Zoubi, Paul E Beaulé, Pascal Fallavollita","doi":"10.3389/fdata.2025.1655260","DOIUrl":"10.3389/fdata.2025.1655260","url":null,"abstract":"<p><strong>Background: </strong>The use of machine learning (ML) in surgery till date has largely focused on predication of surgical variables, which has not been found to significantly improve operating room efficiencies and surgical success rates (SSR). Due to the long surgery wait times, limited health care resources and an increased population need, innovative ML models are needed. Thus, the Framework for AI-based Surgical Transformation (FAST) was created to make real time recommendations to improve OR efficiency.</p><p><strong>Methods: </strong>The FAST model was developed and evaluated using a dataset of n=4796 orthopedic cases that utilizes surgery and team specific variables (e.g. specific team composition, OR turnover time, procedure duration), along with regular positive deviance seminars with the stakeholders for adherence and uptake. FAST was created using six ML algorithms, including decision trees and neural networks. The FAST was implemented in orthopedic surgeries at a hospital in Canada's capital (Ottawa).</p><p><strong>Results: </strong>FAST was found to be feasible and implementable in the hospital orthopedic OR, with good team engagement due to the PD seminars. FAST led to a SSR of 93% over 23 weeks (57 arthroplasty surgery days) compared to 39% at baseline. Key variables impacting SSR included starting the first surgery on time, turnover time, and team composition.</p><p><strong>Conclusions: </strong>FAST is a novel ML framework that can provide real time feedback for improving OR efficiency and SSR. Stakeholder integration is key in its success in uptake and adherence. This unique framework can be implemented in different hospitals and for diverse surgeries, offering a novel and innovative application of ML for improving OR efficiency without additional resources.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1655260"},"PeriodicalIF":2.4,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12463642/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145187517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}