Pub Date : 2025-06-16DOI: 10.1007/s40745-025-00619-7
Lizhen Zhang
With the increasing complexity of oil and gas pipeline networks, early identification of leaks and defects is crucial to ensure the safe operation of pipelines. This study proposes a graph neural network (GNN) method for data processing and defect identification aimed at optimizing monitoring and maintenance strategies for oil and gas pipelines. Through the analysis of historical leakage data, we constructed a graph database containing 5000 samples, each containing 10 features such as pressure, flow, temperature, etc. Using graph convolutional network and graph attention network (GAT) to perform feature extraction and pattern recognition on nodes in pipeline network, our model achieves 92% accuracy in defect recognition, which is 15% higher than traditional methods. In addition, we have developed a leakage prediction model based on time series analysis, which is able to predict potential leakage risks 24 h in advance with an accuracy of 85%. The results of this study not only improve the safety management level of oil and gas pipelines, but also provide a new technical path for future intelligent pipeline maintenance.
{"title":"Optimization of Oil and Gas Pipeline Leakage Data and Defect Identification Based on Graph Neural Processing","authors":"Lizhen Zhang","doi":"10.1007/s40745-025-00619-7","DOIUrl":"10.1007/s40745-025-00619-7","url":null,"abstract":"<div><p>With the increasing complexity of oil and gas pipeline networks, early identification of leaks and defects is crucial to ensure the safe operation of pipelines. This study proposes a graph neural network (GNN) method for data processing and defect identification aimed at optimizing monitoring and maintenance strategies for oil and gas pipelines. Through the analysis of historical leakage data, we constructed a graph database containing 5000 samples, each containing 10 features such as pressure, flow, temperature, etc. Using graph convolutional network and graph attention network (GAT) to perform feature extraction and pattern recognition on nodes in pipeline network, our model achieves 92% accuracy in defect recognition, which is 15% higher than traditional methods. In addition, we have developed a leakage prediction model based on time series analysis, which is able to predict potential leakage risks 24 h in advance with an accuracy of 85%. The results of this study not only improve the safety management level of oil and gas pipelines, but also provide a new technical path for future intelligent pipeline maintenance.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1413 - 1430"},"PeriodicalIF":0.0,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-025-00619-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-03DOI: 10.1007/s40745-025-00618-8
Xiao Zhang
This research investigates the application of the Machine Learning (ML) model for effective and equitable essay scoring in education. Unlike their human counterpart, ML models have the capacity to rapidly analyze scores of essays, providing timely and equitable scores that take into account varying student demographics and styles of writing. This function helps in the identification of classroom problems and supports the design of focused teaching methodologies. For the study, a Light Gradient Boosting Classification (LGBC) model was optimized by three optimizers: Black Widow Optimization (BWO), Zebra Optimization Algorithm (ZOA), and Leader Harris Hawks Optimization (LHHO), for the development of the hybrid models with a focus on improved prediction quality. Comparison of these hybrid models with the base LGBC model was performed through different phases, such as Training, Validation, and Testing. The findings show that the LGLH model exhibited improved performance with an accuracy rate of 0.981, followed by the LGZO model with 0.971 and the LGBW model with 0.963. The lowest rate of accuracy was observed in the base LGBC model, which was 0.946. The results demonstrate the efficacy of hybrid models, which harness the optimality of several optimization techniques and provide more robust results for complicated tasks. The study emphasizes the importance of selecting the appropriate model architecture to achieve optimal performance, providing valuable insights into model efficacy at various stages of evaluation.
{"title":"Improving Predictive Accuracy in Writing Assessment Through Advanced Machine Learning Techniques","authors":"Xiao Zhang","doi":"10.1007/s40745-025-00618-8","DOIUrl":"10.1007/s40745-025-00618-8","url":null,"abstract":"<div><p>This research investigates the application of the Machine Learning (ML) model for effective and equitable essay scoring in education. Unlike their human counterpart, ML models have the capacity to rapidly analyze scores of essays, providing timely and equitable scores that take into account varying student demographics and styles of writing. This function helps in the identification of classroom problems and supports the design of focused teaching methodologies. For the study, a Light Gradient Boosting Classification (LGBC) model was optimized by three optimizers: Black Widow Optimization (BWO), Zebra Optimization Algorithm (ZOA), and Leader Harris Hawks Optimization (LHHO), for the development of the hybrid models with a focus on improved prediction quality. Comparison of these hybrid models with the base LGBC model was performed through different phases, such as Training, Validation, and Testing. The findings show that the LGLH model exhibited improved performance with an accuracy rate of 0.981, followed by the LGZO model with 0.971 and the LGBW model with 0.963. The lowest rate of accuracy was observed in the base LGBC model, which was 0.946. The results demonstrate the efficacy of hybrid models, which harness the optimality of several optimization techniques and provide more robust results for complicated tasks. The study emphasizes the importance of selecting the appropriate model architecture to achieve optimal performance, providing valuable insights into model efficacy at various stages of evaluation.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1389 - 1412"},"PeriodicalIF":0.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145161714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-16DOI: 10.1007/s40745-025-00598-9
Dee Bruce Sun, Stephen Sun
<div><p>The construction of magic squares is an ancient mathematical problem with numerous applications in areas such as information transmission in network security. While there have been various approaches to constructing high-order magic squares, they are typically computationally intensive. We explore and adapt approaches with roots in Chinese classics to develop a set of simple yet comprehensive methods for constructing magic squares of <i>any</i> order. For generating odd-order magic squares, the traditional "Siamese method" (The Siamese Method(or De la Loubère Method): starting from the central box of the first row with the number 1 (or the first number of any arithmetic progression), the fundamental movement for filling the boxes is diagonally up and right (↗), one step at a time. When a move would leave the square, it is wrapped around to the last row or first column, respectively. If a filled box is encountered, one moves vertically down one box (↓) instead, then continuing as before.) was introduced over 300 years ago by Simon de la Loubère, a French ambassador to Thailand in later seventeenth century. While workable, the Siamese method is cumbersome and lacks an underlying logic (Benjamin et al. in Coll Math J 45:92–100, 2014). We employ a concept derived from the circular structure of Five Elements (The <b>Five Elements</b> (Wuxing) is one of the two fundamental Theories define Chinese civilization. It is characterized essentially by five components to interpret everything. In endless cycles, Five Elements are mutually generative, and at the same time mutually restrictive. For the generation relationship, <b>Metal</b> generates <b>Water</b> generates <b>Wood</b> generates <b>Fire</b> generates <b>Earth</b> generates Metal generates Water generates Wood……(see Fig. 1B below); For the restraint relationship, Metal destructs Wood destructs Earth destructs Water destructs Fire destructs Metal destructs Wood destructs Earth generates Water …… This key concept of dynamic and constant balancing, unbalancing, rebalancing process applies universally to explaining natural phenomena, cosmology, medicine, politics, and human affairs.) in Chinese culture to establish a prototype for constructing odd-order magic squares. This approach significantly simplifies the algorithm, and reveals the underlying mathematical logic and mechanism of the Siamese method. We also adapt the composition of the Bagua (The <b>Yin-Yang</b> is another fundamental theory of Chinese thoughts, originating from Yijing. It is characterized with the Yin-Yang duality and interdependence of two for all things in the universe. Where Yin represents the passive, dark, cold, feminine, and <b>receptive</b> aspects of nature; Yang represents the active, bright, hot, masculine, and <b>expansive</b> aspects of nature. Yin and Yang are contrast yet complementary. The Yin Sphere and Yang Sphere are in opposition, eternally embracing together, moving toward and transforming into each other
魔方的构造是一个古老的数学问题,在网络安全信息传输等领域有着广泛的应用。虽然有各种各样的方法来构造高阶幻方,但它们通常是计算密集型的。我们探索和适应根植于中国经典的方法,以发展一套简单而全面的方法来构建任意顺序的魔方。为了生成奇阶魔方,传统的“暹罗法”(the Siamese method(或De la loub方法):从第一行的中心盒子开始,上面有数字1(或任何等差数列的第一个数字),填充盒子的基本运动是对角线向上和向右(xxx),一次一步。当移动离开方块时,将分别绕到最后一行或第一列。如果遇到一个填满的盒子,你可以垂直向下移动一个盒子(↓),然后像以前一样继续。300多年前,17世纪后期,法国驻泰国大使Simon de la loub引入了这个方法。虽然可行,但Siamese方法繁琐且缺乏底层逻辑(Benjamin et al. in Coll Math J 45:92-100, 2014)。我们采用了一个源自五行圆形结构的概念(五行是定义中华文明的两大基本理论之一)。它的基本特征是五个部分来解释一切。在无尽的循环中,五行相互生发,同时又相互制约。生成关系为:金属生成水生成木生成火生成土生成金属生成水生成木......(见下图1B);对于约束关系,金灭木灭土灭水灭火灭金灭木灭地生水......这个动态和恒定的平衡、不平衡、再平衡过程的关键概念,在中国文化中普遍适用于解释自然现象、宇宙学、医学、政治和人伦事务),为奇阶魔方的构造建立了原型。这种方法大大简化了算法,并揭示了Siamese方法的基本数学逻辑和机制。我们还适应了八卦(阴阳是中国思想的另一个基本理论,起源于易经)的组成。它的特点是阴阳二元性和宇宙万物的相互依存。阴代表被动,黑暗,寒冷,女性化和自然的接受方面;阳代表着活跃、明亮、炽热、阳刚和开阔的天性。阴和阳是对比而又互补的。阴阳两界是对立的,永远相拥、相移、相转化。(《易经》中的八卦)从中国经典中的古代智慧中发展出一个建构k级魔方的原型(宗羲《易学相书论》,九州出版社,2007;朱《易学哲学史》,北京大学出版社,1986)。这反过来又使我们能够建立一个全面的算法来涵盖所有阶的魔方的构造。我们的综合算法能够构造非常高阶的魔方,这些魔方具有高度的对称性,可以产生无数的变化(McCranie in Math Teach 81:674-678, 1988)。因此,它有望应用于数据传输的网络安全,以及其他应用,如密码加密等。
{"title":"New Algorithms for Constructing Magic Squares with Ancient Chinese Wisdoms","authors":"Dee Bruce Sun, Stephen Sun","doi":"10.1007/s40745-025-00598-9","DOIUrl":"10.1007/s40745-025-00598-9","url":null,"abstract":"<div><p>The construction of magic squares is an ancient mathematical problem with numerous applications in areas such as information transmission in network security. While there have been various approaches to constructing high-order magic squares, they are typically computationally intensive. We explore and adapt approaches with roots in Chinese classics to develop a set of simple yet comprehensive methods for constructing magic squares of <i>any</i> order. For generating odd-order magic squares, the traditional \"Siamese method\" (The Siamese Method(or De la Loubère Method): starting from the central box of the first row with the number 1 (or the first number of any arithmetic progression), the fundamental movement for filling the boxes is diagonally up and right (↗), one step at a time. When a move would leave the square, it is wrapped around to the last row or first column, respectively. If a filled box is encountered, one moves vertically down one box (↓) instead, then continuing as before.) was introduced over 300 years ago by Simon de la Loubère, a French ambassador to Thailand in later seventeenth century. While workable, the Siamese method is cumbersome and lacks an underlying logic (Benjamin et al. in Coll Math J 45:92–100, 2014). We employ a concept derived from the circular structure of Five Elements (The <b>Five Elements</b> (Wuxing) is one of the two fundamental Theories define Chinese civilization. It is characterized essentially by five components to interpret everything. In endless cycles, Five Elements are mutually generative, and at the same time mutually restrictive. For the generation relationship, <b>Metal</b> generates <b>Water</b> generates <b>Wood</b> generates <b>Fire</b> generates <b>Earth</b> generates Metal generates Water generates Wood……(see Fig. 1B below); For the restraint relationship, Metal destructs Wood destructs Earth destructs Water destructs Fire destructs Metal destructs Wood destructs Earth generates Water …… This key concept of dynamic and constant balancing, unbalancing, rebalancing process applies universally to explaining natural phenomena, cosmology, medicine, politics, and human affairs.) in Chinese culture to establish a prototype for constructing odd-order magic squares. This approach significantly simplifies the algorithm, and reveals the underlying mathematical logic and mechanism of the Siamese method. We also adapt the composition of the Bagua (The <b>Yin-Yang</b> is another fundamental theory of Chinese thoughts, originating from Yijing. It is characterized with the Yin-Yang duality and interdependence of two for all things in the universe. Where Yin represents the passive, dark, cold, feminine, and <b>receptive</b> aspects of nature; Yang represents the active, bright, hot, masculine, and <b>expansive</b> aspects of nature. Yin and Yang are contrast yet complementary. The Yin Sphere and Yang Sphere are in opposition, eternally embracing together, moving toward and transforming into each other","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 6","pages":"1775 - 1798"},"PeriodicalIF":0.0,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145537721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-08DOI: 10.1007/s40745-025-00611-1
Hendrik Santoso Sugiarto, Yozef Tjandra
In university admissions, interaction networks naturally emerge between prospective students and available majors. Understanding hidden patterns in such a vast network is crucial for decision-making but poses technical challenges due to its complexity and data limitations. Many existing models rely heavily on user profiling, raising privacy concerns and making data collection difficult. Instead, this work extracts meaningful insights using only the adjacency information of the network, avoiding the need for personal data. We leverage Graph Convolutional Networks (GCN) to generate compact representations for major recommendation and clustering tasks. Our GCN-based approach outperforms classical methods such as popularity-based and Non-negative Matrix Factorization (NMF), as well as the neural Generalized Matrix Factorization (GMF) model, achieving up to 61.06% and 12.17% improvements in smaller (dimension 40) and larger (dimension 80) embeddings, respectively. Furthermore, hierarchical clustering on these embeddings reveals implicit patterns in student preferences, particularly regarding fields of study and geographic locations, even without explicit data on these attributes. These findings demonstrate that meaningful insights can be derived from interaction networks while mitigating privacy concerns associated with user profiling.
{"title":"Uncovering University Application Patterns Through Graph Representation Learning","authors":"Hendrik Santoso Sugiarto, Yozef Tjandra","doi":"10.1007/s40745-025-00611-1","DOIUrl":"10.1007/s40745-025-00611-1","url":null,"abstract":"<div><p>In university admissions, interaction networks naturally emerge between prospective students and available majors. Understanding hidden patterns in such a vast network is crucial for decision-making but poses technical challenges due to its complexity and data limitations. Many existing models rely heavily on user profiling, raising privacy concerns and making data collection difficult. Instead, this work extracts meaningful insights using only the adjacency information of the network, avoiding the need for personal data. We leverage Graph Convolutional Networks (GCN) to generate compact representations for major recommendation and clustering tasks. Our GCN-based approach outperforms classical methods such as popularity-based and Non-negative Matrix Factorization (NMF), as well as the neural Generalized Matrix Factorization (GMF) model, achieving up to 61.06% and 12.17% improvements in smaller (dimension 40) and larger (dimension 80) embeddings, respectively. Furthermore, hierarchical clustering on these embeddings reveals implicit patterns in student preferences, particularly regarding fields of study and geographic locations, even without explicit data on these attributes. These findings demonstrate that meaningful insights can be derived from interaction networks while mitigating privacy concerns associated with user profiling.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1343 - 1368"},"PeriodicalIF":0.0,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145163350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-04DOI: 10.1007/s40745-025-00606-y
Xuefan Dong, Lei Tang
In the rapidly evolving landscape of online information dissemination, managing rumors has become an imperative challenge for governments worldwide. This study employs a tripartite evolutionary game model to examine the behavior evolution of the government, online media, and netizens in the process of rumor propagation under uncertain conditions. The innovation of the model lies in considering the probability of successful rumor detection under government regulation, the uncertainty of rumor dissemination by online media and netizens, and introducing a dynamic government penalty mechanism. Through simulation and analysis, we identify the evolutionarily stable strategies of each participant under different scenarios and provide specific governance strategies for each party involved. The results reveal that appropriate government penalties, proactive regulation by online media, and rational choices by netizens can effectively curb rumor spreading. In uncertain environments, adopting flexible policies and dynamic adjustment mechanisms is crucial for effective rumor governance. The results reveal that appropriate government penalties, proactive regulation by online media, and rational choices by netizens can effectively curb rumor spreading. In uncertain environments, adopting flexible policies and dynamic adjustment mechanisms is crucial for effective rumor governance. This study not only enriches the application of evolutionary game theory but also offers practical strategic recommendations for policymakers to address the challenges of rumor propagation.
{"title":"Rumor Governance Under Uncertain Conditions: An Evolutionary Game Theory Analysis","authors":"Xuefan Dong, Lei Tang","doi":"10.1007/s40745-025-00606-y","DOIUrl":"10.1007/s40745-025-00606-y","url":null,"abstract":"<div><p>In the rapidly evolving landscape of online information dissemination, managing rumors has become an imperative challenge for governments worldwide. This study employs a tripartite evolutionary game model to examine the behavior evolution of the government, online media, and netizens in the process of rumor propagation under uncertain conditions. The innovation of the model lies in considering the probability of successful rumor detection under government regulation, the uncertainty of rumor dissemination by online media and netizens, and introducing a dynamic government penalty mechanism. Through simulation and analysis, we identify the evolutionarily stable strategies of each participant under different scenarios and provide specific governance strategies for each party involved. The results reveal that appropriate government penalties, proactive regulation by online media, and rational choices by netizens can effectively curb rumor spreading. In uncertain environments, adopting flexible policies and dynamic adjustment mechanisms is crucial for effective rumor governance. The results reveal that appropriate government penalties, proactive regulation by online media, and rational choices by netizens can effectively curb rumor spreading. In uncertain environments, adopting flexible policies and dynamic adjustment mechanisms is crucial for effective rumor governance. This study not only enriches the application of evolutionary game theory but also offers practical strategic recommendations for policymakers to address the challenges of rumor propagation.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 3","pages":"1073 - 1111"},"PeriodicalIF":0.0,"publicationDate":"2025-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145161941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-04DOI: 10.1007/s40745-025-00600-4
An Yingjian, La Ping
Automatic generation of test cases using heuristic methods is a hot research topic nowadays. Although its advantages are obvious, it is slightly insufficient in the selection of optimal individuals. Aiming at the existing problems in the evaluation and selection of the optimal individual, this paper proposes a test case evaluation algorithm based on the comprehensive analysis of the characteristics of layer proximity and branch distance function, which is a joint structure of “layer proximity and branch distance function”. The basic idea of this algorithm is that when selecting pilot individuals in the evolutionary process, we first select the individuals with high proximity between the actual execution path and the target path, and then select the individuals with the smallest branching distances among these individuals, so as to obtain the individuals with the optimal piloting ability. Experiments show that the proposed algorithm can quickly find the optimal test cases, especially for the test case generation of multi-layer nested programs.
{"title":"Optimal Individual Selection Algorithm Based on Layer Proximity and Branch Distance Functions","authors":"An Yingjian, La Ping","doi":"10.1007/s40745-025-00600-4","DOIUrl":"10.1007/s40745-025-00600-4","url":null,"abstract":"<div><p>Automatic generation of test cases using heuristic methods is a hot research topic nowadays. Although its advantages are obvious, it is slightly insufficient in the selection of optimal individuals. Aiming at the existing problems in the evaluation and selection of the optimal individual, this paper proposes a test case evaluation algorithm based on the comprehensive analysis of the characteristics of layer proximity and branch distance function, which is a joint structure of “layer proximity and branch distance function”. The basic idea of this algorithm is that when selecting pilot individuals in the evolutionary process, we first select the individuals with high proximity between the actual execution path and the target path, and then select the individuals with the smallest branching distances among these individuals, so as to obtain the individuals with the optimal piloting ability. Experiments show that the proposed algorithm can quickly find the optimal test cases, especially for the test case generation of multi-layer nested programs.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 3","pages":"1041 - 1054"},"PeriodicalIF":0.0,"publicationDate":"2025-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145161940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-02DOI: 10.1007/s40745-025-00604-0
Jun Li, Chenwu Shan, Liyan Shen, Yawei Ren, Jiajie Zhang
Detectors have been extensively utilized in various scenarios such as autonomous driving and video surveillance. Nonetheless, recent studies have revealed that these detectors are vulnerable to adversarial attacks, particularly adversarial patch attacks. Adversarial patches are specifically crafted to disrupt deep learning models by disturbing image regions, thereby misleading the deep learning models when added to into normal images. Traditional adversarial patches often lack semantics, posing challenges in maintaining concealment in physical world scenarios. To tackle this issue, this paper proposes a Prompt-based Natural Adversarial Patch generation method, which creates patches controllable by textual descriptions to ensure flexibility in application. This approach leverages the latest text-to-image generation model—Latent Diffusion Model (LDM) to produce adversarial patches. We optimize the attack performance of the patches by updating the latent variables of LDM through a combined loss function. Experimental results indicate that our method can generate more natural, semantically rich adversarial patches, achieving effective attacks on various detectors.
{"title":"PNAP-YOLO: An Improved Prompts-Based Naturalistic Adversarial Patch Model for Object Detectors","authors":"Jun Li, Chenwu Shan, Liyan Shen, Yawei Ren, Jiajie Zhang","doi":"10.1007/s40745-025-00604-0","DOIUrl":"10.1007/s40745-025-00604-0","url":null,"abstract":"<div><p>Detectors have been extensively utilized in various scenarios such as autonomous driving and video surveillance. Nonetheless, recent studies have revealed that these detectors are vulnerable to adversarial attacks, particularly adversarial patch attacks. Adversarial patches are specifically crafted to disrupt deep learning models by disturbing image regions, thereby misleading the deep learning models when added to into normal images. Traditional adversarial patches often lack semantics, posing challenges in maintaining concealment in physical world scenarios. To tackle this issue, this paper proposes a Prompt-based Natural Adversarial Patch generation method, which creates patches controllable by textual descriptions to ensure flexibility in application. This approach leverages the latest text-to-image generation model—Latent Diffusion Model (LDM) to produce adversarial patches. We optimize the attack performance of the patches by updating the latent variables of LDM through a combined loss function. Experimental results indicate that our method can generate more natural, semantically rich adversarial patches, achieving effective attacks on various detectors.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 3","pages":"1055 - 1072"},"PeriodicalIF":0.0,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145161140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-21DOI: 10.1007/s40745-025-00588-x
Jixia Zheng, Rui Chen, Qinggen Zeng, Yanan Chen, Qianlin Ye
This study examines the issue of green information distortion and its impact on tourists’ purchasing decisions, as well as the associated high transaction costs within the green tourism supply chain. By selecting a green tourism supply chain with varying government subsidy schemes as the focus of the research, the objective is to explore optimal subsidy strategies and assess the implications of blockchain integration. A three-level Stackelberg game model is established, featuring the government as the leader and a scenic spot (SS) and travel agency (TA) as participants. Key findings include: (1) Production subsidies are more effective in boosting market demand than environmental investment subsidies, particularly when tourist green trust and preferences are high. (2) Blockchain enhances greenness, market demand, and social welfare, positively influencing the green tourism supply chain (GTSC). (3) Tourist green preference and trust significantly affect GTSC optimization, especially as preferences increase. Additionally, a cost-sharing smart contract mechanism is designed to mitigate environmental investment's negative impact and optimize social welfare and product greenness.
{"title":"Sustainable Development of Green Tourism Supply Chain Considering Blockchain Traceability and Government Subsidies","authors":"Jixia Zheng, Rui Chen, Qinggen Zeng, Yanan Chen, Qianlin Ye","doi":"10.1007/s40745-025-00588-x","DOIUrl":"10.1007/s40745-025-00588-x","url":null,"abstract":"<div><p>This study examines the issue of green information distortion and its impact on tourists’ purchasing decisions, as well as the associated high transaction costs within the green tourism supply chain. By selecting a green tourism supply chain with varying government subsidy schemes as the focus of the research, the objective is to explore optimal subsidy strategies and assess the implications of blockchain integration. A three-level Stackelberg game model is established, featuring the government as the leader and a scenic spot (SS) and travel agency (TA) as participants. Key findings include: (1) Production subsidies are more effective in boosting market demand than environmental investment subsidies, particularly when tourist green trust and preferences are high. (2) Blockchain enhances greenness, market demand, and social welfare, positively influencing the green tourism supply chain (GTSC). (3) Tourist green preference and trust significantly affect GTSC optimization, especially as preferences increase. Additionally, a cost-sharing smart contract mechanism is designed to mitigate environmental investment's negative impact and optimize social welfare and product greenness.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1315 - 1342"},"PeriodicalIF":0.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145167864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-28DOI: 10.1007/s40745-025-00587-y
Dominic B. Dayta, Erniel B. Barrios
Common methods used for topic modeling have generally suffered problems of overfitting, leading to diminished predictive performance, as well as a weakness towards reconstructing sparse topic structures that involve only a few critical words to aid in interpretation. Considering the text typically contained in customer feedback, this paper proposes a semiparametric topic model utilizing a two-step approach: (1) makes use of nonnegative matrix factorization to recover topic distributions based on word co-occurrences and; (2) use semiparametric regression to identify factors driving the expression of particular topics in the documents given additional auxiliary information such as location, time of writing, and other features of the author. This approach provides a generative model that can be useful for predicting topics in new documents based on these auxiliary variables, and is demonstrated to accurately identify topics even for documents limited in length or size of vocabulary. In an application to real customer feedback, the topics provided by our model are shown to be as interpretable and useful for downstream analysis tasks as with those produced by current legacy methods.
{"title":"Semiparametric Latent Topic Modeling on Consumer-Generated Corpora","authors":"Dominic B. Dayta, Erniel B. Barrios","doi":"10.1007/s40745-025-00587-y","DOIUrl":"10.1007/s40745-025-00587-y","url":null,"abstract":"<div><p>Common methods used for topic modeling have generally suffered problems of overfitting, leading to diminished predictive performance, as well as a weakness towards reconstructing sparse topic structures that involve only a few critical words to aid in interpretation. Considering the text typically contained in customer feedback, this paper proposes a semiparametric topic model utilizing a two-step approach: (1) makes use of nonnegative matrix factorization to recover topic distributions based on word co-occurrences and; (2) use semiparametric regression to identify factors driving the expression of particular topics in the documents given additional auxiliary information such as location, time of writing, and other features of the author. This approach provides a generative model that can be useful for predicting topics in new documents based on these auxiliary variables, and is demonstrated to accurately identify topics even for documents limited in length or size of vocabulary. In an application to real customer feedback, the topics provided by our model are shown to be as interpretable and useful for downstream analysis tasks as with those produced by current legacy methods.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 6","pages":"1941 - 1963"},"PeriodicalIF":0.0,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-025-00587-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145537720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With digitisation globally on the rise, corporates are compelled to better understand the usage of their websites. In doing so, corporates will be empowered to better understand consumers, and make necessary adjustments to ultimately improve the corporate’s stance in the competitive global landscape of this modern age. However, the online website visit data has proven to be highly complex, big in data volume, and highly transactional with users expressing unique behaviours. Thus, extracting insight can be a complex problem to solve. This study aimed to employ unsupervised machine learning models to identify the intentions behind the visits on the observed website. The data studied was sourced from the Google Analytics tracking tool that was deployed on a corporate informative website. The study employed a k-means, hierarchical and dbscan unsupervised machine learning models to understand the intents behind visitors on the studied website. All three models detected five major intents that were expressed within the observed data. The intents identified were labelled as “accidentals”, “drop-offs”, “engrossed”, “get-in-touch” and “seekers”. On the observed data, all three unsupervised machine learning methods have performed well. However, in the context of the study, which investigated the intents that drove online visits, the hierarchical clustering method yielded superior results by maintaining the best balance between cluster homogeneity (stronger silhouette coefficients) and cluster size.
随着全球数字化程度的不断提高,企业不得不更好地了解其网站的使用情况。这样,企业就能更好地了解消费者,并做出必要的调整,最终提高企业在现代全球竞争格局中的地位。然而,事实证明,在线网站访问数据高度复杂、数据量大、交易性强,用户表现出独特的行为。因此,提取洞察力是一个复杂的问题。本研究旨在采用无监督机器学习模型来识别所观察网站访问背后的意图。所研究的数据来自部署在一个企业信息网站上的 Google Analytics 跟踪工具。研究采用了 k-means、分层和 dbscan 无监督机器学习模型来了解网站访问者背后的意图。这三种模型都检测到了观察数据中表达的五种主要意图。被识别的意图被标记为 "偶然"、"放弃"、"沉迷"、"接触 "和 "寻求"。在观测数据上,三种无监督机器学习方法都表现出色。不过,在调查在线访问意图的研究中,分层聚类方法通过在聚类同质性(更强的剪影系数)和聚类规模之间保持最佳平衡,取得了更优越的结果。
{"title":"Identifying the Intents Behind Website Visits by Employing Unsupervised Machine Learning Models","authors":"Judah Soobramoney, Retius Chifurira, Temesgen Zewotir, Knowledge Chinhamu","doi":"10.1007/s40745-024-00586-5","DOIUrl":"10.1007/s40745-024-00586-5","url":null,"abstract":"<div><p>With digitisation globally on the rise, corporates are compelled to better understand the usage of their websites. In doing so, corporates will be empowered to better understand consumers, and make necessary adjustments to ultimately improve the corporate’s stance in the competitive global landscape of this modern age. However, the online website visit data has proven to be highly complex, big in data volume, and highly transactional with users expressing unique behaviours. Thus, extracting insight can be a complex problem to solve. This study aimed to employ unsupervised machine learning models to identify the intentions behind the visits on the observed website. The data studied was sourced from the Google Analytics tracking tool that was deployed on a corporate informative website. The study employed a k-means, hierarchical and dbscan unsupervised machine learning models to understand the intents behind visitors on the studied website. All three models detected five major intents that were expressed within the observed data. The intents identified were labelled as “accidentals”, “drop-offs”, “engrossed”, “get-in-touch” and “seekers”. On the observed data, all three unsupervised machine learning methods have performed well. However, in the context of the study, which investigated the intents that drove online visits, the hierarchical clustering method yielded superior results by maintaining the best balance between cluster homogeneity (stronger silhouette coefficients) and cluster size.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"413 - 437"},"PeriodicalIF":0.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-024-00586-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}