首页 > 最新文献

Data Science and Management最新文献

英文 中文
Climate physical risks: catalyst or constraint for the convergence of the digital and low-carbon economies? 气候物理风险:数字经济与低碳经济融合的催化剂还是制约因素?
Pub Date : 2025-12-01 DOI: 10.1016/j.dsm.2025.01.004
Ya Cui, Bo Yang
The digital economy and the low-carbon economy are crucial catalysts of global economic change and sustainable development. This research focuses on China and thoroughly examines the coupling and coordination mechanisms between the digital and low-carbon economies. This study employs the coupling coordination degree model to assess the development level of the coupling and coordination between the two economies. It also investigates the influence of climate-related physical threats on their coupled and coordinated development and presents the following findings. First, the integration and synchronized advancement of China's digital and low-carbon economies exhibit a general upward trend, though the average development level remains relatively low, characterized by considerable regional variations. Moreover, the dynamic distribution of coupling and coordination development levels among locations reveals notable disparities. Second, the coupling coordination degree between China's digital and low-carbon economies exhibits distinct spatial correlation, following a “high-high, low-low” distribution pattern, with transfer channels exhibiting spatial dependence. Third, Tobit model analysis reveals that climate-related physical dangers substantially impede the integration and synchronized advancement of digital and low-carbon economies. The underlying mechanism is the inhibition of green low-carbon technological innovation, with extremely low temperatures exerting the most significant suppressive influence. This suppression varies based on geographic location, data development, and resource endowment.
数字经济和低碳经济是全球经济变革和可持续发展的重要催化剂。本文以中国为研究对象,深入探讨了数字经济与低碳经济之间的耦合与协调机制。本文采用耦合协调度模型来评价两国经济的耦合协调发展水平。研究了与气候相关的自然威胁对其耦合协调发展的影响,并提出了以下结论:第一,中国数字经济与低碳经济融合同步发展总体呈上升趋势,但平均发展水平相对较低,区域差异较大。此外,区域间耦合和协调发展水平的动态分布也存在显著差异。②中国数字经济与低碳经济的耦合协调度呈现出明显的空间相关性,呈现出“高-高-低-低”的分布格局,且转移渠道表现出空间依赖性。第三,Tobit模型分析表明,气候相关的物理危险严重阻碍了数字经济和低碳经济的融合和同步发展。其潜在机制是对绿色低碳技术创新的抑制,其中极低温的抑制作用最为显著。这种抑制因地理位置、数据开发和资源禀赋而异。
{"title":"Climate physical risks: catalyst or constraint for the convergence of the digital and low-carbon economies?","authors":"Ya Cui,&nbsp;Bo Yang","doi":"10.1016/j.dsm.2025.01.004","DOIUrl":"10.1016/j.dsm.2025.01.004","url":null,"abstract":"<div><div>The digital economy and the low-carbon economy are crucial catalysts of global economic change and sustainable development. This research focuses on China and thoroughly examines the coupling and coordination mechanisms between the digital and low-carbon economies. This study employs the coupling coordination degree model to assess the development level of the coupling and coordination between the two economies. It also investigates the influence of climate-related physical threats on their coupled and coordinated development and presents the following findings. First, the integration and synchronized advancement of China's digital and low-carbon economies exhibit a general upward trend, though the average development level remains relatively low, characterized by considerable regional variations. Moreover, the dynamic distribution of coupling and coordination development levels among locations reveals notable disparities. Second, the coupling coordination degree between China's digital and low-carbon economies exhibits distinct spatial correlation, following a “high-high, low-low” distribution pattern, with transfer channels exhibiting spatial dependence. Third, Tobit model analysis reveals that climate-related physical dangers substantially impede the integration and synchronized advancement of digital and low-carbon economies. The underlying mechanism is the inhibition of green low-carbon technological innovation, with extremely low temperatures exerting the most significant suppressive influence. This suppression varies based on geographic location, data development, and resource endowment.</div></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"8 4","pages":"Pages 500-518"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of retail commodity hot-spots: a machine learning approach 零售商品热点预测:一种机器学习方法
Pub Date : 2025-12-01 DOI: 10.1016/j.dsm.2025.02.003
Chao Deng , Xipeng Liu , Jinyu Zhang , Yuhua Mo , Paiyu Li , Xuexia Liang , Na Li
The accurate prediction of hot-spot commodities in the retail market is crucial for inventory management and market strategy formulation. Traditional methods focus on analyzing the sales trends of listed commodities to mine hot-spot commodities; however, these methods are incapable of dealing with new commodities that lack historical sales data. Although existing studies have used commodity features to predict hot-spot commodities, commodity features are complex and diverse, and it is difficult for a single prediction model to fully capture and accurately analyze the relationship between commodity features and sales hot-spots. This study proposes a commodity hot-spot classification prediction method based on an ensemble learning model that combines multiple machine learning methods, namely the gradient boosting decision tree, eXtreme gradient boosting, and light gradient-boosting machine, to analyze the relationship between commodity features and retail market sales trends. In addition, a voting strategy is designed to fuse the model classification results. The experimental results show that the accuracy of the ensemble model is as high as 0.91, outperforming a single model and other approaches. This method can effectively help enterprises to identify hot-spot commodities and provide strong support for new commodity designs and sales strategy adjustments.
对零售市场热点商品的准确预测对于库存管理和市场策略的制定至关重要。传统方法侧重于分析上市商品的销售趋势,挖掘热点商品;然而,这些方法无法处理缺乏历史销售数据的新商品。虽然已有研究利用商品特征预测热点商品,但商品特征复杂多样,单一预测模型难以充分捕捉和准确分析商品特征与销售热点之间的关系。本文提出了一种基于集成学习模型的商品热点分类预测方法,该模型结合了多种机器学习方法,即梯度增强决策树、极端梯度增强和轻型梯度增强机,分析商品特征与零售市场销售趋势之间的关系。此外,设计了一种投票策略来融合模型分类结果。实验结果表明,该集成模型的准确率高达0.91,优于单一模型和其他方法。该方法可以有效帮助企业识别热点商品,为新商品设计和销售策略调整提供有力支持。
{"title":"Prediction of retail commodity hot-spots: a machine learning approach","authors":"Chao Deng ,&nbsp;Xipeng Liu ,&nbsp;Jinyu Zhang ,&nbsp;Yuhua Mo ,&nbsp;Paiyu Li ,&nbsp;Xuexia Liang ,&nbsp;Na Li","doi":"10.1016/j.dsm.2025.02.003","DOIUrl":"10.1016/j.dsm.2025.02.003","url":null,"abstract":"<div><div>The accurate prediction of hot-spot commodities in the retail market is crucial for inventory management and market strategy formulation. Traditional methods focus on analyzing the sales trends of listed commodities to mine hot-spot commodities; however, these methods are incapable of dealing with new commodities that lack historical sales data. Although existing studies have used commodity features to predict hot-spot commodities, commodity features are complex and diverse, and it is difficult for a single prediction model to fully capture and accurately analyze the relationship between commodity features and sales hot-spots. This study proposes a commodity hot-spot classification prediction method based on an ensemble learning model that combines multiple machine learning methods, namely the gradient boosting decision tree, eXtreme gradient boosting, and light gradient-boosting machine, to analyze the relationship between commodity features and retail market sales trends. In addition, a voting strategy is designed to fuse the model classification results. The experimental results show that the accuracy of the ensemble model is as high as 0.91, outperforming a single model and other approaches. This method can effectively help enterprises to identify hot-spot commodities and provide strong support for new commodity designs and sales strategy adjustments.</div></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"8 4","pages":"Pages 414-422"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-driven innovation in emerging markets: extending the technology acceptance model–technology-organization-environment framework in small- and medium-sized enterprises 新兴市场人工智能驱动创新:中小企业技术接受模型-技术-组织-环境框架的扩展
Pub Date : 2025-12-01 DOI: 10.1016/j.dsm.2025.04.002
Faizan ul Haq , Norazah Mohd Suki
This study examined the impact of artificial intelligence (AI) adoption on innovation outcomes in Pakistani manufacturing small- and medium-sized enterprises (SMEs), addressing a critical gap in the understanding of how AI drives product and process innovation in resource-constrained settings. While AI adoption (AIA) has been well-studied in larger enterprises and developed markets, its implications for SMEs in emerging economies remain poorly explored. This study investigates the moderating role of leadership support in AIA and AI-driven marketing (AIM) in amplifying innovation outcomes. A quantitative survey of 500 senior and middle management executives from manufacturing SMEs was conducted, and data were analyzed using partial least squares structural equation modeling. Their findings show that technological readiness, organizational preparedness, and strong leadership significantly enhance AIA, resulting in notable improvements in both product and process innovation. Additionally, AIM amplifies the impact of AIA on innovation, thereby maximizing SMEs’ innovative capacity. This study’s theoretical contribution extends the technology acceptance model-technology-organization-environment framework by incorporating leadership support as a moderator for AIA and AIM for innovation outcomes, providing insights into the role of AI in emerging markets. From a managerial standpoint, this study highlights the need for robust leadership, technological infrastructure, and AIM strategies to fully leverage the potential of AI. Policymakers are urged to create supportive environments with strong technological and leadership infrastructures to enable SMEs to achieve sustained growth through AI-driven innovation.
本研究考察了采用人工智能(AI)对巴基斯坦制造业中小企业(sme)创新成果的影响,解决了在理解人工智能如何在资源受限的情况下推动产品和流程创新方面的一个关键空白。虽然人工智能应用(AIA)在大型企业和发达市场已经得到了很好的研究,但它对新兴经济体中小企业的影响仍未得到充分探讨。本研究探讨了领导支持在AIA和ai驱动营销(AIM)中对放大创新成果的调节作用。本文对500名制造业中小企业中高层管理人员进行了定量调查,采用偏最小二乘结构方程模型对数据进行了分析。他们的研究结果表明,技术准备、组织准备和强有力的领导显著提高了AIA,导致产品和工艺创新的显著改善。此外,AIM放大了AIA对创新的影响,从而最大限度地提高了中小企业的创新能力。本研究的理论贡献扩展了技术接受模型-技术-组织-环境框架,将领导支持作为AIA和AIM创新成果的调节因素,提供了对人工智能在新兴市场中的作用的见解。从管理的角度来看,这项研究强调了对强大的领导、技术基础设施和AIM战略的需求,以充分利用人工智能的潜力。敦促政策制定者创造具有强大技术和领导力基础设施的支持性环境,使中小企业能够通过人工智能驱动的创新实现持续增长。
{"title":"AI-driven innovation in emerging markets: extending the technology acceptance model–technology-organization-environment framework in small- and medium-sized enterprises","authors":"Faizan ul Haq ,&nbsp;Norazah Mohd Suki","doi":"10.1016/j.dsm.2025.04.002","DOIUrl":"10.1016/j.dsm.2025.04.002","url":null,"abstract":"<div><div>This study examined the impact of artificial intelligence (AI) adoption on innovation outcomes in Pakistani manufacturing small- and medium-sized enterprises (SMEs), addressing a critical gap in the understanding of how AI drives product and process innovation in resource-constrained settings. While AI adoption (AIA) has been well-studied in larger enterprises and developed markets, its implications for SMEs in emerging economies remain poorly explored. This study investigates the moderating role of leadership support in AIA and AI-driven marketing (AIM) in amplifying innovation outcomes. A quantitative survey of 500 senior and middle management executives from manufacturing SMEs was conducted, and data were analyzed using partial least squares structural equation modeling. Their findings show that technological readiness, organizational preparedness, and strong leadership significantly enhance AIA, resulting in notable improvements in both product and process innovation. Additionally, AIM amplifies the impact of AIA on innovation, thereby maximizing SMEs’ innovative capacity. This study’s theoretical contribution extends the technology acceptance model-technology-organization-environment framework by incorporating leadership support as a moderator for AIA and AIM for innovation outcomes, providing insights into the role of AI in emerging markets. From a managerial standpoint, this study highlights the need for robust leadership, technological infrastructure, and AIM strategies to fully leverage the potential of AI. Policymakers are urged to create supportive environments with strong technological and leadership infrastructures to enable SMEs to achieve sustained growth through AI-driven innovation.</div></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"8 4","pages":"Pages 485-499"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic method for identification of cycles in COVID-19 time-series data COVID-19时间序列数据周期的自动识别方法
Pub Date : 2025-12-01 DOI: 10.1016/j.dsm.2025.02.002
Miaotian Li , Ciprian Doru Giurcăneanu , Jiamou Liu
All previous methods identify cycles in COVID-19 daily and weekly data based on a subjective interpretation of the results. This poses difficulties for researchers interested in conducting comprehensive studies to investigate the presence of cycles in country/territory/area (CTA). Hence, we propose an algorithm that automatically detects the fundamental period T0 and its harmonics. Based on previous literature, we used T0 ​= ​7 days for daily data and T0 ​= ​52 weeks for weekly data. The new algorithm was applied to the time series from 236 CTAs collected by the WHO. The detection results are reported by considering the WHO region to which the CTA belongs or the latitudinal position of the CTA capital. Our results confirm the findings of other researchers in WHO and latitude-based groups. Concurrently, the results provide new information about CTAs for which COVID-19 time-series data have not been carefully examined.
以前的所有方法都是基于对结果的主观解释来确定COVID-19每日和每周数据的周期。这给有兴趣进行全面研究以调查国家/领土/地区(CTA)中周期存在的研究人员带来了困难。因此,我们提出了一种自动检测基本周期T0及其谐波的算法。根据以往文献,我们使用T0 = 7天的每日数据和T0 = 52周的每周数据。将新算法应用于世卫组织收集的236个cta的时间序列。根据CTA所属的WHO区域或CTA首府的纬度位置报告检测结果。我们的结果证实了世卫组织和纬度群体的其他研究人员的发现。同时,研究结果还提供了关于cta的新信息,这些cta的COVID-19时间序列数据尚未得到仔细检查。
{"title":"Automatic method for identification of cycles in COVID-19 time-series data","authors":"Miaotian Li ,&nbsp;Ciprian Doru Giurcăneanu ,&nbsp;Jiamou Liu","doi":"10.1016/j.dsm.2025.02.002","DOIUrl":"10.1016/j.dsm.2025.02.002","url":null,"abstract":"<div><div>All previous methods identify cycles in COVID-19 daily and weekly data based on a subjective interpretation of the results. This poses difficulties for researchers interested in conducting comprehensive studies to investigate the presence of cycles in country/territory/area (CTA). Hence, we propose an algorithm that automatically detects the fundamental period <em>T</em><sub>0</sub> and its harmonics. Based on previous literature, we used <em>T</em><sub>0</sub> ​= ​7 days for daily data and <em>T</em><sub>0</sub> ​= ​52 weeks for weekly data. The new algorithm was applied to the time series from 236 CTAs collected by the WHO. The detection results are reported by considering the WHO region to which the CTA belongs or the latitudinal position of the CTA capital. Our results confirm the findings of other researchers in WHO and latitude-based groups. Concurrently, the results provide new information about CTAs for which COVID-19 time-series data have not been carefully examined.</div></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"8 4","pages":"Pages 447-457"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Counterfactual synthetic minority oversampling technique: solving healthcare's imbalanced learning challenge 反事实合成少数派过采样技术:解决医疗保健不平衡学习挑战
Pub Date : 2025-12-01 DOI: 10.1016/j.dsm.2025.01.006
Goncalo Almeida, Fernando Bacao
The application of machine learning in the healthcare domain has groundbreaking potential across a wide range of scenarios. However, this potential is often stalled by data-related challenges, such as the imbalanced nature of the domain's data, where critical outcomes tend to be inherently rare. To address this challenge, we propose a novel oversampling approach, the counterfactual synthetic minority oversampling technique (Counterfactual SMOTE), which combines SMOTE with a counterfactual generation framework. Our method intrinsically performs an oversampling process near the decision boundary within a safe region of space, allowing for the generation of informative but non-noisy minority samples. To validate the proposed framework, a rigorous experimental procedure was conducted across a set of highly imbalanced binary classification challenges in healthcare. The results demonstrate the superiority of the proposed method over several of the most commonly used oversampling alternatives presented in the literature. Notably, Counterfactual SMOTE was the only method to present a convincingly superior performance when compared with the original SMOTE. Although the proposed method was specifically validated in the healthcare domain, owing to its relevance and frequently imbalanced nature, we expect the findings of this study to be generalizable to any imbalanced scenario.
机器学习在医疗保健领域的应用在广泛的场景中具有开创性的潜力。然而,这种潜力经常被与数据相关的挑战所阻碍,例如领域数据的不平衡性质,其中关键结果往往天生罕见。为了解决这一挑战,我们提出了一种新的过采样方法,即反事实合成少数过采样技术(反事实SMOTE),它将SMOTE与反事实生成框架相结合。我们的方法本质上是在安全空间区域内的决策边界附近执行过采样过程,允许生成信息丰富但无噪声的少数样本。为了验证所提出的框架,在医疗保健领域的一组高度不平衡的二元分类挑战中进行了严格的实验程序。结果表明,所提出的方法优于文献中提出的几种最常用的过采样方法。值得注意的是,与原始SMOTE相比,反事实SMOTE是唯一一种具有令人信服的优越性能的方法。尽管所提出的方法在医疗保健领域得到了具体验证,但由于其相关性和经常不平衡的性质,我们希望本研究的结果可以推广到任何不平衡的情况。
{"title":"Counterfactual synthetic minority oversampling technique: solving healthcare's imbalanced learning challenge","authors":"Goncalo Almeida,&nbsp;Fernando Bacao","doi":"10.1016/j.dsm.2025.01.006","DOIUrl":"10.1016/j.dsm.2025.01.006","url":null,"abstract":"<div><div>The application of machine learning in the healthcare domain has groundbreaking potential across a wide range of scenarios. However, this potential is often stalled by data-related challenges, such as the imbalanced nature of the domain's data, where critical outcomes tend to be inherently rare. To address this challenge, we propose a novel oversampling approach, the counterfactual synthetic minority oversampling technique (Counterfactual SMOTE), which combines SMOTE with a counterfactual generation framework. Our method intrinsically performs an oversampling process near the decision boundary within a safe region of space, allowing for the generation of informative but non-noisy minority samples. To validate the proposed framework, a rigorous experimental procedure was conducted across a set of highly imbalanced binary classification challenges in healthcare. The results demonstrate the superiority of the proposed method over several of the most commonly used oversampling alternatives presented in the literature. Notably, Counterfactual SMOTE was the only method to present a convincingly superior performance when compared with the original SMOTE. Although the proposed method was specifically validated in the healthcare domain, owing to its relevance and frequently imbalanced nature, we expect the findings of this study to be generalizable to any imbalanced scenario.</div></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"8 4","pages":"Pages 436-446"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepSeek: implications for data science and management in the AI era DeepSeek:人工智能时代对数据科学和管理的影响
Pub Date : 2025-12-01 DOI: 10.1016/j.dsm.2025.09.001
Zongben Xu , Shan Liu , Wei Huang , Junmin Shi , Fengmin Xu , Xin Tang , Haibing Lu
DeepSeek has emerged as a disruptive force in artificial intelligence. Unlike traditional large language models (LLMs), which demand extensive computational resources, DeepSeek delivers comparable performance to industry-leading models at a fraction of the cost. Its impact spans both data science and business. In data science, DeepSeek enhances data processing, feature engineering, and statistical modeling, enabling enterprises to deploy AI-driven analytics more cost-effectively. In business, DeepSeek supports various industry sectors, making advanced AI more accessible to small and medium-sized enterprises (SMEs). Additionally, its source-available licensing model promotes transparency and adaptability, challenging the dominance of proprietary AI ecosystems. This paper explores the implications of DeepSeek, reviews its technological advancements and challenges, and outlines future research directions.
DeepSeek已成为人工智能领域的一股颠覆性力量。与需要大量计算资源的传统大型语言模型(llm)不同,DeepSeek以极低的成本提供了与行业领先模型相当的性能。它的影响涵盖了数据科学和商业。在数据科学领域,DeepSeek增强了数据处理、特征工程和统计建模,使企业能够更经济有效地部署人工智能驱动的分析。在商业领域,深seek支持各个行业,使先进的人工智能更容易被中小企业(SMEs)使用。此外,其源代码可用许可模式提高了透明度和适应性,挑战了专有人工智能生态系统的主导地位。本文探讨了深度搜索的意义,回顾了其技术进步和挑战,并概述了未来的研究方向。
{"title":"DeepSeek: implications for data science and management in the AI era","authors":"Zongben Xu ,&nbsp;Shan Liu ,&nbsp;Wei Huang ,&nbsp;Junmin Shi ,&nbsp;Fengmin Xu ,&nbsp;Xin Tang ,&nbsp;Haibing Lu","doi":"10.1016/j.dsm.2025.09.001","DOIUrl":"10.1016/j.dsm.2025.09.001","url":null,"abstract":"<div><div>DeepSeek has emerged as a disruptive force in artificial intelligence. Unlike traditional large language models (LLMs), which demand extensive computational resources, DeepSeek delivers comparable performance to industry-leading models at a fraction of the cost. Its impact spans both data science and business. In data science, DeepSeek enhances data processing, feature engineering, and statistical modeling, enabling enterprises to deploy AI-driven analytics more cost-effectively. In business, DeepSeek supports various industry sectors, making advanced AI more accessible to small and medium-sized enterprises (SMEs). Additionally, its source-available licensing model promotes transparency and adaptability, challenging the dominance of proprietary AI ecosystems. This paper explores the implications of DeepSeek, reviews its technological advancements and challenges, and outlines future research directions.</div></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"8 4","pages":"Pages 536-541"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unlocking the power of machine learning in big data: a scoping survey 解锁大数据中机器学习的力量:一项范围调查
Pub Date : 2025-12-01 DOI: 10.1016/j.dsm.2025.02.004
Fadil Mohammed Surur , Abiy Abinet Mamo , Bealu Girma Gebresilassie , Kidus Abebe Mekonen , Abenezer Golda , Rajat Kumar Behera , Kumod Kumar
Machine learning (ML) plays a crucial role in big data (BD) by serving as the cornerstone of efficient data processing and analysis. In particular, ML provides BD with the ability to extract valuable insights from the large data sets. Therefore, this study conducted a scoping survey to define the role of ML in BD by exploring its history and evolution. Subsequently, a framework of ML in BD is proposed, emphasizing its practical applications in addressing the challenges presented by the volume, velocity, variety, and veracity of data. Moreover, BD analytics is described, showcasing how ML paradigms contribute to decision-making. This is followed by an illustration of real-world applications across diverse industries regarding the transformative impact of ML on BD. The survey findings highlight the integration of ML in BD considering volume, variety, velocity, and reliability. Scalable storage solutions, advanced computational architectures, and distributed ML are the avenues for shaping the ML landscape. Similarly, embedding intelligence in preprocessing and interoperable models is an avenue considering veracity. Real-time processing frameworks, temporally aware ML, and edge-computing integration are avenues considering velocity. Automated data quality assurance, explainable artificial intelligence and transparency, and blockchain technology for data provenance are avenues in veracity.
机器学习(ML)作为高效数据处理和分析的基石,在大数据(BD)中发挥着至关重要的作用。特别是,ML为业务开发提供了从大型数据集中提取有价值的见解的能力。因此,本研究通过探究ML的历史和演变,进行了范围调查,以确定ML在BD中的作用。在此基础上,提出了一种基于数据集的机器学习框架,强调了其在解决数据量、速度、种类和准确性方面的实际应用。此外,还描述了BD分析,展示了机器学习范式如何有助于决策。接下来是关于机器学习对业务流程的变革性影响的不同行业的实际应用的说明。调查结果强调了机器学习在业务流程中的集成,考虑了数量、种类、速度和可靠性。可扩展的存储解决方案、先进的计算架构和分布式机器学习是塑造机器学习景观的途径。同样,在预处理和互操作模型中嵌入智能是考虑准确性的一种途径。实时处理框架、时间感知ML和边缘计算集成是考虑速度的途径。自动化数据质量保证、可解释的人工智能和透明度以及用于数据来源的区块链技术是准确性的途径。
{"title":"Unlocking the power of machine learning in big data: a scoping survey","authors":"Fadil Mohammed Surur ,&nbsp;Abiy Abinet Mamo ,&nbsp;Bealu Girma Gebresilassie ,&nbsp;Kidus Abebe Mekonen ,&nbsp;Abenezer Golda ,&nbsp;Rajat Kumar Behera ,&nbsp;Kumod Kumar","doi":"10.1016/j.dsm.2025.02.004","DOIUrl":"10.1016/j.dsm.2025.02.004","url":null,"abstract":"<div><div>Machine learning (ML) plays a crucial role in big data (BD) by serving as the cornerstone of efficient data processing and analysis. In particular, ML provides BD with the ability to extract valuable insights from the large data sets. Therefore, this study conducted a scoping survey to define the role of ML in BD by exploring its history and evolution. Subsequently, a framework of ML in BD is proposed, emphasizing its practical applications in addressing the challenges presented by the volume, velocity, variety, and veracity of data. Moreover, BD analytics is described, showcasing how ML paradigms contribute to decision-making. This is followed by an illustration of real-world applications across diverse industries regarding the transformative impact of ML on BD. The survey findings highlight the integration of ML in BD considering volume, variety, velocity, and reliability. Scalable storage solutions, advanced computational architectures, and distributed ML are the avenues for shaping the ML landscape. Similarly, embedding intelligence in preprocessing and interoperable models is an avenue considering veracity. Real-time processing frameworks, temporally aware ML, and edge-computing integration are avenues considering velocity. Automated data quality assurance, explainable artificial intelligence and transparency, and blockchain technology for data provenance are avenues in veracity.</div></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"8 4","pages":"Pages 519-535"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Particle swarm optimization-enhanced machine learning and deep learning techniques for Internet of Things intrusion detection 粒子群优化增强机器学习和深度学习技术在物联网入侵检测中的应用
Pub Date : 2025-12-01 DOI: 10.1016/j.dsm.2025.02.005
Mourad Benmalek , Abdessamed Seddiki
The exponential escalation of cyber threats and attacks targeting Internet of Things (IoT) devices in this decade necessitates the development of effective intrusion detection methods. This paper presents an innovative anomaly-based intrusion detection system that leverages machine learning (ML) and deep learning (DL) models to secure IoT networks. This study utilizes the RT_IoT2022 dataset, a novel dataset that captures complex IoT attack scenarios. The study implements particle swarm optimization (PSO), a bio-inspired metaheuristic, for feature selection and optimization, successfully reducing computational overhead while enhancing model performance. Several models, including the support vector machine, k-nearest neighbors, categorical boosting (CatBoost), naïve Bayes, convolutional neural network, and long short-term memory, have been evaluated for their ability to classify normal and malicious attacks. Our findings underscore the crucial roles of ML and DL in safeguarding IoT networks and the importance of continuous model evaluation using real-world data. Experiments demonstrate that CatBoost combined with PSO outperforms state-of-the-art methods in the literature on the same dataset across all metrics.
在这十年中,针对物联网(IoT)设备的网络威胁和攻击呈指数级增长,需要开发有效的入侵检测方法。本文提出了一种创新的基于异常的入侵检测系统,该系统利用机器学习(ML)和深度学习(DL)模型来保护物联网网络。本研究利用RT_IoT2022数据集,这是一个捕获复杂物联网攻击场景的新数据集。该研究采用生物启发的元启发式粒子群算法(PSO)进行特征选择和优化,成功地减少了计算开销,同时提高了模型性能。一些模型,包括支持向量机,k近邻,分类增强(CatBoost), naïve贝叶斯,卷积神经网络和长短期记忆,已经评估了它们对正常和恶意攻击进行分类的能力。我们的研究结果强调了机器学习和深度学习在保护物联网网络中的关键作用,以及使用现实世界数据进行持续模型评估的重要性。实验表明,CatBoost与PSO相结合,在所有指标上都优于文献中最先进的数据集方法。
{"title":"Particle swarm optimization-enhanced machine learning and deep learning techniques for Internet of Things intrusion detection","authors":"Mourad Benmalek ,&nbsp;Abdessamed Seddiki","doi":"10.1016/j.dsm.2025.02.005","DOIUrl":"10.1016/j.dsm.2025.02.005","url":null,"abstract":"<div><div>The exponential escalation of cyber threats and attacks targeting Internet of Things (IoT) devices in this decade necessitates the development of effective intrusion detection methods. This paper presents an innovative anomaly-based intrusion detection system that leverages machine learning (ML) and deep learning (DL) models to secure IoT networks. This study utilizes the RT_IoT2022 dataset, a novel dataset that captures complex IoT attack scenarios. The study implements particle swarm optimization (PSO), a bio-inspired metaheuristic, for feature selection and optimization, successfully reducing computational overhead while enhancing model performance. Several models, including the support vector machine, k-nearest neighbors, categorical boosting (CatBoost), naïve Bayes, convolutional neural network, and long short-term memory, have been evaluated for their ability to classify normal and malicious attacks. Our findings underscore the crucial roles of ML and DL in safeguarding IoT networks and the importance of continuous model evaluation using real-world data. Experiments demonstrate that CatBoost combined with PSO outperforms state-of-the-art methods in the literature on the same dataset across all metrics.</div></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"8 4","pages":"Pages 423-435"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ESG performance and executive compensation levels: an empirical study ESG绩效与高管薪酬水平的实证研究
Pub Date : 2025-12-01 DOI: 10.1016/j.dsm.2025.06.002
Cunbo Yang , Jindong Wang , Haoxiang Huang , Ying Liu
As global environmental concerns grow, corporate environmental, social, and governance (ESG) performance has become an essential indicator for measuring organizational value and sustainability. Considering the inherent complexity of ESG measurement, this study focuses on ESG ratings. Leveraging a decade-long dataset (2012–2021) from the A-share listed firms in China, this study empirically examines the impact of ESG on executive compensation (EC) and the mechanism by which firms’ financial performance plays a role between the two. By employing robust empirical methodologies, we elucidate how financial performance mediates this relationship and offer novel insights into the strategic alignment of sustainability initiatives with organizational reward systems. We find that ESG performance significantly affects EC levels. Specifically, ESG performance can improve EC levels by improving corporate financial performance. Heterogeneity analysis shows that executive salary levels are more affected by ESG performance in businesses with high pollution levels and those where the general manager and board chairperson do not hold concurrent positions. The empirical results demonstrate that integrating sustainability criteria into executive incentive structures and prioritizing robust ESG practices can generate measurable improvements in financial outcomes. Critically, organizational strategies must account for sector-specific environmental impact thresholds and intrinsic variations in corporate governance architecture. This study broadens ESG performance research, provides valuable guidance for establishing corporate ESG compensation, and motivates executives to actively improve corporate ESG performance and thereby achieve sustainable growth.
随着全球对环境问题的关注日益增加,企业的环境、社会和治理(ESG)绩效已成为衡量组织价值和可持续性的重要指标。考虑到ESG测量的固有复杂性,本研究的重点是ESG评级。本研究利用中国a股上市公司长达十年的数据集(2012-2021),实证检验了ESG对高管薪酬(EC)的影响,以及公司财务绩效在两者之间的作用机制。通过采用稳健的实证方法,我们阐明了财务绩效如何中介这种关系,并为可持续发展举措与组织奖励制度的战略一致性提供了新的见解。我们发现ESG绩效显著影响EC水平。具体而言,ESG绩效可以通过改善企业财务绩效来提高EC水平。异质性分析表明,在污染程度高的企业和总经理和董事长不兼任的企业中,高管薪酬水平受ESG绩效的影响更大。实证结果表明,将可持续性标准整合到高管激励结构中,并优先考虑稳健的ESG实践,可以在财务结果方面产生可衡量的改善。至关重要的是,组织战略必须考虑到特定部门的环境影响阈值和公司治理架构的内在变化。本研究拓宽了ESG绩效研究,为企业ESG薪酬的建立提供了有价值的指导,激励高管积极提升企业ESG绩效,从而实现可持续增长。
{"title":"ESG performance and executive compensation levels: an empirical study","authors":"Cunbo Yang ,&nbsp;Jindong Wang ,&nbsp;Haoxiang Huang ,&nbsp;Ying Liu","doi":"10.1016/j.dsm.2025.06.002","DOIUrl":"10.1016/j.dsm.2025.06.002","url":null,"abstract":"<div><div>As global environmental concerns grow, corporate environmental, social, and governance (ESG) performance has become an essential indicator for measuring organizational value and sustainability. Considering the inherent complexity of ESG measurement, this study focuses on ESG ratings. Leveraging a decade-long dataset (2012–2021) from the A-share listed firms in China, this study empirically examines the impact of ESG on executive compensation (EC) and the mechanism by which firms’ financial performance plays a role between the two. By employing robust empirical methodologies, we elucidate how financial performance mediates this relationship and offer novel insights into the strategic alignment of sustainability initiatives with organizational reward systems. We find that ESG performance significantly affects EC levels. Specifically, ESG performance can improve EC levels by improving corporate financial performance. Heterogeneity analysis shows that executive salary levels are more affected by ESG performance in businesses with high pollution levels and those where the general manager and board chairperson do not hold concurrent positions. The empirical results demonstrate that integrating sustainability criteria into executive incentive structures and prioritizing robust ESG practices can generate measurable improvements in financial outcomes. Critically, organizational strategies must account for sector-specific environmental impact thresholds and intrinsic variations in corporate governance architecture. This study broadens ESG performance research, provides valuable guidance for establishing corporate ESG compensation, and motivates executives to actively improve corporate ESG performance and thereby achieve sustainable growth.</div></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"8 4","pages":"Pages 403-413"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing imbalanced text classification: an overlap-based refinement approach 增强不平衡文本分类:基于重叠的细化方法
Pub Date : 2025-12-01 DOI: 10.1016/j.dsm.2025.03.001
Sihem Nouas, Lamia Oukid, Fatima Boumahdi
The inherent class imbalance within textual data poses a significant challenge for machine learning-based techniques, as the available data often fails to adequately represent all classes. This scarcity of instances can make it even more challenging when there are overlapping regions within different classes. To address these limitations, this study introduces a refinement model for textual data classification with imbalanced datasets. The proposed approach, refined classification using overlap data with bagging and genetic algorithms (ReCO-BGA), aims to refine the classification predictions by creating a two-tier classification process. First, a bagging model is employed, incorporating three distinct classes: majority, minority, and an additional extracted class specifically for overlapping instances. Second, we propose to rectify the predicted overlap instances using a genetic-based oversampling technique. To evaluate the performance of ReCO-BGA, we conducted several experiments, focusing on two practical use cases: hate speech detection and sentiment analysis. The results demonstrated the effectiveness of the proposed method and showed that it outperforms state-of-the-art methods.
文本数据中固有的类不平衡对基于机器学习的技术提出了重大挑战,因为可用的数据往往不能充分代表所有的类。当不同的类中存在重叠区域时,实例的稀缺性会使其更具挑战性。为了解决这些限制,本研究引入了一种针对不平衡数据集的文本数据分类的改进模型。提出的方法是利用重叠数据与bagging和遗传算法(ReCO-BGA)进行精细分类,旨在通过创建两层分类过程来精细分类预测。首先,采用bagging模型,其中包含三个不同的类:majority、minority和一个专门为重叠实例提取的附加类。其次,我们建议使用基于遗传的过采样技术来校正预测的重叠实例。为了评估ReCO-BGA的性能,我们进行了几个实验,重点关注两个实际用例:仇恨言论检测和情感分析。结果证明了所提出方法的有效性,并表明它优于最先进的方法。
{"title":"Enhancing imbalanced text classification: an overlap-based refinement approach","authors":"Sihem Nouas,&nbsp;Lamia Oukid,&nbsp;Fatima Boumahdi","doi":"10.1016/j.dsm.2025.03.001","DOIUrl":"10.1016/j.dsm.2025.03.001","url":null,"abstract":"<div><div>The inherent class imbalance within textual data poses a significant challenge for machine learning-based techniques, as the available data often fails to adequately represent all classes. This scarcity of instances can make it even more challenging when there are overlapping regions within different classes. To address these limitations, this study introduces a refinement model for textual data classification with imbalanced datasets. The proposed approach, refined classification using overlap data with bagging and genetic algorithms (ReCO-BGA), aims to refine the classification predictions by creating a two-tier classification process. First, a bagging model is employed, incorporating three distinct classes: majority, minority, and an additional extracted class specifically for overlapping instances. Second, we propose to rectify the predicted overlap instances using a genetic-based oversampling technique. To evaluate the performance of ReCO-BGA, we conducted several experiments, focusing on two practical use cases: hate speech detection and sentiment analysis. The results demonstrated the effectiveness of the proposed method and showed that it outperforms state-of-the-art methods.</div></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"8 4","pages":"Pages 474-484"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data Science and Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1