首页 > 最新文献

Big Data Research最新文献

英文 中文
E-word of mouth in sales volume forecasting: Toyota Camry case study 电子口碑在销量预测中的应用:丰田凯美瑞案例研究
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-15 DOI: 10.1016/j.bdr.2025.100542
Domenica Fioredistella Iezzi , Roberto Monte
In recent years, electronic word of mouth has become a significant factor in purchasing decisions, with consumers' sentiments playing a crucial role in shaping the sales of products and services.
This paper introduces a novel approach to sales forecasting that addresses consumers' sentiments toward goods or services by combining the sales volume time series with a quantitative proxy of the unobservable true sentiment. Numerous studies have explored various methods to capture sentiment and accurately predict sales. We have integrated an estimated sentiment signal, variously built via lexicon-based, machine-learning, and deep-learning approaches, into a multivariate autoregressive state space (MARSS) model. We have tested our model on a dataset of 163,000 tweets about the Toyota Camry, covering the period from June 2009 to December 2022 and sales volumes in the US market over the same timeframe.
近年来,电子口碑已经成为影响购买决策的一个重要因素,消费者的情绪在影响产品和服务的销售方面起着至关重要的作用。本文介绍了一种新的销售预测方法,通过将销售量时间序列与不可观察的真实情绪的定量代理相结合,解决消费者对商品或服务的情绪。许多研究已经探索了各种方法来捕捉情绪并准确预测销售。我们通过基于词典、机器学习和深度学习的方法,将估计的情绪信号整合到一个多变量自回归状态空间(MARSS)模型中。我们在一个包含16.3万条关于丰田凯美瑞(Toyota Camry)的推文的数据集上测试了我们的模型,这些推文涵盖了2009年6月至2022年12月这段时间内丰田凯美瑞在美国市场的销量。
{"title":"E-word of mouth in sales volume forecasting: Toyota Camry case study","authors":"Domenica Fioredistella Iezzi ,&nbsp;Roberto Monte","doi":"10.1016/j.bdr.2025.100542","DOIUrl":"10.1016/j.bdr.2025.100542","url":null,"abstract":"<div><div>In recent years, electronic word of mouth has become a significant factor in purchasing decisions, with consumers' sentiments playing a crucial role in shaping the sales of products and services.</div><div>This paper introduces a novel approach to sales forecasting that addresses consumers' sentiments toward goods or services by combining the sales volume time series with a quantitative proxy of the unobservable true sentiment. Numerous studies have explored various methods to capture sentiment and accurately predict sales. We have integrated an estimated sentiment signal, variously built via lexicon-based, machine-learning, and deep-learning approaches, into a multivariate autoregressive state space (MARSS) model. We have tested our model on a dataset of 163,000 tweets about the Toyota Camry, covering the period from June 2009 to December 2022 and sales volumes in the US market over the same timeframe.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100542"},"PeriodicalIF":3.5,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the impact of high schools, socioeconomic factors, and degree programs on higher education success in Italy 探讨意大利高中、社会经济因素和学位课程对高等教育成功的影响
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-15 DOI: 10.1016/j.bdr.2025.100539
Cristian Usala, Isabella Sulis, Mariano Porcu
This study investigates the determinants of tertiary education success in Italy, focusing on students' outcomes between the first and second years. We use population data of students enrolled between 2015 and 2019, integrating information on high school environments and degree program characteristics. This rich dataset has been exploited with a two-step approach: the first step defines indicators for high school quality and degree program difficulty; the second estimates a multinomial logit to assess the determinants of students' probability of being classified as regulars, churners, at risk of dropout, and dropouts. Data regarding the 2019 cohort have been further investigated by exploiting the additional information on students' socioeconomic backgrounds and schools' self-assessed effectiveness evaluations. Results indicate that students' high school backgrounds, socioeconomic conditions, and post-graduation prospects in terms of net wages and occupation rates of graduates in the chosen degree program significantly influence academic success and students' academic persistence. Overall, the results offer a comprehensive view of the determinants of university success, with specific patterns observed across the different student categories.
本研究调查了意大利高等教育成功的决定因素,重点关注学生在第一年和第二年的成绩。我们使用了2015年至2019年入学学生的人口数据,整合了高中环境和学位课程特征的信息。这个丰富的数据集采用了两步方法:第一步定义了高中质量和学位课程难度的指标;第二种方法估计了一个多项式逻辑,以评估学生被分类为常客、流失者、有辍学风险和辍学者的概率的决定因素。通过利用学生社会经济背景和学校自我评估有效性评估的额外信息,对2019年队列的数据进行了进一步调查。结果表明,学生的高中背景、社会经济条件、毕业后的净工资前景和所选学位专业毕业生的职业率显著影响学业成功和学生的学业坚持。总的来说,研究结果提供了对大学成功决定因素的全面看法,并在不同的学生类别中观察到特定的模式。
{"title":"Exploring the impact of high schools, socioeconomic factors, and degree programs on higher education success in Italy","authors":"Cristian Usala,&nbsp;Isabella Sulis,&nbsp;Mariano Porcu","doi":"10.1016/j.bdr.2025.100539","DOIUrl":"10.1016/j.bdr.2025.100539","url":null,"abstract":"<div><div>This study investigates the determinants of tertiary education success in Italy, focusing on students' outcomes between the first and second years. We use population data of students enrolled between 2015 and 2019, integrating information on high school environments and degree program characteristics. This rich dataset has been exploited with a two-step approach: the first step defines indicators for high school quality and degree program difficulty; the second estimates a multinomial logit to assess the determinants of students' probability of being classified as regulars, churners, at risk of dropout, and dropouts. Data regarding the 2019 cohort have been further investigated by exploiting the additional information on students' socioeconomic backgrounds and schools' self-assessed effectiveness evaluations. Results indicate that students' high school backgrounds, socioeconomic conditions, and post-graduation prospects in terms of net wages and occupation rates of graduates in the chosen degree program significantly influence academic success and students' academic persistence. Overall, the results offer a comprehensive view of the determinants of university success, with specific patterns observed across the different student categories.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100539"},"PeriodicalIF":3.5,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144134568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Business digitalization in Italy: A comprehensive analysis using supplementary fuzzy set approach 意大利商业数字化:利用补充模糊集方法的综合分析
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-15 DOI: 10.1016/j.bdr.2025.100538
Ilaria Benedetti, Federico Crescenzi, Tiziana Laureti, Niccolò Salvini
In an era where digital technologies such as AI, cloud computing and IoT are reshaping global business dynamics, the digital transformation of enterprises has become a pivotal factor for maintaining competitive advantage. This paper provides an in-depth analysis of the digitalization process among Italian firms, leveraging data from the ISTAT ICT survey. Using a fuzzy set approach, we develop a refined index to measure technological deprivation across multiple dimensions, providing a detailed understanding of how digitalization is adopted at the firm level. The results indicate a moderate level of technological development among firms. The dimension related to online sales emerges as the most underdeveloped, highlighting it as a critical area for improvement for Italian companies and underscoring the need for targeted policy interventions to bridge these digital gaps. Moreover, the analysis reveals significant disparities across sectors, geographic areas, and firm sizes, with smaller enterprises and those in certain regions exhibiting lower levels of digital adoption. Our study underscores the utility of the fuzzy set methodology for analyzing high-dimensional big data and provides actionable insights for enhancing digital adoption among firms in Italy.
在人工智能、云计算、物联网等数字技术重塑全球商业格局的时代,企业的数字化转型已成为保持竞争优势的关键因素。本文利用来自ISTAT ICT调查的数据,对意大利企业的数字化进程进行了深入分析。使用模糊集方法,我们开发了一个改进的指数来衡量多个维度的技术剥夺,提供了对企业层面如何采用数字化的详细理解。结果表明,各企业的技术发展水平处于中等水平。与在线销售相关的维度是最不发达的,这突出表明这是意大利公司需要改进的关键领域,并强调需要有针对性的政策干预来弥合这些数字差距。此外,分析还揭示了行业、地理区域和公司规模之间的显著差异,较小的企业和某些地区的企业表现出较低的数字采用水平。我们的研究强调了模糊集方法在分析高维大数据方面的效用,并为提高意大利企业的数字化应用提供了可行的见解。
{"title":"Business digitalization in Italy: A comprehensive analysis using supplementary fuzzy set approach","authors":"Ilaria Benedetti,&nbsp;Federico Crescenzi,&nbsp;Tiziana Laureti,&nbsp;Niccolò Salvini","doi":"10.1016/j.bdr.2025.100538","DOIUrl":"10.1016/j.bdr.2025.100538","url":null,"abstract":"<div><div>In an era where digital technologies such as AI, cloud computing and IoT are reshaping global business dynamics, the digital transformation of enterprises has become a pivotal factor for maintaining competitive advantage. This paper provides an in-depth analysis of the digitalization process among Italian firms, leveraging data from the ISTAT ICT survey. Using a fuzzy set approach, we develop a refined index to measure technological deprivation across multiple dimensions, providing a detailed understanding of how digitalization is adopted at the firm level. The results indicate a moderate level of technological development among firms. The dimension related to online sales emerges as the most underdeveloped, highlighting it as a critical area for improvement for Italian companies and underscoring the need for targeted policy interventions to bridge these digital gaps. Moreover, the analysis reveals significant disparities across sectors, geographic areas, and firm sizes, with smaller enterprises and those in certain regions exhibiting lower levels of digital adoption. Our study underscores the utility of the fuzzy set methodology for analyzing high-dimensional big data and provides actionable insights for enhancing digital adoption among firms in Italy.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100538"},"PeriodicalIF":3.5,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big data analytics for smart home energy management system based on IOMT using AHP and WASPAS 基于AHP和WASPAS的IOMT智能家居能源管理系统大数据分析
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-10 DOI: 10.1016/j.bdr.2025.100534
Jingze Zhou , Salem Alkhalaf , S. Abdel-Khalek , Shah Nazir
The convergence of edge computing and 5G network speed provides an innovative way to address the energy efficiency and low latency requirements in medical data processing, especially from the perspective of the Internet of Medical Things (IoMT). Together, these technologies allow for the quick and effective handling of the enormous volumes of medical data produced by different IoMT devices in the context of smart healthcare systems. The IoMT is bringing cutting-edge technologies, social benefits, and economic advantages to transform modern healthcare systems entirely. Digital healthcare is transforming due to machine learning, which uses sophisticated algorithms to forecast patients’ health status efficiently. These approaches predict the onset of disease, hospital readmissions, and treatment customization by analyzing large medical datasets. Strong data security and good forecast accuracy are still issues. The quality and variety of training data are key factors in making accurate predictions, and strict encryption, safe storage, and regulatory compliance are necessary for data security. By including various significant components from existing research, the current study seeks to determine the most collective features. The goal of the study is to offer a systematic approach for assessing these features identified by using the approaches of the AHP and WASPAS. These approaches are effective for efficient big data analytics in the context of smart home energy management system based on IOMT.
边缘计算与5G网络速度的融合为解决医疗数据处理中的能效和低延迟需求提供了一种创新方式,特别是从医疗物联网(IoMT)的角度来看。总之,这些技术允许在智能医疗保健系统的背景下快速有效地处理不同IoMT设备产生的大量医疗数据。IoMT带来了尖端技术、社会效益和经济优势,彻底改变了现代医疗体系。由于机器学习,数字医疗正在发生变化,机器学习使用复杂的算法来有效地预测患者的健康状况。这些方法通过分析大型医疗数据集来预测疾病的发作、再入院和治疗定制。强大的数据安全性和良好的预测准确性仍然是问题。训练数据的质量和多样性是做出准确预测的关键因素,严格的加密、安全存储和法规遵从性是数据安全的必要条件。通过纳入现有研究的各种重要组成部分,目前的研究试图确定最集体的特征。本研究的目的是提供一种系统的方法来评估通过AHP和WASPAS方法确定的这些特征。这些方法对于基于IOMT的智能家居能源管理系统背景下的高效大数据分析是有效的。
{"title":"Big data analytics for smart home energy management system based on IOMT using AHP and WASPAS","authors":"Jingze Zhou ,&nbsp;Salem Alkhalaf ,&nbsp;S. Abdel-Khalek ,&nbsp;Shah Nazir","doi":"10.1016/j.bdr.2025.100534","DOIUrl":"10.1016/j.bdr.2025.100534","url":null,"abstract":"<div><div>The convergence of edge computing and 5G network speed provides an innovative way to address the energy efficiency and low latency requirements in medical data processing, especially from the perspective of the Internet of Medical Things (IoMT). Together, these technologies allow for the quick and effective handling of the enormous volumes of medical data produced by different IoMT devices in the context of smart healthcare systems. The IoMT is bringing cutting-edge technologies, social benefits, and economic advantages to transform modern healthcare systems entirely. Digital healthcare is transforming due to machine learning, which uses sophisticated algorithms to forecast patients’ health status efficiently. These approaches predict the onset of disease, hospital readmissions, and treatment customization by analyzing large medical datasets. Strong data security and good forecast accuracy are still issues. The quality and variety of training data are key factors in making accurate predictions, and strict encryption, safe storage, and regulatory compliance are necessary for data security. By including various significant components from existing research, the current study seeks to determine the most collective features. The goal of the study is to offer a systematic approach for assessing these features identified by using the approaches of the AHP and WASPAS. These approaches are effective for efficient big data analytics in the context of smart home energy management system based on IOMT.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100534"},"PeriodicalIF":3.5,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Saving food surplus and developing new business models: Exploring the potential of ‘Too Good To Go’ at territorial level using web-scraped data 节约粮食剩余和发展新的商业模式:利用网络数据在地区层面探索“太好而不能去”的潜力
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-10 DOI: 10.1016/j.bdr.2025.100536
Mengting Yu, Luca Secondi, Tiziana Laureti, Luigi Palumbo
Food surplus, fit for consumption, is often excluded from the consumption loop for commercial reasons, leading to wasted food, nutrients, resources, and costs. Digital innovations with diverse business models aim to combat this through food redistribution. However, it is critical to assess their effectiveness from stakeholder and consumer perspectives, meanwhile, new research focuses on the value of these business models.
This study employs web scraping technology to collect multi-dimensional data from two Italian cities on Too Good To Go. The analysis results confirm its positive contribution to food surplus redistribution with economic benefits, despite a weaker presence of certain food establishment types and a lack of social motivation among consumers. Furthermore, strong business-customer relationships can be established when businesses commit to reducing food waste and effectively communicate with their customers using the platform.
适合消费的剩余粮食往往因商业原因被排除在消费循环之外,导致粮食、营养、资源和成本的浪费。具有多种商业模式的数字创新旨在通过粮食再分配来解决这一问题。然而,从利益相关者和消费者的角度评估其有效性是至关重要的,同时,新的研究侧重于这些商业模式的价值。本研究采用网络抓取技术收集了意大利两个城市在Too Good to Go的多维数据。分析结果证实了它对粮食剩余再分配的积极贡献,并具有经济效益,尽管某些食品企业类型的存在较弱,消费者缺乏社会动机。此外,当企业承诺减少食物浪费并使用该平台与客户有效沟通时,可以建立牢固的企业与客户关系。
{"title":"Saving food surplus and developing new business models: Exploring the potential of ‘Too Good To Go’ at territorial level using web-scraped data","authors":"Mengting Yu,&nbsp;Luca Secondi,&nbsp;Tiziana Laureti,&nbsp;Luigi Palumbo","doi":"10.1016/j.bdr.2025.100536","DOIUrl":"10.1016/j.bdr.2025.100536","url":null,"abstract":"<div><div>Food surplus, fit for consumption, is often excluded from the consumption loop for commercial reasons, leading to wasted food, nutrients, resources, and costs. Digital innovations with diverse business models aim to combat this through food redistribution. However, it is critical to assess their effectiveness from stakeholder and consumer perspectives, meanwhile, new research focuses on the value of these business models.</div><div>This study employs web scraping technology to collect multi-dimensional data from two Italian cities on <em>Too Good To Go</em>. The analysis results confirm its positive contribution to food surplus redistribution with economic benefits, despite a weaker presence of certain food establishment types and a lack of social motivation among consumers. Furthermore, strong business-customer relationships can be established when businesses commit to reducing food waste and effectively communicate with their customers using the platform.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100536"},"PeriodicalIF":3.5,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel study of kernel graph regularized semi-non-negative matrix factorization with orthogonal subspace for clustering 基于正交子空间的核图正则化半非负矩阵分解聚类的新研究
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-22 DOI: 10.1016/j.bdr.2025.100531
Yasong Chen , Wen Li, Junjian Zhao
As a nonlinear extension of Non-negative Matrix Factorization (NMF), Kernel Non-negative Matrix Factorization (KNMF) has demonstrated greater effectiveness in revealing latent features from raw data. Building on this, this paper introduces kernel theory and effectively combines the advantages of semi-nonnegative constraints, graph regularization, and orthogonal subspace constraints to propose a novel model-Kernel Graph Regularized Semi-Negative Matrix Factorization with Orthogonal Subspaces and Auxiliary Variables (semi-KGNMFOSV). This model introduces auxiliary variables and reformulates the optimization problem, successfully overcoming the convergence proof challenges typically associated with orthogonal subspace-constrained methods. Furthermore, the model utilizes kernel methods to effectively capture complex nonlinear structures in the data. The semi-nonnegative constraint, along with orthogonal subspace constraints incorporating auxiliary variables, enhances optimization efficiency, while graph regularization preserves the local geometric structure of the data. We develop an efficient optimization algorithm to solve the proposed model and conduct extensive experiments on multiple real-world datasets. Additionally, we investigate the impact of three different initialization strategies on the performance of the proposed algorithm. Experimental results demonstrate that, compared to classical and state-of-the-art methods, the proposed model exhibits superior performance across all three initialization strategies.
作为非负矩阵分解(NMF)的非线性扩展,核非负矩阵分解(KNMF)在揭示原始数据的潜在特征方面表现出更大的有效性。在此基础上,引入核理论,有效地结合了半非负约束、图正则化和正交子空间约束的优点,提出了一种新的模型——具有正交子空间和辅助变量的核图正则化半负矩阵分解(semi-KGNMFOSV)。该模型引入辅助变量,并对优化问题进行了重新表述,成功地克服了正交子空间约束方法的收敛性证明问题。此外,该模型利用核方法有效捕获数据中的复杂非线性结构。半非负约束和包含辅助变量的正交子空间约束提高了优化效率,而图正则化保留了数据的局部几何结构。我们开发了一种有效的优化算法来解决所提出的模型,并在多个真实世界的数据集上进行了广泛的实验。此外,我们还研究了三种不同的初始化策略对所提出算法性能的影响。实验结果表明,与经典和最先进的方法相比,所提出的模型在所有三种初始化策略中都表现出优越的性能。
{"title":"A novel study of kernel graph regularized semi-non-negative matrix factorization with orthogonal subspace for clustering","authors":"Yasong Chen ,&nbsp;Wen Li,&nbsp;Junjian Zhao","doi":"10.1016/j.bdr.2025.100531","DOIUrl":"10.1016/j.bdr.2025.100531","url":null,"abstract":"<div><div>As a nonlinear extension of Non-negative Matrix Factorization (NMF), Kernel Non-negative Matrix Factorization (KNMF) has demonstrated greater effectiveness in revealing latent features from raw data. Building on this, this paper introduces kernel theory and effectively combines the advantages of semi-nonnegative constraints, graph regularization, and orthogonal subspace constraints to propose a novel model-Kernel Graph Regularized Semi-Negative Matrix Factorization with Orthogonal Subspaces and Auxiliary Variables (semi-KGNMFOSV). This model introduces auxiliary variables and reformulates the optimization problem, successfully overcoming the convergence proof challenges typically associated with orthogonal subspace-constrained methods. Furthermore, the model utilizes kernel methods to effectively capture complex nonlinear structures in the data. The semi-nonnegative constraint, along with orthogonal subspace constraints incorporating auxiliary variables, enhances optimization efficiency, while graph regularization preserves the local geometric structure of the data. We develop an efficient optimization algorithm to solve the proposed model and conduct extensive experiments on multiple real-world datasets. Additionally, we investigate the impact of three different initialization strategies on the performance of the proposed algorithm. Experimental results demonstrate that, compared to classical and state-of-the-art methods, the proposed model exhibits superior performance across all three initialization strategies.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100531"},"PeriodicalIF":3.5,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hourglass pattern matching for deep aware neural network text recommendation model 沙漏模式匹配的深度感知神经网络文本推荐模型
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-17 DOI: 10.1016/j.bdr.2025.100532
Li Gao, Hongjun Li, Qingkui Chen, Dunlu Peng
In recent years, with the rapid development of deep learning, big data mining, and natural language processing (NLP) technologies, the application of NLP in the field of recommendation systems has attracted significant attention. However, current text recommendation systems still face challenges in handling word distribution assumptions, preprocessing design, network inference models, and text perception technologies. Traditional RNN neural network layers often encounter issues such as gradient explosion or vanishing gradients, which hinder their ability to effectively handle long-term dependencies and reverse text inference among long texts. Therefore, this paper proposes a new type of depth-aware neural network recommendation model (Hourglass Deep-aware neural network Recommendation Model, HDARM), whose structure presents an hourglass shape. This model consists of three parts: The top of the hourglass uses Word Embedding for input through Fine-tune Bert to process text embeddings as word distribution assumptions, followed by utilizing bidirectional LSTM to integrate Transformer models for learning critical information. The middle of the hourglass retains key features of network outputs through CNN layers, which are combined with pooling layers to extract and enhance critical information from user text. The bottom of the hourglass avoids a decline in generalization performance through deep neural network layers. Finally, the model performs pattern matching between text vectors and word embeddings, recommending texts based on their relevance. In experiments, this model improved metrics like MSE and NDCG@10 by 8.74 % and 10.89 % respectively compared to the optimal baseline model.
近年来,随着深度学习、大数据挖掘和自然语言处理(NLP)技术的快速发展,自然语言处理在推荐系统领域的应用备受关注。然而,当前的文本推荐系统在处理词分布假设、预处理设计、网络推理模型和文本感知技术等方面仍然面临挑战。传统的RNN神经网络层经常遇到梯度爆炸或梯度消失等问题,阻碍了其有效处理长文本间的长期依赖关系和反向文本推理的能力。因此,本文提出了一种新型的深度感知神经网络推荐模型(沙漏深度感知神经网络推荐模型,HDARM),其结构呈沙漏形状。该模型由三部分组成:沙漏的顶部使用Word Embedding作为输入,通过微调Bert处理文本嵌入作为单词分布假设,然后使用双向LSTM集成Transformer模型以学习关键信息。沙漏的中间部分通过CNN层保留网络输出的关键特征,并结合池化层从用户文本中提取和增强关键信息。沙漏的底部通过深度神经网络层避免了泛化性能的下降。最后,该模型在文本向量和词嵌入之间进行模式匹配,根据它们的相关性推荐文本。在实验中,与最优基线模型相比,该模型将MSE和NDCG@10等指标分别提高了8.74%和10.89%。
{"title":"Hourglass pattern matching for deep aware neural network text recommendation model","authors":"Li Gao,&nbsp;Hongjun Li,&nbsp;Qingkui Chen,&nbsp;Dunlu Peng","doi":"10.1016/j.bdr.2025.100532","DOIUrl":"10.1016/j.bdr.2025.100532","url":null,"abstract":"<div><div>In recent years, with the rapid development of deep learning, big data mining, and natural language processing (NLP) technologies, the application of NLP in the field of recommendation systems has attracted significant attention. However, current text recommendation systems still face challenges in handling word distribution assumptions, preprocessing design, network inference models, and text perception technologies. Traditional RNN neural network layers often encounter issues such as gradient explosion or vanishing gradients, which hinder their ability to effectively handle long-term dependencies and reverse text inference among long texts. Therefore, this paper proposes a new type of depth-aware neural network recommendation model (Hourglass Deep-aware neural network Recommendation Model, HDARM), whose structure presents an hourglass shape. This model consists of three parts: The top of the hourglass uses Word Embedding for input through Fine-tune Bert to process text embeddings as word distribution assumptions, followed by utilizing bidirectional LSTM to integrate Transformer models for learning critical information. The middle of the hourglass retains key features of network outputs through CNN layers, which are combined with pooling layers to extract and enhance critical information from user text. The bottom of the hourglass avoids a decline in generalization performance through deep neural network layers. Finally, the model performs pattern matching between text vectors and word embeddings, recommending texts based on their relevance. In experiments, this model improved metrics like MSE and NDCG@10 by 8.74 % and 10.89 % respectively compared to the optimal baseline model.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100532"},"PeriodicalIF":3.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143923599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A decision tree algorithm based on adaptive entropy of feature value importance 基于特征值重要度自适应熵的决策树算法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-14 DOI: 10.1016/j.bdr.2025.100530
Shaobo Deng, Weili Yuan, Sujie Guan, Xing Lin, Zemin Liao, Min Li
Constructing an optimal decision tree remains a challenging task. Existing algorithms often utilize power coefficient methods or standardization techniques to weight the entropy value; however, these approaches do not sufficiently account for the importance of attributes. This paper introduces an Adaptive Entropy Decision Tree (EWDT) algorithm, which leverages eigenvalue importance and integrates singular value decomposition into the calculation of entropy values. Experimental results demonstrate that the proposed algorithm outperforms other decision tree algorithms in terms of accuracy, precision, recall, and F1-score.
构建最优决策树仍然是一项具有挑战性的任务。现有算法多采用功率系数法或标准化技术对熵值进行加权;然而,这些方法并没有充分考虑到属性的重要性。本文介绍了一种自适应熵决策树(EWDT)算法,该算法利用特征值重要度,将奇异值分解集成到熵值计算中。实验结果表明,该算法在准确率、精密度、召回率和f1分数方面都优于其他决策树算法。
{"title":"A decision tree algorithm based on adaptive entropy of feature value importance","authors":"Shaobo Deng,&nbsp;Weili Yuan,&nbsp;Sujie Guan,&nbsp;Xing Lin,&nbsp;Zemin Liao,&nbsp;Min Li","doi":"10.1016/j.bdr.2025.100530","DOIUrl":"10.1016/j.bdr.2025.100530","url":null,"abstract":"<div><div>Constructing an optimal decision tree remains a challenging task. Existing algorithms often utilize power coefficient methods or standardization techniques to weight the entropy value; however, these approaches do not sufficiently account for the importance of attributes. This paper introduces an Adaptive Entropy Decision Tree (EWDT) algorithm, which leverages eigenvalue importance and integrates singular value decomposition into the calculation of entropy values. Experimental results demonstrate that the proposed algorithm outperforms other decision tree algorithms in terms of accuracy, precision, recall, and F1-score.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100530"},"PeriodicalIF":3.5,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143899918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TE-PADN: A poisoning attack defense model based on temporal margin samples TE-PADN:基于时差采样的中毒攻击防御模型
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-09 DOI: 10.1016/j.bdr.2025.100528
Haitao He , Ke Liu , Lei Zhang , Ke Xu , Jiazheng Li , Jiadong Ren
With the development of network security research, intrusion detection systems based on deep learning show great potential in network attack detection. As crucial tools for ensuring network information security, these systems themselves are vulnerable to poisoning attacks from attackers. Currently, most poisoning attack defense methods cannot effectively utilize network traffic characteristics and are only effective for specific models, showing poor defense results for other models. Furthermore, detection of poisoning attacks is often overlooked, leading to a lack of timely and effective defense against such attacks. Therefore, we propose a data poisoning defense mechanism called TE-PADN. Firstly, we introduce a temporal margin sample generation algorithm that integrates an attention mechanism. Based on mapping the original data time series into a latent feature space, this algorithm learns the temporal characteristics of the data and focuses on information from different positions using the attention mechanism to generate temporal margin samples for repairing poisoned models. Secondly, we propose a multi-level poisoning attack detection method for real-time and accurate detection of undetected poisoning attacks. By employing ensemble learning methods, this approach enhances model robustness, repairs model classification boundaries that have shifted due to poisoning attacks and achieves efficient defense against poisoning attacks. Finally, experimental validation of our proposed method demonstrates promising results. Under a 10% attack intensity, the average accuracy of TE-PADN in recovering poisoning models increased by 6.5% on the NSL-KDD dataset, 5.3% on the UNSW-NB15 dataset, and 5.9% on the CICIDS2017 dataset.
随着网络安全研究的发展,基于深度学习的入侵检测系统在网络攻击检测方面展现出巨大潜力。作为保障网络信息安全的重要工具,这些系统本身也容易受到攻击者的中毒攻击。目前,大多数中毒攻击防御方法无法有效利用网络流量特征,只能对特定模型有效,对其他模型的防御效果不佳。此外,中毒攻击的检测往往被忽视,导致对此类攻击缺乏及时有效的防御。因此,我们提出了一种名为 TE-PADN 的数据中毒防御机制。首先,我们引入了一种整合了注意力机制的时差值样本生成算法。该算法在将原始数据时间序列映射到潜在特征空间的基础上,学习数据的时间特征,并利用注意力机制关注来自不同位置的信息,从而生成用于修复中毒模型的时间裕度样本。其次,我们提出了一种多层次中毒攻击检测方法,用于实时、准确地检测未发现的中毒攻击。通过采用集合学习方法,该方法增强了模型的鲁棒性,修复了因中毒攻击而发生偏移的模型分类边界,实现了对中毒攻击的高效防御。最后,我们提出的方法经过实验验证,取得了良好的效果。在 10% 的攻击强度下,TE-PADN 在 NSL-KDD 数据集上恢复中毒模型的平均准确率提高了 6.5%,在 UNSW-NB15 数据集上提高了 5.3%,在 CICIDS2017 数据集上提高了 5.9%。
{"title":"TE-PADN: A poisoning attack defense model based on temporal margin samples","authors":"Haitao He ,&nbsp;Ke Liu ,&nbsp;Lei Zhang ,&nbsp;Ke Xu ,&nbsp;Jiazheng Li ,&nbsp;Jiadong Ren","doi":"10.1016/j.bdr.2025.100528","DOIUrl":"10.1016/j.bdr.2025.100528","url":null,"abstract":"<div><div>With the development of network security research, intrusion detection systems based on deep learning show great potential in network attack detection. As crucial tools for ensuring network information security, these systems themselves are vulnerable to poisoning attacks from attackers. Currently, most poisoning attack defense methods cannot effectively utilize network traffic characteristics and are only effective for specific models, showing poor defense results for other models. Furthermore, detection of poisoning attacks is often overlooked, leading to a lack of timely and effective defense against such attacks. Therefore, we propose a data poisoning defense mechanism called TE-PADN. Firstly, we introduce a temporal margin sample generation algorithm that integrates an attention mechanism. Based on mapping the original data time series into a latent feature space, this algorithm learns the temporal characteristics of the data and focuses on information from different positions using the attention mechanism to generate temporal margin samples for repairing poisoned models. Secondly, we propose a multi-level poisoning attack detection method for real-time and accurate detection of undetected poisoning attacks. By employing ensemble learning methods, this approach enhances model robustness, repairs model classification boundaries that have shifted due to poisoning attacks and achieves efficient defense against poisoning attacks. Finally, experimental validation of our proposed method demonstrates promising results. Under a 10% attack intensity, the average accuracy of TE-PADN in recovering poisoning models increased by 6.5% on the NSL-KDD dataset, 5.3% on the UNSW-NB15 dataset, and 5.9% on the CICIDS2017 dataset.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100528"},"PeriodicalIF":3.5,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging artificial intelligence for pandemic management: Case of COVID-19 in the United States 利用人工智能进行流行病管理:以美国的COVID-19为例
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-08 DOI: 10.1016/j.bdr.2025.100529
Ehsan Ahmadi, Reza Maihami
The COVID-19 pandemic revealed significant limitations in traditional approaches to analyzing time-series data that use one-dimensional data such as historical infection rates. Such approaches do not capture the complex, multifactor influences on disease spread. This paper addresses these challenges by proposing a comprehensive methodology that integrates multiple data sources, including community mobility, census information, Google search trends, socioeconomic variables, vaccination coverage, and political data. In addition, this paper proposes a new cross-learning (CL) methodology that allows for the training of machine learning models on multiple related time series simultaneously, enabling more accurate and robust predictions. Applying the CL approach with four machine learning algorithms, we successfully forecasted confirmed COVID-19 cases 30 days in advance with greater accuracy than the traditional ARIMAX model and the newer Transformer deep learning technique. Our findings identified daily hospital admissions as a significant predictor at the state level and vaccination status at the national level. Random Forest with CL was very effective, performing best in 44 states, while ARIMAX outperformed in seven larger states. These findings highlight the importance of advanced predictive modeling in resource optimization and response strategy development for future health emergencies.
COVID-19大流行表明,使用历史感染率等一维数据分析时间序列数据的传统方法存在重大局限性。这种方法没有捕捉到对疾病传播的复杂的多因素影响。本文通过提出一种综合的方法来解决这些挑战,该方法集成了多个数据源,包括社区流动性、人口普查信息、谷歌搜索趋势、社会经济变量、疫苗接种覆盖率和政治数据。此外,本文提出了一种新的交叉学习(CL)方法,该方法允许同时在多个相关时间序列上训练机器学习模型,从而实现更准确和稳健的预测。采用CL方法和四种机器学习算法,我们成功地提前30天预测了新冠肺炎确诊病例,其准确性高于传统的ARIMAX模型和较新的Transformer深度学习技术。我们的研究结果确定每日住院率是州一级和国家一级疫苗接种状况的重要预测因子。带有CL的随机森林非常有效,在44个州表现最好,而ARIMAX在7个较大的州表现更好。这些发现突出了先进的预测建模在未来突发卫生事件资源优化和应对策略制定中的重要性。
{"title":"Leveraging artificial intelligence for pandemic management: Case of COVID-19 in the United States","authors":"Ehsan Ahmadi,&nbsp;Reza Maihami","doi":"10.1016/j.bdr.2025.100529","DOIUrl":"10.1016/j.bdr.2025.100529","url":null,"abstract":"<div><div>The COVID-19 pandemic revealed significant limitations in traditional approaches to analyzing time-series data that use one-dimensional data such as historical infection rates. Such approaches do not capture the complex, multifactor influences on disease spread. This paper addresses these challenges by proposing a comprehensive methodology that integrates multiple data sources, including community mobility, census information, Google search trends, socioeconomic variables, vaccination coverage, and political data. In addition, this paper proposes a new cross-learning (CL) methodology that allows for the training of machine learning models on multiple related time series simultaneously, enabling more accurate and robust predictions. Applying the CL approach with four machine learning algorithms, we successfully forecasted confirmed COVID-19 cases 30 days in advance with greater accuracy than the traditional ARIMAX model and the newer Transformer deep learning technique. Our findings identified daily hospital admissions as a significant predictor at the state level and vaccination status at the national level. Random Forest with CL was very effective, performing best in 44 states, while ARIMAX outperformed in seven larger states. These findings highlight the importance of advanced predictive modeling in resource optimization and response strategy development for future health emergencies.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100529"},"PeriodicalIF":3.5,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1