首页 > 最新文献

Big Data Research最新文献

英文 中文
Deep neural network modeling for financial time series analysis 金融时间序列分析的深度神经网络建模
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-28 Epub Date: 2025-06-09 DOI: 10.1016/j.bdr.2025.100553
Zheng Fang , Toby Cai
Modeling stock returns has often relied on multivariate time series analysis, and constructing an accurate model remains a challenging goal for both market investors and academic researchers. Stock return prediction typically involves multiple variables and a combination of long-term and short-term time series patterns. In this paper, we propose a new deep learning network, named DLS-TS-Net, to model stock returns and address this challenge. We apply DLS-TS-Net in multivariate time series forecasting. The network integrates a Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) units, and Gated Recurrent Units (GRUs). DLS-TS-Net overcomes LSTM's insensitivity to linear components in stock market forecasting by incorporating a traditional autoregressive model. Experimental results demonstrate that DLS-TS-Net excels at capturing long-term trends in multivariate factors and short-term fluctuations in the stock market, outperforming traditional time series and machine learning models. Additionally, when combined with the investment strategies proposed in this paper, DLS-TS-Net shows superior performance in managing risk during extreme events
股票收益模型通常依赖于多变量时间序列分析,构建一个准确的模型对市场投资者和学术研究人员来说都是一个具有挑战性的目标。股票收益预测通常涉及多个变量以及长期和短期时间序列模式的组合。在本文中,我们提出了一个新的深度学习网络,命名为DLS-TS-Net,来模拟股票收益并解决这一挑战。我们将DLS-TS-Net应用于多元时间序列预测。该网络集成了卷积神经网络(CNN)、长短期记忆(LSTM)单元和门控循环单元(gru)。DLS-TS-Net通过引入传统的自回归模型,克服了LSTM在股市预测中对线性分量不敏感的缺点。实验结果表明,DLS-TS-Net在捕捉多变量因素的长期趋势和股票市场的短期波动方面表现出色,优于传统的时间序列和机器学习模型。此外,当与本文提出的投资策略相结合时,DLS-TS-Net在极端事件中的风险管理方面表现出卓越的性能
{"title":"Deep neural network modeling for financial time series analysis","authors":"Zheng Fang ,&nbsp;Toby Cai","doi":"10.1016/j.bdr.2025.100553","DOIUrl":"10.1016/j.bdr.2025.100553","url":null,"abstract":"<div><div>Modeling stock returns has often relied on multivariate time series analysis, and constructing an accurate model remains a challenging goal for both market investors and academic researchers. Stock return prediction typically involves multiple variables and a combination of long-term and short-term time series patterns. In this paper, we propose a new deep learning network, named DLS-TS-Net, to model stock returns and address this challenge. We apply DLS-TS-Net in multivariate time series forecasting. The network integrates a Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) units, and Gated Recurrent Units (GRUs). DLS-TS-Net overcomes LSTM's insensitivity to linear components in stock market forecasting by incorporating a traditional autoregressive model. Experimental results demonstrate that DLS-TS-Net excels at capturing long-term trends in multivariate factors and short-term fluctuations in the stock market, outperforming traditional time series and machine learning models. Additionally, when combined with the investment strategies proposed in this paper, DLS-TS-Net shows superior performance in managing risk during extreme events</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100553"},"PeriodicalIF":3.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144263987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the impact of high schools, socioeconomic factors, and degree programs on higher education success in Italy 探讨意大利高中、社会经济因素和学位课程对高等教育成功的影响
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-28 Epub Date: 2025-05-15 DOI: 10.1016/j.bdr.2025.100539
Cristian Usala, Isabella Sulis, Mariano Porcu
This study investigates the determinants of tertiary education success in Italy, focusing on students' outcomes between the first and second years. We use population data of students enrolled between 2015 and 2019, integrating information on high school environments and degree program characteristics. This rich dataset has been exploited with a two-step approach: the first step defines indicators for high school quality and degree program difficulty; the second estimates a multinomial logit to assess the determinants of students' probability of being classified as regulars, churners, at risk of dropout, and dropouts. Data regarding the 2019 cohort have been further investigated by exploiting the additional information on students' socioeconomic backgrounds and schools' self-assessed effectiveness evaluations. Results indicate that students' high school backgrounds, socioeconomic conditions, and post-graduation prospects in terms of net wages and occupation rates of graduates in the chosen degree program significantly influence academic success and students' academic persistence. Overall, the results offer a comprehensive view of the determinants of university success, with specific patterns observed across the different student categories.
本研究调查了意大利高等教育成功的决定因素,重点关注学生在第一年和第二年的成绩。我们使用了2015年至2019年入学学生的人口数据,整合了高中环境和学位课程特征的信息。这个丰富的数据集采用了两步方法:第一步定义了高中质量和学位课程难度的指标;第二种方法估计了一个多项式逻辑,以评估学生被分类为常客、流失者、有辍学风险和辍学者的概率的决定因素。通过利用学生社会经济背景和学校自我评估有效性评估的额外信息,对2019年队列的数据进行了进一步调查。结果表明,学生的高中背景、社会经济条件、毕业后的净工资前景和所选学位专业毕业生的职业率显著影响学业成功和学生的学业坚持。总的来说,研究结果提供了对大学成功决定因素的全面看法,并在不同的学生类别中观察到特定的模式。
{"title":"Exploring the impact of high schools, socioeconomic factors, and degree programs on higher education success in Italy","authors":"Cristian Usala,&nbsp;Isabella Sulis,&nbsp;Mariano Porcu","doi":"10.1016/j.bdr.2025.100539","DOIUrl":"10.1016/j.bdr.2025.100539","url":null,"abstract":"<div><div>This study investigates the determinants of tertiary education success in Italy, focusing on students' outcomes between the first and second years. We use population data of students enrolled between 2015 and 2019, integrating information on high school environments and degree program characteristics. This rich dataset has been exploited with a two-step approach: the first step defines indicators for high school quality and degree program difficulty; the second estimates a multinomial logit to assess the determinants of students' probability of being classified as regulars, churners, at risk of dropout, and dropouts. Data regarding the 2019 cohort have been further investigated by exploiting the additional information on students' socioeconomic backgrounds and schools' self-assessed effectiveness evaluations. Results indicate that students' high school backgrounds, socioeconomic conditions, and post-graduation prospects in terms of net wages and occupation rates of graduates in the chosen degree program significantly influence academic success and students' academic persistence. Overall, the results offer a comprehensive view of the determinants of university success, with specific patterns observed across the different student categories.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100539"},"PeriodicalIF":3.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144134568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mortality and risk of cardiovascular diseases by age at retirement in three Italian cohorts 意大利三个队列按退休年龄划分的心血管疾病死亡率和风险
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-28 Epub Date: 2025-05-23 DOI: 10.1016/j.bdr.2025.100543
Chiara Ardito , Roberto Leombruni , Giuseppe Costa , Angelo d’Errico
The relationship between age at retirement and subsequent physical health appears still contradictory in the literature, with more recent studies suggesting possible adverse health effects linked to employment at later ages. Aim of this study was to assess the long-term risk of overall mortality and incidence of cardiovascular diseases (CVDs) associated with age at retirement in three large Italian cohorts using both survey and administrative data.
The risk of mortality and CVDs associated with age at retirement, kept continuous, was assessed separately for gender using age-adjusted Cox models, further controlled for chronic morbidity, education, socioeconomic and previous working characteristics. In another analysis, age at retirement was examined treating it as a dichotomous variable, comparing, in a set of analyses with age at retirement from 52 to 65 years, the incidence of the health outcomes among subjects who retired after a certain age, compared to those who retired up to that age.
Higher age at retirement was associated with significantly higher mortality among men in the three cohorts, while among women the association was not significant, although in the same direction as for men. The risk of CVDs was also significantly associated with higher age at retirement in all the datasets among men, and in two of them among women. The set of the analyses on age at retirement dichotomized confirmed the results based on continuous age at retirement for both genders. Several robustness analyses, including IV Poisson instrumental variable, confirm the validity of results for men, whereas female results were less stable and robust.
Policy makers should be aware of the risk for public heath of policies that increase retirement age.
在文献中,退休年龄与随后的身体健康之间的关系似乎仍然是矛盾的,最近的研究表明,年龄较晚的就业可能对健康产生不利影响。本研究的目的是利用调查和管理数据评估意大利三个大型队列中与退休年龄相关的总死亡率和心血管疾病(cvd)发病率的长期风险。死亡率和心血管疾病风险与退休年龄相关,保持连续性,使用年龄调整的Cox模型分别评估性别,进一步控制慢性发病率、教育、社会经济和以前的工作特征。在另一项分析中,对退休年龄进行了检查,将其作为一个二分变量,在一组与退休年龄从52岁到65岁的分析中,比较了在某一年龄之后退休的受试者与在该年龄之前退休的受试者的健康结果发生率。在这三个队列中,退休年龄越高的男性死亡率越高,而在女性中,尽管与男性的方向相同,但这种关联并不显著。在所有的男性数据集中,心血管疾病的风险也与较高的退休年龄显著相关,其中两个是女性数据集。在连续退休年龄的基础上,对两种性别的退休年龄进行了二分类分析,证实了这一结果。包括泊松工具变量在内的几个稳健性分析证实了男性结果的有效性,而女性结果则不那么稳定和稳健。决策者应该意识到提高退休年龄的政策对公共健康的风险。
{"title":"Mortality and risk of cardiovascular diseases by age at retirement in three Italian cohorts","authors":"Chiara Ardito ,&nbsp;Roberto Leombruni ,&nbsp;Giuseppe Costa ,&nbsp;Angelo d’Errico","doi":"10.1016/j.bdr.2025.100543","DOIUrl":"10.1016/j.bdr.2025.100543","url":null,"abstract":"<div><div>The relationship between age at retirement and subsequent physical health appears still contradictory in the literature, with more recent studies suggesting possible adverse health effects linked to employment at later ages. Aim of this study was to assess the long-term risk of overall mortality and incidence of cardiovascular diseases (CVDs) associated with age at retirement in three large Italian cohorts using both survey and administrative data.</div><div>The risk of mortality and CVDs associated with age at retirement, kept continuous, was assessed separately for gender using age-adjusted Cox models, further controlled for chronic morbidity, education, socioeconomic and previous working characteristics. In another analysis, age at retirement was examined treating it as a dichotomous variable, comparing, in a set of analyses with age at retirement from 52 to 65 years, the incidence of the health outcomes among subjects who retired after a certain age, compared to those who retired up to that age.</div><div>Higher age at retirement was associated with significantly higher mortality among men in the three cohorts, while among women the association was not significant, although in the same direction as for men. The risk of CVDs was also significantly associated with higher age at retirement in all the datasets among men, and in two of them among women. The set of the analyses on age at retirement dichotomized confirmed the results based on continuous age at retirement for both genders. Several robustness analyses, including IV Poisson instrumental variable, confirm the validity of results for men, whereas female results were less stable and robust.</div><div>Policy makers should be aware of the risk for public heath of policies that increase retirement age.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100543"},"PeriodicalIF":3.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144205025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bipartite graph partitioning and spatial bootstrapping methods: A case study of innovative startups 二部图划分与空间自举方法:以创新创业公司为例
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-28 Epub Date: 2025-05-16 DOI: 10.1016/j.bdr.2025.100533
Alessio Bumbea , Andrea Mazzitelli , Giuseppe Espa , Alessandro Rinaldi
Innovative startups are the source of innovation and technological development; therefore, understanding their behavior can help better recognize the business organization's direction. This paper introduces a new method for clustering innovative startups using bipartite graph partitioning combined with spatial bootstrapping, improving clusters' accuracy and interpretability. Recent advancements in clustering techniques have introduced ensemble or consensus clustering methods, which aim to merge multiple clustering results into a superior outcome. A key challenge in this field is effectively integrating diverse clusters, and one promising solution involves utilizing graph formalism and partitioning strategies. By leveraging advanced graph partitioning techniques, we transform the task of partitioning the ensemble graph into a community detection problem. Our methodological approach improves the traditional method of bipartite graphs used in cluster ensembles by implementing the state of the art biLouvain algorithm. We also focused on techniques that could be used to increase the interpretability of the clusters themselves and how they can be used to obtain insightful information from the data. The proposed methodology was applied to a dataset of technologically advanced new businesses, located in the Lombardy region and recorded as innovative startups in the special section of the Italian Chambers of Commerce's Business Register.
创新型创业公司是创新和技术发展的源泉;因此,了解他们的行为有助于更好地认识企业组织的方向。本文提出了一种利用二部图划分和空间自举相结合的创新创业公司聚类方法,提高了聚类的准确率和可解释性。聚类技术的最新进展引入了集成或一致聚类方法,其目的是将多个聚类结果合并为一个更好的结果。该领域的一个关键挑战是有效地集成不同的集群,一个有前途的解决方案涉及利用图的形式化和划分策略。通过利用先进的图划分技术,我们将集成图的划分任务转化为社区检测问题。我们的方法通过实现最先进的biLouvain算法,改进了聚类集成中使用的传统二部图方法。我们还关注了可用于提高集群本身的可解释性的技术,以及如何使用它们从数据中获得有洞察力的信息。拟议的方法被应用于伦巴第地区技术先进的新企业数据集,这些企业被记录为意大利商会商业登记簿的特别部分中的创新初创企业。
{"title":"Bipartite graph partitioning and spatial bootstrapping methods: A case study of innovative startups","authors":"Alessio Bumbea ,&nbsp;Andrea Mazzitelli ,&nbsp;Giuseppe Espa ,&nbsp;Alessandro Rinaldi","doi":"10.1016/j.bdr.2025.100533","DOIUrl":"10.1016/j.bdr.2025.100533","url":null,"abstract":"<div><div>Innovative startups are the source of innovation and technological development; therefore, understanding their behavior can help better recognize the business organization's direction. This paper introduces a new method for clustering innovative startups using bipartite graph partitioning combined with spatial bootstrapping, improving clusters' accuracy and interpretability. Recent advancements in clustering techniques have introduced ensemble or consensus clustering methods, which aim to merge multiple clustering results into a superior outcome. A key challenge in this field is effectively integrating diverse clusters, and one promising solution involves utilizing graph formalism and partitioning strategies. By leveraging advanced graph partitioning techniques, we transform the task of partitioning the ensemble graph into a community detection problem. Our methodological approach improves the traditional method of bipartite graphs used in cluster ensembles by implementing the state of the art biLouvain algorithm. We also focused on techniques that could be used to increase the interpretability of the clusters themselves and how they can be used to obtain insightful information from the data. The proposed methodology was applied to a dataset of technologically advanced new businesses, located in the Lombardy region and recorded as innovative startups in the special section of the Italian Chambers of Commerce's Business Register.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100533"},"PeriodicalIF":3.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Business digitalization in Italy: A comprehensive analysis using supplementary fuzzy set approach 意大利商业数字化:利用补充模糊集方法的综合分析
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-28 Epub Date: 2025-05-15 DOI: 10.1016/j.bdr.2025.100538
Ilaria Benedetti, Federico Crescenzi, Tiziana Laureti, Niccolò Salvini
In an era where digital technologies such as AI, cloud computing and IoT are reshaping global business dynamics, the digital transformation of enterprises has become a pivotal factor for maintaining competitive advantage. This paper provides an in-depth analysis of the digitalization process among Italian firms, leveraging data from the ISTAT ICT survey. Using a fuzzy set approach, we develop a refined index to measure technological deprivation across multiple dimensions, providing a detailed understanding of how digitalization is adopted at the firm level. The results indicate a moderate level of technological development among firms. The dimension related to online sales emerges as the most underdeveloped, highlighting it as a critical area for improvement for Italian companies and underscoring the need for targeted policy interventions to bridge these digital gaps. Moreover, the analysis reveals significant disparities across sectors, geographic areas, and firm sizes, with smaller enterprises and those in certain regions exhibiting lower levels of digital adoption. Our study underscores the utility of the fuzzy set methodology for analyzing high-dimensional big data and provides actionable insights for enhancing digital adoption among firms in Italy.
在人工智能、云计算、物联网等数字技术重塑全球商业格局的时代,企业的数字化转型已成为保持竞争优势的关键因素。本文利用来自ISTAT ICT调查的数据,对意大利企业的数字化进程进行了深入分析。使用模糊集方法,我们开发了一个改进的指数来衡量多个维度的技术剥夺,提供了对企业层面如何采用数字化的详细理解。结果表明,各企业的技术发展水平处于中等水平。与在线销售相关的维度是最不发达的,这突出表明这是意大利公司需要改进的关键领域,并强调需要有针对性的政策干预来弥合这些数字差距。此外,分析还揭示了行业、地理区域和公司规模之间的显著差异,较小的企业和某些地区的企业表现出较低的数字采用水平。我们的研究强调了模糊集方法在分析高维大数据方面的效用,并为提高意大利企业的数字化应用提供了可行的见解。
{"title":"Business digitalization in Italy: A comprehensive analysis using supplementary fuzzy set approach","authors":"Ilaria Benedetti,&nbsp;Federico Crescenzi,&nbsp;Tiziana Laureti,&nbsp;Niccolò Salvini","doi":"10.1016/j.bdr.2025.100538","DOIUrl":"10.1016/j.bdr.2025.100538","url":null,"abstract":"<div><div>In an era where digital technologies such as AI, cloud computing and IoT are reshaping global business dynamics, the digital transformation of enterprises has become a pivotal factor for maintaining competitive advantage. This paper provides an in-depth analysis of the digitalization process among Italian firms, leveraging data from the ISTAT ICT survey. Using a fuzzy set approach, we develop a refined index to measure technological deprivation across multiple dimensions, providing a detailed understanding of how digitalization is adopted at the firm level. The results indicate a moderate level of technological development among firms. The dimension related to online sales emerges as the most underdeveloped, highlighting it as a critical area for improvement for Italian companies and underscoring the need for targeted policy interventions to bridge these digital gaps. Moreover, the analysis reveals significant disparities across sectors, geographic areas, and firm sizes, with smaller enterprises and those in certain regions exhibiting lower levels of digital adoption. Our study underscores the utility of the fuzzy set methodology for analyzing high-dimensional big data and provides actionable insights for enhancing digital adoption among firms in Italy.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100538"},"PeriodicalIF":3.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compression of big data collected in wind farm based on tensor train decomposition 基于张量列分解的风电场大数据压缩
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-28 Epub Date: 2025-08-20 DOI: 10.1016/j.bdr.2025.100554
Keren Li , Wenqiang Zhang , Dandan Xiao , Peng Hou , Shuai Yan , Yang Wang , Xuerui Mao
To address the storage challenges stemming from large volumes of heterogeneous data in wind farms, we propose a data compression technique based on tensor train decomposition (TTD). Initially, we establish a tensor-based processing model to standardize the heterogeneous data originating from wind farms, which includes both structured SCADA (supervisory control and data acquisition) data and unstructured video and picture data. Subsequently, we introduce a TTD-based method designed to compress the heterogeneous data generated in wind farms while preserving the inherent spatial eigenstructure of the data. Finally, we validate the efficacy of the proposed method in alleviating data storage challenges by utilizing authentic wind farm datasets. Comparative analysis reveals that the TTD-based method outperforms previously proposed compression techniques, specifically the canonical polyadic (CP) and Tucker methods.
为了解决风电场中大量异构数据带来的存储挑战,我们提出了一种基于张量列分解(TTD)的数据压缩技术。首先,我们建立了一个基于张量的处理模型来标准化来自风电场的异构数据,其中包括结构化SCADA(监控和数据采集)数据和非结构化视频和图像数据。随后,我们引入了一种基于ttd的方法,该方法旨在压缩风电场产生的异构数据,同时保留数据固有的空间特征结构。最后,我们利用真实的风电场数据集验证了所提出方法在缓解数据存储挑战方面的有效性。对比分析表明,基于ttd的方法优于先前提出的压缩技术,特别是规范多进(CP)和塔克方法。
{"title":"Compression of big data collected in wind farm based on tensor train decomposition","authors":"Keren Li ,&nbsp;Wenqiang Zhang ,&nbsp;Dandan Xiao ,&nbsp;Peng Hou ,&nbsp;Shuai Yan ,&nbsp;Yang Wang ,&nbsp;Xuerui Mao","doi":"10.1016/j.bdr.2025.100554","DOIUrl":"10.1016/j.bdr.2025.100554","url":null,"abstract":"<div><div>To address the storage challenges stemming from large volumes of heterogeneous data in wind farms, we propose a data compression technique based on tensor train decomposition (TTD). Initially, we establish a tensor-based processing model to standardize the heterogeneous data originating from wind farms, which includes both structured SCADA (supervisory control and data acquisition) data and unstructured video and picture data. Subsequently, we introduce a TTD-based method designed to compress the heterogeneous data generated in wind farms while preserving the inherent spatial eigenstructure of the data. Finally, we validate the efficacy of the proposed method in alleviating data storage challenges by utilizing authentic wind farm datasets. Comparative analysis reveals that the TTD-based method outperforms previously proposed compression techniques, specifically the canonical polyadic (CP) and Tucker methods.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100554"},"PeriodicalIF":4.2,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144886090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time-synchronized sentiment labeling via autonomous online comments data mining: A multimodal information fusion on large-scale multimedia data 基于自主在线评论数据挖掘的时间同步情感标记:大规模多媒体数据的多模态信息融合
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-28 Epub Date: 2025-06-08 DOI: 10.1016/j.bdr.2025.100552
Jiachen Ma , Nazmus Sakib , Fahim Islam Anik , Sheikh Iqbal Ahamed
While temporal sentiment labels prove invaluable for video tagging, segmentation, and labeling tasks in multimedia studies, large-scale manual annotation remains cost and time-prohibitive. Emerging Online Time-Sync Comment (TSC) datasets offer promising alternatives for generating sentiment maps. However, limitations in existing TSC scope and a lack of resource-constrained data creation guidelines hinder broader use. This study addresses these challenges by proposing a novel system for automated TSC generation utilizing recent YouTube comments as a readily accessible source of time-synchronized data. The efficacy of our multi-platform data mining system is evaluated through extensive long-term trials, leading to the development and analysis of two large-scale TSC datasets. Benchmarking against original temporal Automatic Speech Recognition (ASR) sentiment annotations validates the accuracy of our generated data. This work establishes a promising method for automatic TSC generation, laying the groundwork for further advancements in multimedia research and paving the way for novel sentiment analysis applications.
虽然时间情感标签在多媒体研究中的视频标记、分割和标记任务中被证明是无价的,但大规模的人工注释仍然是成本和时间上的限制。新兴的在线时间同步评论(TSC)数据集为生成情感地图提供了有希望的替代方案。然而,现有TSC范围的限制和缺乏资源有限的数据创建指南阻碍了更广泛的使用。本研究通过提出一种新的系统来解决这些挑战,该系统利用最近的YouTube评论作为易于访问的时间同步数据来源,自动生成TSC。我们的多平台数据挖掘系统的有效性通过广泛的长期试验进行评估,从而开发和分析了两个大型TSC数据集。对原始时间自动语音识别(ASR)情感注释进行基准测试验证了我们生成数据的准确性。这项工作建立了一个有前途的自动生成TSC的方法,为多媒体研究的进一步发展奠定了基础,并为新的情感分析应用铺平了道路。
{"title":"Time-synchronized sentiment labeling via autonomous online comments data mining: A multimodal information fusion on large-scale multimedia data","authors":"Jiachen Ma ,&nbsp;Nazmus Sakib ,&nbsp;Fahim Islam Anik ,&nbsp;Sheikh Iqbal Ahamed","doi":"10.1016/j.bdr.2025.100552","DOIUrl":"10.1016/j.bdr.2025.100552","url":null,"abstract":"<div><div>While temporal sentiment labels prove invaluable for video tagging, segmentation, and labeling tasks in multimedia studies, large-scale manual annotation remains cost and time-prohibitive. Emerging Online Time-Sync Comment (TSC) datasets offer promising alternatives for generating sentiment maps. However, limitations in existing TSC scope and a lack of resource-constrained data creation guidelines hinder broader use. This study addresses these challenges by proposing a novel system for automated TSC generation utilizing recent YouTube comments as a readily accessible source of time-synchronized data. The efficacy of our multi-platform data mining system is evaluated through extensive long-term trials, leading to the development and analysis of two large-scale TSC datasets. Benchmarking against original temporal Automatic Speech Recognition (ASR) sentiment annotations validates the accuracy of our generated data. This work establishes a promising method for automatic TSC generation, laying the groundwork for further advancements in multimedia research and paving the way for novel sentiment analysis applications.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100552"},"PeriodicalIF":3.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144307271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multimodal deep learning framework for constructing a market sentiment index from stock news 基于股票新闻构建市场情绪指数的多模态深度学习框架
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-28 Epub Date: 2025-05-17 DOI: 10.1016/j.bdr.2025.100535
Yunting Liu, Yirong Huang
Unimodal sentiment analysis often fails to capture the complexity of financial sentiment. This paper proposes a multimodal deep learning framework that integrates text, audio, and image data from CCTV news videos on TikTok to construct a multimodal sentiment indicator for the Chinese stock market. Empirical results show that multimodal fusion enhances sentiment analysis, with text outperforming audio and image modalities. The indicator correlates weakly with stock returns but significantly with market volatility, aligns with seasonal sentiment patterns, and reflects significant events like COVID-19. Additionally, weekly sentiment trends indicate the lowest sentiment on Thursdays and the highest on Fridays. This study advances financial sentiment analysis by demonstrating the efficacy of multimodal indicators in capturing market sentiment and informing volatility forecasts.
单模态情绪分析往往无法捕捉到金融情绪的复杂性。本文提出了一个多模态深度学习框架,该框架整合了TikTok上CCTV新闻视频的文本、音频和图像数据,构建了中国股市的多模态情绪指标。实证结果表明,多模态融合增强了情感分析,文本模式优于音频和图像模式。该指标与股票回报相关性较弱,但与市场波动性相关性显著,与季节性情绪模式一致,并反映了COVID-19等重大事件。此外,每周情绪趋势显示周四情绪最低,周五情绪最高。本研究通过展示多模态指标在捕捉市场情绪和为波动率预测提供信息方面的有效性,推进了金融情绪分析。
{"title":"A multimodal deep learning framework for constructing a market sentiment index from stock news","authors":"Yunting Liu,&nbsp;Yirong Huang","doi":"10.1016/j.bdr.2025.100535","DOIUrl":"10.1016/j.bdr.2025.100535","url":null,"abstract":"<div><div>Unimodal sentiment analysis often fails to capture the complexity of financial sentiment. This paper proposes a multimodal deep learning framework that integrates text, audio, and image data from CCTV news videos on TikTok to construct a multimodal sentiment indicator for the Chinese stock market. Empirical results show that multimodal fusion enhances sentiment analysis, with text outperforming audio and image modalities. The indicator correlates weakly with stock returns but significantly with market volatility, aligns with seasonal sentiment patterns, and reflects significant events like COVID-19. Additionally, weekly sentiment trends indicate the lowest sentiment on Thursdays and the highest on Fridays. This study advances financial sentiment analysis by demonstrating the efficacy of multimodal indicators in capturing market sentiment and informing volatility forecasts.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100535"},"PeriodicalIF":3.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144147766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ImDMI: Improved Distributed M-Invariance model to achieve privacy continuous big data publishing using Apache Spark ImDMI:改进的分布式m -不变性模型,使用Apache Spark实现隐私连续大数据发布
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-28 Epub Date: 2025-03-07 DOI: 10.1016/j.bdr.2025.100519
Salheddine Kabou , Laid Gasmi , Abdelbaset Kabou , Sidi Mohammed Benslimane
One of the critical challenges in the big data analytics is the individual's privacy issues. Data anonymization models including k-anonymity and l-diversity are used to guarantee the tradeoff between privacy and data utility while publishing the data. However, these models focus only on the single release of datasets and produce a certain level of privacy. In practical big data applications, data publishing is more complicated where the data is published continuously as new data is collected, and the privacy should be achieved for different releases. In this research, we propose a new distributed bottom up approach on Apache Spark for achievement of the m-invariance privacy model in the continuous big data context. The proposed approach, which is the first study that deals with dynamic big data publishing, is based on the insertion and the split process. In the first process, the data records collected from different workers are inserted into an improved bottom up R-tree generalization in order to minimizing the information loss. The second process concentrates on splitting the overflowed node with respect to the m-invariance model requirement by minimizing the overlap between the resulting partitions. The experimental results show significant improvement in term of data utility, execution time and counterfeit data records as compared to existing techniques in the literature.
大数据分析的关键挑战之一是个人隐私问题。数据匿名化模型包括k-匿名和l-多样性,以保证在发布数据时隐私和数据效用之间的权衡。然而,这些模型只关注数据集的单一发布,并产生一定程度的隐私。在实际的大数据应用中,数据发布更加复杂,随着新数据的收集,数据会不断发布,不同的发布需要做到隐私性。在本研究中,我们提出了一种新的基于Apache Spark的分布式自底向上方法来实现连续大数据环境下的m-不变性隐私模型。提出的方法是基于插入和分割过程的,这是第一个处理动态大数据发布的研究。在第一个过程中,从不同工人收集的数据记录被插入到改进的自下而上的r树泛化中,以最小化信息丢失。第二个过程侧重于通过最小化结果分区之间的重叠来根据m-不变性模型要求拆分溢出节点。实验结果表明,与现有的文献技术相比,该方法在数据效用、执行时间和伪造数据记录方面有了显著改善。
{"title":"ImDMI: Improved Distributed M-Invariance model to achieve privacy continuous big data publishing using Apache Spark","authors":"Salheddine Kabou ,&nbsp;Laid Gasmi ,&nbsp;Abdelbaset Kabou ,&nbsp;Sidi Mohammed Benslimane","doi":"10.1016/j.bdr.2025.100519","DOIUrl":"10.1016/j.bdr.2025.100519","url":null,"abstract":"<div><div>One of the critical challenges in the big data analytics is the individual's privacy issues. Data anonymization models including k-anonymity and l-diversity are used to guarantee the tradeoff between privacy and data utility while publishing the data. However, these models focus only on the single release of datasets and produce a certain level of privacy. In practical big data applications, data publishing is more complicated where the data is published continuously as new data is collected, and the privacy should be achieved for different releases. In this research, we propose a new distributed bottom up approach on Apache Spark for achievement of the m-invariance privacy model in the continuous big data context. The proposed approach, which is the first study that deals with dynamic big data publishing, is based on the insertion and the split process. In the first process, the data records collected from different workers are inserted into an improved bottom up R-tree generalization in order to minimizing the information loss. The second process concentrates on splitting the overflowed node with respect to the m-invariance model requirement by minimizing the overlap between the resulting partitions. The experimental results show significant improvement in term of data utility, execution time and counterfeit data records as compared to existing techniques in the literature.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100519"},"PeriodicalIF":3.5,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143609162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting option prices: From the Black-Scholes model to machine learning methods 预测期权价格:从布莱克-斯科尔斯模型到机器学习方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-28 Epub Date: 2025-02-26 DOI: 10.1016/j.bdr.2025.100518
Angela Maria D'Uggento, Marta Biancardi, Domenico Ciriello
In the ever-changing landscape of financial markets, accurate option pricing remains critical for investors, traders and financial institutions. Traditionally, the Black-Scholes (B&S) model has been the cornerstone for option pricing, providing a solid framework based on mathematical and physical principles. Nevertheless, the B&S model has some limitations, such as the restriction to European options, the absence of dividends, constant volatility, etc. Studies and academic literature on the application of machine learning models in the financial sector are rapidly increasing. The main objective of this paper is to provide a comprehensive comparative analysis between the traditional B&S model and the most commonly used machine learning algorithms such as Artificial Neural Networks (ANNs). The rationale is twofold. First, to examine the assumptions of the B&S model, such as constant volatility and a perfectly efficient market, in light of the complexity of the real world, even though it is recognized that the model has been known as a pillar for decades. Secondly, to emphasize that the proliferation of big data and advances in computing power have fuelled the rise of machine learning techniques in finance. These algorithms have remarkable capabilities in discovering non-linear patterns and extracting information from large data sets, providing a compelling alternative to traditional quantitative methods. Machine learning offers a new way to capture and model such complex financial dynamics, which can lead to more accurate pricing models. By comparing the B&S model and some machine learning approaches, this paper aims to shed light on their respective strengths, weaknesses and applicability in the context of options pricing using real data. Through rigorous empirical analyses and performance metrics, our results demonstrate the importance of using machine learning techniques that can outperform or complement the established B&S model in predicting option prices by achieving higher prediction accuracy.
在瞬息万变的金融市场中,准确的期权定价对投资者、交易员和金融机构来说仍然至关重要。传统上,Black-Scholes (B&;S)模型一直是期权定价的基石,它提供了一个基于数学和物理原理的坚实框架。然而,B&;S模型也有一些局限性,比如对欧洲期权的限制、没有股息、持续波动等。关于机器学习模型在金融领域应用的研究和学术文献正在迅速增加。本文的主要目的是对传统的B&;S模型和最常用的机器学习算法(如人工神经网络(ANNs))进行全面的比较分析。理由有二。首先,根据现实世界的复杂性来检验B&;S模型的假设,比如恒定的波动性和一个完全有效的市场,尽管人们认识到该模型几十年来一直被认为是一个支柱。其次,强调大数据的扩散和计算能力的进步推动了金融领域机器学习技术的兴起。这些算法在发现非线性模式和从大型数据集中提取信息方面具有卓越的能力,为传统的定量方法提供了令人信服的替代方案。机器学习提供了一种捕捉和建模这种复杂金融动态的新方法,可以产生更准确的定价模型。通过比较B&;S模型和一些机器学习方法,本文旨在利用真实数据揭示它们各自的优势、劣势和在期权定价背景下的适用性。通过严格的实证分析和绩效指标,我们的结果证明了使用机器学习技术的重要性,通过实现更高的预测精度,机器学习技术可以在预测期权价格方面优于或补充已建立的B&;S模型。
{"title":"Predicting option prices: From the Black-Scholes model to machine learning methods","authors":"Angela Maria D'Uggento,&nbsp;Marta Biancardi,&nbsp;Domenico Ciriello","doi":"10.1016/j.bdr.2025.100518","DOIUrl":"10.1016/j.bdr.2025.100518","url":null,"abstract":"<div><div>In the ever-changing landscape of financial markets, accurate option pricing remains critical for investors, traders and financial institutions. Traditionally, the Black-Scholes (B&amp;S) model has been the cornerstone for option pricing, providing a solid framework based on mathematical and physical principles. Nevertheless, the B&amp;S model has some limitations, such as the restriction to European options, the absence of dividends, constant volatility, etc. Studies and academic literature on the application of machine learning models in the financial sector are rapidly increasing. The main objective of this paper is to provide a comprehensive comparative analysis between the traditional B&amp;S model and the most commonly used machine learning algorithms such as Artificial Neural Networks (ANNs). The rationale is twofold. First, to examine the assumptions of the B&amp;S model, such as constant volatility and a perfectly efficient market, in light of the complexity of the real world, even though it is recognized that the model has been known as a pillar for decades. Secondly, to emphasize that the proliferation of big data and advances in computing power have fuelled the rise of machine learning techniques in finance. These algorithms have remarkable capabilities in discovering non-linear patterns and extracting information from large data sets, providing a compelling alternative to traditional quantitative methods. Machine learning offers a new way to capture and model such complex financial dynamics, which can lead to more accurate pricing models. By comparing the B&amp;S model and some machine learning approaches, this paper aims to shed light on their respective strengths, weaknesses and applicability in the context of options pricing using real data. Through rigorous empirical analyses and performance metrics, our results demonstrate the importance of using machine learning techniques that can outperform or complement the established B&amp;S model in predicting option prices by achieving higher prediction accuracy.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100518"},"PeriodicalIF":3.5,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1