首页 > 最新文献

Big Data最新文献

英文 中文
Dual-Path Graph Neural Network with Adaptive Auxiliary Module for Link Prediction. 带自适应辅助模块的双路径图神经网络用于链路预测
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-03-25 DOI: 10.1089/big.2023.0130
Zhenzhen Yang, Zelong Lin, Yongpeng Yang, Jiaqi Li

Link prediction, which has important applications in many fields, predicts the possibility of the link between two nodes in a graph. Link prediction based on Graph Neural Network (GNN) obtains node representation and graph structure through GNN, which has attracted a growing amount of attention recently. However, the existing GNN-based link prediction approaches possess some shortcomings. On the one hand, because a graph contains different types of nodes, it leads to a great challenge for aggregating information and learning node representation from its neighbor nodes. On the other hand, the attention mechanism has been an effect instrument for enhancing the link prediction performance. However, the traditional attention mechanism is always monotonic for query nodes, which limits its influence on link prediction. To address these two problems, a Dual-Path Graph Neural Network (DPGNN) for link prediction is proposed in this study. First, we propose a novel Local Random Features Augmentation for Graph Convolution Network as a baseline of one path. Meanwhile, Graph Attention Network version 2 based on dynamic attention mechanism is adopted as a baseline of the other path. And then, we capture more meaningful node representation and more accurate link features by concatenating the information of these two paths. In addition, we propose an adaptive auxiliary module for better balancing the weight of auxiliary tasks, which brings more benefit to link prediction. Finally, extensive experiments verify the effectiveness and superiority of our proposed DPGNN for link prediction.

链接预测是指预测图中两个节点之间链接的可能性,在许多领域都有重要应用。基于图神经网络(GNN)的链接预测通过 GNN 获得节点表示和图结构,最近引起了越来越多的关注。然而,现有的基于 GNN 的链接预测方法存在一些缺陷。一方面,由于图中包含不同类型的节点,这给从相邻节点汇总信息和学习节点表示带来了巨大挑战。另一方面,注意力机制一直是提高链接预测性能的有效工具。然而,传统的注意力机制对于查询节点总是单调的,这限制了它对链接预测的影响。针对这两个问题,本研究提出了一种用于链接预测的双路径图神经网络(DPGNN)。首先,我们提出了一种新颖的局部随机特征增强图卷积网络(Local Random Features Augmentation for Graph Convolution Network),作为单路径的基线。同时,我们采用基于动态注意力机制的图注意力网络版本 2 作为另一条路径的基准。然后,我们通过串联这两条路径的信息来捕捉更有意义的节点表示和更准确的链接特征。此外,我们还提出了自适应辅助模块,以更好地平衡辅助任务的权重,从而为链接预测带来更多益处。最后,大量实验验证了我们提出的 DPGNN 在链接预测方面的有效性和优越性。
{"title":"Dual-Path Graph Neural Network with Adaptive Auxiliary Module for Link Prediction.","authors":"Zhenzhen Yang, Zelong Lin, Yongpeng Yang, Jiaqi Li","doi":"10.1089/big.2023.0130","DOIUrl":"https://doi.org/10.1089/big.2023.0130","url":null,"abstract":"<p><p>Link prediction, which has important applications in many fields, predicts the possibility of the link between two nodes in a graph. Link prediction based on Graph Neural Network (GNN) obtains node representation and graph structure through GNN, which has attracted a growing amount of attention recently. However, the existing GNN-based link prediction approaches possess some shortcomings. On the one hand, because a graph contains different types of nodes, it leads to a great challenge for aggregating information and learning node representation from its neighbor nodes. On the other hand, the attention mechanism has been an effect instrument for enhancing the link prediction performance. However, the traditional attention mechanism is always monotonic for query nodes, which limits its influence on link prediction. To address these two problems, a Dual-Path Graph Neural Network (DPGNN) for link prediction is proposed in this study. First, we propose a novel Local Random Features Augmentation for Graph Convolution Network as a baseline of one path. Meanwhile, Graph Attention Network version 2 based on dynamic attention mechanism is adopted as a baseline of the other path. And then, we capture more meaningful node representation and more accurate link features by concatenating the information of these two paths. In addition, we propose an adaptive auxiliary module for better balancing the weight of auxiliary tasks, which brings more benefit to link prediction. Finally, extensive experiments verify the effectiveness and superiority of our proposed DPGNN for link prediction.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140289590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the Co-Movement and Asymmetric Relationships of Oil Prices on the Shipping Stock Returns: Evidence from Three Shipping-Flagged Companies from Germany, South Korea, and Taiwan. 探究油价对航运股回报的共动和非对称关系:来自德国、韩国和台湾的三家航运滞后公司的证据。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-13 DOI: 10.1089/big.2023.0026
Jumadil Saputra, Kasypi Mokhtar, Anuar Abu Bakar, Siti Marsila Mhd Ruslan

In the last 2 years, there has been a significant upswing in oil prices, leading to a decline in economic activity and demand. This trend holds substantial implications for the global economy, particularly within the emerging business landscape. Among the influential risk factors impacting the returns of shipping stocks, none looms larger than the volatility in oil prices. Yet, only a limited number of studies have explored the complex relationship between oil price shocks and the dynamics of the liner shipping industry, with specific focus on uncertainty linkages and potential diversification strategies. This study aims to investigate the co-movements and asymmetric associations between oil prices (specifically, West Texas Intermediate and Brent) and the stock returns of three prominent shipping companies from Germany, South Korea, and Taiwan. The results unequivocally highlight the indispensable role of oil prices in shaping both short-term and long-term shipping stock returns. In addition, the research underscores the statistical significance of exchange rates and interest rates in influencing these returns, with their effects varying across different time horizons. Notably, shipping stock prices exhibit heightened sensitivity to positive movements in oil prices, while exchange rates and interest rates exert contrasting impacts, one being positive and the other negative. These findings collectively illuminate the profound influence of market sentiment regarding crucial economic indicators within the global shipping sector.

在过去两年里,石油价格大幅上涨,导致经济活动和需求下降。这一趋势对全球经济,尤其是新兴商业领域产生了重大影响。在影响航运业股票收益的风险因素中,最重要的莫过于石油价格的波动。然而,只有为数有限的研究探讨了油价冲击与班轮航运业动态之间的复杂关系,并特别关注不确定性联系和潜在的多元化战略。本研究旨在探讨油价(特别是西德克萨斯中质油价和布伦特油价)与德国、韩国和台湾三家著名航运公司股票收益之间的共同变动和非对称关联。研究结果明确凸显了油价在影响短期和长期航运股票回报率方面不可或缺的作用。此外,研究还强调了汇率和利率在影响这些回报率方面的统计意义,它们在不同时间跨度上的影响也各不相同。值得注意的是,航运股票价格对石油价格的积极变动表现出更高的敏感性,而汇率和利率则产生了截然不同的影响,一个是积极的,另一个是消极的。这些发现共同揭示了市场情绪对全球航运业关键经济指标的深刻影响。
{"title":"Investigating the Co-Movement and Asymmetric Relationships of Oil Prices on the Shipping Stock Returns: Evidence from Three Shipping-Flagged Companies from Germany, South Korea, and Taiwan.","authors":"Jumadil Saputra, Kasypi Mokhtar, Anuar Abu Bakar, Siti Marsila Mhd Ruslan","doi":"10.1089/big.2023.0026","DOIUrl":"https://doi.org/10.1089/big.2023.0026","url":null,"abstract":"<p><p>In the last 2 years, there has been a significant upswing in oil prices, leading to a decline in economic activity and demand. This trend holds substantial implications for the global economy, particularly within the emerging business landscape. Among the influential risk factors impacting the returns of shipping stocks, none looms larger than the volatility in oil prices. Yet, only a limited number of studies have explored the complex relationship between oil price shocks and the dynamics of the liner shipping industry, with specific focus on uncertainty linkages and potential diversification strategies. This study aims to investigate the co-movements and asymmetric associations between oil prices (specifically, West Texas Intermediate and Brent) and the stock returns of three prominent shipping companies from Germany, South Korea, and Taiwan. The results unequivocally highlight the indispensable role of oil prices in shaping both short-term and long-term shipping stock returns. In addition, the research underscores the statistical significance of exchange rates and interest rates in influencing these returns, with their effects varying across different time horizons. Notably, shipping stock prices exhibit heightened sensitivity to positive movements in oil prices, while exchange rates and interest rates exert contrasting impacts, one being positive and the other negative. These findings collectively illuminate the profound influence of market sentiment regarding crucial economic indicators within the global shipping sector.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139736755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Autoregressive-Based Kalman Filter Approach for Daily PM2.5 Concentration Forecasting in Beijing, China. 基于自回归卡尔曼滤波器的中国北京 PM2.5 每日浓度预测方法。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-01 Epub Date: 2023-05-03 DOI: 10.1089/big.2022.0082
Xinyue Zhang, Chen Ding, Guizhi Wang

With the acceleration of urbanization, air pollution, especially PM2.5, has seriously affected human health and reduced people's life quality. Accurate PM2.5 prediction is significant for environmental protection authorities to take actions and develop prevention countermeasures. In this article, an adapted Kalman filter (KF) approach is presented to remove the nonlinearity and stochastic uncertainty of time series, suffered by the autoregressive integrated moving average (ARIMA) model. To further improve the accuracy of PM2.5 forecasting, a hybrid model is proposed by introducing an autoregressive (AR) model, where the AR part is used to determine the state-space equation, whereas the KF part is used for state estimation on PM2.5 concentration series. A modified artificial neural network (ANN), called AR-ANN is introduced to compare with the AR-KF model. According to the results, the AR-KF model outperforms the AR-ANN model and the original ARIMA model on the predication accuracy; that is, the AR-ANN obtains 10.85 and 15.45 of mean absolute error and root mean square error, respectively, whereas the ARIMA gains 30.58 and 29.39 on the corresponding metrics. It, therefore, proves that the presented AR-KF model can be adopted for air pollutant concentration prediction.

随着城市化进程的加快,空气污染尤其是 PM2.5 严重影响了人类健康,降低了人们的生活质量。准确预测 PM2.5 对环保部门采取行动和制定预防对策意义重大。本文提出了一种改进的卡尔曼滤波器(KF)方法,以消除自回归积分移动平均(ARIMA)模型所带来的时间序列的非线性和随机不确定性。为了进一步提高 PM2.5 预测的准确性,提出了一种混合模型,即引入自回归(AR)模型,其中 AR 部分用于确定状态空间方程,而 KF 部分用于 PM2.5 浓度序列的状态估计。为了与 AR-KF 模型进行比较,引入了一个名为 AR-ANN 的改进型人工神经网络(ANN)。结果表明,AR-KF 模型的预测精度优于 AR-ANN 模型和原始 ARIMA 模型,即 AR-ANN 模型的平均绝对误差和均方根误差分别为 10.85 和 15.45,而 ARIMA 模型的相应指标分别为 30.58 和 29.39。因此,这证明所提出的 AR-KF 模型可用于空气污染物浓度预测。
{"title":"An Autoregressive-Based Kalman Filter Approach for Daily PM<sub>2.5</sub> Concentration Forecasting in Beijing, China.","authors":"Xinyue Zhang, Chen Ding, Guizhi Wang","doi":"10.1089/big.2022.0082","DOIUrl":"10.1089/big.2022.0082","url":null,"abstract":"<p><p>With the acceleration of urbanization, air pollution, especially PM<sub>2.5</sub>, has seriously affected human health and reduced people's life quality. Accurate PM<sub>2.5</sub> prediction is significant for environmental protection authorities to take actions and develop prevention countermeasures. In this article, an adapted Kalman filter (KF) approach is presented to remove the nonlinearity and stochastic uncertainty of time series, suffered by the autoregressive integrated moving average (ARIMA) model. To further improve the accuracy of PM<sub>2.5</sub> forecasting, a hybrid model is proposed by introducing an autoregressive (AR) model, where the AR part is used to determine the state-space equation, whereas the KF part is used for state estimation on PM<sub>2.5</sub> concentration series. A modified artificial neural network (ANN), called AR-ANN is introduced to compare with the AR-KF model. According to the results, the AR-KF model outperforms the AR-ANN model and the original ARIMA model on the predication accuracy; that is, the AR-ANN obtains 10.85 and 15.45 of mean absolute error and root mean square error, respectively, whereas the ARIMA gains 30.58 and 29.39 on the corresponding metrics. It, therefore, proves that the presented AR-KF model can be adopted for air pollutant concentration prediction.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"19-29"},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9757180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long- and Short-Term Memory Model of Cotton Price Index Volatility Risk Based on Explainable Artificial Intelligence. 基于可解释人工智能的棉花价格指数波动风险的长短期记忆模型。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-01 Epub Date: 2023-11-17 DOI: 10.1089/big.2022.0287
Huosong Xia, Xiaoyu Hou, Justin Zuopeng Zhang

Market uncertainty greatly interferes with the decisions and plans of market participants, thus increasing the risk of decision-making, leading to compromised interests of decision-makers. Cotton price index (hereinafter referred to as cotton price) volatility is highly noisy, nonlinear, and stochastic and is susceptible to supply and demand, climate, substitutes, and other policy factors, which are subject to large uncertainties. To reduce decision risk and provide decision support for policymakers, this article integrates 13 factors affecting cotton price index volatility based on existing research and further divides them into transaction data and interaction data. A long- and short-term memory (LSTM) model is constructed, and a comparison experiment is implemented to analyze the cotton price index volatility. To make the constructed model explainable, we use explainable artificial intelligence (XAI) techniques to perform statistical analysis of the input features. The experimental results show that the LSTM model can accurately analyze the cotton price index fluctuation trend but cannot accurately predict the actual price of cotton; the transaction data plus interaction data are more sensitive than the transaction data in analyzing the cotton price fluctuation trend and can have a positive effect on the cotton price fluctuation analysis. This study can accurately reflect the fluctuation trend of the cotton market, provide reference to the state, enterprises, and cotton farmers for decision-making, and reduce the risk caused by frequent fluctuation of cotton prices. The analysis of the model using XAI techniques builds the confidence of decision-makers in the model.

市场的不确定性极大地干扰了市场参与者的决策和计划,从而增加了决策的风险,导致决策者的利益受损。棉花价格指数(以下简称棉价)波动具有高度的噪声、非线性和随机性,易受供需、气候、代用品等政策因素的影响,具有较大的不确定性。为了降低决策风险,为决策者提供决策支持,本文在已有研究的基础上,将影响棉花价格指数波动的13个因素进行整合,并进一步划分为交易数据和交互数据。构建了长短期记忆(LSTM)模型,并对棉花价格指数波动进行了对比实验分析。为了使构建的模型具有可解释性,我们使用可解释性人工智能(XAI)技术对输入特征进行统计分析。实验结果表明,LSTM模型能准确分析棉花价格指数波动趋势,但不能准确预测棉花实际价格;交易数据加交互数据在分析棉花价格波动趋势时比交易数据更敏感,可以对棉花价格波动分析产生积极的影响。本研究可以准确反映棉花市场的波动趋势,为国家、企业和棉农决策提供参考,降低棉花价格频繁波动带来的风险。使用XAI技术对模型进行分析,建立决策者对模型的信心。
{"title":"Long- and Short-Term Memory Model of Cotton Price Index Volatility Risk Based on Explainable Artificial Intelligence.","authors":"Huosong Xia, Xiaoyu Hou, Justin Zuopeng Zhang","doi":"10.1089/big.2022.0287","DOIUrl":"10.1089/big.2022.0287","url":null,"abstract":"<p><p>Market uncertainty greatly interferes with the decisions and plans of market participants, thus increasing the risk of decision-making, leading to compromised interests of decision-makers. Cotton price index (hereinafter referred to as cotton price) volatility is highly noisy, nonlinear, and stochastic and is susceptible to supply and demand, climate, substitutes, and other policy factors, which are subject to large uncertainties. To reduce decision risk and provide decision support for policymakers, this article integrates 13 factors affecting cotton price index volatility based on existing research and further divides them into transaction data and interaction data. A long- and short-term memory (LSTM) model is constructed, and a comparison experiment is implemented to analyze the cotton price index volatility. To make the constructed model explainable, we use explainable artificial intelligence (XAI) techniques to perform statistical analysis of the input features. The experimental results show that the LSTM model can accurately analyze the cotton price index fluctuation trend but cannot accurately predict the actual price of cotton; the transaction data plus interaction data are more sensitive than the transaction data in analyzing the cotton price fluctuation trend and can have a positive effect on the cotton price fluctuation analysis. This study can accurately reflect the fluctuation trend of the cotton market, provide reference to the state, enterprises, and cotton farmers for decision-making, and reduce the risk caused by frequent fluctuation of cotton prices. The analysis of the model using XAI techniques builds the confidence of decision-makers in the model.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"49-62"},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaussian Adapted Markov Model with Overhauled Fluctuation Analysis-Based Big Data Streaming Model in Cloud. 基于高斯自适应马尔可夫模型和检修波动分析的云中大数据流模型。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-01 Epub Date: 2023-10-30 DOI: 10.1089/big.2023.0035
M Ananthi, Annapoorani Gopal, K Ramalakshmi, P Mohan Kumar

An accurate resource usage prediction in the big data streaming applications still remains as one of the complex processes. In the existing works, various resource scaling techniques are developed for forecasting the resource usage in the big data streaming systems. However, the baseline streaming mechanisms limit with the issues of inefficient resource scaling, inaccurate forecasting, high latency, and running time. Therefore, the proposed work motivates to develop a new framework, named as Gaussian adapted Markov model (GAMM)-overhauled fluctuation analysis (OFA), for an efficient big data streaming in the cloud systems. The purpose of this work is to efficiently manage the time-bounded big data streaming applications with reduced error rate. In this study, the gating strategy is also used to extract the set of features for obtaining nonlinear distribution of data and fat convergence solution, used to perform the fluctuation analysis. Moreover, the layered architecture is developed for simplifying the process of resource forecasting in the streaming applications. During experimentation, the results of the proposed stream model GAMM-OFA are validated and compared by using different measures.

在大数据流应用中,准确的资源使用预测仍然是一个复杂的过程。在现有的工作中,开发了各种资源缩放技术来预测大数据流系统中的资源使用情况。然而,基线流机制由于资源扩展效率低、预测不准确、高延迟和运行时间等问题而受到限制。因此,所提出的工作旨在开发一种新的框架,称为高斯自适应马尔可夫模型(GAMM)-大修波动分析(OFA),用于云系统中高效的大数据流。这项工作的目的是有效地管理有时间限制的大数据流应用程序,降低错误率。在本研究中,门控策略还用于提取一组特征,以获得数据的非线性分布和脂肪收敛解,用于进行波动分析。此外,为了简化流应用程序中的资源预测过程,开发了分层体系结构。在实验过程中,通过使用不同的措施对所提出的流模型GAMM-OFA的结果进行了验证和比较。
{"title":"Gaussian Adapted Markov Model with Overhauled Fluctuation Analysis-Based Big Data Streaming Model in Cloud.","authors":"M Ananthi, Annapoorani Gopal, K Ramalakshmi, P Mohan Kumar","doi":"10.1089/big.2023.0035","DOIUrl":"10.1089/big.2023.0035","url":null,"abstract":"<p><p>An accurate resource usage prediction in the big data streaming applications still remains as one of the complex processes. In the existing works, various resource scaling techniques are developed for forecasting the resource usage in the big data streaming systems. However, the baseline streaming mechanisms limit with the issues of inefficient resource scaling, inaccurate forecasting, high latency, and running time. Therefore, the proposed work motivates to develop a new framework, named as Gaussian adapted Markov model (GAMM)-overhauled fluctuation analysis (OFA), for an efficient big data streaming in the cloud systems. The purpose of this work is to efficiently manage the time-bounded big data streaming applications with reduced error rate. In this study, the gating strategy is also used to extract the set of features for obtaining nonlinear distribution of data and fat convergence solution, used to perform the fluctuation analysis. Moreover, the layered architecture is developed for simplifying the process of resource forecasting in the streaming applications. During experimentation, the results of the proposed stream model GAMM-OFA are validated and compared by using different measures.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"1-18"},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Acknowledgment of Reviewers 2023. 鸣谢 2023 年审稿人。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-01 Epub Date: 2023-12-19 DOI: 10.1089/big.2023.29063.ack
{"title":"Acknowledgment of Reviewers 2023.","authors":"","doi":"10.1089/big.2023.29063.ack","DOIUrl":"10.1089/big.2023.29063.ack","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"12 1","pages":"81-82"},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139730992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of Cooperative Innovation on the Technological Innovation Performance of High-Tech Firms: A Dual Moderating Effect Model of Big Data Capabilities and Policy Support. 合作创新对高科技企业技术创新绩效的影响:大数据能力与政策支持的双重调节效应模型。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-01 Epub Date: 2023-09-14 DOI: 10.1089/big.2022.0301
Xianglong Li, Qingjin Wang, Renbo Shi, Xueling Wang, Kaiyun Zhang, Xiao Liu

The mechanism of cooperative innovation (CI) for high-tech firms aims to improve their technological innovation performance. It is the effective integration of the internal and external innovation resources of these firms, along with the simultaneous reduction in the uncertainty of technological innovation and the maintenance of the comparative advantage of the firms in the competition. This study used 322 high-tech firms as our sample, which were located in 33 national innovation demonstration bases identified by the Chinese government. We implemented a multiple linear regression to test the impact of CI conducted by these high-tech firms at the level of their technological innovation performance. In addition, the study further examined the moderating effect of two boundary conditions-big data capabilities and policy support (PS)-on the main hypotheses. Our study found that high-tech firms carrying out CI can effectively improve their technological innovation performance, with big data capabilities and PS significantly enhancing the degree of this influence. The study reveals the intrinsic mechanism of the impact of CI on the technological innovation performance of high-tech firms, which, to a certain extent, expands the application context of CI and enriches the research perspective on the impact of CI on the innovation performance of firms. At the same time, the findings provide insight for how high-tech firms in the digital era can make reasonable use of data empowerment in the process of CI to achieve improved technological innovation performance.

高科技企业的合作创新(CI)机制旨在提高其技术创新绩效。它有效整合了企业内外部的创新资源,同时降低了技术创新的不确定性,保持了企业在竞争中的比较优势。本研究以中国政府认定的 33 个国家自主创新示范基地中的 322 家高科技企业为样本。我们采用多元线性回归的方法,检验了这些高科技企业开展的 CI 对其技术创新绩效水平的影响。此外,研究还进一步检验了两个边界条件--大数据能力和政策支持(PS)--对主要假设的调节作用。我们的研究发现,高科技企业开展 CI 能有效提高其技术创新绩效,而大数据能力和政策支持能显著提高这种影响程度。研究揭示了CI对高科技企业技术创新绩效影响的内在机理,在一定程度上拓展了CI的应用范围,丰富了CI对企业创新绩效影响的研究视角。同时,研究结果也为数字时代的高科技企业如何在CI过程中合理利用数据赋能实现技术创新绩效的提升提供了启示。
{"title":"Impact of Cooperative Innovation on the Technological Innovation Performance of High-Tech Firms: A Dual Moderating Effect Model of Big Data Capabilities and Policy Support.","authors":"Xianglong Li, Qingjin Wang, Renbo Shi, Xueling Wang, Kaiyun Zhang, Xiao Liu","doi":"10.1089/big.2022.0301","DOIUrl":"10.1089/big.2022.0301","url":null,"abstract":"<p><p>The mechanism of cooperative innovation (CI) for high-tech firms aims to improve their technological innovation performance. It is the effective integration of the internal and external innovation resources of these firms, along with the simultaneous reduction in the uncertainty of technological innovation and the maintenance of the comparative advantage of the firms in the competition. This study used 322 high-tech firms as our sample, which were located in 33 national innovation demonstration bases identified by the Chinese government. We implemented a multiple linear regression to test the impact of CI conducted by these high-tech firms at the level of their technological innovation performance. In addition, the study further examined the moderating effect of two boundary conditions-big data capabilities and policy support (PS)-on the main hypotheses. Our study found that high-tech firms carrying out CI can effectively improve their technological innovation performance, with big data capabilities and PS significantly enhancing the degree of this influence. The study reveals the intrinsic mechanism of the impact of CI on the technological innovation performance of high-tech firms, which, to a certain extent, expands the application context of CI and enriches the research perspective on the impact of CI on the innovation performance of firms. At the same time, the findings provide insight for how high-tech firms in the digital era can make reasonable use of data empowerment in the process of CI to achieve improved technological innovation performance.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"63-80"},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10243508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Natural Language Processing-Based Supplier Discovery for Financial Services. 基于自然语言处理的金融服务供应商自动发现。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-01 Epub Date: 2023-07-07 DOI: 10.1089/big.2022.0215
Mauro Papa, Ioannis Chatzigiannakis, Aris Anagnostopoulos

Public procurement is viewed as a major market force that can be used to promote innovation and drive small and medium-sized enterprises growth. In such cases, procurement system design relies on intermediates that provide vertical linkages between suppliers and providers of innovative services and products. In this work we propose an innovative methodology for decision support in the process of supplier discovery, which precedes the final supplier selection. We focus on data gathered from community-based sources such as Reddit and Wikidata and avoid any use of historical open procurement datasets to identify small and medium sized suppliers of innovative products and services that own very little market shares. We look into a real-world procurement case study from the financial sector focusing on the Financial and Market Data offering and develop an interactive web-based support tool to address certain requirements of the Italian central bank. We demonstrate how a suitable selection of natural language processing models, such as a part-of-speech tagger and a word-embedding model, in combination with a novel named-entity-disambiguation algorithm, can efficiently analyze huge quantity of textual data, increasing the probability of a full coverage of the market.

公共采购被视为一种重要的市场力量,可用于促进创新和推动中小型企业的发展。在这种情况下,采购系统的设计依赖于在供应商与创新服务和产品提供商之间建立纵向联系的中介机构。在这项工作中,我们提出了一种创新方法,用于在最终选择供应商之前的发现供应商过程中提供决策支持。我们专注于从 Reddit 和 Wikidata 等基于社区的来源收集数据,避免使用任何历史公开采购数据集来识别市场份额极小的创新产品和服务的中小型供应商。我们研究了金融部门的一个真实采购案例,重点是金融和市场数据产品,并开发了一个基于网络的互动式支持工具,以满足意大利中央银行的某些要求。我们展示了如何选择合适的自然语言处理模型,如语音部分标记和词嵌入模型,并结合新颖的命名实体消歧义算法,高效地分析大量文本数据,从而提高全面覆盖市场的可能性。
{"title":"Automated Natural Language Processing-Based Supplier Discovery for Financial Services.","authors":"Mauro Papa, Ioannis Chatzigiannakis, Aris Anagnostopoulos","doi":"10.1089/big.2022.0215","DOIUrl":"10.1089/big.2022.0215","url":null,"abstract":"<p><p>Public procurement is viewed as a major market force that can be used to promote innovation and drive small and medium-sized enterprises growth. In such cases, procurement system design relies on intermediates that provide vertical linkages between suppliers and providers of innovative services and products. In this work we propose an innovative methodology for decision support in the process of supplier discovery, which precedes the final supplier selection. We focus on data gathered from community-based sources such as Reddit and Wikidata and avoid any use of historical open procurement datasets to identify small and medium sized suppliers of innovative products and services that own very little market shares. We look into a real-world procurement case study from the financial sector focusing on the Financial and Market Data offering and develop an interactive web-based support tool to address certain requirements of the Italian central bank. We demonstrate how a suitable selection of natural language processing models, such as a part-of-speech tagger and a word-embedding model, in combination with a novel named-entity-disambiguation algorithm, can efficiently analyze huge quantity of textual data, increasing the probability of a full coverage of the market.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"30-48"},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9749953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A MapReduce-Based Approach for Fast Connected Components Detection from Large-Scale Networks. 基于 MapReduce 的大规模网络连接组件快速检测方法。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-29 DOI: 10.1089/big.2022.0264
Sajid Yousuf Bhat, Muhammad Abulaish

Owing to increasing size of the real-world networks, their processing using classical techniques has become infeasible. The amount of storage and central processing unit time required for processing large networks is far beyond the capabilities of a high-end computing machine. Moreover, real-world network data are generally distributed in nature because they are collected and stored on distributed platforms. This has popularized the use of the MapReduce, a distributed data processing framework, for analyzing real-world network data. Existing MapReduce-based methods for connected components detection mainly struggle to minimize the number of MapReduce rounds and the amount of data generated and forwarded to the subsequent rounds. This article presents an efficient MapReduce-based approach for finding connected components, which does not forward the complete set of connected components to the subsequent rounds; instead, it writes them to the Hadoop Distributed File System as soon as they are found to reduce the amount of data forwarded to the subsequent rounds. It also presents an application of the proposed method in contact tracing. The proposed method is evaluated on several network data sets and compared with two state-of-the-art methods. The empirical results reveal that the proposed method performs significantly better and is scalable to find connected components in large-scale networks.

由于现实世界的网络规模越来越大,使用传统技术处理这些网络已经变得不可行。处理大型网络所需的存储量和中央处理单元时间远远超出了高端计算机的能力。此外,现实世界的网络数据通常是分布式的,因为它们是在分布式平台上收集和存储的。因此,使用分布式数据处理框架 MapReduce 来分析现实世界的网络数据得到了普及。现有的基于 MapReduce 的连接组件检测方法主要致力于尽量减少 MapReduce 轮数以及生成并转发到后续轮的数据量。本文提出了一种高效的基于 MapReduce 的查找连接组件的方法,该方法不会将连接组件的完整集合转发给后续轮次,而是在找到连接组件后立即将其写入 Hadoop 分布式文件系统,以减少转发给后续轮次的数据量。报告还介绍了所提方法在接触追踪中的应用。本文在多个网络数据集上对所提出的方法进行了评估,并将其与两种最先进的方法进行了比较。实证结果表明,所提出的方法在大规模网络中寻找连接组件方面表现明显更好,并且具有可扩展性。
{"title":"A MapReduce-Based Approach for Fast Connected Components Detection from Large-Scale Networks.","authors":"Sajid Yousuf Bhat, Muhammad Abulaish","doi":"10.1089/big.2022.0264","DOIUrl":"https://doi.org/10.1089/big.2022.0264","url":null,"abstract":"<p><p>Owing to increasing size of the real-world networks, their processing using classical techniques has become infeasible. The amount of storage and central processing unit time required for processing large networks is far beyond the capabilities of a high-end computing machine. Moreover, real-world network data are generally distributed in nature because they are collected and stored on distributed platforms. This has popularized the use of the MapReduce, a distributed data processing framework, for analyzing real-world network data. Existing MapReduce-based methods for connected components detection mainly struggle to minimize the number of MapReduce rounds and the amount of data generated and forwarded to the subsequent rounds. This article presents an efficient MapReduce-based approach for finding connected components, which does not forward the complete set of connected components to the subsequent rounds; instead, it writes them to the Hadoop Distributed File System as soon as they are found to reduce the amount of data forwarded to the subsequent rounds. It also presents an application of the proposed method in contact tracing. The proposed method is evaluated on several network data sets and compared with two state-of-the-art methods. The empirical results reveal that the proposed method performs significantly better and is scalable to find connected components in large-scale networks.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139571864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling of Machine Learning-Based Extreme Value Theory in Stock Investment Risk Prediction: A Systematic Literature Review. 基于机器学习的极值理论在股票投资风险预测中的建模:系统性文献综述。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-17 DOI: 10.1089/big.2023.0004
Melina Melina, Sukono, Herlina Napitupulu, Norizan Mohamed

The stock market is heavily influenced by global sentiment, which is full of uncertainty and is characterized by extreme values and linear and nonlinear variables. High-frequency data generally refer to data that are collected at a very fast rate based on days, hours, minutes, and even seconds. Stock prices fluctuate rapidly and even at extremes along with changes in the variables that affect stock fluctuations. Research on investment risk estimation in the stock market that can identify extreme values is nonlinear, reliable in multivariate cases, and uses high-frequency data that are very important. The extreme value theory (EVT) approach can detect extreme values. This method is reliable in univariate cases and very complicated in multivariate cases. The purpose of this research was to collect, characterize, and analyze the investment risk estimation literature to identify research gaps. The literature used was selected by applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and sourced from Sciencedirect.com and Scopus databases. A total of 1107 articles were produced from the search at the identification stage, reduced to 236 in the eligibility stage, and 90 articles in the included studies set. The bibliometric networks were visualized using the VOSviewer software, and the main keyword used as the search criteria is "VaR." The visualization showed that EVT, the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and historical simulation are models often used to estimate the investment risk; the application of the machine learning (ML)-based investment risk estimation model is low. There has been no research using a combination of EVT and ML to estimate the investment risk. The results showed that the hybrid model produced better Value-at-Risk (VaR) accuracy under uncertainty and nonlinear conditions. Generally, models only use daily return data as model input. Based on research gaps, a hybrid model framework for estimating risk measures is proposed using a combination of EVT and ML, using multivariable and high-frequency data to identify extreme values in the distribution of data. The goal is to produce an accurate and flexible estimated risk value against extreme changes and shocks in the stock market. Mathematics Subject Classification: 60G25; 62M20; 6245; 62P05; 91G70.

股票市场深受全球情绪的影响,而全球情绪充满了不确定性,其特点是极端值以及线性和非线性变量。高频数据一般是指以天、小时、分钟甚至秒为单位快速收集的数据。股票价格随着影响股票波动的变量的变化而快速波动,甚至出现极端波动。能够识别极值的股市投资风险评估研究是非线性的,在多变量情况下是可靠的,并且使用的是非常重要的高频数据。极值理论(EVT)方法可以检测极值。这种方法在单变量情况下是可靠的,而在多变量情况下则非常复杂。本研究的目的是收集、描述和分析投资风险估计文献,找出研究空白。所使用的文献是根据《系统综述和元分析首选报告项目》(Preferred Reporting Items for Systematic Reviews and Meta-Analyses,PRISMA)进行筛选的,来源于 Sciencedirect.com 和 Scopus 数据库。在识别阶段共搜索到 1107 篇文章,在资格审查阶段减少到 236 篇,在纳入研究集中有 90 篇文章。使用 VOSviewer 软件对文献计量学网络进行了可视化,搜索标准的主要关键词是 "VaR"。可视化结果显示,EVT、广义自回归条件异方差(GARCH)模型和历史模拟是常用的投资风险估计模型;基于机器学习(ML)的投资风险估计模型应用较少。目前还没有将 EVT 和 ML 结合起来估计投资风险的研究。研究结果表明,在不确定和非线性条件下,混合模型能产生更好的风险价值(VaR)精度。一般来说,模型仅使用每日收益数据作为模型输入。基于研究差距,我们提出了一个结合 EVT 和 ML 的混合模型框架来估算风险度量,使用多变量和高频数据来识别数据分布中的极端值。其目标是针对股票市场的极端变化和冲击,得出准确而灵活的估计风险值。数学学科分类:60G25; 62M20; 6245; 62P05; 91G70.
{"title":"Modeling of Machine Learning-Based Extreme Value Theory in Stock Investment Risk Prediction: A Systematic Literature Review.","authors":"Melina Melina, Sukono, Herlina Napitupulu, Norizan Mohamed","doi":"10.1089/big.2023.0004","DOIUrl":"https://doi.org/10.1089/big.2023.0004","url":null,"abstract":"<p><p>The stock market is heavily influenced by global sentiment, which is full of uncertainty and is characterized by extreme values and linear and nonlinear variables. High-frequency data generally refer to data that are collected at a very fast rate based on days, hours, minutes, and even seconds. Stock prices fluctuate rapidly and even at extremes along with changes in the variables that affect stock fluctuations. Research on investment risk estimation in the stock market that can identify extreme values is nonlinear, reliable in multivariate cases, and uses high-frequency data that are very important. The extreme value theory (EVT) approach can detect extreme values. This method is reliable in univariate cases and very complicated in multivariate cases. The purpose of this research was to collect, characterize, and analyze the investment risk estimation literature to identify research gaps. The literature used was selected by applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and sourced from Sciencedirect.com and Scopus databases. A total of 1107 articles were produced from the search at the identification stage, reduced to 236 in the eligibility stage, and 90 articles in the included studies set. The bibliometric networks were visualized using the VOSviewer software, and the main keyword used as the search criteria is \"VaR.\" The visualization showed that EVT, the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and historical simulation are models often used to estimate the investment risk; the application of the machine learning (ML)-based investment risk estimation model is low. There has been no research using a combination of EVT and ML to estimate the investment risk. The results showed that the hybrid model produced better Value-at-Risk (VaR) accuracy under uncertainty and nonlinear conditions. Generally, models only use daily return data as model input. Based on research gaps, a hybrid model framework for estimating risk measures is proposed using a combination of EVT and ML, using multivariable and high-frequency data to identify extreme values in the distribution of data. The goal is to produce an accurate and flexible estimated risk value against extreme changes and shocks in the stock market. Mathematics Subject Classification: 60G25; 62M20; 6245; 62P05; 91G70.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139486846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1