首页 > 最新文献

Big Data最新文献

英文 中文
Long- and Short-Term Memory Model of Cotton Price Index Volatility Risk Based on Explainable Artificial Intelligence. 基于可解释人工智能的棉花价格指数波动风险的长短期记忆模型。
IF 4.6 4区 计算机科学 Q1 Computer Science Pub Date : 2024-02-01 Epub Date: 2023-11-17 DOI: 10.1089/big.2022.0287
Huosong Xia, Xiaoyu Hou, Justin Zuopeng Zhang

Market uncertainty greatly interferes with the decisions and plans of market participants, thus increasing the risk of decision-making, leading to compromised interests of decision-makers. Cotton price index (hereinafter referred to as cotton price) volatility is highly noisy, nonlinear, and stochastic and is susceptible to supply and demand, climate, substitutes, and other policy factors, which are subject to large uncertainties. To reduce decision risk and provide decision support for policymakers, this article integrates 13 factors affecting cotton price index volatility based on existing research and further divides them into transaction data and interaction data. A long- and short-term memory (LSTM) model is constructed, and a comparison experiment is implemented to analyze the cotton price index volatility. To make the constructed model explainable, we use explainable artificial intelligence (XAI) techniques to perform statistical analysis of the input features. The experimental results show that the LSTM model can accurately analyze the cotton price index fluctuation trend but cannot accurately predict the actual price of cotton; the transaction data plus interaction data are more sensitive than the transaction data in analyzing the cotton price fluctuation trend and can have a positive effect on the cotton price fluctuation analysis. This study can accurately reflect the fluctuation trend of the cotton market, provide reference to the state, enterprises, and cotton farmers for decision-making, and reduce the risk caused by frequent fluctuation of cotton prices. The analysis of the model using XAI techniques builds the confidence of decision-makers in the model.

市场的不确定性极大地干扰了市场参与者的决策和计划,从而增加了决策的风险,导致决策者的利益受损。棉花价格指数(以下简称棉价)波动具有高度的噪声、非线性和随机性,易受供需、气候、代用品等政策因素的影响,具有较大的不确定性。为了降低决策风险,为决策者提供决策支持,本文在已有研究的基础上,将影响棉花价格指数波动的13个因素进行整合,并进一步划分为交易数据和交互数据。构建了长短期记忆(LSTM)模型,并对棉花价格指数波动进行了对比实验分析。为了使构建的模型具有可解释性,我们使用可解释性人工智能(XAI)技术对输入特征进行统计分析。实验结果表明,LSTM模型能准确分析棉花价格指数波动趋势,但不能准确预测棉花实际价格;交易数据加交互数据在分析棉花价格波动趋势时比交易数据更敏感,可以对棉花价格波动分析产生积极的影响。本研究可以准确反映棉花市场的波动趋势,为国家、企业和棉农决策提供参考,降低棉花价格频繁波动带来的风险。使用XAI技术对模型进行分析,建立决策者对模型的信心。
{"title":"Long- and Short-Term Memory Model of Cotton Price Index Volatility Risk Based on Explainable Artificial Intelligence.","authors":"Huosong Xia, Xiaoyu Hou, Justin Zuopeng Zhang","doi":"10.1089/big.2022.0287","DOIUrl":"10.1089/big.2022.0287","url":null,"abstract":"<p><p>Market uncertainty greatly interferes with the decisions and plans of market participants, thus increasing the risk of decision-making, leading to compromised interests of decision-makers. Cotton price index (hereinafter referred to as cotton price) volatility is highly noisy, nonlinear, and stochastic and is susceptible to supply and demand, climate, substitutes, and other policy factors, which are subject to large uncertainties. To reduce decision risk and provide decision support for policymakers, this article integrates 13 factors affecting cotton price index volatility based on existing research and further divides them into transaction data and interaction data. A long- and short-term memory (LSTM) model is constructed, and a comparison experiment is implemented to analyze the cotton price index volatility. To make the constructed model explainable, we use explainable artificial intelligence (XAI) techniques to perform statistical analysis of the input features. The experimental results show that the LSTM model can accurately analyze the cotton price index fluctuation trend but cannot accurately predict the actual price of cotton; the transaction data plus interaction data are more sensitive than the transaction data in analyzing the cotton price fluctuation trend and can have a positive effect on the cotton price fluctuation analysis. This study can accurately reflect the fluctuation trend of the cotton market, provide reference to the state, enterprises, and cotton farmers for decision-making, and reduce the risk caused by frequent fluctuation of cotton prices. The analysis of the model using XAI techniques builds the confidence of decision-makers in the model.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaussian Adapted Markov Model with Overhauled Fluctuation Analysis-Based Big Data Streaming Model in Cloud. 基于高斯自适应马尔可夫模型和检修波动分析的云中大数据流模型。
IF 4.6 4区 计算机科学 Q1 Computer Science Pub Date : 2024-02-01 Epub Date: 2023-10-30 DOI: 10.1089/big.2023.0035
M Ananthi, Annapoorani Gopal, K Ramalakshmi, P Mohan Kumar

An accurate resource usage prediction in the big data streaming applications still remains as one of the complex processes. In the existing works, various resource scaling techniques are developed for forecasting the resource usage in the big data streaming systems. However, the baseline streaming mechanisms limit with the issues of inefficient resource scaling, inaccurate forecasting, high latency, and running time. Therefore, the proposed work motivates to develop a new framework, named as Gaussian adapted Markov model (GAMM)-overhauled fluctuation analysis (OFA), for an efficient big data streaming in the cloud systems. The purpose of this work is to efficiently manage the time-bounded big data streaming applications with reduced error rate. In this study, the gating strategy is also used to extract the set of features for obtaining nonlinear distribution of data and fat convergence solution, used to perform the fluctuation analysis. Moreover, the layered architecture is developed for simplifying the process of resource forecasting in the streaming applications. During experimentation, the results of the proposed stream model GAMM-OFA are validated and compared by using different measures.

在大数据流应用中,准确的资源使用预测仍然是一个复杂的过程。在现有的工作中,开发了各种资源缩放技术来预测大数据流系统中的资源使用情况。然而,基线流机制由于资源扩展效率低、预测不准确、高延迟和运行时间等问题而受到限制。因此,所提出的工作旨在开发一种新的框架,称为高斯自适应马尔可夫模型(GAMM)-大修波动分析(OFA),用于云系统中高效的大数据流。这项工作的目的是有效地管理有时间限制的大数据流应用程序,降低错误率。在本研究中,门控策略还用于提取一组特征,以获得数据的非线性分布和脂肪收敛解,用于进行波动分析。此外,为了简化流应用程序中的资源预测过程,开发了分层体系结构。在实验过程中,通过使用不同的措施对所提出的流模型GAMM-OFA的结果进行了验证和比较。
{"title":"Gaussian Adapted Markov Model with Overhauled Fluctuation Analysis-Based Big Data Streaming Model in Cloud.","authors":"M Ananthi, Annapoorani Gopal, K Ramalakshmi, P Mohan Kumar","doi":"10.1089/big.2023.0035","DOIUrl":"10.1089/big.2023.0035","url":null,"abstract":"<p><p>An accurate resource usage prediction in the big data streaming applications still remains as one of the complex processes. In the existing works, various resource scaling techniques are developed for forecasting the resource usage in the big data streaming systems. However, the baseline streaming mechanisms limit with the issues of inefficient resource scaling, inaccurate forecasting, high latency, and running time. Therefore, the proposed work motivates to develop a new framework, named as Gaussian adapted Markov model (GAMM)-overhauled fluctuation analysis (OFA), for an efficient big data streaming in the cloud systems. The purpose of this work is to efficiently manage the time-bounded big data streaming applications with reduced error rate. In this study, the gating strategy is also used to extract the set of features for obtaining nonlinear distribution of data and fat convergence solution, used to perform the fluctuation analysis. Moreover, the layered architecture is developed for simplifying the process of resource forecasting in the streaming applications. During experimentation, the results of the proposed stream model GAMM-OFA are validated and compared by using different measures.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Acknowledgment of Reviewers 2023. 鸣谢 2023 年审稿人。
IF 4.6 4区 计算机科学 Q1 Computer Science Pub Date : 2024-02-01 Epub Date: 2023-12-19 DOI: 10.1089/big.2023.29063.ack
{"title":"Acknowledgment of Reviewers 2023.","authors":"","doi":"10.1089/big.2023.29063.ack","DOIUrl":"10.1089/big.2023.29063.ack","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139730992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Natural Language Processing-Based Supplier Discovery for Financial Services. 基于自然语言处理的金融服务供应商自动发现。
IF 4.6 4区 计算机科学 Q1 Computer Science Pub Date : 2024-02-01 Epub Date: 2023-07-07 DOI: 10.1089/big.2022.0215
Mauro Papa, Ioannis Chatzigiannakis, Aris Anagnostopoulos

Public procurement is viewed as a major market force that can be used to promote innovation and drive small and medium-sized enterprises growth. In such cases, procurement system design relies on intermediates that provide vertical linkages between suppliers and providers of innovative services and products. In this work we propose an innovative methodology for decision support in the process of supplier discovery, which precedes the final supplier selection. We focus on data gathered from community-based sources such as Reddit and Wikidata and avoid any use of historical open procurement datasets to identify small and medium sized suppliers of innovative products and services that own very little market shares. We look into a real-world procurement case study from the financial sector focusing on the Financial and Market Data offering and develop an interactive web-based support tool to address certain requirements of the Italian central bank. We demonstrate how a suitable selection of natural language processing models, such as a part-of-speech tagger and a word-embedding model, in combination with a novel named-entity-disambiguation algorithm, can efficiently analyze huge quantity of textual data, increasing the probability of a full coverage of the market.

公共采购被视为一种重要的市场力量,可用于促进创新和推动中小型企业的发展。在这种情况下,采购系统的设计依赖于在供应商与创新服务和产品提供商之间建立纵向联系的中介机构。在这项工作中,我们提出了一种创新方法,用于在最终选择供应商之前的发现供应商过程中提供决策支持。我们专注于从 Reddit 和 Wikidata 等基于社区的来源收集数据,避免使用任何历史公开采购数据集来识别市场份额极小的创新产品和服务的中小型供应商。我们研究了金融部门的一个真实采购案例,重点是金融和市场数据产品,并开发了一个基于网络的互动式支持工具,以满足意大利中央银行的某些要求。我们展示了如何选择合适的自然语言处理模型,如语音部分标记和词嵌入模型,并结合新颖的命名实体消歧义算法,高效地分析大量文本数据,从而提高全面覆盖市场的可能性。
{"title":"Automated Natural Language Processing-Based Supplier Discovery for Financial Services.","authors":"Mauro Papa, Ioannis Chatzigiannakis, Aris Anagnostopoulos","doi":"10.1089/big.2022.0215","DOIUrl":"10.1089/big.2022.0215","url":null,"abstract":"<p><p>Public procurement is viewed as a major market force that can be used to promote innovation and drive small and medium-sized enterprises growth. In such cases, procurement system design relies on intermediates that provide vertical linkages between suppliers and providers of innovative services and products. In this work we propose an innovative methodology for decision support in the process of supplier discovery, which precedes the final supplier selection. We focus on data gathered from community-based sources such as Reddit and Wikidata and avoid any use of historical open procurement datasets to identify small and medium sized suppliers of innovative products and services that own very little market shares. We look into a real-world procurement case study from the financial sector focusing on the Financial and Market Data offering and develop an interactive web-based support tool to address certain requirements of the Italian central bank. We demonstrate how a suitable selection of natural language processing models, such as a part-of-speech tagger and a word-embedding model, in combination with a novel named-entity-disambiguation algorithm, can efficiently analyze huge quantity of textual data, increasing the probability of a full coverage of the market.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9749953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of Cooperative Innovation on the Technological Innovation Performance of High-Tech Firms: A Dual Moderating Effect Model of Big Data Capabilities and Policy Support. 合作创新对高科技企业技术创新绩效的影响:大数据能力与政策支持的双重调节效应模型。
IF 4.6 4区 计算机科学 Q1 Computer Science Pub Date : 2024-02-01 Epub Date: 2023-09-14 DOI: 10.1089/big.2022.0301
Xianglong Li, Qingjin Wang, Renbo Shi, Xueling Wang, Kaiyun Zhang, Xiao Liu

The mechanism of cooperative innovation (CI) for high-tech firms aims to improve their technological innovation performance. It is the effective integration of the internal and external innovation resources of these firms, along with the simultaneous reduction in the uncertainty of technological innovation and the maintenance of the comparative advantage of the firms in the competition. This study used 322 high-tech firms as our sample, which were located in 33 national innovation demonstration bases identified by the Chinese government. We implemented a multiple linear regression to test the impact of CI conducted by these high-tech firms at the level of their technological innovation performance. In addition, the study further examined the moderating effect of two boundary conditions-big data capabilities and policy support (PS)-on the main hypotheses. Our study found that high-tech firms carrying out CI can effectively improve their technological innovation performance, with big data capabilities and PS significantly enhancing the degree of this influence. The study reveals the intrinsic mechanism of the impact of CI on the technological innovation performance of high-tech firms, which, to a certain extent, expands the application context of CI and enriches the research perspective on the impact of CI on the innovation performance of firms. At the same time, the findings provide insight for how high-tech firms in the digital era can make reasonable use of data empowerment in the process of CI to achieve improved technological innovation performance.

高科技企业的合作创新(CI)机制旨在提高其技术创新绩效。它有效整合了企业内外部的创新资源,同时降低了技术创新的不确定性,保持了企业在竞争中的比较优势。本研究以中国政府认定的 33 个国家自主创新示范基地中的 322 家高科技企业为样本。我们采用多元线性回归的方法,检验了这些高科技企业开展的 CI 对其技术创新绩效水平的影响。此外,研究还进一步检验了两个边界条件--大数据能力和政策支持(PS)--对主要假设的调节作用。我们的研究发现,高科技企业开展 CI 能有效提高其技术创新绩效,而大数据能力和政策支持能显著提高这种影响程度。研究揭示了CI对高科技企业技术创新绩效影响的内在机理,在一定程度上拓展了CI的应用范围,丰富了CI对企业创新绩效影响的研究视角。同时,研究结果也为数字时代的高科技企业如何在CI过程中合理利用数据赋能实现技术创新绩效的提升提供了启示。
{"title":"Impact of Cooperative Innovation on the Technological Innovation Performance of High-Tech Firms: A Dual Moderating Effect Model of Big Data Capabilities and Policy Support.","authors":"Xianglong Li, Qingjin Wang, Renbo Shi, Xueling Wang, Kaiyun Zhang, Xiao Liu","doi":"10.1089/big.2022.0301","DOIUrl":"10.1089/big.2022.0301","url":null,"abstract":"<p><p>The mechanism of cooperative innovation (CI) for high-tech firms aims to improve their technological innovation performance. It is the effective integration of the internal and external innovation resources of these firms, along with the simultaneous reduction in the uncertainty of technological innovation and the maintenance of the comparative advantage of the firms in the competition. This study used 322 high-tech firms as our sample, which were located in 33 national innovation demonstration bases identified by the Chinese government. We implemented a multiple linear regression to test the impact of CI conducted by these high-tech firms at the level of their technological innovation performance. In addition, the study further examined the moderating effect of two boundary conditions-big data capabilities and policy support (PS)-on the main hypotheses. Our study found that high-tech firms carrying out CI can effectively improve their technological innovation performance, with big data capabilities and PS significantly enhancing the degree of this influence. The study reveals the intrinsic mechanism of the impact of CI on the technological innovation performance of high-tech firms, which, to a certain extent, expands the application context of CI and enriches the research perspective on the impact of CI on the innovation performance of firms. At the same time, the findings provide insight for how high-tech firms in the digital era can make reasonable use of data empowerment in the process of CI to achieve improved technological innovation performance.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10243508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A MapReduce-Based Approach for Fast Connected Components Detection from Large-Scale Networks. 基于 MapReduce 的大规模网络连接组件快速检测方法。
IF 4.6 4区 计算机科学 Q1 Computer Science Pub Date : 2024-01-29 DOI: 10.1089/big.2022.0264
Sajid Yousuf Bhat, Muhammad Abulaish

Owing to increasing size of the real-world networks, their processing using classical techniques has become infeasible. The amount of storage and central processing unit time required for processing large networks is far beyond the capabilities of a high-end computing machine. Moreover, real-world network data are generally distributed in nature because they are collected and stored on distributed platforms. This has popularized the use of the MapReduce, a distributed data processing framework, for analyzing real-world network data. Existing MapReduce-based methods for connected components detection mainly struggle to minimize the number of MapReduce rounds and the amount of data generated and forwarded to the subsequent rounds. This article presents an efficient MapReduce-based approach for finding connected components, which does not forward the complete set of connected components to the subsequent rounds; instead, it writes them to the Hadoop Distributed File System as soon as they are found to reduce the amount of data forwarded to the subsequent rounds. It also presents an application of the proposed method in contact tracing. The proposed method is evaluated on several network data sets and compared with two state-of-the-art methods. The empirical results reveal that the proposed method performs significantly better and is scalable to find connected components in large-scale networks.

由于现实世界的网络规模越来越大,使用传统技术处理这些网络已经变得不可行。处理大型网络所需的存储量和中央处理单元时间远远超出了高端计算机的能力。此外,现实世界的网络数据通常是分布式的,因为它们是在分布式平台上收集和存储的。因此,使用分布式数据处理框架 MapReduce 来分析现实世界的网络数据得到了普及。现有的基于 MapReduce 的连接组件检测方法主要致力于尽量减少 MapReduce 轮数以及生成并转发到后续轮的数据量。本文提出了一种高效的基于 MapReduce 的查找连接组件的方法,该方法不会将连接组件的完整集合转发给后续轮次,而是在找到连接组件后立即将其写入 Hadoop 分布式文件系统,以减少转发给后续轮次的数据量。报告还介绍了所提方法在接触追踪中的应用。本文在多个网络数据集上对所提出的方法进行了评估,并将其与两种最先进的方法进行了比较。实证结果表明,所提出的方法在大规模网络中寻找连接组件方面表现明显更好,并且具有可扩展性。
{"title":"A MapReduce-Based Approach for Fast Connected Components Detection from Large-Scale Networks.","authors":"Sajid Yousuf Bhat, Muhammad Abulaish","doi":"10.1089/big.2022.0264","DOIUrl":"https://doi.org/10.1089/big.2022.0264","url":null,"abstract":"<p><p>Owing to increasing size of the real-world networks, their processing using classical techniques has become infeasible. The amount of storage and central processing unit time required for processing large networks is far beyond the capabilities of a high-end computing machine. Moreover, real-world network data are generally distributed in nature because they are collected and stored on distributed platforms. This has popularized the use of the MapReduce, a distributed data processing framework, for analyzing real-world network data. Existing MapReduce-based methods for connected components detection mainly struggle to minimize the number of MapReduce rounds and the amount of data generated and forwarded to the subsequent rounds. This article presents an efficient MapReduce-based approach for finding connected components, which does not forward the complete set of connected components to the subsequent rounds; instead, it writes them to the Hadoop Distributed File System as soon as they are found to reduce the amount of data forwarded to the subsequent rounds. It also presents an application of the proposed method in contact tracing. The proposed method is evaluated on several network data sets and compared with two state-of-the-art methods. The empirical results reveal that the proposed method performs significantly better and is scalable to find connected components in large-scale networks.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139571864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling of Machine Learning-Based Extreme Value Theory in Stock Investment Risk Prediction: A Systematic Literature Review. 基于机器学习的极值理论在股票投资风险预测中的建模:系统性文献综述。
IF 4.6 4区 计算机科学 Q1 Computer Science Pub Date : 2024-01-17 DOI: 10.1089/big.2023.0004
Melina Melina, Sukono, Herlina Napitupulu, Norizan Mohamed

The stock market is heavily influenced by global sentiment, which is full of uncertainty and is characterized by extreme values and linear and nonlinear variables. High-frequency data generally refer to data that are collected at a very fast rate based on days, hours, minutes, and even seconds. Stock prices fluctuate rapidly and even at extremes along with changes in the variables that affect stock fluctuations. Research on investment risk estimation in the stock market that can identify extreme values is nonlinear, reliable in multivariate cases, and uses high-frequency data that are very important. The extreme value theory (EVT) approach can detect extreme values. This method is reliable in univariate cases and very complicated in multivariate cases. The purpose of this research was to collect, characterize, and analyze the investment risk estimation literature to identify research gaps. The literature used was selected by applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and sourced from Sciencedirect.com and Scopus databases. A total of 1107 articles were produced from the search at the identification stage, reduced to 236 in the eligibility stage, and 90 articles in the included studies set. The bibliometric networks were visualized using the VOSviewer software, and the main keyword used as the search criteria is "VaR." The visualization showed that EVT, the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and historical simulation are models often used to estimate the investment risk; the application of the machine learning (ML)-based investment risk estimation model is low. There has been no research using a combination of EVT and ML to estimate the investment risk. The results showed that the hybrid model produced better Value-at-Risk (VaR) accuracy under uncertainty and nonlinear conditions. Generally, models only use daily return data as model input. Based on research gaps, a hybrid model framework for estimating risk measures is proposed using a combination of EVT and ML, using multivariable and high-frequency data to identify extreme values in the distribution of data. The goal is to produce an accurate and flexible estimated risk value against extreme changes and shocks in the stock market. Mathematics Subject Classification: 60G25; 62M20; 6245; 62P05; 91G70.

股票市场深受全球情绪的影响,而全球情绪充满了不确定性,其特点是极端值以及线性和非线性变量。高频数据一般是指以天、小时、分钟甚至秒为单位快速收集的数据。股票价格随着影响股票波动的变量的变化而快速波动,甚至出现极端波动。能够识别极值的股市投资风险评估研究是非线性的,在多变量情况下是可靠的,并且使用的是非常重要的高频数据。极值理论(EVT)方法可以检测极值。这种方法在单变量情况下是可靠的,而在多变量情况下则非常复杂。本研究的目的是收集、描述和分析投资风险估计文献,找出研究空白。所使用的文献是根据《系统综述和元分析首选报告项目》(Preferred Reporting Items for Systematic Reviews and Meta-Analyses,PRISMA)进行筛选的,来源于 Sciencedirect.com 和 Scopus 数据库。在识别阶段共搜索到 1107 篇文章,在资格审查阶段减少到 236 篇,在纳入研究集中有 90 篇文章。使用 VOSviewer 软件对文献计量学网络进行了可视化,搜索标准的主要关键词是 "VaR"。可视化结果显示,EVT、广义自回归条件异方差(GARCH)模型和历史模拟是常用的投资风险估计模型;基于机器学习(ML)的投资风险估计模型应用较少。目前还没有将 EVT 和 ML 结合起来估计投资风险的研究。研究结果表明,在不确定和非线性条件下,混合模型能产生更好的风险价值(VaR)精度。一般来说,模型仅使用每日收益数据作为模型输入。基于研究差距,我们提出了一个结合 EVT 和 ML 的混合模型框架来估算风险度量,使用多变量和高频数据来识别数据分布中的极端值。其目标是针对股票市场的极端变化和冲击,得出准确而灵活的估计风险值。数学学科分类:60G25; 62M20; 6245; 62P05; 91G70.
{"title":"Modeling of Machine Learning-Based Extreme Value Theory in Stock Investment Risk Prediction: A Systematic Literature Review.","authors":"Melina Melina, Sukono, Herlina Napitupulu, Norizan Mohamed","doi":"10.1089/big.2023.0004","DOIUrl":"https://doi.org/10.1089/big.2023.0004","url":null,"abstract":"<p><p>The stock market is heavily influenced by global sentiment, which is full of uncertainty and is characterized by extreme values and linear and nonlinear variables. High-frequency data generally refer to data that are collected at a very fast rate based on days, hours, minutes, and even seconds. Stock prices fluctuate rapidly and even at extremes along with changes in the variables that affect stock fluctuations. Research on investment risk estimation in the stock market that can identify extreme values is nonlinear, reliable in multivariate cases, and uses high-frequency data that are very important. The extreme value theory (EVT) approach can detect extreme values. This method is reliable in univariate cases and very complicated in multivariate cases. The purpose of this research was to collect, characterize, and analyze the investment risk estimation literature to identify research gaps. The literature used was selected by applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and sourced from Sciencedirect.com and Scopus databases. A total of 1107 articles were produced from the search at the identification stage, reduced to 236 in the eligibility stage, and 90 articles in the included studies set. The bibliometric networks were visualized using the VOSviewer software, and the main keyword used as the search criteria is \"VaR.\" The visualization showed that EVT, the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and historical simulation are models often used to estimate the investment risk; the application of the machine learning (ML)-based investment risk estimation model is low. There has been no research using a combination of EVT and ML to estimate the investment risk. The results showed that the hybrid model produced better Value-at-Risk (VaR) accuracy under uncertainty and nonlinear conditions. Generally, models only use daily return data as model input. Based on research gaps, a hybrid model framework for estimating risk measures is proposed using a combination of EVT and ML, using multivariable and high-frequency data to identify extreme values in the distribution of data. The goal is to produce an accurate and flexible estimated risk value against extreme changes and shocks in the stock market. Mathematics Subject Classification: 60G25; 62M20; 6245; 62P05; 91G70.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139486846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Big Data Analytics on Decision-Making Within the Government Sector. 大数据分析对政府部门决策的影响。
IF 4.6 4区 计算机科学 Q1 Computer Science Pub Date : 2024-01-09 DOI: 10.1089/big.2023.0019
Laila Faridoon, Wei Liu, Crawford Spence

The government sector has started adopting big data analytics capability (BDAC) to enhance its service delivery. This study examines the relationship between BDAC and decision-making capability (DMC) in the government sector. It investigates the mediation role of the cognitive style of decision makers and organizational culture in the relationship between BDAC and DMC utilizing the resource-based view of the firm theory. It further investigates the impact of BDAC on organizational performance (OP). This study attempts to extend existing research through significant findings and recommendations to enhance decision-making processes for a successful utilization of BDAC in the government sector. A survey method was adopted to collect data from government organizations in the United Arab Emirates, and partial least-squares structural equation modeling was deployed to analyze the collected data. The results empirically validate the proposed theoretical framework and confirm that BDAC positively impacts DMC via cognitive style and organizational culture, and in turn further positively impacting OP overall.

政府部门已开始采用大数据分析能力(BDAC)来提高服务水平。本研究探讨了政府部门大数据分析能力(BDAC)与决策能力(DMC)之间的关系。研究利用基于资源的企业理论,探讨了决策者的认知风格和组织文化在 BDAC 与 DMC 关系中的中介作用。研究还进一步探讨了 BDAC 对组织绩效(OP)的影响。本研究试图通过重要的发现和建议来扩展现有的研究,以加强决策过程,从而在政府部门成功使用 BDAC。本研究采用调查方法收集阿拉伯联合酋长国政府组织的数据,并采用偏最小二乘结构方程模型对收集到的数据进行分析。研究结果从实证角度验证了所提出的理论框架,并证实 BDAC 通过认知风格和组织文化对 DMC 产生积极影响,进而进一步对 OP 整体产生积极影响。
{"title":"The Impact of Big Data Analytics on Decision-Making Within the Government Sector.","authors":"Laila Faridoon, Wei Liu, Crawford Spence","doi":"10.1089/big.2023.0019","DOIUrl":"https://doi.org/10.1089/big.2023.0019","url":null,"abstract":"<p><p>The government sector has started adopting big data analytics capability (BDAC) to enhance its service delivery. This study examines the relationship between BDAC and decision-making capability (DMC) in the government sector. It investigates the mediation role of the cognitive style of decision makers and organizational culture in the relationship between BDAC and DMC utilizing the resource-based view of the firm theory. It further investigates the impact of BDAC on organizational performance (OP). This study attempts to extend existing research through significant findings and recommendations to enhance decision-making processes for a successful utilization of BDAC in the government sector. A survey method was adopted to collect data from government organizations in the United Arab Emirates, and partial least-squares structural equation modeling was deployed to analyze the collected data. The results empirically validate the proposed theoretical framework and confirm that BDAC positively impacts DMC via cognitive style and organizational culture, and in turn further positively impacting OP overall.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139405170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-Scale Estimation and Analysis of Web Users' Mood from Web Search Query and Mobile Sensor Data. 从网络搜索查询和移动传感器数据中大规模估计和分析网络用户的情绪。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-01 Epub Date: 2023-06-02 DOI: 10.1089/big.2022.0211
Wataru Sasaki, Satoki Hamanaka, Satoko Miyahara, Kota Tsubouchi, Jin Nakazawa, Tadashi Okoshi

The ability to estimate the current mood states of web users has considerable potential for realizing user-centric opportune services in pervasive computing. However, it is difficult to determine the data type used for such estimation and collect the ground truth of such mood states. Therefore, we built a model to estimate the mood states from search-query data in an easy-to-collect and non-invasive manner. Then, we built a model to estimate mood states from mobile sensor data as another estimation model and supplemented its output to the ground-truth label of the model estimated from search queries. This novel two-step model building contributed to boosting the performance of estimating the mood states of web users. Our system was also deployed in the commercial stack, and large-scale data analysis with >11 million users was conducted. We proposed a nationwide mood score, which bundles the mood values of users across the country. It shows the daily and weekly rhythm of people's moods and explains the ups and downs of moods during the COVID-19 pandemic, which is inversely synchronized to the number of new COVID-19 cases. It detects big news that simultaneously affects the mood states of many users, even under fine-grained time resolution, such as the order of hours. In addition, we identified a certain class of advertisements that indicated a clear tendency in the mood of the users who clicked such advertisements.

估计网络用户当前情绪状态的能力对于在普适计算中实现以用户为中心的适时服务具有相当大的潜力。然而,很难确定用于这种估计的数据类型,也很难收集这种情绪状态的基本事实。因此,我们建立了一个模型,以易于收集和非侵入性的方式从搜索查询数据中估计情绪状态。然后,我们建立了一个从移动传感器数据中估计情绪状态的模型,作为另一个估计模型,并将其输出补充到从搜索查询中估计的模型的地面实况标签中。这种分两步建立模型的新方法有助于提高估计网络用户情绪状态的性能。我们的系统还部署在商业堆栈中,并对超过 1100 万用户进行了大规模数据分析。我们提出了一个全国性的情绪评分,它捆绑了全国用户的情绪值。它显示了人们每日和每周的情绪节奏,并解释了 COVID-19 大流行期间的情绪起伏,这与 COVID-19 新病例的数量成反比。它能检测到同时影响许多用户情绪状态的大新闻,即使是在时间分辨率很细的情况下,如数小时。此外,我们还发现了某类广告,点击此类广告的用户的情绪有明显的变化趋势。
{"title":"Large-Scale Estimation and Analysis of Web Users' Mood from Web Search Query and Mobile Sensor Data.","authors":"Wataru Sasaki, Satoki Hamanaka, Satoko Miyahara, Kota Tsubouchi, Jin Nakazawa, Tadashi Okoshi","doi":"10.1089/big.2022.0211","DOIUrl":"10.1089/big.2022.0211","url":null,"abstract":"<p><p>The ability to estimate the current mood states of web users has considerable potential for realizing user-centric opportune services in pervasive computing. However, it is difficult to determine the data type used for such estimation and collect the ground truth of such mood states. Therefore, we built a model to estimate the mood states from search-query data in an easy-to-collect and non-invasive manner. Then, we built a model to estimate mood states from mobile sensor data as another estimation model and supplemented its output to the ground-truth label of the model estimated from search queries. This novel two-step model building contributed to boosting the performance of estimating the mood states of web users. Our system was also deployed in the commercial stack, and large-scale data analysis with >11 million users was conducted. We proposed a nationwide mood score, which bundles the mood values of users across the country. It shows the daily and weekly rhythm of people's moods and explains the ups and downs of moods during the COVID-19 pandemic, which is inversely synchronized to the number of new COVID-19 cases. It detects big news that simultaneously affects the mood states of many users, even under fine-grained time resolution, such as the order of hours. In addition, we identified a certain class of advertisements that indicated a clear tendency in the mood of the users who clicked such advertisements.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304759/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9565593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational Efficient Approximations of the Concordance Probability in a Big Data Setting. 大数据环境下一致概率的高效计算近似。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-01 Epub Date: 2023-06-07 DOI: 10.1089/big.2022.0107
Robin Van Oirbeek, Jolien Ponnet, Bart Baesens, Tim Verdonck

Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.

建立统计模型后,性能测量是一项重要任务。接收运行特征曲线下面积(AUC)是评估二元分类器质量的最常用指标。在这种情况下,AUC 等于一致性概率,是评估模型判别能力的常用指标。与 AUC 相反,一致性概率也可以扩展到连续响应变量的情况。由于当今数据集的规模惊人,确定这种判别能力需要进行大量昂贵的计算,因此非常耗时,当然是在连续响应变量的情况下。因此,我们提出了两种估算方法,可以快速、准确地计算一致性概率,并同时适用于离散和连续环境。大量的仿真研究表明,这两种估计方法都具有卓越的性能和快速的计算时间。最后,两个真实数据集的实验证实了人工模拟的结论。
{"title":"Computational Efficient Approximations of the Concordance Probability in a Big Data Setting.","authors":"Robin Van Oirbeek, Jolien Ponnet, Bart Baesens, Tim Verdonck","doi":"10.1089/big.2022.0107","DOIUrl":"10.1089/big.2022.0107","url":null,"abstract":"<p><p>Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9592435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1