首页 > 最新文献

Big Data最新文献

英文 中文
Dual-Path Graph Neural Network with Adaptive Auxiliary Module for Link Prediction. 带自适应辅助模块的双路径图神经网络用于链路预测
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-01 Epub Date: 2024-03-25 DOI: 10.1089/big.2023.0130
Zhenzhen Yang, Zelong Lin, Yongpeng Yang, Jiaqi Li

Link prediction, which has important applications in many fields, predicts the possibility of the link between two nodes in a graph. Link prediction based on Graph Neural Network (GNN) obtains node representation and graph structure through GNN, which has attracted a growing amount of attention recently. However, the existing GNN-based link prediction approaches possess some shortcomings. On the one hand, because a graph contains different types of nodes, it leads to a great challenge for aggregating information and learning node representation from its neighbor nodes. On the other hand, the attention mechanism has been an effect instrument for enhancing the link prediction performance. However, the traditional attention mechanism is always monotonic for query nodes, which limits its influence on link prediction. To address these two problems, a Dual-Path Graph Neural Network (DPGNN) for link prediction is proposed in this study. First, we propose a novel Local Random Features Augmentation for Graph Convolution Network as a baseline of one path. Meanwhile, Graph Attention Network version 2 based on dynamic attention mechanism is adopted as a baseline of the other path. And then, we capture more meaningful node representation and more accurate link features by concatenating the information of these two paths. In addition, we propose an adaptive auxiliary module for better balancing the weight of auxiliary tasks, which brings more benefit to link prediction. Finally, extensive experiments verify the effectiveness and superiority of our proposed DPGNN for link prediction.

链接预测是指预测图中两个节点之间链接的可能性,在许多领域都有重要应用。基于图神经网络(GNN)的链接预测通过 GNN 获得节点表示和图结构,最近引起了越来越多的关注。然而,现有的基于 GNN 的链接预测方法存在一些缺陷。一方面,由于图中包含不同类型的节点,这给从相邻节点汇总信息和学习节点表示带来了巨大挑战。另一方面,注意力机制一直是提高链接预测性能的有效工具。然而,传统的注意力机制对于查询节点总是单调的,这限制了它对链接预测的影响。针对这两个问题,本研究提出了一种用于链接预测的双路径图神经网络(DPGNN)。首先,我们提出了一种新颖的局部随机特征增强图卷积网络(Local Random Features Augmentation for Graph Convolution Network),作为单路径的基线。同时,我们采用基于动态注意力机制的图注意力网络版本 2 作为另一条路径的基准。然后,我们通过串联这两条路径的信息来捕捉更有意义的节点表示和更准确的链接特征。此外,我们还提出了自适应辅助模块,以更好地平衡辅助任务的权重,从而为链接预测带来更多益处。最后,大量实验验证了我们提出的 DPGNN 在链接预测方面的有效性和优越性。
{"title":"Dual-Path Graph Neural Network with Adaptive Auxiliary Module for Link Prediction.","authors":"Zhenzhen Yang, Zelong Lin, Yongpeng Yang, Jiaqi Li","doi":"10.1089/big.2023.0130","DOIUrl":"10.1089/big.2023.0130","url":null,"abstract":"<p><p>Link prediction, which has important applications in many fields, predicts the possibility of the link between two nodes in a graph. Link prediction based on Graph Neural Network (GNN) obtains node representation and graph structure through GNN, which has attracted a growing amount of attention recently. However, the existing GNN-based link prediction approaches possess some shortcomings. On the one hand, because a graph contains different types of nodes, it leads to a great challenge for aggregating information and learning node representation from its neighbor nodes. On the other hand, the attention mechanism has been an effect instrument for enhancing the link prediction performance. However, the traditional attention mechanism is always monotonic for query nodes, which limits its influence on link prediction. To address these two problems, a Dual-Path Graph Neural Network (DPGNN) for link prediction is proposed in this study. First, we propose a novel Local Random Features Augmentation for Graph Convolution Network as a baseline of one path. Meanwhile, Graph Attention Network version 2 based on dynamic attention mechanism is adopted as a baseline of the other path. And then, we capture more meaningful node representation and more accurate link features by concatenating the information of these two paths. In addition, we propose an adaptive auxiliary module for better balancing the weight of auxiliary tasks, which brings more benefit to link prediction. Finally, extensive experiments verify the effectiveness and superiority of our proposed DPGNN for link prediction.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"333-343"},"PeriodicalIF":2.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140289590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Content-Aware Human Mobility Pattern Extraction. 内容感知的人类移动模式提取。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-01 Epub Date: 2024-07-10 DOI: 10.1089/big.2022.0281
Shengwen Li, Chaofan Fan, Tianci Li, Renyao Chen, Qingyuan Liu, Junfang Gong

Extracting meaningful patterns of human mobility from accumulating trajectories is essential for understanding human behavior. However, previous works identify human mobility patterns based on the spatial co-occurrence of trajectories, which ignores the effect of activity content, leaving challenges in effectively extracting and understanding patterns. To bridge this gap, this study incorporates the activity content of trajectories to extract human mobility patterns, and proposes acontent-aware mobility pattern model. The model first embeds the activity content in distributed continuous vector space by taking point-of-interest as an agent and then extracts representative and interpretable mobility patterns from human trajectory sets using a derived topic model. To investigate the performance of the proposed model, several evaluation metrics are developed, including pattern coherence, pattern similarity, and manual scoring. A real-world case study is conducted, and its experimental results show that the proposed model improves interpretability and helps to understand mobility patterns. This study provides not only a novel solution and several evaluation metrics for human mobility patterns but also a method reference for fusing content semantics of human activities for trajectory analysis and mining.

从累积的轨迹中提取有意义的人类移动模式对于理解人类行为至关重要。然而,以往的研究基于轨迹的空间共现来识别人类移动模式,忽略了活动内容的影响,给有效提取和理解模式带来了挑战。为了弥补这一不足,本研究结合轨迹的活动内容来提取人类移动模式,并提出了一种主动感知移动模式模型。该模型首先以兴趣点为代理将活动内容嵌入分布式连续向量空间,然后利用衍生的主题模型从人类轨迹集中提取具有代表性和可解释性的移动模式。为了研究拟议模型的性能,开发了几个评估指标,包括模式一致性、模式相似性和人工评分。我们进行了一项真实世界案例研究,实验结果表明,所提出的模型提高了可解释性,有助于理解移动模式。这项研究不仅为人类移动模式提供了新颖的解决方案和多个评价指标,还为融合人类活动的内容语义进行轨迹分析和挖掘提供了方法参考。
{"title":"Content-Aware Human Mobility Pattern Extraction.","authors":"Shengwen Li, Chaofan Fan, Tianci Li, Renyao Chen, Qingyuan Liu, Junfang Gong","doi":"10.1089/big.2022.0281","DOIUrl":"10.1089/big.2022.0281","url":null,"abstract":"<p><p>Extracting meaningful patterns of human mobility from accumulating trajectories is essential for understanding human behavior. However, previous works identify human mobility patterns based on the spatial co-occurrence of trajectories, which ignores the effect of activity content, leaving challenges in effectively extracting and understanding patterns. To bridge this gap, this study incorporates the activity content of trajectories to extract human mobility patterns, and proposes acontent-aware mobility pattern model. The model first embeds the activity content in distributed continuous vector space by taking point-of-interest as an agent and then extracts representative and interpretable mobility patterns from human trajectory sets using a derived topic model. To investigate the performance of the proposed model, several evaluation metrics are developed, including pattern coherence, pattern similarity, and manual scoring. A real-world case study is conducted, and its experimental results show that the proposed model improves interpretability and helps to understand mobility patterns. This study provides not only a novel solution and several evaluation metrics for human mobility patterns but also a method reference for fusing content semantics of human activities for trajectory analysis and mining.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"269-284"},"PeriodicalIF":2.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on the Influence of Information Iterative Propagation on Complex Network Structure. 信息迭代传播对复杂网络结构的影响研究。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-01 Epub Date: 2024-07-27 DOI: 10.1089/big.2023.0016
Yinuo Qian, Fuzhong Nian, Zheming Wang, Yabing Yao

Dynamic propagation will affect the change of network structure. Different networks are affected by the iterative propagation of information to different degrees. The iterative propagation of information in the network changes the connection strength of the chain edge between nodes. Most studies on temporal networks build networks based on time characteristics, and the iterative propagation of information in the network can also reflect the time characteristics of network evolution. The change of network structure is a macromanifestation of time characteristics, whereas the dynamics in the network is a micromanifestation of time characteristics. How to concretely visualize the change of network structure influenced by the characteristics of propagation dynamics has become the focus of this article. The appearance of chain edge is the micro change of network structure, and the division of community is the macro change of network structure. Based on this, the node participation is proposed to quantify the influence of different users on the information propagation in the network, and it is simulated in different types of networks. By analyzing the iterative propagation of information, the weighted network of different networks based on the iterative propagation of information is constructed. Finally, the chain edge and community division in the network are analyzed to achieve the purpose of quantifying the influence of network propagation on complex network structure.

动态传播会影响网络结构的变化。不同的网络受信息迭代传播的影响程度不同。网络中信息的迭代传播会改变节点间链边的连接强度。大多数关于时态网络的研究都是基于时间特征来构建网络的,网络中信息的迭代传播也能反映网络演化的时间特征。网络结构的变化是时间特征的宏观体现,而网络中的动态变化则是时间特征的微观体现。如何具体直观地体现传播动力学特征对网络结构变化的影响,成为本文讨论的重点。链边的出现是网络结构的微观变化,社区的划分是网络结构的宏观变化。在此基础上,提出了节点参与度来量化不同用户对网络信息传播的影响,并在不同类型的网络中进行了模拟。通过对信息迭代传播的分析,构建了基于信息迭代传播的不同网络的加权网络。最后,通过分析网络中的链边和社区划分,达到量化网络传播对复杂网络结构影响的目的。
{"title":"Research on the Influence of Information Iterative Propagation on Complex Network Structure.","authors":"Yinuo Qian, Fuzhong Nian, Zheming Wang, Yabing Yao","doi":"10.1089/big.2023.0016","DOIUrl":"10.1089/big.2023.0016","url":null,"abstract":"<p><p>Dynamic propagation will affect the change of network structure. Different networks are affected by the iterative propagation of information to different degrees. The iterative propagation of information in the network changes the connection strength of the chain edge between nodes. Most studies on temporal networks build networks based on time characteristics, and the iterative propagation of information in the network can also reflect the time characteristics of network evolution. The change of network structure is a macromanifestation of time characteristics, whereas the dynamics in the network is a micromanifestation of time characteristics. How to concretely visualize the change of network structure influenced by the characteristics of propagation dynamics has become the focus of this article. The appearance of chain edge is the micro change of network structure, and the division of community is the macro change of network structure. Based on this, the node participation is proposed to quantify the influence of different users on the information propagation in the network, and it is simulated in different types of networks. By analyzing the iterative propagation of information, the weighted network of different networks based on the iterative propagation of information is constructed. Finally, the chain edge and community division in the network are analyzed to achieve the purpose of quantifying the influence of network propagation on complex network structure.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"319-332"},"PeriodicalIF":2.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141789804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning-Based Decision Support System for Nurse Staff in Hospitals. 基于深度学习的医院护士决策支持系统。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-02 DOI: 10.1089/big.2024.0122
Jieyu Chen, Feilong He, Lihua Tang, Lingli Gu

To promote the informatization management of hospital human resources and advance the application of hospital information technology. The application of deep learning (DL) technologies in health care, particularly in hospital settings, has shown significant promise in enhancing decision-making processes for nurse staff. Utilizing a hospital management decision support system based on data warehouse theory and business intelligence technology to achieve multidimensional analysis and display of data. This research explores the development and implementation of a DL-Based Clinical Decision Support System (DL-CDSS) tailored for nurses in hospitals. DL-CDSS utilizes advanced neural network architectures to analyze complex clinical data, including patient records, vital signs, and diagnostic reports, aiming to assist nurses in making informed decisions regarding patient care. By leveraging large-scale datasets from Hospital Information Systems, DL-CDSS provides real-time recommendations for treatment plans, medication administration, and patient monitoring. The system's effectiveness is demonstrated through improved accuracy in clinical decision-making, reduction in medication errors, and optimized workflow efficiency. The system analyzes and displays nurses data from hospitals in terms of quantity, distribution, structure, forecasting, analysis reports, and peer comparisons, providing head nurses with multilevel, multiperspective data mining analysis results. Challenges such as data integration, model interpretability, and user interface design are addressed to ensure seamless integration into nursing practice, also concludes with insights into the potential benefits of DL-CDSS in promoting patient safety, enhancing health care quality, and supporting nursing professionals in delivering optimal care.

促进医院人力资源信息化管理,推进医院信息技术的应用。深度学习(DL)技术在医疗保健领域的应用,特别是在医院环境中,在加强护士工作人员的决策过程方面显示出巨大的希望。利用基于数据仓库理论和商业智能技术的医院管理决策支持系统,实现数据的多维分析和显示。本研究探讨了为医院护士量身定制的基于dl的临床决策支持系统(DL-CDSS)的开发和实施。DL-CDSS利用先进的神经网络架构来分析复杂的临床数据,包括患者记录、生命体征和诊断报告,旨在帮助护士做出有关患者护理的明智决策。通过利用来自医院信息系统的大规模数据集,DL-CDSS为治疗计划、药物管理和患者监测提供实时建议。通过提高临床决策的准确性、减少用药错误和优化工作流程效率,证明了该系统的有效性。系统从数量、分布、结构、预测、分析报告、同行比较等方面对医院护士数据进行分析展示,为护士长提供多层次、多角度的数据挖掘分析结果。解决了数据集成、模型可解释性和用户界面设计等挑战,以确保无缝集成到护理实践中,并总结了DL-CDSS在促进患者安全、提高医疗保健质量和支持护理专业人员提供最佳护理方面的潜在好处。
{"title":"Deep Learning-Based Decision Support System for Nurse Staff in Hospitals.","authors":"Jieyu Chen, Feilong He, Lihua Tang, Lingli Gu","doi":"10.1089/big.2024.0122","DOIUrl":"https://doi.org/10.1089/big.2024.0122","url":null,"abstract":"<p><p>To promote the informatization management of hospital human resources and advance the application of hospital information technology. The application of deep learning (DL) technologies in health care, particularly in hospital settings, has shown significant promise in enhancing decision-making processes for nurse staff. Utilizing a hospital management decision support system based on data warehouse theory and business intelligence technology to achieve multidimensional analysis and display of data. This research explores the development and implementation of a DL-Based Clinical Decision Support System (DL-CDSS) tailored for nurses in hospitals. DL-CDSS utilizes advanced neural network architectures to analyze complex clinical data, including patient records, vital signs, and diagnostic reports, aiming to assist nurses in making informed decisions regarding patient care. By leveraging large-scale datasets from Hospital Information Systems, DL-CDSS provides real-time recommendations for treatment plans, medication administration, and patient monitoring. The system's effectiveness is demonstrated through improved accuracy in clinical decision-making, reduction in medication errors, and optimized workflow efficiency. The system analyzes and displays nurses data from hospitals in terms of quantity, distribution, structure, forecasting, analysis reports, and peer comparisons, providing head nurses with multilevel, multiperspective data mining analysis results. Challenges such as data integration, model interpretability, and user interface design are addressed to ensure seamless integration into nursing practice, also concludes with insights into the potential benefits of DL-CDSS in promoting patient safety, enhancing health care quality, and supporting nursing professionals in delivering optimal care.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of the COVID-19 Pandemic on Stock Market Performance in G20 Countries: Evidence from Long Short-Term Memory with a Recurrent Neural Network Approach. COVID-19 大流行对 G20 国家股市表现的影响:利用递归神经网络方法从短期长记忆中获取证据》(Evidence from Long Short-Term Memory with a Recurrent Neural Network Approach.
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-01 Epub Date: 2023-12-20 DOI: 10.1089/big.2023.0015
Pingkan Mayosi Fitriana, Jumadil Saputra, Zairihan Abdul Halim

In light of developing and industrialized nations, the G20 economies account for a whopping two-thirds of the world's population and are the largest economies globally. Public emergencies have occasionally arisen due to the rapid spread of COVID-19 globally, impacting many people's lives, especially in G20 countries. Thus, this study is written to investigate the impact of the COVID-19 pandemic on stock market performance in G20 countries. This study uses daily stock market data of G20 countries from January 1, 2019 to June 30, 2020. The stock market data were divided into G7 countries and non-G7 countries. The data were analyzed using Long Short-Term Memory with a Recurrent Neural Network (LSTM-RNN) approach. The result indicated a gap between the actual stock market index and a forecasted time series that would have happened without COVID-19. Owing to movement restrictions, this study found that stock markets in six countries, including Argentina, China, South Africa, Turkey, Saudi Arabia, and the United States, are affected negatively. Besides that, movement restrictions in the G7 countries, excluding the United States, and the non-G20 countries, excluding Argentina, China, South Africa, Turkey, and Saudi, significantly impact the stock market performance. Generally, LSTM prediction estimates relative terms, except for stock market performance in the United Kingdom, the Republic of Korea, South Africa, and Spain. The stock market performance in the United Kingdom and Spain countries has significantly reduced during and after the occurrence of COVID-19. It indicates that the COVID-19 pandemic considerably influenced the stock markets of 14 G20 countries, whereas less severely impacting 6 remaining countries. In conclusion, our empirical evidence showed that the pandemic had restricted effects on the stock market performance in G20 countries.

从发展中国家和工业化国家的角度来看,G20 经济体占世界人口的三分之二,是全球最大的经济体。由于 COVID-19 在全球范围内的快速传播,公共突发事件时有发生,对许多人的生活造成了影响,尤其是在 G20 国家。因此,本研究旨在调查 COVID-19 大流行对 G20 国家股市表现的影响。本研究使用 G20 国家从 2019 年 1 月 1 日至 2020 年 6 月 30 日的每日股市数据。股市数据分为 G7 国家和非 G7 国家。数据分析采用了具有循环神经网络(LSTM-RNN)的长短期记忆方法。结果表明,如果没有 COVID-19,实际股市指数与预测时间序列之间会出现差距。本研究发现,由于流动限制,阿根廷、中国、南非、土耳其、沙特阿拉伯和美国等六个国家的股市受到了负面影响。此外,除美国之外的 G7 国家以及除阿根廷、中国、南非、土耳其和沙特之外的非 G20 国家的流动限制也对股市表现产生了显著影响。一般来说,除了英国、大韩民国、南非和西班牙的股市表现外,LSTM 预测估计的都是相对值。在 COVID-19 发生期间和之后,英国和西班牙的股市表现明显下降。这表明,COVID-19 大流行对 14 个 G20 国家的股市产生了重大影响,而对其余 6 个国家的影响则较小。总之,我们的经验证据表明,大流行病对 G20 国家的股市表现产生了有限的影响。
{"title":"The Impact of the COVID-19 Pandemic on Stock Market Performance in G20 Countries: Evidence from Long Short-Term Memory with a Recurrent Neural Network Approach.","authors":"Pingkan Mayosi Fitriana, Jumadil Saputra, Zairihan Abdul Halim","doi":"10.1089/big.2023.0015","DOIUrl":"10.1089/big.2023.0015","url":null,"abstract":"<p><p>In light of developing and industrialized nations, the G20 economies account for a whopping two-thirds of the world's population and are the largest economies globally. Public emergencies have occasionally arisen due to the rapid spread of COVID-19 globally, impacting many people's lives, especially in G20 countries. Thus, this study is written to investigate the impact of the COVID-19 pandemic on stock market performance in G20 countries. This study uses daily stock market data of G20 countries from January 1, 2019 to June 30, 2020. The stock market data were divided into G7 countries and non-G7 countries. The data were analyzed using Long Short-Term Memory with a Recurrent Neural Network (LSTM-RNN) approach. The result indicated a gap between the actual stock market index and a forecasted time series that would have happened without COVID-19. Owing to movement restrictions, this study found that stock markets in six countries, including Argentina, China, South Africa, Turkey, Saudi Arabia, and the United States, are affected negatively. Besides that, movement restrictions in the G7 countries, excluding the United States, and the non-G20 countries, excluding Argentina, China, South Africa, Turkey, and Saudi, significantly impact the stock market performance. Generally, LSTM prediction estimates relative terms, except for stock market performance in the United Kingdom, the Republic of Korea, South Africa, and Spain. The stock market performance in the United Kingdom and Spain countries has significantly reduced during and after the occurrence of COVID-19. It indicates that the COVID-19 pandemic considerably influenced the stock markets of 14 G20 countries, whereas less severely impacting 6 remaining countries. In conclusion, our empirical evidence showed that the pandemic had restricted effects on the stock market performance in G20 countries.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"219-242"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138832891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investment Recommender System Model Based on the Potential Investors' Key Decision Factors. 基于潜在投资者关键决策因素的投资推荐系统模型。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-01 Epub Date: 2023-05-08 DOI: 10.1089/big.2022.0302
Asefeh Asemi, Adeleh Asemi, Andrea Ko

In this research, we propose an automatic recommender system for providing investment-type suggestions offered to investors. This system is based on a new intelligent approach using an adaptive neuro-fuzzy inference system (ANFIS) that works with four potential investors' key decision factors (KDFs), which are system value, environmental awareness factors, the expectation of high return, and expectation of low return. The proposed system provides a new model for investment recommender systems (IRSs), which is based on the data of KDFs, and the data related to the type of investment. The solution of fuzzy neural inference and choosing the type of investment is used to provide advice and support the investor's decision. This system also works with incomplete data. It is also possible to apply expert opinions based on feedback provided by investors who use the system. The proposed system is a reliable system for providing suggestions for the type of investment. It can predict the investors' investment decisions based on their KDFs in the selection of different investment types. This system uses the K-means technique in JMP for preprocessing the data and ANFIS for evaluating the data. We also compare the proposed system with other existing IRSs and evaluate the system's accuracy and effectiveness using the root mean squared error method. Overall, the proposed system is an effective and reliable IRS that can be used by potential investors to make better investment decisions.

在本研究中,我们提出了一个自动推荐系统,为投资者提供投资类型的建议。该系统基于一种新的智能方法,使用自适应神经模糊推理系统(ANFIS)来处理潜在投资者的四个关键决策因素(kdf),即系统价值、环境意识因素、高回报预期和低回报预期。该系统为投资推荐系统(IRSs)提供了一种基于kdf数据和投资类型相关数据的新模型。利用模糊神经推理和投资类型选择的解决方案,为投资者的决策提供建议和支持。该系统也适用于不完整的数据。根据使用该系统的投资者提供的反馈,也可以应用专家的意见。拟议的制度是为投资类型提供建议的可靠制度。它可以根据投资者在选择不同投资类型时的kdf来预测投资者的投资决策。该系统使用JMP中的K-means技术对数据进行预处理,并使用ANFIS对数据进行评价。我们还将所提出的系统与其他现有的irs进行了比较,并使用均方根误差方法评估了系统的准确性和有效性。总的来说,所提出的系统是一个有效和可靠的IRS,可以被潜在的投资者用来做出更好的投资决策。
{"title":"Investment Recommender System Model Based on the Potential Investors' Key Decision Factors.","authors":"Asefeh Asemi, Adeleh Asemi, Andrea Ko","doi":"10.1089/big.2022.0302","DOIUrl":"10.1089/big.2022.0302","url":null,"abstract":"<p><p>In this research, we propose an automatic recommender system for providing investment-type suggestions offered to investors. This system is based on a new intelligent approach using an adaptive neuro-fuzzy inference system (ANFIS) that works with four potential investors' key decision factors (KDFs), which are system value, environmental awareness factors, the expectation of high return, and expectation of low return. The proposed system provides a new model for investment recommender systems (IRSs), which is based on the data of KDFs, and the data related to the type of investment. The solution of fuzzy neural inference and choosing the type of investment is used to provide advice and support the investor's decision. This system also works with incomplete data. It is also possible to apply expert opinions based on feedback provided by investors who use the system. The proposed system is a reliable system for providing suggestions for the type of investment. It can predict the investors' investment decisions based on their KDFs in the selection of different investment types. This system uses the K-means technique in JMP for preprocessing the data and ANFIS for evaluating the data. We also compare the proposed system with other existing IRSs and evaluate the system's accuracy and effectiveness using the root mean squared error method. Overall, the proposed system is an effective and reliable IRS that can be used by potential investors to make better investment decisions.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"197-218"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9432264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling of Machine Learning-Based Extreme Value Theory in Stock Investment Risk Prediction: A Systematic Literature Review. 基于机器学习的极值理论在股票投资风险预测中的建模:系统性文献综述。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-01 Epub Date: 2024-01-17 DOI: 10.1089/big.2023.0004
Melina Melina, Sukono, Herlina Napitupulu, Norizan Mohamed

The stock market is heavily influenced by global sentiment, which is full of uncertainty and is characterized by extreme values and linear and nonlinear variables. High-frequency data generally refer to data that are collected at a very fast rate based on days, hours, minutes, and even seconds. Stock prices fluctuate rapidly and even at extremes along with changes in the variables that affect stock fluctuations. Research on investment risk estimation in the stock market that can identify extreme values is nonlinear, reliable in multivariate cases, and uses high-frequency data that are very important. The extreme value theory (EVT) approach can detect extreme values. This method is reliable in univariate cases and very complicated in multivariate cases. The purpose of this research was to collect, characterize, and analyze the investment risk estimation literature to identify research gaps. The literature used was selected by applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and sourced from Sciencedirect.com and Scopus databases. A total of 1107 articles were produced from the search at the identification stage, reduced to 236 in the eligibility stage, and 90 articles in the included studies set. The bibliometric networks were visualized using the VOSviewer software, and the main keyword used as the search criteria is "VaR." The visualization showed that EVT, the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and historical simulation are models often used to estimate the investment risk; the application of the machine learning (ML)-based investment risk estimation model is low. There has been no research using a combination of EVT and ML to estimate the investment risk. The results showed that the hybrid model produced better Value-at-Risk (VaR) accuracy under uncertainty and nonlinear conditions. Generally, models only use daily return data as model input. Based on research gaps, a hybrid model framework for estimating risk measures is proposed using a combination of EVT and ML, using multivariable and high-frequency data to identify extreme values in the distribution of data. The goal is to produce an accurate and flexible estimated risk value against extreme changes and shocks in the stock market. Mathematics Subject Classification: 60G25; 62M20; 6245; 62P05; 91G70.

股票市场深受全球情绪的影响,而全球情绪充满了不确定性,其特点是极端值以及线性和非线性变量。高频数据一般是指以天、小时、分钟甚至秒为单位快速收集的数据。股票价格随着影响股票波动的变量的变化而快速波动,甚至出现极端波动。能够识别极值的股市投资风险评估研究是非线性的,在多变量情况下是可靠的,并且使用的是非常重要的高频数据。极值理论(EVT)方法可以检测极值。这种方法在单变量情况下是可靠的,而在多变量情况下则非常复杂。本研究的目的是收集、描述和分析投资风险估计文献,找出研究空白。所使用的文献是根据《系统综述和元分析首选报告项目》(Preferred Reporting Items for Systematic Reviews and Meta-Analyses,PRISMA)进行筛选的,来源于 Sciencedirect.com 和 Scopus 数据库。在识别阶段共搜索到 1107 篇文章,在资格审查阶段减少到 236 篇,在纳入研究集中有 90 篇文章。使用 VOSviewer 软件对文献计量学网络进行了可视化,搜索标准的主要关键词是 "VaR"。可视化结果显示,EVT、广义自回归条件异方差(GARCH)模型和历史模拟是常用的投资风险估计模型;基于机器学习(ML)的投资风险估计模型应用较少。目前还没有将 EVT 和 ML 结合起来估计投资风险的研究。研究结果表明,在不确定和非线性条件下,混合模型能产生更好的风险价值(VaR)精度。一般来说,模型仅使用每日收益数据作为模型输入。基于研究差距,我们提出了一个结合 EVT 和 ML 的混合模型框架来估算风险度量,使用多变量和高频数据来识别数据分布中的极端值。其目标是针对股票市场的极端变化和冲击,得出准确而灵活的估计风险值。数学学科分类:60G25; 62M20; 6245; 62P05; 91G70.
{"title":"Modeling of Machine Learning-Based Extreme Value Theory in Stock Investment Risk Prediction: A Systematic Literature Review.","authors":"Melina Melina, Sukono, Herlina Napitupulu, Norizan Mohamed","doi":"10.1089/big.2023.0004","DOIUrl":"10.1089/big.2023.0004","url":null,"abstract":"<p><p>The stock market is heavily influenced by global sentiment, which is full of uncertainty and is characterized by extreme values and linear and nonlinear variables. High-frequency data generally refer to data that are collected at a very fast rate based on days, hours, minutes, and even seconds. Stock prices fluctuate rapidly and even at extremes along with changes in the variables that affect stock fluctuations. Research on investment risk estimation in the stock market that can identify extreme values is nonlinear, reliable in multivariate cases, and uses high-frequency data that are very important. The extreme value theory (EVT) approach can detect extreme values. This method is reliable in univariate cases and very complicated in multivariate cases. The purpose of this research was to collect, characterize, and analyze the investment risk estimation literature to identify research gaps. The literature used was selected by applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and sourced from Sciencedirect.com and Scopus databases. A total of 1107 articles were produced from the search at the identification stage, reduced to 236 in the eligibility stage, and 90 articles in the included studies set. The bibliometric networks were visualized using the VOSviewer software, and the main keyword used as the search criteria is \"VaR.\" The visualization showed that EVT, the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and historical simulation are models often used to estimate the investment risk; the application of the machine learning (ML)-based investment risk estimation model is low. There has been no research using a combination of EVT and ML to estimate the investment risk. The results showed that the hybrid model produced better Value-at-Risk (VaR) accuracy under uncertainty and nonlinear conditions. Generally, models only use daily return data as model input. Based on research gaps, a hybrid model framework for estimating risk measures is proposed using a combination of EVT and ML, using multivariable and high-frequency data to identify extreme values in the distribution of data. The goal is to produce an accurate and flexible estimated risk value against extreme changes and shocks in the stock market. Mathematics Subject Classification: 60G25; 62M20; 6245; 62P05; 91G70.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"161-180"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139486846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A MapReduce-Based Approach for Fast Connected Components Detection from Large-Scale Networks. 基于 MapReduce 的大规模网络连接组件快速检测方法。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-01 Epub Date: 2024-01-29 DOI: 10.1089/big.2022.0264
Sajid Yousuf Bhat, Muhammad Abulaish

Owing to increasing size of the real-world networks, their processing using classical techniques has become infeasible. The amount of storage and central processing unit time required for processing large networks is far beyond the capabilities of a high-end computing machine. Moreover, real-world network data are generally distributed in nature because they are collected and stored on distributed platforms. This has popularized the use of the MapReduce, a distributed data processing framework, for analyzing real-world network data. Existing MapReduce-based methods for connected components detection mainly struggle to minimize the number of MapReduce rounds and the amount of data generated and forwarded to the subsequent rounds. This article presents an efficient MapReduce-based approach for finding connected components, which does not forward the complete set of connected components to the subsequent rounds; instead, it writes them to the Hadoop Distributed File System as soon as they are found to reduce the amount of data forwarded to the subsequent rounds. It also presents an application of the proposed method in contact tracing. The proposed method is evaluated on several network data sets and compared with two state-of-the-art methods. The empirical results reveal that the proposed method performs significantly better and is scalable to find connected components in large-scale networks.

由于现实世界的网络规模越来越大,使用传统技术处理这些网络已经变得不可行。处理大型网络所需的存储量和中央处理单元时间远远超出了高端计算机的能力。此外,现实世界的网络数据通常是分布式的,因为它们是在分布式平台上收集和存储的。因此,使用分布式数据处理框架 MapReduce 来分析现实世界的网络数据得到了普及。现有的基于 MapReduce 的连接组件检测方法主要致力于尽量减少 MapReduce 轮数以及生成并转发到后续轮的数据量。本文提出了一种高效的基于 MapReduce 的查找连接组件的方法,该方法不会将连接组件的完整集合转发给后续轮次,而是在找到连接组件后立即将其写入 Hadoop 分布式文件系统,以减少转发给后续轮次的数据量。报告还介绍了所提方法在接触追踪中的应用。本文在多个网络数据集上对所提出的方法进行了评估,并将其与两种最先进的方法进行了比较。实证结果表明,所提出的方法在大规模网络中寻找连接组件方面表现明显更好,并且具有可扩展性。
{"title":"A MapReduce-Based Approach for Fast Connected Components Detection from Large-Scale Networks.","authors":"Sajid Yousuf Bhat, Muhammad Abulaish","doi":"10.1089/big.2022.0264","DOIUrl":"10.1089/big.2022.0264","url":null,"abstract":"<p><p>Owing to increasing size of the real-world networks, their processing using classical techniques has become infeasible. The amount of storage and central processing unit time required for processing large networks is far beyond the capabilities of a high-end computing machine. Moreover, real-world network data are generally distributed in nature because they are collected and stored on distributed platforms. This has popularized the use of the MapReduce, a distributed data processing framework, for analyzing real-world network data. Existing MapReduce-based methods for connected components detection mainly struggle to minimize the number of MapReduce rounds and the amount of data generated and forwarded to the subsequent rounds. This article presents an efficient MapReduce-based approach for finding connected components, which does not forward the complete set of connected components to the subsequent rounds; instead, it writes them to the Hadoop Distributed File System as soon as they are found to reduce the amount of data forwarded to the subsequent rounds. It also presents an application of the proposed method in contact tracing. The proposed method is evaluated on several network data sets and compared with two state-of-the-art methods. The empirical results reveal that the proposed method performs significantly better and is scalable to find connected components in large-scale networks.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"243-268"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139571864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the Co-Movement and Asymmetric Relationships of Oil Prices on the Shipping Stock Returns: Evidence from Three Shipping-Flagged Companies from Germany, South Korea, and Taiwan. 探究油价对航运股回报的共动和非对称关系:来自德国、韩国和台湾的三家航运滞后公司的证据。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-01 Epub Date: 2024-02-13 DOI: 10.1089/big.2023.0026
Jumadil Saputra, Kasypi Mokhtar, Anuar Abu Bakar, Siti Marsila Mhd Ruslan

In the last 2 years, there has been a significant upswing in oil prices, leading to a decline in economic activity and demand. This trend holds substantial implications for the global economy, particularly within the emerging business landscape. Among the influential risk factors impacting the returns of shipping stocks, none looms larger than the volatility in oil prices. Yet, only a limited number of studies have explored the complex relationship between oil price shocks and the dynamics of the liner shipping industry, with specific focus on uncertainty linkages and potential diversification strategies. This study aims to investigate the co-movements and asymmetric associations between oil prices (specifically, West Texas Intermediate and Brent) and the stock returns of three prominent shipping companies from Germany, South Korea, and Taiwan. The results unequivocally highlight the indispensable role of oil prices in shaping both short-term and long-term shipping stock returns. In addition, the research underscores the statistical significance of exchange rates and interest rates in influencing these returns, with their effects varying across different time horizons. Notably, shipping stock prices exhibit heightened sensitivity to positive movements in oil prices, while exchange rates and interest rates exert contrasting impacts, one being positive and the other negative. These findings collectively illuminate the profound influence of market sentiment regarding crucial economic indicators within the global shipping sector.

在过去两年里,石油价格大幅上涨,导致经济活动和需求下降。这一趋势对全球经济,尤其是新兴商业领域产生了重大影响。在影响航运业股票收益的风险因素中,最重要的莫过于石油价格的波动。然而,只有为数有限的研究探讨了油价冲击与班轮航运业动态之间的复杂关系,并特别关注不确定性联系和潜在的多元化战略。本研究旨在探讨油价(特别是西德克萨斯中质油价和布伦特油价)与德国、韩国和台湾三家著名航运公司股票收益之间的共同变动和非对称关联。研究结果明确凸显了油价在影响短期和长期航运股票回报率方面不可或缺的作用。此外,研究还强调了汇率和利率在影响这些回报率方面的统计意义,它们在不同时间跨度上的影响也各不相同。值得注意的是,航运股票价格对石油价格的积极变动表现出更高的敏感性,而汇率和利率则产生了截然不同的影响,一个是积极的,另一个是消极的。这些发现共同揭示了市场情绪对全球航运业关键经济指标的深刻影响。
{"title":"Investigating the Co-Movement and Asymmetric Relationships of Oil Prices on the Shipping Stock Returns: Evidence from Three Shipping-Flagged Companies from Germany, South Korea, and Taiwan.","authors":"Jumadil Saputra, Kasypi Mokhtar, Anuar Abu Bakar, Siti Marsila Mhd Ruslan","doi":"10.1089/big.2023.0026","DOIUrl":"10.1089/big.2023.0026","url":null,"abstract":"<p><p>In the last 2 years, there has been a significant upswing in oil prices, leading to a decline in economic activity and demand. This trend holds substantial implications for the global economy, particularly within the emerging business landscape. Among the influential risk factors impacting the returns of shipping stocks, none looms larger than the volatility in oil prices. Yet, only a limited number of studies have explored the complex relationship between oil price shocks and the dynamics of the liner shipping industry, with specific focus on uncertainty linkages and potential diversification strategies. This study aims to investigate the co-movements and asymmetric associations between oil prices (specifically, West Texas Intermediate and Brent) and the stock returns of three prominent shipping companies from Germany, South Korea, and Taiwan. The results unequivocally highlight the indispensable role of oil prices in shaping both short-term and long-term shipping stock returns. In addition, the research underscores the statistical significance of exchange rates and interest rates in influencing these returns, with their effects varying across different time horizons. Notably, shipping stock prices exhibit heightened sensitivity to positive movements in oil prices, while exchange rates and interest rates exert contrasting impacts, one being positive and the other negative. These findings collectively illuminate the profound influence of market sentiment regarding crucial economic indicators within the global shipping sector.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"181-196"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139736755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based System. 大数据保密:使用基于规则的系统实现企业合规的方法。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2023-10-31 DOI: 10.1089/big.2022.0201
Georgios Vranopoulos, Nathan Clarke, Shirley Atkinson

Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, "the ugly duckling" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.

组织一直在投资于依赖内部和外部数据的分析,以获得竞争优势。然而,国家和国际上实施的法律和监管法案已成为一项挑战,尤其是对卫生或金融/银行等高度监管的部门而言。脸书(Facebook)和亚马逊(Amazon)等数据处理公司已经因违反数据治理规定而被处以巨额罚款,或正在接受调查。大数据时代通过将Volume、Velocity和Variety等维度引入保密性,进一步加剧了将数据丢失风险降至最低的挑战。尽管Volume和Velocity已经得到了广泛的研究,但Variety这个大数据的“丑小鸭”却经常被忽视和难以解决,从而增加了数据暴露和数据丢失的风险。在本文中,为了降低数据暴露和数据丢失的风险,提出了一个框架,利用算法分类和工作流功能,为跨组织的数据评估提供一致的方法。一个基于规则的系统,实施公司数据分类政策,将通过方便用户识别批准的指导方针并迅速执行,将暴露风险降至最低。该框架包括一个例外处理程序,对情有可原的情况给予适当批准。该系统是在概念验证工作原型中实现的,以展示其能力并提供动手体验。安全和数据管理领域的学者和高级企业高管对该信息系统进行了评估和认可。观众平均经历了~25年,积累了近三个世纪(294年)的总经历。结果证实,3V令人担忧,而拥有90%评论员的《综艺》是最令人担忧的。除此之外,平均水平约为60%,证实了适当的分类政策、程序和先决条件已经到位,而实施工具却滞后。
{"title":"Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based System.","authors":"Georgios Vranopoulos, Nathan Clarke, Shirley Atkinson","doi":"10.1089/big.2022.0201","DOIUrl":"10.1089/big.2022.0201","url":null,"abstract":"<p><p>Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, \"the ugly duckling\" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"90-110"},"PeriodicalIF":2.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1