The stock market is heavily influenced by global sentiment, which is full of uncertainty and is characterized by extreme values and linear and nonlinear variables. High-frequency data generally refer to data that are collected at a very fast rate based on days, hours, minutes, and even seconds. Stock prices fluctuate rapidly and even at extremes along with changes in the variables that affect stock fluctuations. Research on investment risk estimation in the stock market that can identify extreme values is nonlinear, reliable in multivariate cases, and uses high-frequency data that are very important. The extreme value theory (EVT) approach can detect extreme values. This method is reliable in univariate cases and very complicated in multivariate cases. The purpose of this research was to collect, characterize, and analyze the investment risk estimation literature to identify research gaps. The literature used was selected by applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and sourced from Sciencedirect.com and Scopus databases. A total of 1107 articles were produced from the search at the identification stage, reduced to 236 in the eligibility stage, and 90 articles in the included studies set. The bibliometric networks were visualized using the VOSviewer software, and the main keyword used as the search criteria is "VaR." The visualization showed that EVT, the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and historical simulation are models often used to estimate the investment risk; the application of the machine learning (ML)-based investment risk estimation model is low. There has been no research using a combination of EVT and ML to estimate the investment risk. The results showed that the hybrid model produced better Value-at-Risk (VaR) accuracy under uncertainty and nonlinear conditions. Generally, models only use daily return data as model input. Based on research gaps, a hybrid model framework for estimating risk measures is proposed using a combination of EVT and ML, using multivariable and high-frequency data to identify extreme values in the distribution of data. The goal is to produce an accurate and flexible estimated risk value against extreme changes and shocks in the stock market. Mathematics Subject Classification: 60G25; 62M20; 6245; 62P05; 91G70.
股票市场深受全球情绪的影响,而全球情绪充满了不确定性,其特点是极端值以及线性和非线性变量。高频数据一般是指以天、小时、分钟甚至秒为单位快速收集的数据。股票价格随着影响股票波动的变量的变化而快速波动,甚至出现极端波动。能够识别极值的股市投资风险评估研究是非线性的,在多变量情况下是可靠的,并且使用的是非常重要的高频数据。极值理论(EVT)方法可以检测极值。这种方法在单变量情况下是可靠的,而在多变量情况下则非常复杂。本研究的目的是收集、描述和分析投资风险估计文献,找出研究空白。所使用的文献是根据《系统综述和元分析首选报告项目》(Preferred Reporting Items for Systematic Reviews and Meta-Analyses,PRISMA)进行筛选的,来源于 Sciencedirect.com 和 Scopus 数据库。在识别阶段共搜索到 1107 篇文章,在资格审查阶段减少到 236 篇,在纳入研究集中有 90 篇文章。使用 VOSviewer 软件对文献计量学网络进行了可视化,搜索标准的主要关键词是 "VaR"。可视化结果显示,EVT、广义自回归条件异方差(GARCH)模型和历史模拟是常用的投资风险估计模型;基于机器学习(ML)的投资风险估计模型应用较少。目前还没有将 EVT 和 ML 结合起来估计投资风险的研究。研究结果表明,在不确定和非线性条件下,混合模型能产生更好的风险价值(VaR)精度。一般来说,模型仅使用每日收益数据作为模型输入。基于研究差距,我们提出了一个结合 EVT 和 ML 的混合模型框架来估算风险度量,使用多变量和高频数据来识别数据分布中的极端值。其目标是针对股票市场的极端变化和冲击,得出准确而灵活的估计风险值。数学学科分类:60G25; 62M20; 6245; 62P05; 91G70.
{"title":"Modeling of Machine Learning-Based Extreme Value Theory in Stock Investment Risk Prediction: A Systematic Literature Review.","authors":"Melina Melina, Sukono, Herlina Napitupulu, Norizan Mohamed","doi":"10.1089/big.2023.0004","DOIUrl":"10.1089/big.2023.0004","url":null,"abstract":"<p><p>The stock market is heavily influenced by global sentiment, which is full of uncertainty and is characterized by extreme values and linear and nonlinear variables. High-frequency data generally refer to data that are collected at a very fast rate based on days, hours, minutes, and even seconds. Stock prices fluctuate rapidly and even at extremes along with changes in the variables that affect stock fluctuations. Research on investment risk estimation in the stock market that can identify extreme values is nonlinear, reliable in multivariate cases, and uses high-frequency data that are very important. The extreme value theory (EVT) approach can detect extreme values. This method is reliable in univariate cases and very complicated in multivariate cases. The purpose of this research was to collect, characterize, and analyze the investment risk estimation literature to identify research gaps. The literature used was selected by applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and sourced from Sciencedirect.com and Scopus databases. A total of 1107 articles were produced from the search at the identification stage, reduced to 236 in the eligibility stage, and 90 articles in the included studies set. The bibliometric networks were visualized using the VOSviewer software, and the main keyword used as the search criteria is \"VaR.\" The visualization showed that EVT, the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and historical simulation are models often used to estimate the investment risk; the application of the machine learning (ML)-based investment risk estimation model is low. There has been no research using a combination of EVT and ML to estimate the investment risk. The results showed that the hybrid model produced better Value-at-Risk (VaR) accuracy under uncertainty and nonlinear conditions. Generally, models only use daily return data as model input. Based on research gaps, a hybrid model framework for estimating risk measures is proposed using a combination of EVT and ML, using multivariable and high-frequency data to identify extreme values in the distribution of data. The goal is to produce an accurate and flexible estimated risk value against extreme changes and shocks in the stock market. Mathematics Subject Classification: 60G25; 62M20; 6245; 62P05; 91G70.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"161-180"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139486846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2024-01-29DOI: 10.1089/big.2022.0264
Sajid Yousuf Bhat, Muhammad Abulaish
Owing to increasing size of the real-world networks, their processing using classical techniques has become infeasible. The amount of storage and central processing unit time required for processing large networks is far beyond the capabilities of a high-end computing machine. Moreover, real-world network data are generally distributed in nature because they are collected and stored on distributed platforms. This has popularized the use of the MapReduce, a distributed data processing framework, for analyzing real-world network data. Existing MapReduce-based methods for connected components detection mainly struggle to minimize the number of MapReduce rounds and the amount of data generated and forwarded to the subsequent rounds. This article presents an efficient MapReduce-based approach for finding connected components, which does not forward the complete set of connected components to the subsequent rounds; instead, it writes them to the Hadoop Distributed File System as soon as they are found to reduce the amount of data forwarded to the subsequent rounds. It also presents an application of the proposed method in contact tracing. The proposed method is evaluated on several network data sets and compared with two state-of-the-art methods. The empirical results reveal that the proposed method performs significantly better and is scalable to find connected components in large-scale networks.
{"title":"A MapReduce-Based Approach for Fast Connected Components Detection from Large-Scale Networks.","authors":"Sajid Yousuf Bhat, Muhammad Abulaish","doi":"10.1089/big.2022.0264","DOIUrl":"10.1089/big.2022.0264","url":null,"abstract":"<p><p>Owing to increasing size of the real-world networks, their processing using classical techniques has become infeasible. The amount of storage and central processing unit time required for processing large networks is far beyond the capabilities of a high-end computing machine. Moreover, real-world network data are generally distributed in nature because they are collected and stored on distributed platforms. This has popularized the use of the MapReduce, a distributed data processing framework, for analyzing real-world network data. Existing MapReduce-based methods for connected components detection mainly struggle to minimize the number of MapReduce rounds and the amount of data generated and forwarded to the subsequent rounds. This article presents an efficient MapReduce-based approach for finding connected components, which does not forward the complete set of connected components to the subsequent rounds; instead, it writes them to the Hadoop Distributed File System as soon as they are found to reduce the amount of data forwarded to the subsequent rounds. It also presents an application of the proposed method in contact tracing. The proposed method is evaluated on several network data sets and compared with two state-of-the-art methods. The empirical results reveal that the proposed method performs significantly better and is scalable to find connected components in large-scale networks.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"243-268"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139571864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2024-02-13DOI: 10.1089/big.2023.0026
Jumadil Saputra, Kasypi Mokhtar, Anuar Abu Bakar, Siti Marsila Mhd Ruslan
In the last 2 years, there has been a significant upswing in oil prices, leading to a decline in economic activity and demand. This trend holds substantial implications for the global economy, particularly within the emerging business landscape. Among the influential risk factors impacting the returns of shipping stocks, none looms larger than the volatility in oil prices. Yet, only a limited number of studies have explored the complex relationship between oil price shocks and the dynamics of the liner shipping industry, with specific focus on uncertainty linkages and potential diversification strategies. This study aims to investigate the co-movements and asymmetric associations between oil prices (specifically, West Texas Intermediate and Brent) and the stock returns of three prominent shipping companies from Germany, South Korea, and Taiwan. The results unequivocally highlight the indispensable role of oil prices in shaping both short-term and long-term shipping stock returns. In addition, the research underscores the statistical significance of exchange rates and interest rates in influencing these returns, with their effects varying across different time horizons. Notably, shipping stock prices exhibit heightened sensitivity to positive movements in oil prices, while exchange rates and interest rates exert contrasting impacts, one being positive and the other negative. These findings collectively illuminate the profound influence of market sentiment regarding crucial economic indicators within the global shipping sector.
{"title":"Investigating the Co-Movement and Asymmetric Relationships of Oil Prices on the Shipping Stock Returns: Evidence from Three Shipping-Flagged Companies from Germany, South Korea, and Taiwan.","authors":"Jumadil Saputra, Kasypi Mokhtar, Anuar Abu Bakar, Siti Marsila Mhd Ruslan","doi":"10.1089/big.2023.0026","DOIUrl":"10.1089/big.2023.0026","url":null,"abstract":"<p><p>In the last 2 years, there has been a significant upswing in oil prices, leading to a decline in economic activity and demand. This trend holds substantial implications for the global economy, particularly within the emerging business landscape. Among the influential risk factors impacting the returns of shipping stocks, none looms larger than the volatility in oil prices. Yet, only a limited number of studies have explored the complex relationship between oil price shocks and the dynamics of the liner shipping industry, with specific focus on uncertainty linkages and potential diversification strategies. This study aims to investigate the co-movements and asymmetric associations between oil prices (specifically, West Texas Intermediate and Brent) and the stock returns of three prominent shipping companies from Germany, South Korea, and Taiwan. The results unequivocally highlight the indispensable role of oil prices in shaping both short-term and long-term shipping stock returns. In addition, the research underscores the statistical significance of exchange rates and interest rates in influencing these returns, with their effects varying across different time horizons. Notably, shipping stock prices exhibit heightened sensitivity to positive movements in oil prices, while exchange rates and interest rates exert contrasting impacts, one being positive and the other negative. These findings collectively illuminate the profound influence of market sentiment regarding crucial economic indicators within the global shipping sector.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"181-196"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139736755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, "the ugly duckling" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.
{"title":"Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based System.","authors":"Georgios Vranopoulos, Nathan Clarke, Shirley Atkinson","doi":"10.1089/big.2022.0201","DOIUrl":"10.1089/big.2022.0201","url":null,"abstract":"<p><p>Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, \"the ugly duckling\" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"90-110"},"PeriodicalIF":2.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2024-01-09DOI: 10.1089/big.2023.0019
Laila Faridoon, Wei Liu, Crawford Spence
The government sector has started adopting big data analytics capability (BDAC) to enhance its service delivery. This study examines the relationship between BDAC and decision-making capability (DMC) in the government sector. It investigates the mediation role of the cognitive style of decision makers and organizational culture in the relationship between BDAC and DMC utilizing the resource-based view of the firm theory. It further investigates the impact of BDAC on organizational performance (OP). This study attempts to extend existing research through significant findings and recommendations to enhance decision-making processes for a successful utilization of BDAC in the government sector. A survey method was adopted to collect data from government organizations in the United Arab Emirates, and partial least-squares structural equation modeling was deployed to analyze the collected data. The results empirically validate the proposed theoretical framework and confirm that BDAC positively impacts DMC via cognitive style and organizational culture, and in turn further positively impacting OP overall.
{"title":"The Impact of Big Data Analytics on Decision-Making Within the Government Sector.","authors":"Laila Faridoon, Wei Liu, Crawford Spence","doi":"10.1089/big.2023.0019","DOIUrl":"10.1089/big.2023.0019","url":null,"abstract":"<p><p>The government sector has started adopting big data analytics capability (BDAC) to enhance its service delivery. This study examines the relationship between BDAC and decision-making capability (DMC) in the government sector. It investigates the mediation role of the cognitive style of decision makers and organizational culture in the relationship between BDAC and DMC utilizing the resource-based view of the firm theory. It further investigates the impact of BDAC on organizational performance (OP). This study attempts to extend existing research through significant findings and recommendations to enhance decision-making processes for a successful utilization of BDAC in the government sector. A survey method was adopted to collect data from government organizations in the United Arab Emirates, and partial least-squares structural equation modeling was deployed to analyze the collected data. The results empirically validate the proposed theoretical framework and confirm that BDAC positively impacts DMC via cognitive style and organizational culture, and in turn further positively impacting OP overall.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"73-89"},"PeriodicalIF":2.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139405170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2024-12-17DOI: 10.1089/big.2023.0134
Zhiyong Wu, Zhida Huang, Nianhua Tang, Kai Wang, Chuanjie Bian, Dandan Li, Vumika Kuraki, Felix Schmid
Physical therapists specializing in sports rehabilitation detection help injured athletes recover from their wounds and avoid further harm. Sports rehabilitators treat not just commonplace sports injuries but also work-related musculoskeletal injuries, discomfort, and disorders. Sensor-equipped Internet of Things (IoT) monitors the real-time location of medical equipment such as scooters, cardioverters, nebulizer treatments, oxygenation pumps, or other monitor gear. Analysis of medicine deployment across sites is possible in real time. Health care delivery based on digital technology to improve access, affordability, and sustainability of medical treatment is known as digital health care. The challenging characteristics of such sports injury rehabilitation for digital health care are playing position, game strategies, and cybersecurity. Hence, in this research, health care IoT-enabled body area networks (HIoT-BAN) have been designed to improve sports injury rehabilitation detection for digital health care. The health care sector may benefit significantly from IoT adoption since it allows for enhanced patient safety; health care investment management includes controlling the hospital's pharmaceutical stock and monitoring the heat and humidity levels. Digital health describes a group of programmers made to aid health care delivery, whether by assisting with clinical decision-making or streamlining back-end operations in health care institutions. A HIoT-BAN effectively predicts the rise in sports injury rehabilitation detection with faster digital health care based on IoT. The research concludes that the HIoT-BAN effectively indicates sports injury rehabilitation detection for digital health care. The experimental analysis of HIoT-BAN outperforms the IoT method in terms of performance, accuracy, prediction ratio, and mean square error rate.
{"title":"Research on Sports Injury Rehabilitation Detection Based on IoT Models for Digital Health Care.","authors":"Zhiyong Wu, Zhida Huang, Nianhua Tang, Kai Wang, Chuanjie Bian, Dandan Li, Vumika Kuraki, Felix Schmid","doi":"10.1089/big.2023.0134","DOIUrl":"10.1089/big.2023.0134","url":null,"abstract":"<p><p>Physical therapists specializing in sports rehabilitation detection help injured athletes recover from their wounds and avoid further harm. Sports rehabilitators treat not just commonplace sports injuries but also work-related musculoskeletal injuries, discomfort, and disorders. Sensor-equipped Internet of Things (IoT) monitors the real-time location of medical equipment such as scooters, cardioverters, nebulizer treatments, oxygenation pumps, or other monitor gear. Analysis of medicine deployment across sites is possible in real time. Health care delivery based on digital technology to improve access, affordability, and sustainability of medical treatment is known as digital health care. The challenging characteristics of such sports injury rehabilitation for digital health care are playing position, game strategies, and cybersecurity. Hence, in this research, <i>health care IoT-enabled body area networks (HIoT-BAN)</i> have been designed to improve sports injury rehabilitation detection for digital health care. The health care sector may benefit significantly from IoT adoption since it allows for enhanced patient safety; health care investment management includes controlling the hospital's pharmaceutical stock and monitoring the heat and humidity levels. Digital health describes a group of programmers made to aid health care delivery, whether by assisting with clinical decision-making or streamlining back-end operations in health care institutions. A <i>HIoT-BAN</i> effectively predicts the rise in sports injury rehabilitation detection with faster digital health care based on IoT. The research concludes that the <i>HIoT-BAN</i> effectively indicates sports injury rehabilitation detection for digital health care. The experimental analysis of <i>HIoT-BAN</i> outperforms the IoT method in terms of performance, accuracy, prediction ratio, and mean square error rate.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"144-160"},"PeriodicalIF":2.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2023-10-30DOI: 10.1089/big.2022.0307
Fatemeh Ehsani, Monireh Hosseini
Consumer segmentation is an electronic marketing practice that involves dividing consumers into groups with similar features to discover their preferences. In the business-to-customer (B2C) retailing industry, marketers explore big data to segment consumers based on various dimensions. However, among these dimensions, the motives of location and time of shopping have received relatively less attention. In this study, we use the recency, frequency, monetary, and tenure (RFMT) method to segment consumers into 10 groups based on their time and geographical features. To explore location, we investigate market distribution, revenue distribution, and consumer distribution. Geographical coordinates and peculiarities are estimated based on consumer density. Regarding time exploration, we evaluate the accuracy of product delivery and the timing of promotions. To pinpoint the target consumers, we display the main hotspots on the distribution heatmap. Furthermore, we identify the optimal time for purchase and the most densely populated locations of beneficial consumers. In addition, we evaluate product distribution to determine the most popular product categories. Based on the RFMT segmentation and product popularity, we have developed a product recommender system to assist marketers in attracting and engaging potential consumers. Through a case study using data from massive B2C retailing, we conclude that the proposed segmentation provides superior insights into consumer behavior and improves product recommendation performance.
{"title":"Consumer Segmentation Based on Location and Timing Dimensions Using Big Data from Business-to-Customer Retailing Marketplaces.","authors":"Fatemeh Ehsani, Monireh Hosseini","doi":"10.1089/big.2022.0307","DOIUrl":"10.1089/big.2022.0307","url":null,"abstract":"<p><p>Consumer segmentation is an electronic marketing practice that involves dividing consumers into groups with similar features to discover their preferences. In the business-to-customer (B2C) retailing industry, marketers explore big data to segment consumers based on various dimensions. However, among these dimensions, the motives of location and time of shopping have received relatively less attention. In this study, we use the recency, frequency, monetary, and tenure (RFMT) method to segment consumers into 10 groups based on their time and geographical features. To explore location, we investigate market distribution, revenue distribution, and consumer distribution. Geographical coordinates and peculiarities are estimated based on consumer density. Regarding time exploration, we evaluate the accuracy of product delivery and the timing of promotions. To pinpoint the target consumers, we display the main hotspots on the distribution heatmap. Furthermore, we identify the optimal time for purchase and the most densely populated locations of beneficial consumers. In addition, we evaluate product distribution to determine the most popular product categories. Based on the RFMT segmentation and product popularity, we have developed a product recommender system to assist marketers in attracting and engaging potential consumers. Through a case study using data from massive B2C retailing, we conclude that the proposed segmentation provides superior insights into consumer behavior and improves product recommendation performance.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"111-126"},"PeriodicalIF":2.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2023-04-24DOI: 10.1089/big.2022.0269
Gergely Kocsis, Imre Varga
Mass transportation networks of cities or regions are interesting and important to be studied to get a picture of the properties of a somehow better topology and system of transportation. One way to do this lies on the basis of spatial information of stations and routes. As we show however interesting findings can be gained also if one studies the abstract network topologies of these systems. To get these abstract types of networks, we have developed a tool that can extract a network of connected stops from General Transit Feed Specification feeds. As we found during the development, service providers do not follow the specification in coherent ways, so as a kind of postprocessing we have introduced virtual stations to the abstract networks that gather close stops together. We analyze the effect of these new stations on the abstract map as well.
{"title":"gtfs2net: Extraction of General Transit Feed Specification Data Sets to Abstract Networks and Their Analysis.","authors":"Gergely Kocsis, Imre Varga","doi":"10.1089/big.2022.0269","DOIUrl":"10.1089/big.2022.0269","url":null,"abstract":"<p><p>Mass transportation networks of cities or regions are interesting and important to be studied to get a picture of the properties of a somehow better topology and system of transportation. One way to do this lies on the basis of spatial information of stations and routes. As we show however interesting findings can be gained also if one studies the abstract network topologies of these systems. To get these abstract types of networks, we have developed a tool that can extract a network of connected stops from General Transit Feed Specification feeds. As we found during the development, service providers do not follow the specification in coherent ways, so as a kind of postprocessing we have introduced virtual stations to the abstract networks that gather close stops together. We analyze the effect of these new stations on the abstract map as well.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"30-41"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9446347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haitao Xie, Chengkai Li, Zhiwei Ye, Tao Zhao, Hui Xu, Jiangyi Du, Wanfang Bai
Cloud resource scheduling is one of the most significant tasks in the field of big data, which is a combinatorial optimization problem in essence. Scheduling strategies based on meta-heuristic algorithms (MAs) are often chosen to deal with this topic. However, MAs are prone to falling into local optima leading to decreasing quality of the allocation scheme. Algorithms with good global search ability are needed to map available cloud resources to the requirements of the task. Honey Badger Algorithm (HBA) is a newly proposed algorithm with strong search ability. In order to further improve scheduling performance, an Improved Honey Badger Algorithm (IHBA), which combines two local search strategies and a new fitness function, is proposed in this article. IHBA is compared with 6 MAs in four scale load tasks. The comparative simulation results obtained reveal that the proposed algorithm performs better than other algorithms involved in the article. IHBA enhances the diversity of algorithm populations, expands the individual's random search range, and prevents the algorithm from falling into local optima while effectively achieving resource load balancing.
{"title":"Cloud Resource Scheduling Using Multi-Strategy Fused Honey Badger Algorithm.","authors":"Haitao Xie, Chengkai Li, Zhiwei Ye, Tao Zhao, Hui Xu, Jiangyi Du, Wanfang Bai","doi":"10.1089/big.2023.0146","DOIUrl":"10.1089/big.2023.0146","url":null,"abstract":"<p><p>Cloud resource scheduling is one of the most significant tasks in the field of big data, which is a combinatorial optimization problem in essence. Scheduling strategies based on meta-heuristic algorithms (MAs) are often chosen to deal with this topic. However, MAs are prone to falling into local optima leading to decreasing quality of the allocation scheme. Algorithms with good global search ability are needed to map available cloud resources to the requirements of the task. Honey Badger Algorithm (HBA) is a newly proposed algorithm with strong search ability. In order to further improve scheduling performance, an Improved Honey Badger Algorithm (IHBA), which combines two local search strategies and a new fitness function, is proposed in this article. IHBA is compared with 6 MAs in four scale load tasks. The comparative simulation results obtained reveal that the proposed algorithm performs better than other algorithms involved in the article. IHBA enhances the diversity of algorithm populations, expands the individual's random search range, and prevents the algorithm from falling into local optima while effectively achieving resource load balancing.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"13 1","pages":"59-72"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143450642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2023-04-19DOI: 10.1089/big.2022.0260
Zhengyang Hu, Weiwei Lin, Xiaoying Ye, Haojun Xu, Haocheng Zhong, Huikang Huang, Xinyang Wang
Recommender system (RS) plays an important role in Big Data research. Its main idea is to handle huge amounts of data to accurately recommend items to users. The recommendation method is the core research content of the whole RS. However, the existing recommendation methods still have the following two shortcomings: (1) Most recommendation methods use only one kind of information about the user's interaction with items (such as Browse or Purchase), which makes it difficult to model complete user preference. (2) Most mainstream recommendation methods only consider the final consistency of recommendation (e.g., user preferences) but ignore the process consistency (e.g., user behavior), which leads to the biased final result. In this article, we propose a recommendation method based on the Entity Interaction Knowledge Graph (EIKG), which draws on the idea of collaborative filtering and innovatively uses the similarity of user behaviors to recommend items. The method first extracts fact triples containing interaction relations from relevant data sets to generate the EIKG; then embeds the entities and relations in the EIKG; finally, uses link prediction techniques to recommend items for users. The proposed method is compared with other recommendation methods on two publicly available data sets, Scholat and Lizhi, and the experimental result shows that it exceeds the state of the art in most metrics, verifying the effectiveness of the proposed method.
{"title":"Generic User Behavior: A User Behavior Similarity-Based Recommendation Method.","authors":"Zhengyang Hu, Weiwei Lin, Xiaoying Ye, Haojun Xu, Haocheng Zhong, Huikang Huang, Xinyang Wang","doi":"10.1089/big.2022.0260","DOIUrl":"10.1089/big.2022.0260","url":null,"abstract":"<p><p>Recommender system (RS) plays an important role in Big Data research. Its main idea is to handle huge amounts of data to accurately recommend items to users. The recommendation method is the core research content of the whole RS. However, the existing recommendation methods still have the following two shortcomings: (1) Most recommendation methods use only one kind of information about the user's interaction with items (such as Browse or Purchase), which makes it difficult to model complete user preference. (2) Most mainstream recommendation methods only consider the final consistency of recommendation (e.g., user preferences) but ignore the process consistency (e.g., user behavior), which leads to the biased final result. In this article, we propose a recommendation method based on the Entity Interaction Knowledge Graph (EIKG), which draws on the idea of collaborative filtering and innovatively uses the similarity of user behaviors to recommend items. The method first extracts fact triples containing interaction relations from relevant data sets to generate the EIKG; then embeds the entities and relations in the EIKG; finally, uses link prediction techniques to recommend items for users. The proposed method is compared with other recommendation methods on two publicly available data sets, Scholat and Lizhi, and the experimental result shows that it exceeds the state of the art in most metrics, verifying the effectiveness of the proposed method.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"3-15"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9477294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}