Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, "the ugly duckling" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.
{"title":"Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based System.","authors":"Georgios Vranopoulos, Nathan Clarke, Shirley Atkinson","doi":"10.1089/big.2022.0201","DOIUrl":"10.1089/big.2022.0201","url":null,"abstract":"<p><p>Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, \"the ugly duckling\" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"90-110"},"PeriodicalIF":2.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2024-01-09DOI: 10.1089/big.2023.0019
Laila Faridoon, Wei Liu, Crawford Spence
The government sector has started adopting big data analytics capability (BDAC) to enhance its service delivery. This study examines the relationship between BDAC and decision-making capability (DMC) in the government sector. It investigates the mediation role of the cognitive style of decision makers and organizational culture in the relationship between BDAC and DMC utilizing the resource-based view of the firm theory. It further investigates the impact of BDAC on organizational performance (OP). This study attempts to extend existing research through significant findings and recommendations to enhance decision-making processes for a successful utilization of BDAC in the government sector. A survey method was adopted to collect data from government organizations in the United Arab Emirates, and partial least-squares structural equation modeling was deployed to analyze the collected data. The results empirically validate the proposed theoretical framework and confirm that BDAC positively impacts DMC via cognitive style and organizational culture, and in turn further positively impacting OP overall.
{"title":"The Impact of Big Data Analytics on Decision-Making Within the Government Sector.","authors":"Laila Faridoon, Wei Liu, Crawford Spence","doi":"10.1089/big.2023.0019","DOIUrl":"10.1089/big.2023.0019","url":null,"abstract":"<p><p>The government sector has started adopting big data analytics capability (BDAC) to enhance its service delivery. This study examines the relationship between BDAC and decision-making capability (DMC) in the government sector. It investigates the mediation role of the cognitive style of decision makers and organizational culture in the relationship between BDAC and DMC utilizing the resource-based view of the firm theory. It further investigates the impact of BDAC on organizational performance (OP). This study attempts to extend existing research through significant findings and recommendations to enhance decision-making processes for a successful utilization of BDAC in the government sector. A survey method was adopted to collect data from government organizations in the United Arab Emirates, and partial least-squares structural equation modeling was deployed to analyze the collected data. The results empirically validate the proposed theoretical framework and confirm that BDAC positively impacts DMC via cognitive style and organizational culture, and in turn further positively impacting OP overall.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"73-89"},"PeriodicalIF":2.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139405170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2024-12-17DOI: 10.1089/big.2023.0134
Zhiyong Wu, Zhida Huang, Nianhua Tang, Kai Wang, Chuanjie Bian, Dandan Li, Vumika Kuraki, Felix Schmid
Physical therapists specializing in sports rehabilitation detection help injured athletes recover from their wounds and avoid further harm. Sports rehabilitators treat not just commonplace sports injuries but also work-related musculoskeletal injuries, discomfort, and disorders. Sensor-equipped Internet of Things (IoT) monitors the real-time location of medical equipment such as scooters, cardioverters, nebulizer treatments, oxygenation pumps, or other monitor gear. Analysis of medicine deployment across sites is possible in real time. Health care delivery based on digital technology to improve access, affordability, and sustainability of medical treatment is known as digital health care. The challenging characteristics of such sports injury rehabilitation for digital health care are playing position, game strategies, and cybersecurity. Hence, in this research, health care IoT-enabled body area networks (HIoT-BAN) have been designed to improve sports injury rehabilitation detection for digital health care. The health care sector may benefit significantly from IoT adoption since it allows for enhanced patient safety; health care investment management includes controlling the hospital's pharmaceutical stock and monitoring the heat and humidity levels. Digital health describes a group of programmers made to aid health care delivery, whether by assisting with clinical decision-making or streamlining back-end operations in health care institutions. A HIoT-BAN effectively predicts the rise in sports injury rehabilitation detection with faster digital health care based on IoT. The research concludes that the HIoT-BAN effectively indicates sports injury rehabilitation detection for digital health care. The experimental analysis of HIoT-BAN outperforms the IoT method in terms of performance, accuracy, prediction ratio, and mean square error rate.
{"title":"Research on Sports Injury Rehabilitation Detection Based on IoT Models for Digital Health Care.","authors":"Zhiyong Wu, Zhida Huang, Nianhua Tang, Kai Wang, Chuanjie Bian, Dandan Li, Vumika Kuraki, Felix Schmid","doi":"10.1089/big.2023.0134","DOIUrl":"10.1089/big.2023.0134","url":null,"abstract":"<p><p>Physical therapists specializing in sports rehabilitation detection help injured athletes recover from their wounds and avoid further harm. Sports rehabilitators treat not just commonplace sports injuries but also work-related musculoskeletal injuries, discomfort, and disorders. Sensor-equipped Internet of Things (IoT) monitors the real-time location of medical equipment such as scooters, cardioverters, nebulizer treatments, oxygenation pumps, or other monitor gear. Analysis of medicine deployment across sites is possible in real time. Health care delivery based on digital technology to improve access, affordability, and sustainability of medical treatment is known as digital health care. The challenging characteristics of such sports injury rehabilitation for digital health care are playing position, game strategies, and cybersecurity. Hence, in this research, <i>health care IoT-enabled body area networks (HIoT-BAN)</i> have been designed to improve sports injury rehabilitation detection for digital health care. The health care sector may benefit significantly from IoT adoption since it allows for enhanced patient safety; health care investment management includes controlling the hospital's pharmaceutical stock and monitoring the heat and humidity levels. Digital health describes a group of programmers made to aid health care delivery, whether by assisting with clinical decision-making or streamlining back-end operations in health care institutions. A <i>HIoT-BAN</i> effectively predicts the rise in sports injury rehabilitation detection with faster digital health care based on IoT. The research concludes that the <i>HIoT-BAN</i> effectively indicates sports injury rehabilitation detection for digital health care. The experimental analysis of <i>HIoT-BAN</i> outperforms the IoT method in terms of performance, accuracy, prediction ratio, and mean square error rate.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"144-160"},"PeriodicalIF":2.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2023-10-30DOI: 10.1089/big.2022.0307
Fatemeh Ehsani, Monireh Hosseini
Consumer segmentation is an electronic marketing practice that involves dividing consumers into groups with similar features to discover their preferences. In the business-to-customer (B2C) retailing industry, marketers explore big data to segment consumers based on various dimensions. However, among these dimensions, the motives of location and time of shopping have received relatively less attention. In this study, we use the recency, frequency, monetary, and tenure (RFMT) method to segment consumers into 10 groups based on their time and geographical features. To explore location, we investigate market distribution, revenue distribution, and consumer distribution. Geographical coordinates and peculiarities are estimated based on consumer density. Regarding time exploration, we evaluate the accuracy of product delivery and the timing of promotions. To pinpoint the target consumers, we display the main hotspots on the distribution heatmap. Furthermore, we identify the optimal time for purchase and the most densely populated locations of beneficial consumers. In addition, we evaluate product distribution to determine the most popular product categories. Based on the RFMT segmentation and product popularity, we have developed a product recommender system to assist marketers in attracting and engaging potential consumers. Through a case study using data from massive B2C retailing, we conclude that the proposed segmentation provides superior insights into consumer behavior and improves product recommendation performance.
{"title":"Consumer Segmentation Based on Location and Timing Dimensions Using Big Data from Business-to-Customer Retailing Marketplaces.","authors":"Fatemeh Ehsani, Monireh Hosseini","doi":"10.1089/big.2022.0307","DOIUrl":"10.1089/big.2022.0307","url":null,"abstract":"<p><p>Consumer segmentation is an electronic marketing practice that involves dividing consumers into groups with similar features to discover their preferences. In the business-to-customer (B2C) retailing industry, marketers explore big data to segment consumers based on various dimensions. However, among these dimensions, the motives of location and time of shopping have received relatively less attention. In this study, we use the recency, frequency, monetary, and tenure (RFMT) method to segment consumers into 10 groups based on their time and geographical features. To explore location, we investigate market distribution, revenue distribution, and consumer distribution. Geographical coordinates and peculiarities are estimated based on consumer density. Regarding time exploration, we evaluate the accuracy of product delivery and the timing of promotions. To pinpoint the target consumers, we display the main hotspots on the distribution heatmap. Furthermore, we identify the optimal time for purchase and the most densely populated locations of beneficial consumers. In addition, we evaluate product distribution to determine the most popular product categories. Based on the RFMT segmentation and product popularity, we have developed a product recommender system to assist marketers in attracting and engaging potential consumers. Through a case study using data from massive B2C retailing, we conclude that the proposed segmentation provides superior insights into consumer behavior and improves product recommendation performance.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"111-126"},"PeriodicalIF":2.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ikpe Justice Akpan, Rouzbeh Razavi, Asuama A Akpan
Decision sciences (DSC) involves studying complex dynamic systems and processes to aid informed choices subject to constraints in uncertain conditions. It integrates multidisciplinary methods and strategies to evaluate decision engineering processes, identifying alternatives and providing insights toward enhancing prudent decision-making. This study analyzes the evolutionary trends and innovation in DSC education and research trends over the past 25 years. Using metadata from bibliographic records and employing the science mapping method and text analytics, we map and evaluate the thematic, intellectual, and social structures of DSC research. The results identify "knowledge management," "decision support systems," "data envelopment analysis," "simulation," and "artificial intelligence" (AI) as some of the prominent critical skills and knowledge requirements for problem-solving in DSC before and during the period (2000-2024). However, these technologies are evolving significantly in the recent wave of digital transformation, with data analytics frameworks (including techniques such as big data analytics, machine learning, business intelligence, data mining, and information visualization) becoming crucial. DSC education and research continue to mirror the development in practice, with sustainable education through virtual/online learning becoming prominent. Innovative pedagogical approaches/strategies also include computer simulation and games ("play and learn" or "role-playing"). The current era witnesses AI adoption in different forms as conversational Chatbot agent and generative AI (GenAI), such as chat generative pretrained transformer in teaching, learning, and scholarly activities amidst challenges (academic integrity, plagiarism, intellectual property violations, and other ethical and legal issues). Future DSC education must innovatively integrate GenAI into DSC education and address the resulting challenges.
{"title":"Evolutionary Trends in Decision Sciences Education Research from Simulation and Games to Big Data Analytics and Generative Artificial Intelligence.","authors":"Ikpe Justice Akpan, Rouzbeh Razavi, Asuama A Akpan","doi":"10.1089/big.2024.0128","DOIUrl":"https://doi.org/10.1089/big.2024.0128","url":null,"abstract":"<p><p>Decision sciences (DSC) involves studying complex dynamic systems and processes to aid informed choices subject to constraints in uncertain conditions. It integrates multidisciplinary methods and strategies to evaluate decision engineering processes, identifying alternatives and providing insights toward enhancing prudent decision-making. This study analyzes the evolutionary trends and innovation in DSC education and research trends over the past 25 years. Using metadata from bibliographic records and employing the science mapping method and text analytics, we map and evaluate the thematic, intellectual, and social structures of DSC research. The results identify \"knowledge management,\" \"decision support systems,\" \"data envelopment analysis,\" \"simulation,\" and \"artificial intelligence\" (AI) as some of the prominent critical skills and knowledge requirements for problem-solving in DSC before and during the period (2000-2024). However, these technologies are evolving significantly in the recent wave of digital transformation, with data analytics frameworks (including techniques such as big data analytics, machine learning, business intelligence, data mining, and information visualization) becoming crucial. DSC education and research continue to mirror the development in practice, with sustainable education through virtual/online learning becoming prominent. Innovative pedagogical approaches/strategies also include computer simulation and games (\"play and learn\" or \"role-playing\"). The current era witnesses AI adoption in different forms as conversational Chatbot agent and generative AI (GenAI), such as chat generative pretrained transformer in teaching, learning, and scholarly activities amidst challenges (academic integrity, plagiarism, intellectual property violations, and other ethical and legal issues). Future DSC education must innovatively integrate GenAI into DSC education and address the resulting challenges.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143527974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2023-04-24DOI: 10.1089/big.2022.0269
Gergely Kocsis, Imre Varga
Mass transportation networks of cities or regions are interesting and important to be studied to get a picture of the properties of a somehow better topology and system of transportation. One way to do this lies on the basis of spatial information of stations and routes. As we show however interesting findings can be gained also if one studies the abstract network topologies of these systems. To get these abstract types of networks, we have developed a tool that can extract a network of connected stops from General Transit Feed Specification feeds. As we found during the development, service providers do not follow the specification in coherent ways, so as a kind of postprocessing we have introduced virtual stations to the abstract networks that gather close stops together. We analyze the effect of these new stations on the abstract map as well.
{"title":"gtfs2net: Extraction of General Transit Feed Specification Data Sets to Abstract Networks and Their Analysis.","authors":"Gergely Kocsis, Imre Varga","doi":"10.1089/big.2022.0269","DOIUrl":"10.1089/big.2022.0269","url":null,"abstract":"<p><p>Mass transportation networks of cities or regions are interesting and important to be studied to get a picture of the properties of a somehow better topology and system of transportation. One way to do this lies on the basis of spatial information of stations and routes. As we show however interesting findings can be gained also if one studies the abstract network topologies of these systems. To get these abstract types of networks, we have developed a tool that can extract a network of connected stops from General Transit Feed Specification feeds. As we found during the development, service providers do not follow the specification in coherent ways, so as a kind of postprocessing we have introduced virtual stations to the abstract networks that gather close stops together. We analyze the effect of these new stations on the abstract map as well.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"30-41"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9446347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haitao Xie, Chengkai Li, Zhiwei Ye, Tao Zhao, Hui Xu, Jiangyi Du, Wanfang Bai
Cloud resource scheduling is one of the most significant tasks in the field of big data, which is a combinatorial optimization problem in essence. Scheduling strategies based on meta-heuristic algorithms (MAs) are often chosen to deal with this topic. However, MAs are prone to falling into local optima leading to decreasing quality of the allocation scheme. Algorithms with good global search ability are needed to map available cloud resources to the requirements of the task. Honey Badger Algorithm (HBA) is a newly proposed algorithm with strong search ability. In order to further improve scheduling performance, an Improved Honey Badger Algorithm (IHBA), which combines two local search strategies and a new fitness function, is proposed in this article. IHBA is compared with 6 MAs in four scale load tasks. The comparative simulation results obtained reveal that the proposed algorithm performs better than other algorithms involved in the article. IHBA enhances the diversity of algorithm populations, expands the individual's random search range, and prevents the algorithm from falling into local optima while effectively achieving resource load balancing.
{"title":"Cloud Resource Scheduling Using Multi-Strategy Fused Honey Badger Algorithm.","authors":"Haitao Xie, Chengkai Li, Zhiwei Ye, Tao Zhao, Hui Xu, Jiangyi Du, Wanfang Bai","doi":"10.1089/big.2023.0146","DOIUrl":"https://doi.org/10.1089/big.2023.0146","url":null,"abstract":"<p><p>Cloud resource scheduling is one of the most significant tasks in the field of big data, which is a combinatorial optimization problem in essence. Scheduling strategies based on meta-heuristic algorithms (MAs) are often chosen to deal with this topic. However, MAs are prone to falling into local optima leading to decreasing quality of the allocation scheme. Algorithms with good global search ability are needed to map available cloud resources to the requirements of the task. Honey Badger Algorithm (HBA) is a newly proposed algorithm with strong search ability. In order to further improve scheduling performance, an Improved Honey Badger Algorithm (IHBA), which combines two local search strategies and a new fitness function, is proposed in this article. IHBA is compared with 6 MAs in four scale load tasks. The comparative simulation results obtained reveal that the proposed algorithm performs better than other algorithms involved in the article. IHBA enhances the diversity of algorithm populations, expands the individual's random search range, and prevents the algorithm from falling into local optima while effectively achieving resource load balancing.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"13 1","pages":"59-72"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143450642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2023-04-19DOI: 10.1089/big.2022.0260
Zhengyang Hu, Weiwei Lin, Xiaoying Ye, Haojun Xu, Haocheng Zhong, Huikang Huang, Xinyang Wang
Recommender system (RS) plays an important role in Big Data research. Its main idea is to handle huge amounts of data to accurately recommend items to users. The recommendation method is the core research content of the whole RS. However, the existing recommendation methods still have the following two shortcomings: (1) Most recommendation methods use only one kind of information about the user's interaction with items (such as Browse or Purchase), which makes it difficult to model complete user preference. (2) Most mainstream recommendation methods only consider the final consistency of recommendation (e.g., user preferences) but ignore the process consistency (e.g., user behavior), which leads to the biased final result. In this article, we propose a recommendation method based on the Entity Interaction Knowledge Graph (EIKG), which draws on the idea of collaborative filtering and innovatively uses the similarity of user behaviors to recommend items. The method first extracts fact triples containing interaction relations from relevant data sets to generate the EIKG; then embeds the entities and relations in the EIKG; finally, uses link prediction techniques to recommend items for users. The proposed method is compared with other recommendation methods on two publicly available data sets, Scholat and Lizhi, and the experimental result shows that it exceeds the state of the art in most metrics, verifying the effectiveness of the proposed method.
{"title":"Generic User Behavior: A User Behavior Similarity-Based Recommendation Method.","authors":"Zhengyang Hu, Weiwei Lin, Xiaoying Ye, Haojun Xu, Haocheng Zhong, Huikang Huang, Xinyang Wang","doi":"10.1089/big.2022.0260","DOIUrl":"10.1089/big.2022.0260","url":null,"abstract":"<p><p>Recommender system (RS) plays an important role in Big Data research. Its main idea is to handle huge amounts of data to accurately recommend items to users. The recommendation method is the core research content of the whole RS. However, the existing recommendation methods still have the following two shortcomings: (1) Most recommendation methods use only one kind of information about the user's interaction with items (such as Browse or Purchase), which makes it difficult to model complete user preference. (2) Most mainstream recommendation methods only consider the final consistency of recommendation (e.g., user preferences) but ignore the process consistency (e.g., user behavior), which leads to the biased final result. In this article, we propose a recommendation method based on the Entity Interaction Knowledge Graph (EIKG), which draws on the idea of collaborative filtering and innovatively uses the similarity of user behaviors to recommend items. The method first extracts fact triples containing interaction relations from relevant data sets to generate the EIKG; then embeds the entities and relations in the EIKG; finally, uses link prediction techniques to recommend items for users. The proposed method is compared with other recommendation methods on two publicly available data sets, Scholat and Lizhi, and the experimental result shows that it exceeds the state of the art in most metrics, verifying the effectiveness of the proposed method.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"3-15"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9477294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2023-04-17DOI: 10.1089/big.2022.0261
Shadi A Aljawarneh, Romesaa Al-Quraan
Pneumonia, caused by microorganisms, is a severely contagious disease that damages one or both the lungs of the patients. Early detection and treatment are typically favored to recover infected patients since untreated pneumonia can lead to major complications in the elderly (>65 years) and children (<5 years). The objectives of this work are to develop several models to evaluate big X-ray images (XRIs) of the chest, to determine whether the images show/do not show signs of pneumonia, and to compare the models based on their accuracy, precision, recall, loss, and receiver operating characteristic area under the ROC curve scores. Enhanced convolutional neural network (CNN), VGG-19, ResNet-50, and ResNet-50 with fine-tuning are some of the deep learning (DL) algorithms employed in this study. By training the transfer learning model and enhanced CNN model using a big data set, these techniques are used to identify pneumonia. The data set for the study was obtained from Kaggle. It should be noted that the data set has been expanded to include further records. This data set included 5863 chest XRIs, which were categorized into 3 different folders (i.e., train, val, test). These data are produced every day from personnel records and Internet of Medical Things devices. According to the experimental findings, the ResNet-50 model showed the lowest accuracy, that is, 82.8%, while the enhanced CNN model showed the highest accuracy of 92.4%. Owing to its high accuracy, enhanced CNN was regarded as the best model in this study. The techniques developed in this study outperformed the popular ensemble techniques, and the models showed better results than those generated by cutting-edge methods. Our study implication is that a DL models can detect the progression of pneumonia, which improves the general diagnostic accuracy and gives patients new hope for speedy treatment. Since enhanced CNN and ResNet-50 showed the highest accuracy compared with other algorithms, it was concluded that these techniques could be effectively used to identify pneumonia after performing fine-tuning.
{"title":"Pneumonia Detection Using Enhanced Convolutional Neural Network Model on Chest X-Ray Images.","authors":"Shadi A Aljawarneh, Romesaa Al-Quraan","doi":"10.1089/big.2022.0261","DOIUrl":"10.1089/big.2022.0261","url":null,"abstract":"<p><p>Pneumonia, caused by microorganisms, is a severely contagious disease that damages one or both the lungs of the patients. Early detection and treatment are typically favored to recover infected patients since untreated pneumonia can lead to major complications in the elderly (>65 years) and children (<5 years). The objectives of this work are to develop several models to evaluate big X-ray images (XRIs) of the chest, to determine whether the images show/do not show signs of pneumonia, and to compare the models based on their accuracy, precision, recall, loss, and receiver operating characteristic area under the ROC curve scores. Enhanced convolutional neural network (CNN), VGG-19, ResNet-50, and ResNet-50 with fine-tuning are some of the deep learning (DL) algorithms employed in this study. By training the transfer learning model and enhanced CNN model using a big data set, these techniques are used to identify pneumonia. The data set for the study was obtained from Kaggle. It should be noted that the data set has been expanded to include further records. This data set included 5863 chest XRIs, which were categorized into 3 different folders (i.e., train, val, test). These data are produced every day from personnel records and Internet of Medical Things devices. According to the experimental findings, the ResNet-50 model showed the lowest accuracy, that is, 82.8%, while the enhanced CNN model showed the highest accuracy of 92.4%. Owing to its high accuracy, enhanced CNN was regarded as the best model in this study. The techniques developed in this study outperformed the popular ensemble techniques, and the models showed better results than those generated by cutting-edge methods. Our study implication is that a DL models can detect the progression of pneumonia, which improves the general diagnostic accuracy and gives patients new hope for speedy treatment. Since enhanced CNN and ResNet-50 showed the highest accuracy compared with other algorithms, it was concluded that these techniques could be effectively used to identify pneumonia after performing fine-tuning.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"16-29"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9737399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2023-06-16DOI: 10.1089/big.2022.0299
Yi Gao, Dawei Yan, Xiangyu Kong, Ning Liu, Zhiyu Zou, Bixuan Gao, Yang Wang, Yue Chen, Shuai Luo
"Industry 4.0" aims to build a highly versatile, individualized digital production model for goods and services. The carbon emission (CE) issue needs to be addressed by changing from centralized control to decentralized and enhanced control. Based on a solid CE monitoring, reporting, and verification system, it is necessary to study future power system CE dynamics simulation technology. In this article, a data-driven approach is proposed to analyzing the trajectory of urban electricity CEs based on empirical mode decomposition, which suggests combining macro-energy thinking and big data thinking by removing the barriers among power systems and related technological, economic, and environmental domains. Based on multisource heterogeneous mass data acquisition, effective secondary data can be extracted through the integration of statistical analysis, causal analysis, and behavior analysis, which can help construct a simulation environment supporting the dynamic interaction among mathematical models, multi-agents, and human participants.
{"title":"A Data-Driven Analysis Method for the Trajectory of Power Carbon Emission in the Urban Area.","authors":"Yi Gao, Dawei Yan, Xiangyu Kong, Ning Liu, Zhiyu Zou, Bixuan Gao, Yang Wang, Yue Chen, Shuai Luo","doi":"10.1089/big.2022.0299","DOIUrl":"10.1089/big.2022.0299","url":null,"abstract":"<p><p>\"Industry 4.0\" aims to build a highly versatile, individualized digital production model for goods and services. The carbon emission (CE) issue needs to be addressed by changing from centralized control to decentralized and enhanced control. Based on a solid CE monitoring, reporting, and verification system, it is necessary to study future power system CE dynamics simulation technology. In this article, a data-driven approach is proposed to analyzing the trajectory of urban electricity CEs based on empirical mode decomposition, which suggests combining macro-energy thinking and big data thinking by removing the barriers among power systems and related technological, economic, and environmental domains. Based on multisource heterogeneous mass data acquisition, effective secondary data can be extracted through the integration of statistical analysis, causal analysis, and behavior analysis, which can help construct a simulation environment supporting the dynamic interaction among mathematical models, multi-agents, and human participants.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"42-58"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9634989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}