Pub Date : 2024-09-11DOI: 10.1186/s40537-024-00960-3
Xisong Liang, Jie Wen, Chunrun Qu, Nan Zhang, Ziyu Dai, Hao Zhang, Peng Luo, Ming Meng, Zhixiong Liu, Fan Fan, Quan Cheng
Psychiatric disorders are severe health challenges that exert a heavy public burden. Air pollution has been widely reported as related to psychiatric disorder risk, but their casual association and pathological mechanism remained unclear. Herein, we systematically investigated the large genome-wide association studies (6 cohorts with 1,357,645 samples), single-cell RNA (26 samples with 157,488 cells), and bulk-RNAseq (1595 samples) datasets to reveal the genetic causality and biological link between four air pollutants and nine psychiatric disorders. As a result, we identified ten positive genetic correlations between air pollution and psychiatric disorders. Besides, PM2.5 and NO2 presented significant causal effects on schizophrenia risk which was robust with adjustment of potential confounders. Besides, transcriptome-wide association studies identified the shared genes between PM2.5/NO2 and schizophrenia. We then discovered a schizophrenia-derived inhibitory neuron subtype with highly expressed shared genes and abnormal synaptic and metabolic pathways by scRNA analyses and confirmed their abnormal level and correlations with the shared genes in schizophrenia patients in a large RNA-seq cohort. Comprehensively, we discovered robust genetic causality between PM2.5, NO2, and schizophrenia and identified an abnormal inhibitory neuron subtype that links schizophrenia pathology and PM2.5/NO2 exposure. These discoveries highlight the schizophrenia risk under air pollutants exposure and provide novel mechanical insights into schizophrenia pathology, contributing to pollutant-related schizophrenia risk control and therapeutic strategies development.
{"title":"Inhibitory neuron links the causal relationship from air pollution to psychiatric disorders: a large multi-omics analysis","authors":"Xisong Liang, Jie Wen, Chunrun Qu, Nan Zhang, Ziyu Dai, Hao Zhang, Peng Luo, Ming Meng, Zhixiong Liu, Fan Fan, Quan Cheng","doi":"10.1186/s40537-024-00960-3","DOIUrl":"https://doi.org/10.1186/s40537-024-00960-3","url":null,"abstract":"<p>Psychiatric disorders are severe health challenges that exert a heavy public burden. Air pollution has been widely reported as related to psychiatric disorder risk, but their casual association and pathological mechanism remained unclear. Herein, we systematically investigated the large genome-wide association studies (6 cohorts with 1,357,645 samples), single-cell RNA (26 samples with 157,488 cells), and bulk-RNAseq (1595 samples) datasets to reveal the genetic causality and biological link between four air pollutants and nine psychiatric disorders. As a result, we identified ten positive genetic correlations between air pollution and psychiatric disorders. Besides, PM2.5 and NO<sub>2</sub> presented significant causal effects on schizophrenia risk which was robust with adjustment of potential confounders. Besides, transcriptome-wide association studies identified the shared genes between PM2.5/NO2 and schizophrenia. We then discovered a schizophrenia-derived inhibitory neuron subtype with highly expressed shared genes and abnormal synaptic and metabolic pathways by scRNA analyses and confirmed their abnormal level and correlations with the shared genes in schizophrenia patients in a large RNA-seq cohort. Comprehensively, we discovered robust genetic causality between PM2.5, NO<sub>2</sub>, and schizophrenia and identified an abnormal inhibitory neuron subtype that links schizophrenia pathology and PM2.5/NO2 exposure. These discoveries highlight the schizophrenia risk under air pollutants exposure and provide novel mechanical insights into schizophrenia pathology, contributing to pollutant-related schizophrenia risk control and therapeutic strategies development.</p><h3 data-test=\"abstract-sub-heading\">Graphical Abstract</h3>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"58 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-08DOI: 10.1186/s40537-024-00995-6
Chin-Tsu Chen, Asif Khan, Shih-Chih Chen
Data has evolved into one of the principal resources for contemporary businesses. Moreover, corporations have undergone digitalization; consequently, their supply chains generate substantial amounts of data. The theoretical framework of this investigation was built on novel concepts like big data analytics—artificial intelligence (BDA-AI) and supply chain ambidexterity’s (SCA) direct impacts on sustainable supply chain management (SSCM) and indirect impacts on sustainable innovation ambidexterity (SIA) and environmental performance (EP). This study selected employees of manufacturing industries as respondents for environmental performance, sustainable supply chain management, big data analytics, artificial intelligence, and supply chain ambidexterity. The results from this study show that BDA-AI and SCA significantly affect SSCM. SSCM has significant associations with SIA and EP. Finally, SIA has a significant impact on EP. According to the results indicating the indirect impacts, BDA-AI has significant indirect relationships with SIA and EP by having SSCM as the mediating variable. Furthermore, SCA has significant indirect associations with SIA and EP, with SSCM as the mediating variable. Additionally, both BDA-AI and SCA have significant indirect associations with EP, while SIA and SSCM are mediating variables. Finally, SSCM has an indirect association with EP while having SIA as a mediating variable. The findings of this paper provide several theoretical contributions to the research in sustainability and big data analytics artificial intelligence field. Furthermore, based on the suggested framework, this study offers a number of practical implications for decision-makers to improve significantly in the supply chain and BDA-AI. For instance, this paper provides significant insight for logistics and supply chain managers, supporting them in implementing BDA-AI solutions to help SSCM and enhance EP.
数据已发展成为当代企业的主要资源之一。此外,企业经历了数字化,因此其供应链产生了大量数据。本研究的理论框架建立在大数据分析-人工智能(BDA-AI)、供应链灵活性(SCA)对可持续供应链管理(SSCM)的直接影响以及对可持续创新灵活性(SIA)和环境绩效(EP)的间接影响等新概念之上。本研究选取了制造业员工作为环境绩效、可持续供应链管理、大数据分析、人工智能和供应链灵活性的调查对象。研究结果表明,BDA-AI 和 SCA 对 SSCM 有显著影响。SSCM 与 SIA 和 EP 有重大关联。最后,SIA 对 EP 有重大影响。根据间接影响的结果,BDA-AI 与 SIA 和 EP 有明显的间接关系,SSCM 是中介变量。此外,以 SSCM 为中介变量,SCA 与 SIA 和 EP 有明显的间接关系。此外,BDA-AI 和 SCA 与 EP 有显著的间接关联,而 SIA 和 SSCM 是中介变量。最后,SSCM 与 EP 间接相关,而 SIA 是中介变量。本文的研究结果为可持续发展和大数据分析人工智能领域的研究提供了若干理论贡献。此外,基于所建议的框架,本研究还为决策者提供了一些实际意义,以显著改善供应链和 BDA-AI 的状况。例如,本文为物流和供应链管理者提供了重要启示,支持他们实施 BDA-AI 解决方案,以帮助 SSCM 和提升 EP。
{"title":"Modeling the impact of BDA-AI on sustainable innovation ambidexterity and environmental performance","authors":"Chin-Tsu Chen, Asif Khan, Shih-Chih Chen","doi":"10.1186/s40537-024-00995-6","DOIUrl":"https://doi.org/10.1186/s40537-024-00995-6","url":null,"abstract":"<p>Data has evolved into one of the principal resources for contemporary businesses. Moreover, corporations have undergone digitalization; consequently, their supply chains generate substantial amounts of data. The theoretical framework of this investigation was built on novel concepts like big data analytics—artificial intelligence (BDA-AI) and supply chain ambidexterity’s (SCA) direct impacts on sustainable supply chain management (SSCM) and indirect impacts on sustainable innovation ambidexterity (SIA) and environmental performance (EP). This study selected employees of manufacturing industries as respondents for environmental performance, sustainable supply chain management, big data analytics, artificial intelligence, and supply chain ambidexterity. The results from this study show that BDA-AI and SCA significantly affect SSCM. SSCM has significant associations with SIA and EP. Finally, SIA has a significant impact on EP. According to the results indicating the indirect impacts, BDA-AI has significant indirect relationships with SIA and EP by having SSCM as the mediating variable. Furthermore, SCA has significant indirect associations with SIA and EP, with SSCM as the mediating variable. Additionally, both BDA-AI and SCA have significant indirect associations with EP, while SIA and SSCM are mediating variables. Finally, SSCM has an indirect association with EP while having SIA as a mediating variable. The findings of this paper provide several theoretical contributions to the research in sustainability and big data analytics artificial intelligence field. Furthermore, based on the suggested framework, this study offers a number of practical implications for decision-makers to improve significantly in the supply chain and BDA-AI. For instance, this paper provides significant insight for logistics and supply chain managers, supporting them in implementing BDA-AI solutions to help SSCM and enhance EP.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"13 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-08DOI: 10.1186/s40537-024-00990-x
Qi Bin Kwong, Yee Thung Kon, Wan Rusydiah W. Rusik, Mohd Nor Azizi Shabudin, Shahirah Shazana A. Rahman, Harikrishna Kulaveerasingam, David Ross Appleton
In digital agriculture, accurate crop detection is fundamental to developing automated systems for efficient plantation management. For oil palm, the main challenge lies in developing robust models that perform well in different environmental conditions. This study addresses the feasibility of using GAN augmentation methods to improve palm detection models. For this purpose, drone images of young palms (< 5 year-old) from eight different estates were collected, annotated, and used to build a baseline detection model based on DETR. StyleGAN2 was trained on the extracted palms and then used to generate a series of synthetic palms, which were then inserted into tiles representing different environments. CycleGAN networks were trained for bidirectional translation between synthetic and real tiles, subsequently utilized to augment the authenticity of synthetic tiles. Both synthetic and real tiles were used to train the GAN-based detection model. The baseline model achieved precision and recall values of 95.8% and 97.2%. The GAN-based model achieved comparable result, with precision and recall values of 98.5% and 98.6%. In the challenge dataset 1 consisting older palms (> 5 year-old), both models also achieved similar accuracies, with baseline model achieving precision and recall of 93.1% and 99.4%, and GAN-based model achieving 95.7% and 99.4%. As for the challenge dataset 2 consisting of storm affected palms, the baseline model achieved precision of 100% but recall was only 13%. The GAN-based model achieved a significantly better result, with a precision and recall values of 98.7% and 95.3%. This result demonstrates that images generated by GANs have the potential to enhance the accuracies of palm detection models.
在数字农业领域,准确的作物检测是开发高效种植管理自动化系统的基础。对于油棕榈树来说,主要挑战在于开发在不同环境条件下表现良好的稳健模型。本研究探讨了使用 GAN 增强方法改进棕榈检测模型的可行性。为此,研究人员从八个不同的庄园收集了幼嫩棕榈树(5 岁)的无人机图像,并对其进行了注释,用于建立基于 DETR 的基准检测模型。对提取的棕榈树进行了 StyleGAN2 训练,然后用于生成一系列合成棕榈树,并将其插入代表不同环境的瓷砖中。对 CycleGAN 网络进行了训练,以实现合成和真实瓷砖之间的双向转换,随后用于增强合成瓷砖的真实性。合成瓷砖和真实瓷砖都用于训练基于 GAN 的检测模型。基线模型的精确度和召回率分别达到 95.8% 和 97.2%。基于 GAN 的模型取得了不相上下的结果,精确度和召回值分别为 98.5% 和 98.6%。在由年龄较大的手掌(5 岁)组成的挑战数据集 1 中,两个模型也取得了相似的准确度,基线模型的准确度和召回率分别为 93.1% 和 99.4%,基于 GAN 的模型的准确度和召回率分别为 95.7% 和 99.4%。至于由受风暴影响的手掌组成的挑战数据集 2,基线模型的精确度达到了 100%,但召回率仅为 13%。基于 GAN 的模型取得了明显更好的结果,精确率和召回率分别为 98.7% 和 95.3%。这一结果表明,由 GAN 生成的图像有可能提高棕榈检测模型的精确度。
{"title":"Enhancing oil palm segmentation model with GAN-based augmentation","authors":"Qi Bin Kwong, Yee Thung Kon, Wan Rusydiah W. Rusik, Mohd Nor Azizi Shabudin, Shahirah Shazana A. Rahman, Harikrishna Kulaveerasingam, David Ross Appleton","doi":"10.1186/s40537-024-00990-x","DOIUrl":"https://doi.org/10.1186/s40537-024-00990-x","url":null,"abstract":"<p>In digital agriculture, accurate crop detection is fundamental to developing automated systems for efficient plantation management. For oil palm, the main challenge lies in developing robust models that perform well in different environmental conditions. This study addresses the feasibility of using GAN augmentation methods to improve palm detection models. For this purpose, drone images of young palms (< 5 year-old) from eight different estates were collected, annotated, and used to build a baseline detection model based on DETR. StyleGAN2 was trained on the extracted palms and then used to generate a series of synthetic palms, which were then inserted into tiles representing different environments. CycleGAN networks were trained for bidirectional translation between synthetic and real tiles, subsequently utilized to augment the authenticity of synthetic tiles. Both synthetic and real tiles were used to train the GAN-based detection model. The baseline model achieved precision and recall values of 95.8% and 97.2%. The GAN-based model achieved comparable result, with precision and recall values of 98.5% and 98.6%. In the challenge dataset 1 consisting older palms (> 5 year-old), both models also achieved similar accuracies, with baseline model achieving precision and recall of 93.1% and 99.4%, and GAN-based model achieving 95.7% and 99.4%. As for the challenge dataset 2 consisting of storm affected palms, the baseline model achieved precision of 100% but recall was only 13%. The GAN-based model achieved a significantly better result, with a precision and recall values of 98.7% and 95.3%. This result demonstrates that images generated by GANs have the potential to enhance the accuracies of palm detection models.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"25 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The question of whether artificial intelligence (AI) can surpass human capabilities is crucial in the application of AI in clinical medicine. To explore this, an interpretable deep learning (DL) model was developed to assess myopia status using retinal refraction maps obtained with a novel peripheral refractor. The DL model demonstrated promising performance, achieving an AUC of 0.9074 (95% CI 0.83–0.97), an accuracy of 0.8140 (95% CI 0.70–0.93), a sensitivity of 0.7500 (95% CI 0.51–0.90), and a specificity of 0.8519 (95% CI 0.68–0.94). Grad-CAM analysis provided interpretable visualization of the attention of DL model and revealed that the DL model utilized information from the central retina, similar to human readers. Additionally, the model considered information from vertical regions across the central retina, which human readers had overlooked. This finding suggests that AI can indeed surpass human capabilities, bolstering our confidence in the use of AI in clinical practice, especially in new scenarios where prior human knowledge is limited.
人工智能(AI)能否超越人类的能力,是将人工智能应用于临床医学的关键问题。为了探讨这个问题,我们开发了一个可解释的深度学习(DL)模型,利用新型周边屈光仪获得的视网膜屈光度图来评估近视状态。该深度学习模型表现出良好的性能,AUC 为 0.9074(95% CI 0.83-0.97),准确度为 0.8140(95% CI 0.70-0.93),灵敏度为 0.7500(95% CI 0.51-0.90),特异度为 0.8519(95% CI 0.68-0.94)。Grad-CAM 分析为 DL 模型的注意力提供了可解释的可视化,并显示 DL 模型利用了视网膜中央的信息,与人类读者类似。此外,该模型还考虑了来自视网膜中央垂直区域的信息,而人类读者却忽略了这些信息。这一发现表明,人工智能确实可以超越人类的能力,增强了我们在临床实践中使用人工智能的信心,尤其是在人类先前知识有限的新场景中。
{"title":"AI sees beyond humans: automated diagnosis of myopia based on peripheral refraction map using interpretable deep learning","authors":"Yong Tang, Zhenghua Lin, Linjing Zhou, Weijia Wang, Longbo Wen, Yongli Zhou, Zongyuan Ge, Zhao Chen, Weiwei Dai, Zhikuan Yang, He Tang, Weizhong Lan","doi":"10.1186/s40537-024-00989-4","DOIUrl":"https://doi.org/10.1186/s40537-024-00989-4","url":null,"abstract":"<p>The question of whether artificial intelligence (AI) can surpass human capabilities is crucial in the application of AI in clinical medicine. To explore this, an interpretable deep learning (DL) model was developed to assess myopia status using retinal refraction maps obtained with a novel peripheral refractor. The DL model demonstrated promising performance, achieving an AUC of 0.9074 (95% CI 0.83–0.97), an accuracy of 0.8140 (95% CI 0.70–0.93), a sensitivity of 0.7500 (95% CI 0.51–0.90), and a specificity of 0.8519 (95% CI 0.68–0.94). Grad-CAM analysis provided interpretable visualization of the attention of DL model and revealed that the DL model utilized information from the central retina, similar to human readers. Additionally, the model considered information from vertical regions across the central retina, which human readers had overlooked. This finding suggests that AI can indeed surpass human capabilities, bolstering our confidence in the use of AI in clinical practice, especially in new scenarios where prior human knowledge is limited.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"23 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1186/s40537-024-00975-w
Abdul Rasheed Mahesar, Xiaoping Li, Dileep Kumar Sajnani
In recent years, mobile applications have proliferated across domains such as E-banking, Augmented Reality, E-Transportation, and E-Healthcare. These applications are often built using microservices, an architectural style where the application is composed of independently deployable services focusing on specific functionalities. Mobile devices cannot process these microservices locally, so traditionally, cloud-based frameworks using cost-efficient Virtual Machines (VMs) and edge servers have been used to offload these tasks. However, cloud frameworks suffer from extended boot times and high transmission overhead, while edge servers have limited computational resources. To overcome these challenges, this study introduces a Microservices Container-Based Mobile Edge Cloud Computing (MCBMEC) environment and proposes an innovative framework, Optimization Task Scheduling and Computational Offloading with Cost Awareness (OTSCOCA). This framework addresses Resource Matching, Task Sequencing, and Task Scheduling to enhance server utilization, reduce service latency, and improve service bootup times. Empirical results validate the efficacy of MCBMEC and OTSCOCA, demonstrating significant improvements in server efficiency, reduced service latency, faster service bootup times, and notable cost savings. These outcomes underscore the pivotal role of these methodologies in advancing mobile edge computing applications amidst the challenges of edge server limitations and traditional cloud-based approaches.
{"title":"Efficient microservices offloading for cost optimization in diverse MEC cloud networks","authors":"Abdul Rasheed Mahesar, Xiaoping Li, Dileep Kumar Sajnani","doi":"10.1186/s40537-024-00975-w","DOIUrl":"https://doi.org/10.1186/s40537-024-00975-w","url":null,"abstract":"<p>In recent years, mobile applications have proliferated across domains such as E-banking, Augmented Reality, E-Transportation, and E-Healthcare. These applications are often built using microservices, an architectural style where the application is composed of independently deployable services focusing on specific functionalities. Mobile devices cannot process these microservices locally, so traditionally, cloud-based frameworks using cost-efficient Virtual Machines (VMs) and edge servers have been used to offload these tasks. However, cloud frameworks suffer from extended boot times and high transmission overhead, while edge servers have limited computational resources. To overcome these challenges, this study introduces a Microservices Container-Based Mobile Edge Cloud Computing (MCBMEC) environment and proposes an innovative framework, Optimization Task Scheduling and Computational Offloading with Cost Awareness (OTSCOCA). This framework addresses Resource Matching, Task Sequencing, and Task Scheduling to enhance server utilization, reduce service latency, and improve service bootup times. Empirical results validate the efficacy of MCBMEC and OTSCOCA, demonstrating significant improvements in server efficiency, reduced service latency, faster service bootup times, and notable cost savings. These outcomes underscore the pivotal role of these methodologies in advancing mobile edge computing applications amidst the challenges of edge server limitations and traditional cloud-based approaches.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"1 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1186/s40537-024-00993-8
Jungryeol Park, Saesol Choi, Yituo Feng
The success of newly established companies holds significant implications for community development and economic growth. However, startups often grapple with heightened vulnerability to market volatility, which can lead to early-stage failures. This study aims to predict startup success by addressing biases in existing predictive models. Previous research has examined external factors such as market dynamics and internal elements like founder characteristics.While such efforts have contributed to understanding success mechanisms, challenges persist, including predictor and learning data biases. This study proposes a novel approach by constructing independent variables using early-stage information, incorporating founder attributes, and mitigating class imbalance through generative adversarial networks (GAN). Our proposed model aims to enhance investment decision-making efficiency and effectiveness, offering a valuable decision support system for various venture capital funds.
{"title":"Predicting startup success using two bias-free machine learning: resolving data imbalance using generative adversarial networks","authors":"Jungryeol Park, Saesol Choi, Yituo Feng","doi":"10.1186/s40537-024-00993-8","DOIUrl":"https://doi.org/10.1186/s40537-024-00993-8","url":null,"abstract":"<p>The success of newly established companies holds significant implications for community development and economic growth. However, startups often grapple with heightened vulnerability to market volatility, which can lead to early-stage failures. This study aims to predict startup success by addressing biases in existing predictive models. Previous research has examined external factors such as market dynamics and internal elements like founder characteristics.While such efforts have contributed to understanding success mechanisms, challenges persist, including predictor and learning data biases. This study proposes a novel approach by constructing independent variables using early-stage information, incorporating founder attributes, and mitigating class imbalance through generative adversarial networks (GAN). Our proposed model aims to enhance investment decision-making efficiency and effectiveness, offering a valuable decision support system for various venture capital funds.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"4 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.1186/s40537-024-00982-x
I Nyoman Mahayasa Adiputra, Paweena Wanchai
Class imbalance is one of many problems of customer churn datasets. One of the common problems is class overlap, where the data have a similar instance between classes. The prediction task of customer churn becomes more challenging when there is class overlap in the data training. In this research, we suggested a hybrid method based on tabular GANs, called CTGAN-ENN, to address class overlap and imbalanced data in datasets of customers that churn. We used five different customer churn datasets from an open platform. CTGAN is a tabular GAN-based oversampling to address class imbalance but has a class overlap problem. We combined CTGAN with the ENN under-sampling technique to overcome the class overlap. CTGAN-ENN reduced the number of class overlaps by each feature in all datasets. We investigated how effective CTGAN-ENN is in each machine learning technique. Based on our experiments, CTGAN-ENN achieved satisfactory results in KNN, GBM, XGB and LGB machine learning performance for customer churn predictions. We compared CTGAN-ENN with common over-sampling and hybrid sampling methods, and CTGAN-ENN achieved outperform results compared with other sampling methods and algorithm-level methods with cost-sensitive learning in several machine learning algorithms. We provide a time consumption algorithm between CTGAN and CTGAN-ENN. CTGAN-ENN achieved less time consumption than CTGAN. Our research work provides a new framework to handle customer churn prediction problems with several types of imbalanced datasets and can be useful in real-world data from customer churn prediction.
类不平衡是客户流失数据集的众多问题之一。其中一个常见问题是类重叠,即数据在类之间有相似的实例。当数据训练中存在类重叠时,客户流失的预测任务就变得更具挑战性。在这项研究中,我们提出了一种基于表格 GAN 的混合方法,称为 CTGAN-ENN,以解决客户流失数据集中的类重叠和不平衡数据问题。我们使用了来自开放平台的五个不同的客户流失数据集。CTGAN 是一种基于表格 GAN 的超采样方法,用于解决类不平衡问题,但也存在类重叠问题。我们将 CTGAN 与 ENN 下采样技术相结合,以克服类重叠问题。CTGAN-ENN 减少了所有数据集中每个特征的类重叠数量。我们研究了 CTGAN-ENN 在每种机器学习技术中的效果。根据我们的实验,CTGAN-ENN 在客户流失预测的 KNN、GBM、XGB 和 LGB 机器学习性能方面都取得了令人满意的结果。我们将 CTGAN-ENN 与常见的过度采样法和混合采样法进行了比较,在几种机器学习算法中,CTGAN-ENN 取得了优于其他采样法和具有成本敏感学习的算法级方法的结果。我们提供了 CTGAN 和 CTGAN-ENN 之间的耗时算法。与 CTGAN 相比,CTGAN-ENN 的耗时更少。我们的研究工作提供了一个新的框架来处理几类不平衡数据集的客户流失预测问题,并可用于客户流失预测的实际数据中。
{"title":"CTGAN-ENN: a tabular GAN-based hybrid sampling method for imbalanced and overlapped data in customer churn prediction","authors":"I Nyoman Mahayasa Adiputra, Paweena Wanchai","doi":"10.1186/s40537-024-00982-x","DOIUrl":"https://doi.org/10.1186/s40537-024-00982-x","url":null,"abstract":"<p>Class imbalance is one of many problems of customer churn datasets. One of the common problems is class overlap, where the data have a similar instance between classes. The prediction task of customer churn becomes more challenging when there is class overlap in the data training. In this research, we suggested a hybrid method based on tabular GANs, called CTGAN-ENN, to address class overlap and imbalanced data in datasets of customers that churn. We used five different customer churn datasets from an open platform. CTGAN is a tabular GAN-based oversampling to address class imbalance but has a class overlap problem. We combined CTGAN with the ENN under-sampling technique to overcome the class overlap. CTGAN-ENN reduced the number of class overlaps by each feature in all datasets. We investigated how effective CTGAN-ENN is in each machine learning technique. Based on our experiments, CTGAN-ENN achieved satisfactory results in KNN, GBM, XGB and LGB machine learning performance for customer churn predictions. We compared CTGAN-ENN with common over-sampling and hybrid sampling methods, and CTGAN-ENN achieved outperform results compared with other sampling methods and algorithm-level methods with cost-sensitive learning in several machine learning algorithms. We provide a time consumption algorithm between CTGAN and CTGAN-ENN. CTGAN-ENN achieved less time consumption than CTGAN. Our research work provides a new framework to handle customer churn prediction problems with several types of imbalanced datasets and can be useful in real-world data from customer churn prediction.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"78 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-29DOI: 10.1186/s40537-024-00962-1
Monica L. Smith, Connor Newton
Some of the most notable human behavioral palimpsests result from warfare and its durable traces in the form of defensive architecture and strategic infrastructure. For premodern periods, this architecture is often understudied at the large scale, resulting in a lack of appreciation for the enormity of the costs and impacts of military spending over the course of human history. In this article, we compare the information gleaned from the study of the fortified cities of the Early Historic period of the Indian subcontinent (c. 3rd century BCE to 4th century CE) with the precolonial medieval era (9-17th centuries CE). Utilizing in-depth archaeological and historical studies along with local sightings and citizen-science blogs to create a comprehensive data set and map series in a “big-data” approach that makes use of heterogeneous data sets and presence-absence criteria, we discuss how the architecture of warfare shifted from an emphasis on urban defense in the Early Historic period to an emphasis on territorial offense and defense in the medieval period. Many medieval fortifications are known from only local reports and have minimal identifying information but can still be studied in the aggregate using a least-shared denominator approach to quantification and mapping.
{"title":"Cartographies of warfare in the Indian subcontinent: Contextualizing archaeological and historical analysis through big data approaches","authors":"Monica L. Smith, Connor Newton","doi":"10.1186/s40537-024-00962-1","DOIUrl":"https://doi.org/10.1186/s40537-024-00962-1","url":null,"abstract":"<p>Some of the most notable human behavioral palimpsests result from warfare and its durable traces in the form of defensive architecture and strategic infrastructure. For premodern periods, this architecture is often understudied at the large scale, resulting in a lack of appreciation for the enormity of the costs and impacts of military spending over the course of human history. In this article, we compare the information gleaned from the study of the fortified cities of the Early Historic period of the Indian subcontinent (c. 3rd century BCE to 4th century CE) with the precolonial medieval era (9-17th centuries CE). Utilizing in-depth archaeological and historical studies along with local sightings and citizen-science blogs to create a comprehensive data set and map series in a “big-data” approach that makes use of heterogeneous data sets and presence-absence criteria, we discuss how the architecture of warfare shifted from an emphasis on urban defense in the Early Historic period to an emphasis on territorial offense and defense in the medieval period. Many medieval fortifications are known from only local reports and have minimal identifying information but can still be studied in the aggregate using a least-shared denominator approach to quantification and mapping.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"14 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-29DOI: 10.1186/s40537-024-00941-6
Junfeng An, Mengmeng Lu, Gang Li, Jiqiang Liu, Chongqing Wang
Subway button detection is paramount for passenger safety, yet the occurrence of inadvertent touches poses operational threats. Camera-based detection is indispensable for identifying touch occurrences, ascertaining person identity, and implementing scientific measures. Existing methods suffer from inaccuracies due to the small size of buttons, complex environments, and challenges such as occlusion. We present YOLOv8-DETR-P2-DCNv2-Dynamic-NWD-DA, which enhances occlusion awareness, reduces redundant annotations, and improves contextual feature extraction. The model integrates the RTDETRDecoder, P2 small target detection layer, DCNv2-Dynamic algorithm, and the NWD loss function for multiscale feature extraction. Dataset augmentation and the GAN algorithm refine the model, aligning feature distributions and enhancing precision by 6.5%, 5%, and 5.8% in precision, recall, and mAP50, respectively. These advancements denote significant improvements in key performance indicators.
{"title":"Automated subway touch button detection using image process","authors":"Junfeng An, Mengmeng Lu, Gang Li, Jiqiang Liu, Chongqing Wang","doi":"10.1186/s40537-024-00941-6","DOIUrl":"https://doi.org/10.1186/s40537-024-00941-6","url":null,"abstract":"<p>Subway button detection is paramount for passenger safety, yet the occurrence of inadvertent touches poses operational threats. Camera-based detection is indispensable for identifying touch occurrences, ascertaining person identity, and implementing scientific measures. Existing methods suffer from inaccuracies due to the small size of buttons, complex environments, and challenges such as occlusion. We present YOLOv8-DETR-P2-DCNv2-Dynamic-NWD-DA, which enhances occlusion awareness, reduces redundant annotations, and improves contextual feature extraction. The model integrates the RTDETRDecoder, P2 small target detection layer, DCNv2-Dynamic algorithm, and the NWD loss function for multiscale feature extraction. Dataset augmentation and the GAN algorithm refine the model, aligning feature distributions and enhancing precision by 6.5%, 5%, and 5.8% in precision, recall, and mAP50, respectively. These advancements denote significant improvements in key performance indicators.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"9 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-23DOI: 10.1186/s40537-024-00980-z
Ali Yimam Eshetu, Endris Abdu Mohammed, Ayodeji Olalekan Salau
This study investigates the causes and countermeasures of cybercrime vulnerabilities, specifically focusing on selected 16 Ethiopian university websites. This study uses a cybersecurity awareness survey, and automated vulnerability assessment and penetration testing (VAPT) technique tools, namely, Nmap, Nessus, and Vega, to identify potential security threats and vulnerabilities. The assessment was performed according to the ISO/IEC 27001 series of standards, ensuring a comprehensive and globally recognized approach to information security. The results of this study provide valuable insights into the current state of cybersecurity in Ethiopian universities and reveals a range of issues, from outdated software and poor password management to a lack of encryption and inadequate access control. Vega vulnerability assessment reports 11,286 total findings, and Nessus identified a total of 1749 vulnerabilities across all the websites of the institutions examined. Based on these findings, the study proposes counteractive measures tailored to the specific needs of each identified defect. These recommendations aim to strengthen the security posture of the university websites, thereby protecting sensitive data and maintaining the trust of students, staff, and other stakeholders. The study emphasizes the need for proactive cybersecurity measures in the realm of higher education and presents a strategic plan for universities to improve their digital security.
{"title":"Cybersecurity vulnerabilities and solutions in Ethiopian university websites","authors":"Ali Yimam Eshetu, Endris Abdu Mohammed, Ayodeji Olalekan Salau","doi":"10.1186/s40537-024-00980-z","DOIUrl":"https://doi.org/10.1186/s40537-024-00980-z","url":null,"abstract":"<p>This study investigates the causes and countermeasures of cybercrime vulnerabilities, specifically focusing on selected 16 Ethiopian university websites. This study uses a cybersecurity awareness survey, and automated vulnerability assessment and penetration testing (VAPT) technique tools, namely, Nmap, Nessus, and Vega, to identify potential security threats and vulnerabilities. The assessment was performed according to the ISO/IEC 27001 series of standards, ensuring a comprehensive and globally recognized approach to information security. The results of this study provide valuable insights into the current state of cybersecurity in Ethiopian universities and reveals a range of issues, from outdated software and poor password management to a lack of encryption and inadequate access control. Vega vulnerability assessment reports 11,286 total findings, and Nessus identified a total of 1749 vulnerabilities across all the websites of the institutions examined. Based on these findings, the study proposes counteractive measures tailored to the specific needs of each identified defect. These recommendations aim to strengthen the security posture of the university websites, thereby protecting sensitive data and maintaining the trust of students, staff, and other stakeholders. The study emphasizes the need for proactive cybersecurity measures in the realm of higher education and presents a strategic plan for universities to improve their digital security.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"9 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}