Journal of Big Data最新文献_第2页

Enhancing oil palm segmentation model with GAN-based augmentation 利用基于 GAN 的增强技术改进油棕榈树细分模型

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-08 DOI: 10.1186/s40537-024-00990-x

Qi Bin Kwong, Yee Thung Kon, Wan Rusydiah W. Rusik, Mohd Nor Azizi Shabudin, Shahirah Shazana A. Rahman, Harikrishna Kulaveerasingam, David Ross Appleton

In digital agriculture, accurate crop detection is fundamental to developing automated systems for efficient plantation management. For oil palm, the main challenge lies in developing robust models that perform well in different environmental conditions. This study addresses the feasibility of using GAN augmentation methods to improve palm detection models. For this purpose, drone images of young palms (< 5 year-old) from eight different estates were collected, annotated, and used to build a baseline detection model based on DETR. StyleGAN2 was trained on the extracted palms and then used to generate a series of synthetic palms, which were then inserted into tiles representing different environments. CycleGAN networks were trained for bidirectional translation between synthetic and real tiles, subsequently utilized to augment the authenticity of synthetic tiles. Both synthetic and real tiles were used to train the GAN-based detection model. The baseline model achieved precision and recall values of 95.8% and 97.2%. The GAN-based model achieved comparable result, with precision and recall values of 98.5% and 98.6%. In the challenge dataset 1 consisting older palms (> 5 year-old), both models also achieved similar accuracies, with baseline model achieving precision and recall of 93.1% and 99.4%, and GAN-based model achieving 95.7% and 99.4%. As for the challenge dataset 2 consisting of storm affected palms, the baseline model achieved precision of 100% but recall was only 13%. The GAN-based model achieved a significantly better result, with a precision and recall values of 98.7% and 95.3%. This result demonstrates that images generated by GANs have the potential to enhance the accuracies of palm detection models.

在数字农业领域，准确的作物检测是开发高效种植管理自动化系统的基础。对于油棕榈树来说，主要挑战在于开发在不同环境条件下表现良好的稳健模型。本研究探讨了使用 GAN 增强方法改进棕榈检测模型的可行性。为此，研究人员从八个不同的庄园收集了幼嫩棕榈树（5 岁）的无人机图像，并对其进行了注释，用于建立基于 DETR 的基准检测模型。对提取的棕榈树进行了 StyleGAN2 训练，然后用于生成一系列合成棕榈树，并将其插入代表不同环境的瓷砖中。对 CycleGAN 网络进行了训练，以实现合成和真实瓷砖之间的双向转换，随后用于增强合成瓷砖的真实性。合成瓷砖和真实瓷砖都用于训练基于 GAN 的检测模型。基线模型的精确度和召回率分别达到 95.8% 和 97.2%。基于 GAN 的模型取得了不相上下的结果，精确度和召回值分别为 98.5% 和 98.6%。在由年龄较大的手掌（5 岁）组成的挑战数据集 1 中，两个模型也取得了相似的准确度，基线模型的准确度和召回率分别为 93.1% 和 99.4%，基于 GAN 的模型的准确度和召回率分别为 95.7% 和 99.4%。至于由受风暴影响的手掌组成的挑战数据集 2，基线模型的精确度达到了 100%，但召回率仅为 13%。基于 GAN 的模型取得了明显更好的结果，精确率和召回率分别为 98.7% 和 95.3%。这一结果表明，由 GAN 生成的图像有可能提高棕榈检测模型的精确度。

{"title":"Enhancing oil palm segmentation model with GAN-based augmentation","authors":"Qi Bin Kwong, Yee Thung Kon, Wan Rusydiah W. Rusik, Mohd Nor Azizi Shabudin, Shahirah Shazana A. Rahman, Harikrishna Kulaveerasingam, David Ross Appleton","doi":"10.1186/s40537-024-00990-x","DOIUrl":"https://doi.org/10.1186/s40537-024-00990-x","url":null,"abstract":"In digital agriculture, accurate crop detection is fundamental to developing automated systems for efficient plantation management. For oil palm, the main challenge lies in developing robust models that perform well in different environmental conditions. This study addresses the feasibility of using GAN augmentation methods to improve palm detection models. For this purpose, drone images of young palms (< 5 year-old) from eight different estates were collected, annotated, and used to build a baseline detection model based on DETR. StyleGAN2 was trained on the extracted palms and then used to generate a series of synthetic palms, which were then inserted into tiles representing different environments. CycleGAN networks were trained for bidirectional translation between synthetic and real tiles, subsequently utilized to augment the authenticity of synthetic tiles. Both synthetic and real tiles were used to train the GAN-based detection model. The baseline model achieved precision and recall values of 95.8% and 97.2%. The GAN-based model achieved comparable result, with precision and recall values of 98.5% and 98.6%. In the challenge dataset 1 consisting older palms (> 5 year-old), both models also achieved similar accuracies, with baseline model achieving precision and recall of 93.1% and 99.4%, and GAN-based model achieving 95.7% and 99.4%. As for the challenge dataset 2 consisting of storm affected palms, the baseline model achieved precision of 100% but recall was only 13%. The GAN-based model achieved a significantly better result, with a precision and recall values of 98.7% and 95.3%. This result demonstrates that images generated by GANs have the potential to enhance the accuracies of palm detection models.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"25 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI sees beyond humans: automated diagnosis of myopia based on peripheral refraction map using interpretable deep learning 人工智能的视力超越人类：利用可解释深度学习，基于周边屈光图自动诊断近视

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-08 DOI: 10.1186/s40537-024-00989-4

Yong Tang, Zhenghua Lin, Linjing Zhou, Weijia Wang, Longbo Wen, Yongli Zhou, Zongyuan Ge, Zhao Chen, Weiwei Dai, Zhikuan Yang, He Tang, Weizhong Lan

The question of whether artificial intelligence (AI) can surpass human capabilities is crucial in the application of AI in clinical medicine. To explore this, an interpretable deep learning (DL) model was developed to assess myopia status using retinal refraction maps obtained with a novel peripheral refractor. The DL model demonstrated promising performance, achieving an AUC of 0.9074 (95% CI 0.83–0.97), an accuracy of 0.8140 (95% CI 0.70–0.93), a sensitivity of 0.7500 (95% CI 0.51–0.90), and a specificity of 0.8519 (95% CI 0.68–0.94). Grad-CAM analysis provided interpretable visualization of the attention of DL model and revealed that the DL model utilized information from the central retina, similar to human readers. Additionally, the model considered information from vertical regions across the central retina, which human readers had overlooked. This finding suggests that AI can indeed surpass human capabilities, bolstering our confidence in the use of AI in clinical practice, especially in new scenarios where prior human knowledge is limited.

人工智能（AI）能否超越人类的能力，是将人工智能应用于临床医学的关键问题。为了探讨这个问题，我们开发了一个可解释的深度学习（DL）模型，利用新型周边屈光仪获得的视网膜屈光度图来评估近视状态。该深度学习模型表现出良好的性能，AUC 为 0.9074（95% CI 0.83-0.97），准确度为 0.8140（95% CI 0.70-0.93），灵敏度为 0.7500（95% CI 0.51-0.90），特异度为 0.8519（95% CI 0.68-0.94）。Grad-CAM 分析为 DL 模型的注意力提供了可解释的可视化，并显示 DL 模型利用了视网膜中央的信息，与人类读者类似。此外，该模型还考虑了来自视网膜中央垂直区域的信息，而人类读者却忽略了这些信息。这一发现表明，人工智能确实可以超越人类的能力，增强了我们在临床实践中使用人工智能的信心，尤其是在人类先前知识有限的新场景中。

{"title":"AI sees beyond humans: automated diagnosis of myopia based on peripheral refraction map using interpretable deep learning","authors":"Yong Tang, Zhenghua Lin, Linjing Zhou, Weijia Wang, Longbo Wen, Yongli Zhou, Zongyuan Ge, Zhao Chen, Weiwei Dai, Zhikuan Yang, He Tang, Weizhong Lan","doi":"10.1186/s40537-024-00989-4","DOIUrl":"https://doi.org/10.1186/s40537-024-00989-4","url":null,"abstract":"The question of whether artificial intelligence (AI) can surpass human capabilities is crucial in the application of AI in clinical medicine. To explore this, an interpretable deep learning (DL) model was developed to assess myopia status using retinal refraction maps obtained with a novel peripheral refractor. The DL model demonstrated promising performance, achieving an AUC of 0.9074 (95% CI 0.83–0.97), an accuracy of 0.8140 (95% CI 0.70–0.93), a sensitivity of 0.7500 (95% CI 0.51–0.90), and a specificity of 0.8519 (95% CI 0.68–0.94). Grad-CAM analysis provided interpretable visualization of the attention of DL model and revealed that the DL model utilized information from the central retina, similar to human readers. Additionally, the model considered information from vertical regions across the central retina, which human readers had overlooked. This finding suggests that AI can indeed surpass human capabilities, bolstering our confidence in the use of AI in clinical practice, especially in new scenarios where prior human knowledge is limited.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"23 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient microservices offloading for cost optimization in diverse MEC cloud networks 在多样化 MEC 云网络中高效卸载微服务以优化成本

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-04 DOI: 10.1186/s40537-024-00975-w

Abdul Rasheed Mahesar, Xiaoping Li, Dileep Kumar Sajnani

In recent years, mobile applications have proliferated across domains such as E-banking, Augmented Reality, E-Transportation, and E-Healthcare. These applications are often built using microservices, an architectural style where the application is composed of independently deployable services focusing on specific functionalities. Mobile devices cannot process these microservices locally, so traditionally, cloud-based frameworks using cost-efficient Virtual Machines (VMs) and edge servers have been used to offload these tasks. However, cloud frameworks suffer from extended boot times and high transmission overhead, while edge servers have limited computational resources. To overcome these challenges, this study introduces a Microservices Container-Based Mobile Edge Cloud Computing (MCBMEC) environment and proposes an innovative framework, Optimization Task Scheduling and Computational Offloading with Cost Awareness (OTSCOCA). This framework addresses Resource Matching, Task Sequencing, and Task Scheduling to enhance server utilization, reduce service latency, and improve service bootup times. Empirical results validate the efficacy of MCBMEC and OTSCOCA, demonstrating significant improvements in server efficiency, reduced service latency, faster service bootup times, and notable cost savings. These outcomes underscore the pivotal role of these methodologies in advancing mobile edge computing applications amidst the challenges of edge server limitations and traditional cloud-based approaches.

近年来，移动应用程序在电子银行、增强现实、电子交通和电子医疗等领域激增。这些应用通常使用微服务构建，微服务是一种架构风格，应用由可独立部署的服务组成，专注于特定功能。移动设备无法在本地处理这些微服务，因此传统上使用基于云的框架，利用具有成本效益的虚拟机（VM）和边缘服务器来卸载这些任务。然而，云框架存在启动时间长、传输开销大的问题，而边缘服务器的计算资源有限。为了克服这些挑战，本研究引入了基于微服务容器的移动边缘云计算（MCBMEC）环境，并提出了一个创新框架--具有成本意识的优化任务调度和计算卸载（OTSCOCA）。该框架涉及资源匹配、任务排序和任务调度，以提高服务器利用率、减少服务延迟并改善服务启动时间。实证结果验证了 MCBMEC 和 OTSCOCA 的功效，表明服务器效率显著提高，服务延迟减少，服务启动时间加快，成本明显降低。这些结果凸显了这些方法在推进移动边缘计算应用中的关键作用，而边缘服务器的局限性和传统的基于云的方法则是这些方法面临的挑战。

{"title":"Efficient microservices offloading for cost optimization in diverse MEC cloud networks","authors":"Abdul Rasheed Mahesar, Xiaoping Li, Dileep Kumar Sajnani","doi":"10.1186/s40537-024-00975-w","DOIUrl":"https://doi.org/10.1186/s40537-024-00975-w","url":null,"abstract":"In recent years, mobile applications have proliferated across domains such as E-banking, Augmented Reality, E-Transportation, and E-Healthcare. These applications are often built using microservices, an architectural style where the application is composed of independently deployable services focusing on specific functionalities. Mobile devices cannot process these microservices locally, so traditionally, cloud-based frameworks using cost-efficient Virtual Machines (VMs) and edge servers have been used to offload these tasks. However, cloud frameworks suffer from extended boot times and high transmission overhead, while edge servers have limited computational resources. To overcome these challenges, this study introduces a Microservices Container-Based Mobile Edge Cloud Computing (MCBMEC) environment and proposes an innovative framework, Optimization Task Scheduling and Computational Offloading with Cost Awareness (OTSCOCA). This framework addresses Resource Matching, Task Sequencing, and Task Scheduling to enhance server utilization, reduce service latency, and improve service bootup times. Empirical results validate the efficacy of MCBMEC and OTSCOCA, demonstrating significant improvements in server efficiency, reduced service latency, faster service bootup times, and notable cost savings. These outcomes underscore the pivotal role of these methodologies in advancing mobile edge computing applications amidst the challenges of edge server limitations and traditional cloud-based approaches.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"1 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting startup success using two bias-free machine learning: resolving data imbalance using generative adversarial networks 利用两种无偏差机器学习预测初创企业的成功：利用生成式对抗网络解决数据不平衡问题

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-03 DOI: 10.1186/s40537-024-00993-8

Jungryeol Park, Saesol Choi, Yituo Feng

The success of newly established companies holds significant implications for community development and economic growth. However, startups often grapple with heightened vulnerability to market volatility, which can lead to early-stage failures. This study aims to predict startup success by addressing biases in existing predictive models. Previous research has examined external factors such as market dynamics and internal elements like founder characteristics.While such efforts have contributed to understanding success mechanisms, challenges persist, including predictor and learning data biases. This study proposes a novel approach by constructing independent variables using early-stage information, incorporating founder attributes, and mitigating class imbalance through generative adversarial networks (GAN). Our proposed model aims to enhance investment decision-making efficiency and effectiveness, offering a valuable decision support system for various venture capital funds.

新成立公司的成功对社区发展和经济增长具有重要意义。然而，初创企业往往更容易受到市场波动的影响，从而导致早期阶段的失败。本研究旨在通过解决现有预测模型中的偏差来预测初创企业的成功。以往的研究考察了市场动态等外部因素和创始人特征等内部因素。虽然这些努力有助于了解成功机制，但挑战依然存在，包括预测和学习数据的偏差。本研究提出了一种新方法，即利用早期信息构建自变量，纳入创始人属性，并通过生成式对抗网络（GAN）缓解类别不平衡。我们提出的模型旨在提高投资决策的效率和效果，为各种风险投资基金提供有价值的决策支持系统。

引用次数: 0

CTGAN-ENN: a tabular GAN-based hybrid sampling method for imbalanced and overlapped data in customer churn prediction CTGAN-ENN：一种基于表格 GAN 的混合采样方法，适用于客户流失预测中的不平衡和重叠数据

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-02 DOI: 10.1186/s40537-024-00982-x

I Nyoman Mahayasa Adiputra, Paweena Wanchai

Class imbalance is one of many problems of customer churn datasets. One of the common problems is class overlap, where the data have a similar instance between classes. The prediction task of customer churn becomes more challenging when there is class overlap in the data training. In this research, we suggested a hybrid method based on tabular GANs, called CTGAN-ENN, to address class overlap and imbalanced data in datasets of customers that churn. We used five different customer churn datasets from an open platform. CTGAN is a tabular GAN-based oversampling to address class imbalance but has a class overlap problem. We combined CTGAN with the ENN under-sampling technique to overcome the class overlap. CTGAN-ENN reduced the number of class overlaps by each feature in all datasets. We investigated how effective CTGAN-ENN is in each machine learning technique. Based on our experiments, CTGAN-ENN achieved satisfactory results in KNN, GBM, XGB and LGB machine learning performance for customer churn predictions. We compared CTGAN-ENN with common over-sampling and hybrid sampling methods, and CTGAN-ENN achieved outperform results compared with other sampling methods and algorithm-level methods with cost-sensitive learning in several machine learning algorithms. We provide a time consumption algorithm between CTGAN and CTGAN-ENN. CTGAN-ENN achieved less time consumption than CTGAN. Our research work provides a new framework to handle customer churn prediction problems with several types of imbalanced datasets and can be useful in real-world data from customer churn prediction.

类不平衡是客户流失数据集的众多问题之一。其中一个常见问题是类重叠，即数据在类之间有相似的实例。当数据训练中存在类重叠时，客户流失的预测任务就变得更具挑战性。在这项研究中，我们提出了一种基于表格 GAN 的混合方法，称为 CTGAN-ENN，以解决客户流失数据集中的类重叠和不平衡数据问题。我们使用了来自开放平台的五个不同的客户流失数据集。CTGAN 是一种基于表格 GAN 的超采样方法，用于解决类不平衡问题，但也存在类重叠问题。我们将 CTGAN 与 ENN 下采样技术相结合，以克服类重叠问题。CTGAN-ENN 减少了所有数据集中每个特征的类重叠数量。我们研究了 CTGAN-ENN 在每种机器学习技术中的效果。根据我们的实验，CTGAN-ENN 在客户流失预测的 KNN、GBM、XGB 和 LGB 机器学习性能方面都取得了令人满意的结果。我们将 CTGAN-ENN 与常见的过度采样法和混合采样法进行了比较，在几种机器学习算法中，CTGAN-ENN 取得了优于其他采样法和具有成本敏感学习的算法级方法的结果。我们提供了 CTGAN 和 CTGAN-ENN 之间的耗时算法。与 CTGAN 相比，CTGAN-ENN 的耗时更少。我们的研究工作提供了一个新的框架来处理几类不平衡数据集的客户流失预测问题，并可用于客户流失预测的实际数据中。

{"title":"CTGAN-ENN: a tabular GAN-based hybrid sampling method for imbalanced and overlapped data in customer churn prediction","authors":"I Nyoman Mahayasa Adiputra, Paweena Wanchai","doi":"10.1186/s40537-024-00982-x","DOIUrl":"https://doi.org/10.1186/s40537-024-00982-x","url":null,"abstract":"Class imbalance is one of many problems of customer churn datasets. One of the common problems is class overlap, where the data have a similar instance between classes. The prediction task of customer churn becomes more challenging when there is class overlap in the data training. In this research, we suggested a hybrid method based on tabular GANs, called CTGAN-ENN, to address class overlap and imbalanced data in datasets of customers that churn. We used five different customer churn datasets from an open platform. CTGAN is a tabular GAN-based oversampling to address class imbalance but has a class overlap problem. We combined CTGAN with the ENN under-sampling technique to overcome the class overlap. CTGAN-ENN reduced the number of class overlaps by each feature in all datasets. We investigated how effective CTGAN-ENN is in each machine learning technique. Based on our experiments, CTGAN-ENN achieved satisfactory results in KNN, GBM, XGB and LGB machine learning performance for customer churn predictions. We compared CTGAN-ENN with common over-sampling and hybrid sampling methods, and CTGAN-ENN achieved outperform results compared with other sampling methods and algorithm-level methods with cost-sensitive learning in several machine learning algorithms. We provide a time consumption algorithm between CTGAN and CTGAN-ENN. CTGAN-ENN achieved less time consumption than CTGAN. Our research work provides a new framework to handle customer churn prediction problems with several types of imbalanced datasets and can be useful in real-world data from customer churn prediction.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"78 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cartographies of warfare in the Indian subcontinent: Contextualizing archaeological and historical analysis through big data approaches 印度次大陆的战争地图：通过大数据方法对考古和历史分析进行语境分析

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-08-29 DOI: 10.1186/s40537-024-00962-1

Monica L. Smith, Connor Newton

Some of the most notable human behavioral palimpsests result from warfare and its durable traces in the form of defensive architecture and strategic infrastructure. For premodern periods, this architecture is often understudied at the large scale, resulting in a lack of appreciation for the enormity of the costs and impacts of military spending over the course of human history. In this article, we compare the information gleaned from the study of the fortified cities of the Early Historic period of the Indian subcontinent (c. 3rd century BCE to 4th century CE) with the precolonial medieval era (9-17th centuries CE). Utilizing in-depth archaeological and historical studies along with local sightings and citizen-science blogs to create a comprehensive data set and map series in a “big-data” approach that makes use of heterogeneous data sets and presence-absence criteria, we discuss how the architecture of warfare shifted from an emphasis on urban defense in the Early Historic period to an emphasis on territorial offense and defense in the medieval period. Many medieval fortifications are known from only local reports and have minimal identifying information but can still be studied in the aggregate using a least-shared denominator approach to quantification and mapping.

战争及其以防御性建筑和战略基础设施形式留下的持久痕迹是一些最显著的人类行为古迹。对于近代以前的时期，这种大规模的建筑往往研究不足，导致人们对人类历史上军事开支的巨大代价和影响缺乏认识。在本文中，我们将对印度次大陆早期历史时期（约公元前 3 世纪至公元前 4 世纪）的设防城市和前殖民时期的中世纪（公元前 9-17 世纪）的设防城市进行比较研究。我们利用深入的考古和历史研究以及当地目击和公民科学博客，以 "大数据 "方法（即利用异构数据集和存在-不存在标准）创建了一个综合数据集和地图系列，讨论了战争建筑如何从早期历史时期强调城市防御转变为中世纪时期强调领土进攻和防御。许多中世纪防御工事仅从地方报告中得知，识别信息极少，但仍可使用最小公分母方法进行量化和制图，对其进行总体研究。

{"title":"Cartographies of warfare in the Indian subcontinent: Contextualizing archaeological and historical analysis through big data approaches","authors":"Monica L. Smith, Connor Newton","doi":"10.1186/s40537-024-00962-1","DOIUrl":"https://doi.org/10.1186/s40537-024-00962-1","url":null,"abstract":"Some of the most notable human behavioral palimpsests result from warfare and its durable traces in the form of defensive architecture and strategic infrastructure. For premodern periods, this architecture is often understudied at the large scale, resulting in a lack of appreciation for the enormity of the costs and impacts of military spending over the course of human history. In this article, we compare the information gleaned from the study of the fortified cities of the Early Historic period of the Indian subcontinent (c. 3rd century BCE to 4th century CE) with the precolonial medieval era (9-17th centuries CE). Utilizing in-depth archaeological and historical studies along with local sightings and citizen-science blogs to create a comprehensive data set and map series in a “big-data” approach that makes use of heterogeneous data sets and presence-absence criteria, we discuss how the architecture of warfare shifted from an emphasis on urban defense in the Early Historic period to an emphasis on territorial offense and defense in the medieval period. Many medieval fortifications are known from only local reports and have minimal identifying information but can still be studied in the aggregate using a least-shared denominator approach to quantification and mapping.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"14 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automated subway touch button detection using image process 利用图像处理自动检测地铁触摸按钮

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-08-29 DOI: 10.1186/s40537-024-00941-6

Junfeng An, Mengmeng Lu, Gang Li, Jiqiang Liu, Chongqing Wang

Subway button detection is paramount for passenger safety, yet the occurrence of inadvertent touches poses operational threats. Camera-based detection is indispensable for identifying touch occurrences, ascertaining person identity, and implementing scientific measures. Existing methods suffer from inaccuracies due to the small size of buttons, complex environments, and challenges such as occlusion. We present YOLOv8-DETR-P2-DCNv2-Dynamic-NWD-DA, which enhances occlusion awareness, reduces redundant annotations, and improves contextual feature extraction. The model integrates the RTDETRDecoder, P2 small target detection layer, DCNv2-Dynamic algorithm, and the NWD loss function for multiscale feature extraction. Dataset augmentation and the GAN algorithm refine the model, aligning feature distributions and enhancing precision by 6.5%, 5%, and 5.8% in precision, recall, and mAP50, respectively. These advancements denote significant improvements in key performance indicators.

地铁按钮检测对乘客安全至关重要，但不经意的触碰会对运行造成威胁。基于摄像头的检测对于识别触摸事件、确定人员身份和实施科学措施是不可或缺的。由于按钮尺寸小、环境复杂以及遮挡等挑战，现有方法存在误差。我们提出了 YOLOv8-DETR-P2-DCNv2-Dynamic-NWD-DA，它增强了遮挡意识，减少了冗余注释，并改进了上下文特征提取。该模型集成了 RTDETRD 解码器、P2 小目标检测层、DCNv2-动态算法和用于多尺度特征提取的 NWD 损失函数。数据集增强和 GAN 算法完善了模型，对齐了特征分布，在精度、召回率和 mAP50 方面分别提高了 6.5%、5% 和 5.8%。这些进步表明关键性能指标有了显著提高。

引用次数: 0

Cybersecurity vulnerabilities and solutions in Ethiopian university websites 埃塞俄比亚大学网站的网络安全漏洞和解决方案

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-08-23 DOI: 10.1186/s40537-024-00980-z

Ali Yimam Eshetu, Endris Abdu Mohammed, Ayodeji Olalekan Salau

This study investigates the causes and countermeasures of cybercrime vulnerabilities, specifically focusing on selected 16 Ethiopian university websites. This study uses a cybersecurity awareness survey, and automated vulnerability assessment and penetration testing (VAPT) technique tools, namely, Nmap, Nessus, and Vega, to identify potential security threats and vulnerabilities. The assessment was performed according to the ISO/IEC 27001 series of standards, ensuring a comprehensive and globally recognized approach to information security. The results of this study provide valuable insights into the current state of cybersecurity in Ethiopian universities and reveals a range of issues, from outdated software and poor password management to a lack of encryption and inadequate access control. Vega vulnerability assessment reports 11,286 total findings, and Nessus identified a total of 1749 vulnerabilities across all the websites of the institutions examined. Based on these findings, the study proposes counteractive measures tailored to the specific needs of each identified defect. These recommendations aim to strengthen the security posture of the university websites, thereby protecting sensitive data and maintaining the trust of students, staff, and other stakeholders. The study emphasizes the need for proactive cybersecurity measures in the realm of higher education and presents a strategic plan for universities to improve their digital security.

本研究调查了网络犯罪漏洞的原因和对策，特别关注选定的 16 个埃塞俄比亚大学网站。本研究使用网络安全意识调查以及自动漏洞评估和渗透测试（VAPT）技术工具，即 Nmap、Nessus 和 Vega，来识别潜在的安全威胁和漏洞。评估是根据 ISO/IEC 27001 系列标准进行的，以确保采用全球公认的全面信息安全方法。这项研究的结果为了解埃塞俄比亚大学的网络安全现状提供了有价值的见解，并揭示了从软件过时、密码管理不善到缺乏加密和访问控制不足等一系列问题。Vega 漏洞评估报告共发现 11286 个漏洞，Nessus 在受检机构的所有网站上共发现 1749 个漏洞。根据这些发现，研究针对每个已发现缺陷的具体需求提出了应对措施。这些建议旨在加强大学网站的安全态势，从而保护敏感数据，维护学生、教职员工和其他利益相关者的信任。本研究强调了在高等教育领域采取积极主动的网络安全措施的必要性，并提出了大学提高数字安全的战略计划。

{"title":"Cybersecurity vulnerabilities and solutions in Ethiopian university websites","authors":"Ali Yimam Eshetu, Endris Abdu Mohammed, Ayodeji Olalekan Salau","doi":"10.1186/s40537-024-00980-z","DOIUrl":"https://doi.org/10.1186/s40537-024-00980-z","url":null,"abstract":"This study investigates the causes and countermeasures of cybercrime vulnerabilities, specifically focusing on selected 16 Ethiopian university websites. This study uses a cybersecurity awareness survey, and automated vulnerability assessment and penetration testing (VAPT) technique tools, namely, Nmap, Nessus, and Vega, to identify potential security threats and vulnerabilities. The assessment was performed according to the ISO/IEC 27001 series of standards, ensuring a comprehensive and globally recognized approach to information security. The results of this study provide valuable insights into the current state of cybersecurity in Ethiopian universities and reveals a range of issues, from outdated software and poor password management to a lack of encryption and inadequate access control. Vega vulnerability assessment reports 11,286 total findings, and Nessus identified a total of 1749 vulnerabilities across all the websites of the institutions examined. Based on these findings, the study proposes counteractive measures tailored to the specific needs of each identified defect. These recommendations aim to strengthen the security posture of the university websites, thereby protecting sensitive data and maintaining the trust of students, staff, and other stakeholders. The study emphasizes the need for proactive cybersecurity measures in the realm of higher education and presents a strategic plan for universities to improve their digital security.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"9 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Crude oil price forecasting using K-means clustering and LSTM model enhanced by dense-sparse-dense strategy 使用 K-均值聚类和通过密集-稀疏-密集策略增强的 LSTM 模型预测原油价格

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-08-17 DOI: 10.1186/s40537-024-00977-8

Alireza Jahandoost, Farhad Abedinzadeh Torghabeh, Seyyed Abed Hosseini, Mahboobeh Houshmand

Crude oil is an essential energy source that affects international trade, transportation, and manufacturing, highlighting its importance to the economy. Its future price prediction affects consumer prices and the energy markets, and it shapes the development of sustainable energy. It is essential for financial planning, economic stability, and investment decisions. However, reaching a reliable future prediction is an open issue because of its high volatility. Furthermore, many state-of-the-art methods utilize signal decomposition techniques, which can lead to increased prediction time. In this paper, a model called K-means-dense-sparse-dense long short-term memory (K-means-DSD-LSTM) is proposed, which has three main training phrases for crude oil price forecasting. In the first phase, the DSD-LSTM model is trained. Afterwards, the training part of the data is clustered using the K-means algorithm. Finally, a copy of the trained DSD-LSTM model is fine-tuned for each obtained cluster. It helps the models predict that cluster better while they are generalizing the whole dataset quite well, which diminishes overfitting. The proposed model is evaluated on two famous crude oil benchmarks: West Texas Intermediate (WTI) and Brent. Empirical evaluations demonstrated the superiority of the DSD-LSTM model over the K-means-LSTM model. Furthermore, the K-means-DSD-LSTM model exhibited even stronger performance. Notably, the proposed method yielded promising results across diverse datasets, achieving competitive performance in comparison to existing methods, even without employing signal decomposition techniques.

原油是影响国际贸易、运输和制造业的重要能源，对经济的重要性不言而喻。它对未来价格的预测影响着消费价格和能源市场，并左右着可持续能源的发展。它对财务规划、经济稳定和投资决策至关重要。然而，由于其高度波动性，实现可靠的未来预测是一个尚未解决的问题。此外，许多最先进的方法都采用了信号分解技术，这会导致预测时间的增加。本文提出了一种名为 K-means-dense-sparse-dense long short-term memory（K-means-DSD-LSTM）的模型，该模型有三个主要训练阶段，用于原油价格预测。在第一阶段，对 DSD-LSTM 模型进行训练。然后，使用 K-means 算法对数据的训练部分进行聚类。最后，针对每个获得的聚类对训练好的 DSD-LSTM 模型的副本进行微调。这有助于模型在很好地泛化整个数据集的同时，更好地预测该聚类，从而减少过拟合。我们在两个著名的原油基准上对所提出的模型进行了评估：西德克萨斯中质原油（WTI）和布伦特原油。经验评估表明，DSD-LSTM 模型优于 K-means-LSTM 模型。此外，K-means-DSD-LSTM 模型表现出更强的性能。值得注意的是，所提出的方法在各种数据集上都取得了可喜的成果，与现有方法相比，即使不采用信号分解技术，也能取得具有竞争力的性能。

{"title":"Crude oil price forecasting using K-means clustering and LSTM model enhanced by dense-sparse-dense strategy","authors":"Alireza Jahandoost, Farhad Abedinzadeh Torghabeh, Seyyed Abed Hosseini, Mahboobeh Houshmand","doi":"10.1186/s40537-024-00977-8","DOIUrl":"https://doi.org/10.1186/s40537-024-00977-8","url":null,"abstract":"Crude oil is an essential energy source that affects international trade, transportation, and manufacturing, highlighting its importance to the economy. Its future price prediction affects consumer prices and the energy markets, and it shapes the development of sustainable energy. It is essential for financial planning, economic stability, and investment decisions. However, reaching a reliable future prediction is an open issue because of its high volatility. Furthermore, many state-of-the-art methods utilize signal decomposition techniques, which can lead to increased prediction time. In this paper, a model called K-means-dense-sparse-dense long short-term memory (K-means-DSD-LSTM) is proposed, which has three main training phrases for crude oil price forecasting. In the first phase, the DSD-LSTM model is trained. Afterwards, the training part of the data is clustered using the K-means algorithm. Finally, a copy of the trained DSD-LSTM model is fine-tuned for each obtained cluster. It helps the models predict that cluster better while they are generalizing the whole dataset quite well, which diminishes overfitting. The proposed model is evaluated on two famous crude oil benchmarks: West Texas Intermediate (WTI) and Brent. Empirical evaluations demonstrated the superiority of the DSD-LSTM model over the K-means-LSTM model. Furthermore, the K-means-DSD-LSTM model exhibited even stronger performance. Notably, the proposed method yielded promising results across diverse datasets, achieving competitive performance in comparison to existing methods, even without employing signal decomposition techniques.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"5 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rs-net: Residual Sharp U-Net architecture for pavement crack segmentation and severity assessment Rs-net：用于路面裂缝细分和严重程度评估的残差夏普 U-Net 架构

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-08-17 DOI: 10.1186/s40537-024-00981-y

Luqman Ali, Hamad AlJassmi, Mohammed Swavaf, Wasif Khan, Fady Alnajjar

U-net, a fully convolutional network-based image segmentation method, has demonstrated widespread adaptability in the crack segmentation task. The combination of the semantically dissimilar features of the encoder (shallow layers) and the decoder (deep layers) in the skip connections leads to blurry features map and leads to undesirable over- or under-segmentation of target regions. Additionally, the shallow architecture of the U-Net model prevents the extraction of more discriminatory information from input images. This paper proposes a Residual Sharp U-Net (RS-Net) architecture for crack segmentation and severity assessment in pavement surfaces to address these limitations. The proposed architecture uses residual block in the U-Net model to extract a more insightful representation of features. In addition to that, a sharpening kernel filter is used instead of plain skip connections to generate a fine-tuned encoder features map before combining it with decoder features maps to reduce the dissimilarity between them and smoothes artifacts in the network layers during early training. The proposed architecture is also integrated with various morphological operations to assess the severity of cracks and categorize them into hairline, medium, and severe labels. Experiments results demonstrated that the RS-Net model has promising segmentation performance, outperforming earlier U-Net variations on testing data for crack segmentation and severity assessment, with a promising accuracy (>0.97)

U-net 是一种基于全卷积网络的图像分割方法，在裂缝分割任务中表现出广泛的适应性。在跳转连接中，编码器（浅层）和解码器（深层）在语义上不同的特征结合在一起，导致特征图模糊不清，从而导致目标区域的过度或不足分割。此外，U-Net 模型的浅层结构阻碍了从输入图像中提取更多的判别信息。本文提出了一种用于路面裂缝分割和严重程度评估的残余锐U-Net（RS-Net）架构，以解决这些局限性。建议的架构使用 U-Net 模型中的残差块来提取更有洞察力的特征表示。此外，还使用了锐化内核滤波器来代替普通的跳过连接，以生成微调编码器特征图，然后再将其与解码器特征图相结合，从而降低它们之间的差异，并在早期训练过程中平滑网络层中的人工痕迹。所提出的架构还与各种形态学运算相结合，以评估裂纹的严重程度，并将其分为发丝裂纹、中等裂纹和严重裂纹。实验结果表明，RS-Net 模型具有良好的分割性能，在裂缝分割和严重程度评估的测试数据上，其准确率（>0.97）优于早期的 U-Net 变体。

{"title":"Rs-net: Residual Sharp U-Net architecture for pavement crack segmentation and severity assessment","authors":"Luqman Ali, Hamad AlJassmi, Mohammed Swavaf, Wasif Khan, Fady Alnajjar","doi":"10.1186/s40537-024-00981-y","DOIUrl":"https://doi.org/10.1186/s40537-024-00981-y","url":null,"abstract":"U-net, a fully convolutional network-based image segmentation method, has demonstrated widespread adaptability in the crack segmentation task. The combination of the semantically dissimilar features of the encoder (shallow layers) and the decoder (deep layers) in the skip connections leads to blurry features map and leads to undesirable over- or under-segmentation of target regions. Additionally, the shallow architecture of the U-Net model prevents the extraction of more discriminatory information from input images. This paper proposes a Residual Sharp U-Net (RS-Net) architecture for crack segmentation and severity assessment in pavement surfaces to address these limitations. The proposed architecture uses residual block in the U-Net model to extract a more insightful representation of features. In addition to that, a sharpening kernel filter is used instead of plain skip connections to generate a fine-tuned encoder features map before combining it with decoder features maps to reduce the dissimilarity between them and smoothes artifacts in the network layers during early training. The proposed architecture is also integrated with various morphological operations to assess the severity of cracks and categorize them into hairline, medium, and severe labels. Experiments results demonstrated that the RS-Net model has promising segmentation performance, outperforming earlier U-Net variations on testing data for crack segmentation and severity assessment, with a promising accuracy (>0.97)","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"80 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0