Big Data Research最新文献_第4页

Crop monitoring using remote sensing land use and land change data: Comparative analysis of deep learning methods using pre-trained CNN models 利用遥感土地利用和土地变化数据进行作物监测：使用预训练 CNN 模型的深度学习方法比较分析

IF 3.3 3区计算机科学 Q1 Business, Management and Accounting

Big Data Research

Pub Date : 2024-03-20 DOI: 10.1016/j.bdr.2024.100448

Min Peng , Yunxiang Liu , Asad Khan , Bilal Ahmed , Subrata K. Sarker , Yazeed Yasin Ghadi , Uzair Aslam Bhatti , Muna Al-Razgan , Yasser A. Ali

In the context of the rapidly evolving climate dynamics of the early twenty-first century, the interplay between climate change and biospheric integrity is becoming increasingly critical. The pervasive impact of climate change on ecosystems is manifested not only through alterations in average environmental conditions and their variability but also through ancillary shifts such as escalated oceanic acidification and heightened atmospheric CO₂ levels. These climatic transformations are further compounded by concurrent ecological stressors, including habitat degradation, defaunation, and fragmentation. Against this backdrop, this study delves into the efficacy of advanced deep learning methodologies for the classification of land cover from satellite imagery, with a particular emphasis on agricultural crop monitoring. The study leverages state-of-the-art pre-trained Convolutional Neural Network (CNN) architectures, namely VGG16, MobileNetV2, DenseNet121, and ResNet50, selected for their architectural sophistication and proven competence in image recognition domains. The research framework encompasses a comprehensive data preparation phase incorporating augmentation techniques, a thorough exploratory data analysis to pinpoint and address class imbalances through the computation of class weights, and the strategic fine-tuning of CNN architectures with tailored classification layers to suit the specificities of land cover classification challenges. The models' performance was rigorously evaluated against benchmarks of accuracy and loss, both during the training phase and on validation datasets, with preventative strategies against overfitting, such as early stopping and adaptive learning rate modifications, being integral to the methodology. The findings illuminate the considerable potential of leveraging pre-trained deep learning models for remote sensing in agriculture, demonstrating that advanced CNN architectures, particularly DenseNet121 and ResNet50, are notably effective in enhancing crop type classification accuracy from satellite imagery. This study contributes valuable insights to the field of precision agriculture, advocating for the integration of sophisticated image recognition technologies to bolster crop monitoring efficacy, thereby enabling more nuanced agricultural decision-making and resource allocation.

在二十一世纪初气候动态迅速演变的背景下，气候变化与生物圈完整性之间的相互作用正变得日益重要。气候变化对生态系统的影响无处不在，这不仅体现在平均环境条件及其变异性的改变上，还体现在海洋酸化加剧和大气二氧化碳浓度升高等附带变化上。同时出现的生态压力因素（包括栖息地退化、失衡和破碎化）进一步加剧了这些气候转变。在此背景下，本研究深入探讨了先进的深度学习方法对卫星图像中的土地覆被进行分类的功效，并特别强调了对农业作物的监测。本研究利用了最先进的预训练卷积神经网络（CNN）架构，即 VGG16、MobileNetV2、DenseNet121 和 ResNet50，这些架构因其架构复杂性和在图像识别领域久经考验的能力而入选。研究框架包括结合增强技术的全面数据准备阶段、彻底的探索性数据分析（通过计算类权重来确定和解决类失衡问题）以及对具有定制分类层的 CNN 架构进行战略性微调，以适应土地覆被分类挑战的特殊性。在训练阶段和验证数据集上，都根据准确率和损失基准对模型的性能进行了严格评估，而防止过拟合的策略，如提前停止和自适应学习率修改，则是该方法的组成部分。研究结果阐明了利用预训练深度学习模型进行农业遥感的巨大潜力，证明了先进的 CNN 架构，尤其是 DenseNet121 和 ResNet50，在提高卫星图像作物类型分类准确性方面效果显著。这项研究为精准农业领域提供了宝贵的见解，倡导整合先进的图像识别技术来提高作物监测效率，从而实现更细致的农业决策和资源分配。

{"title":"Crop monitoring using remote sensing land use and land change data: Comparative analysis of deep learning methods using pre-trained CNN models","authors":"Min Peng , Yunxiang Liu , Asad Khan , Bilal Ahmed , Subrata K. Sarker , Yazeed Yasin Ghadi , Uzair Aslam Bhatti , Muna Al-Razgan , Yasser A. Ali","doi":"10.1016/j.bdr.2024.100448","DOIUrl":"10.1016/j.bdr.2024.100448","url":null,"abstract":"<div><p>In the context of the rapidly evolving climate dynamics of the early twenty-first century, the interplay between climate change and biospheric integrity is becoming increasingly critical. The pervasive impact of climate change on ecosystems is manifested not only through alterations in average environmental conditions and their variability but also through ancillary shifts such as escalated oceanic acidification and heightened atmospheric CO<sub>2</sub> levels. These climatic transformations are further compounded by concurrent ecological stressors, including habitat degradation, defaunation, and fragmentation. Against this backdrop, this study delves into the efficacy of advanced deep learning methodologies for the classification of land cover from satellite imagery, with a particular emphasis on agricultural crop monitoring. The study leverages state-of-the-art pre-trained Convolutional Neural Network (CNN) architectures, namely VGG16, MobileNetV2, DenseNet121, and ResNet50, selected for their architectural sophistication and proven competence in image recognition domains. The research framework encompasses a comprehensive data preparation phase incorporating augmentation techniques, a thorough exploratory data analysis to pinpoint and address class imbalances through the computation of class weights, and the strategic fine-tuning of CNN architectures with tailored classification layers to suit the specificities of land cover classification challenges. The models' performance was rigorously evaluated against benchmarks of accuracy and loss, both during the training phase and on validation datasets, with preventative strategies against overfitting, such as early stopping and adaptive learning rate modifications, being integral to the methodology. The findings illuminate the considerable potential of leveraging pre-trained deep learning models for remote sensing in agriculture, demonstrating that advanced CNN architectures, particularly DenseNet121 and ResNet50, are notably effective in enhancing crop type classification accuracy from satellite imagery. This study contributes valuable insights to the field of precision agriculture, advocating for the integration of sophisticated image recognition technologies to bolster crop monitoring efficacy, thereby enabling more nuanced agricultural decision-making and resource allocation.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140282143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Real Time Deep Learning Based Approach for Detecting Network Attacks 基于深度学习的网络攻击实时检测方法

IF 3.3 3区计算机科学 Q1 Business, Management and Accounting

Big Data Research

Pub Date : 2024-02-27 DOI: 10.1016/j.bdr.2024.100446

Christian Callegari, Stefano Giordano, Michele Pagano

Anomaly-based Intrusion Detection is a key research topic in network security due to its ability to face unknown attacks and new security threats. For this reason, many works on the topic have been proposed in the last decade. Nonetheless, an ultimate solution, able to provide a high detection rate with an acceptable false alarm rate, has still to be identified. In the last years big research efforts have focused on the application of Deep Learning techniques to the field, but no work has been able, so far, to propose a system achieving good detection performance, while processing raw network traffic in real time. For this reason in the paper we propose an Intrusion Detection System that, leveraging on probabilistic data structures and Deep Learning techniques, is able to process in real time the traffic collected in a backbone network, offering excellent detection performance and low false alarm rate. Indeed, the extensive experimental tests, run to validate our system and compare different Deep Learning techniques, confirm that, with a proper parameter setting, we can achieve about 92% of detection rate, with an accuracy of 0.899. Finally, with minimal changes, the proposed system can provide some information about the kind of anomaly, although in the multi-class scenario the detection rate is slightly lower (around 86%).

基于异常的入侵检测是网络安全领域的一个重要研究课题，因为它能够面对未知的攻击和新的安全威胁。因此，在过去的十年中，已经有许多关于这一主题的研究成果被提出。然而，能够提供高检测率和可接受误报率的终极解决方案仍有待确定。在过去的几年里，大量的研究工作都集中在深度学习技术在该领域的应用上，但迄今为止，还没有任何工作能够在实时处理原始网络流量的同时，提出一种能够实现良好检测性能的系统。为此，我们在本文中提出了一种入侵检测系统，该系统利用概率数据结构和深度学习技术，能够实时处理骨干网络中收集到的流量，具有良好的检测性能和较低的误报率。事实上，为验证我们的系统和比较不同的深度学习技术而进行的大量实验测试证实，通过适当的参数设置，我们可以实现约 92% 的检测率和 0.899 的准确率。最后，尽管在多类情况下检测率略低（约 86%），但只需做极少的改动，我们提出的系统就能提供一些异常类型的信息。

{"title":"A Real Time Deep Learning Based Approach for Detecting Network Attacks","authors":"Christian Callegari, Stefano Giordano, Michele Pagano","doi":"10.1016/j.bdr.2024.100446","DOIUrl":"10.1016/j.bdr.2024.100446","url":null,"abstract":"<div><p>Anomaly-based Intrusion Detection is a key research topic in network security due to its ability to face unknown attacks and new security threats. For this reason, many works on the topic have been proposed in the last decade. Nonetheless, an ultimate solution, able to provide a high detection rate with an acceptable false alarm rate, has still to be identified. In the last years big research efforts have focused on the application of Deep Learning techniques to the field, but no work has been able, so far, to propose a system achieving good detection performance, while processing raw network traffic in real time. For this reason in the paper we propose an Intrusion Detection System that, leveraging on probabilistic data structures and Deep Learning techniques, is able to process in real time the traffic collected in a backbone network, offering <em>excellent</em> detection performance and low false alarm rate. Indeed, the extensive experimental tests, run to validate our system and compare different Deep Learning techniques, confirm that, with a proper parameter setting, we can achieve about 92% of detection rate, with an accuracy of 0.899. Finally, with minimal changes, the proposed system can provide some information about the kind of anomaly, although in the multi-class scenario the detection rate is slightly lower (around 86%).</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579624000224/pdfft?md5=bbd19915547bc28f9b5784f2f0ddcb21&pid=1-s2.0-S2214579624000224-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140004622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Integration visual navigation algorithm for urban air mobility 用于城市空中机动的集成视觉导航算法

IF 3.3 3区计算机科学 Q1 Business, Management and Accounting

Big Data Research

Pub Date : 2024-02-23 DOI: 10.1016/j.bdr.2024.100447

Yandong Li, Bo Jiang, Long Zeng, Chenglong Li

This paper presents an integration visual navigation algorithm called PnP-ORBSLAM for UAV position estimation in Urban Air Mobility (UAM). ORBSLAM is a popular and benchmark algorithm for vision based navigation applications. The proposed method improve the performance of ORBSLAM by adding a post-processing marker recognition phase to the model. Based on the features extracted from the markers, PnP algorithm is introduced to estimate the position of the monocular camera. The position estimation accuracy of the UAV is supposed to be improved by adding the position information of the camera to the model. Experiment is carried out based on Airsim simulation platform. Results show that the PnP-ORBSLAM algorithm can improve the three-dimensional accuracy by a margin of 5.38 % compared with ORBSLAM. In addition, the process speed of the proposed method can reach about 28 frames per second. It means that the PnP-ORBSLAM algorithm can work in real-time.

本文提出了一种名为 PnP-ORBSLAM 的集成视觉导航算法，用于城市空中机动（UAM）中的无人机位置估计。ORBSLAM 是基于视觉的导航应用中一种流行的基准算法。所提出的方法通过在模型中添加后处理标记识别阶段来提高 ORBSLAM 的性能。根据从标记中提取的特征，引入 PnP 算法来估计单目摄像头的位置。通过在模型中加入相机的位置信息，无人机的位置估计精度应该会得到提高。实验基于 Airsim 仿真平台进行。结果表明，PnP-ORBSLAM 算法的三维精度比 ORBSLAM 算法提高了 5.38%。此外，所提方法的处理速度可达到每秒约 28 帧。这意味着 PnP-ORBSLAM 算法可以实时工作。

引用次数: 0

Investigating Influence of Google-Play Application Titles on Success 调查 Google-Play 应用程序标题对成功的影响

IF 3.3 3区计算机科学 Q1 Business, Management and Accounting

Big Data Research

Pub Date : 2024-02-21 DOI: 10.1016/j.bdr.2024.100443

Ahmad Bilal , Hamid Turab Mirza , Ibrar Hussain , Adnan Ahmad

The title (name) is the primary information related to a mobile (smartphone) application, as it describes its functions and services. An eye-catching title can entice customers to choose a certain application over others. Application development companies are well aware of this phenomenon and invest significant efforts in crafting their application titles with compelling keywords, phrases and topics in pursuit of higher installs. However, to the best of our knowledge, traditional literature that investigates the impact of application titles on success is limited. There may be only a few instances where scientific (data-analytical) approaches have been used to examine application titles. Moreover, these investigations of titles are dominated by supervised learning and traditional literature may lack any unsupervised (cluster) data analysis techniques to measure the impact of titles on application success. Therefore, this research work proposes an unsupervised data analysis approach based on multiple layers and algorithms. The initial layer clusters the application titles, the subsequent layer extracts various textual features from these clusters and the final layer refines the extracted attributes. In general, certain textual features in the titles are proven to be positively and negatively linked with the application installs. Verification of the results has confirmed that this proposed approach can successfully detect the most prominent features from application titles (textual data) that correlate with success.

标题（名称）是与移动（智能手机）应用程序相关的主要信息，因为它描述了应用程序的功能和服务。一个醒目的标题可以吸引客户选择某个应用程序而不是其他应用程序。应用程序开发公司非常清楚这一现象，并投入大量精力，用引人注目的关键词、短语和主题来制作应用程序标题，以追求更高的安装率。然而，据我们所知，研究应用程序标题对成功的影响的传统文献非常有限。使用科学（数据分析）方法研究应用程序标题的例子可能屈指可数。此外，这些对标题的研究都是以监督学习为主，而传统文献可能缺乏任何无监督（聚类）数据分析技术来衡量标题对应用程序成功的影响。因此，本研究工作提出了一种基于多层和算法的无监督数据分析方法。初始层对应用程序标题进行聚类，后续层从这些聚类中提取各种文本特征，最后一层对提取的属性进行细化。一般来说，标题中的某些文本特征被证明与应用程序的安装有正反两方面的联系。对结果的验证证实，这种建议的方法可以成功地从应用程序标题（文本数据）中检测出与成功相关的最突出特征。

{"title":"Investigating Influence of Google-Play Application Titles on Success","authors":"Ahmad Bilal , Hamid Turab Mirza , Ibrar Hussain , Adnan Ahmad","doi":"10.1016/j.bdr.2024.100443","DOIUrl":"https://doi.org/10.1016/j.bdr.2024.100443","url":null,"abstract":"<div><p>The title (name) is the primary information related to a mobile (smartphone) application, as it describes its functions and services. An eye-catching title can entice customers to choose a certain application over others. Application development companies are well aware of this phenomenon and invest significant efforts in crafting their application titles with compelling keywords, phrases and topics in pursuit of higher installs. However, to the best of our knowledge, traditional literature that investigates the impact of application titles on success is limited. There may be only a few instances where scientific (data-analytical) approaches have been used to examine application titles. Moreover, these investigations of titles are dominated by supervised learning and traditional literature may lack any unsupervised (cluster) data analysis techniques to measure the impact of titles on application success. Therefore, this research work proposes an unsupervised data analysis approach based on multiple layers and algorithms. The initial layer clusters the application titles, the subsequent layer extracts various textual features from these clusters and the final layer refines the extracted attributes. In general, certain textual features in the titles are proven to be positively and negatively linked with the application installs. Verification of the results has confirmed that this proposed approach can successfully detect the most prominent features from application titles (textual data) that correlate with success.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139935737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scholar's Career Switch from Academia to Industry: Mining and Analysis from AMiner 学者从学术界到工业界的职业转换：来自 AMiner 的挖掘和分析

IF 3.3 3区计算机科学 Q1 Business, Management and Accounting

Big Data Research

Pub Date : 2024-02-19 DOI: 10.1016/j.bdr.2024.100441

Zhou Shao , Sha Yuan , Yinyu Jin , Yongli Wang

The phenomenon of scholars switching their careers from academia to industry has become more prevalent nowadays. This paper proposes a combination approach of bibliometrics analysis and data mining to study the phenomenon from the perspective of Science of Science (SciSci). Based on the proposed methods, this paper first provides an overview of frequent companies and frequent universities as well as the exponentially increasing number of scholars under the scenario. And then, this study uncovers the excessively single patterns in South Korean scholars switches using frequent pattern mining from their papers. This paper studies the knowledge and technology transfer (KTT) and the research change of scholars by using the language model, the result of which illustrates that the research difference between industry and academia gradually decreases and reaches a steady state in recent years. In exploring the driving factors of the phenomenon, deep preliminary cooperation may be an essential reason, and the career switches will not promote the published amounts of papers but may benefit its academic influence. This study should, therefore, be of value to researchers wishing to study the academia-industry career switches more intensely.

如今，学者从学术界转向产业界的现象越来越普遍。本文提出了文献计量学分析和数据挖掘相结合的方法，从科学的科学（SciSci）的角度研究这一现象。基于所提出的方法，本文首先概述了频繁出现的公司和频繁出现的大学，以及在这种情况下呈指数级增长的学者数量。然后，本研究通过对韩国学者论文中频繁模式的挖掘，发现了韩国学者交换中过于单一的模式。本文利用语言模型研究了知识与技术转移（KTT）和学者的研究变化，研究结果表明，近年来产学研差异逐渐缩小并达到稳定状态。在探讨这一现象的驱动因素时，前期的深度合作可能是一个重要原因，职业转换不会促进论文发表量的提升，但可能有利于其学术影响力的提升。因此，本研究对希望更深入地研究学术界-产业界职业转换的研究人员应该有一定的参考价值。

{"title":"Scholar's Career Switch from Academia to Industry: Mining and Analysis from AMiner","authors":"Zhou Shao , Sha Yuan , Yinyu Jin , Yongli Wang","doi":"10.1016/j.bdr.2024.100441","DOIUrl":"10.1016/j.bdr.2024.100441","url":null,"abstract":"<div><p>The phenomenon of scholars switching their careers from academia to industry has become more prevalent nowadays. This paper proposes a combination approach of bibliometrics analysis and data mining to study the phenomenon from the perspective of Science of Science (SciSci). Based on the proposed methods, this paper first provides an overview of frequent companies and frequent universities as well as the exponentially increasing number of scholars under the scenario. And then, this study uncovers the excessively single patterns in South Korean scholars switches using frequent pattern mining from their papers. This paper studies the knowledge and technology transfer (KTT) and the research change of scholars by using the language model, the result of which illustrates that the research difference between industry and academia gradually decreases and reaches a steady state in recent years. In exploring the driving factors of the phenomenon, deep preliminary cooperation may be an essential reason, and the career switches will not promote the published amounts of papers but may benefit its academic influence. This study should, therefore, be of value to researchers wishing to study the academia-industry career switches more intensely.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interactive big data visualization and analytics 交互式大数据可视化和分析

IF 3.3 3区计算机科学 Q1 Business, Management and Accounting

Big Data Research

Pub Date : 2024-02-14 DOI: 10.1016/j.bdr.2024.100445

David Auber , Nikos Bikakis , Panos K. Chrysanthis , George Papastefanatos , Mohamed Sharaf

引用次数: 0

A big data driven vegetation disease and pest region identification method based on self supervised convolutional neural networks and parallel extreme learning machines 基于自监督卷积神经网络和并行极限学习机的大数据驱动型植被病虫害区域识别方法

IF 3.3 3区计算机科学 Q1 Business, Management and Accounting

Big Data Research

Pub Date : 2024-02-13 DOI: 10.1016/j.bdr.2024.100444

Bo Jiang , Hao Wang , Hanxu Ma

A self supervised convolutional neural network-parallel extreme learning machine classification model based on big data is proposed to address the subjectivity and inaccuracy of traditional methods for identifying vegetation pests and diseases that rely on manual observation and empirical judgment. This model is constructed using convolutional neural networks and parallel extreme learning machines, and integrates feature extraction networks with dual attention mechanisms to improve the accuracy of identifying pests and diseases. The model utilized a large amount of big data for training, achieving a recall rate of 98.42 % on multispectral datasets, and an overall classification accuracy of 99.04 %. After optimizing the residual network, the overall accuracy of identifying vegetation pest and disease areas has been further improved to 99.77 %, and the recall rate has also reached 98.91 %. These results indicate that the method proposed in this study has high accuracy and efficiency in the application of big data, can meet the needs of disease and pest identification, and provides effective technical support for the monitoring and prevention of crop diseases and pests, which has important practical significance.

针对传统植被病虫害识别方法依赖人工观察和经验判断的主观性和不准确性，提出了一种基于大数据的自监督卷积神经网络-并行极端学习机分类模型。该模型采用卷积神经网络和并行极端学习机构建，将特征提取网络与双重关注机制相结合，提高了识别病虫害的准确性。该模型利用大量大数据进行训练，在多光谱数据集上的召回率达到 98.42%，整体分类准确率达到 99.04%。在优化残差网络后，植被病虫害区域识别的总体准确率进一步提高到 99.77 %，召回率也达到了 98.91 %。这些结果表明，本研究提出的方法在大数据应用中具有较高的准确率和效率，能够满足病虫害识别的需要，为农作物病虫害的监测和防治提供了有效的技术支撑，具有重要的现实意义。

{"title":"A big data driven vegetation disease and pest region identification method based on self supervised convolutional neural networks and parallel extreme learning machines","authors":"Bo Jiang , Hao Wang , Hanxu Ma","doi":"10.1016/j.bdr.2024.100444","DOIUrl":"10.1016/j.bdr.2024.100444","url":null,"abstract":"<div><p>A self supervised convolutional neural network-parallel extreme learning machine classification model based on big data is proposed to address the subjectivity and inaccuracy of traditional methods for identifying vegetation pests and diseases that rely on manual observation and empirical judgment. This model is constructed using convolutional neural networks and parallel extreme learning machines, and integrates feature extraction networks with dual attention mechanisms to improve the accuracy of identifying pests and diseases. The model utilized a large amount of big data for training, achieving a recall rate of 98.42 % on multispectral datasets, and an overall classification accuracy of 99.04 %. After optimizing the residual network, the overall accuracy of identifying vegetation pest and disease areas has been further improved to 99.77 %, and the recall rate has also reached 98.91 %. These results indicate that the method proposed in this study has high accuracy and efficiency in the application of big data, can meet the needs of disease and pest identification, and provides effective technical support for the monitoring and prevention of crop diseases and pests, which has important practical significance.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139887525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge Distillation via Token-Level Relationship Graph Based on the Big Data Technologies 基于大数据技术的令牌级关系图知识提炼

IF 3.3 3区计算机科学 Q1 Business, Management and Accounting

Big Data Research

Pub Date : 2024-02-12 DOI: 10.1016/j.bdr.2024.100438

Shuoxi Zhang , Hanpeng Liu , Kun He

In the big data era, characterized by vast volumes of complex data, the efficiency of machine learning models is of utmost importance, particularly in the context of intelligent agriculture. Knowledge distillation (KD), a technique aimed at both model compression and performance enhancement, serves as a pivotal solution by distilling the knowledge from an elaborate model (teacher) to a lightweight, compact counterpart (student). However, the true potential of KD has not been fully explored. Existing approaches primarily focus on transferring instance-level information by big data technologies, overlooking the valuable information embedded in token-level relationships, which may be particularly affected by the long-tail effects. To address the above limitations, we propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG) that leverages token-wise relationships to enhance the performance of knowledge distillation. By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model, resulting in improved performance and mobile-friendly efficiency. To further enhance the learning process, we introduce a dynamic temperature adjustment strategy, which encourages the student model to capture the topology structure of the teacher model more effectively. We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches. Empirical results demonstrate the superiority of TRG across various visual tasks, including those involving imbalanced data. Our method consistently outperforms the existing baselines, establishing a new state-of-the-art performance in the field of KD based on big data technologies.

在以海量复杂数据为特征的大数据时代，机器学习模型的效率至关重要，尤其是在智能农业领域。知识蒸馏（KD）是一种旨在压缩模型和提高性能的技术，通过将复杂模型（教师）中的知识蒸馏为轻量、紧凑的对应模型（学生），成为一种关键的解决方案。然而，KD 的真正潜力尚未得到充分挖掘。现有方法主要侧重于通过大数据技术传输实例级信息，忽略了标记级关系中蕴含的宝贵信息，而这些信息尤其可能受到长尾效应的影响。针对上述局限，我们提出了一种名为 "令牌级关系图（TRG）的知识蒸馏 "的新方法，利用令牌级关系来提高知识蒸馏的性能。通过使用 TRG，学生模型可以有效地模仿教师模型中更高层次的语义信息，从而提高性能和移动友好的效率。为了进一步加强学习过程，我们引入了动态温度调整策略，鼓励学生模型更有效地捕捉教师模型的拓扑结构。我们通过实验评估了所提方法与几种最先进方法的有效性。实证结果表明，TRG 在各种视觉任务（包括涉及不平衡数据的视觉任务）中都具有优势。我们的方法始终优于现有的基线方法，在基于大数据技术的 KD 领域确立了新的一流性能。

{"title":"Knowledge Distillation via Token-Level Relationship Graph Based on the Big Data Technologies","authors":"Shuoxi Zhang , Hanpeng Liu , Kun He","doi":"10.1016/j.bdr.2024.100438","DOIUrl":"https://doi.org/10.1016/j.bdr.2024.100438","url":null,"abstract":"<div><p>In the big data era, characterized by vast volumes of complex data, the efficiency of machine learning models is of utmost importance, particularly in the context of intelligent agriculture. Knowledge distillation (KD), a technique aimed at both model compression and performance enhancement, serves as a pivotal solution by distilling the knowledge from an elaborate model (teacher) to a lightweight, compact counterpart (student). However, the true potential of KD has not been fully explored. Existing approaches primarily focus on transferring instance-level information by big data technologies, overlooking the valuable information embedded in token-level relationships, which may be particularly affected by the long-tail effects. To address the above limitations, we propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG) that leverages token-wise relationships to enhance the performance of knowledge distillation. By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model, resulting in improved performance and mobile-friendly efficiency. To further enhance the learning process, we introduce a dynamic temperature adjustment strategy, which encourages the student model to capture the topology structure of the teacher model more effectively. We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches. Empirical results demonstrate the superiority of TRG across various visual tasks, including those involving imbalanced data. Our method consistently outperforms the existing baselines, establishing a new state-of-the-art performance in the field of KD based on big data technologies.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139737402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attentive Implicit Relation Embedding for Event Recommendation in Event-Based Social Network 为基于事件的社交网络中的事件推荐嵌入注意隐含关系

IF 3.3 3区计算机科学 Q1 Business, Management and Accounting

Big Data Research

Pub Date : 2024-02-05 DOI: 10.1016/j.bdr.2024.100426

Yuan Liang

The event-based social network (EBSN) is a new type of social network that combines online and offline networks, and its primary goal is to recommend appropriate events to users. Most studies do not model event recommendations on the EBSN platform as graph representation learning, nor do they consider the implicit relationship between events, resulting in recommendations that are not accepted by users. Thus, we study graph representation learning, which integrates implicit relationships between social networks and events. First, we propose an algorithm that integrates implicit relationships between social networks and events based on a multiple attention model. The graph structure that integrates implicit relationships between social networks and events is divided into user modeling and event modeling: modeling the interactive information of user events, user social relationships, and implicit relationships between users in user modeling; modeling user information and implicit relationships between events in event modeling; and deeply mining high-level transfer relationships between users and events. Then, the user modeling and event modeling models are fused using a multiattention joint learning mechanism to capture the different impacts of social and implicit relationships on user preferences, improving the recommendation quality of the recommendation system. Finally, the effectiveness of the proposed algorithm is verified in real datasets.

基于事件的社交网络（EBSN）是一种结合了线上和线下网络的新型社交网络，其主要目标是向用户推荐合适的事件。大多数研究都没有将 EBSN 平台上的事件推荐建模为图表示学习，也没有考虑事件之间的隐含关系，结果导致推荐不被用户接受。因此，我们研究了图表示学习，它整合了社交网络和事件之间的隐含关系。首先，我们基于多重注意模型提出了一种整合社交网络和事件之间隐含关系的算法。整合社交网络与事件之间隐含关系的图结构分为用户建模和事件建模：在用户建模中对用户事件的交互信息、用户社交关系和用户之间的隐含关系进行建模；在事件建模中对用户信息和事件之间的隐含关系进行建模；深度挖掘用户与事件之间的高层转移关系。然后，利用多注意力联合学习机制融合用户建模和事件建模模型，捕捉社交关系和隐性关系对用户偏好的不同影响，提高推荐系统的推荐质量。最后，在真实数据集中验证了所提算法的有效性。

{"title":"Attentive Implicit Relation Embedding for Event Recommendation in Event-Based Social Network","authors":"Yuan Liang","doi":"10.1016/j.bdr.2024.100426","DOIUrl":"10.1016/j.bdr.2024.100426","url":null,"abstract":"<div><p>The <u>e</u>vent-<u>b</u>ased <u>s</u>ocial <u>n</u>etwork (EBSN) is a new type of social network that combines online and offline networks, and its primary goal is to recommend appropriate events to users. Most studies do not model event recommendations on the EBSN platform as graph representation learning, nor do they consider the implicit relationship between events, resulting in recommendations that are not accepted by users. Thus, we study graph representation learning, which integrates implicit relationships between social networks and events. First, we propose an algorithm that integrates implicit relationships between social networks and events based on a multiple attention model. The graph structure that integrates implicit relationships between social networks and events is divided into user modeling and event modeling: modeling the interactive information of user events, user social relationships, and implicit relationships between users in user modeling; modeling user information and implicit relationships between events in event modeling; and deeply mining high-level transfer relationships between users and events. Then, the user modeling and event modeling models are fused using a multiattention joint learning mechanism to capture the different impacts of social and implicit relationships on user preferences, improving the recommendation quality of the recommendation system. Finally, the effectiveness of the proposed algorithm is verified in real datasets.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Chlorophyll-a concentration variations in Bohai sea: Impacts of environmental complexity and human activities based on remote sensing technologies 渤海叶绿素 a 浓度变化：基于遥感技术的环境复杂性和人类活动的影响

IF 3.3 3区计算机科学 Q1 Business, Management and Accounting

Big Data Research

Pub Date : 2024-02-03 DOI: 10.1016/j.bdr.2024.100440

Yong Du , Xiaoyu Zhang , Shuchang Ma , Nan Yao

This study extensively explores the intricate dynamics of the Bohai Sea ecosystem, a semi-closed marginal sea in China, influenced by both environmental complexity and human activities. By utilizing chlorophyll-a as an indicator, we closely examine how phytoplankton responds to coastal environmental conditions and stressors. The temporal analysis conducted over the 23-year period from 1998 to 2020 reveals a distinctive "bell-shaped" variation in chlorophyll-a concentration. Spatially, a declining trend is observed from coastal to central regions, characterized by widespread low-value areas. Employing M-K and slope trend analyses, we observe a 42.13 % decline in the northern Bohai Sea, contrasting with a significant 57.87 % increase in the central and southern regions. The innovative aspects of this research lie in identifying the complex interplay between chlorophyll-a concentration, human pollution controls, and nutrient inputs. Factors contributing to chlorophyll-a concentration, ranked by significance, include sea surface temperature, photosynthetically available radiation (PAR), and wind speed. Remarkably, the negligible impact of the "2015 Tianjin explosion" underscores the robustness of the Bohai Sea's chlorophyll-a dynamics. Furthermore, the positive correlation between phosphorus input and chlorophyll classifies Bohai Bay as a phosphorus-limited aquatic ecosystem. In conclusion, this study provides crucial insights for the preservation of the Bohai Sea ecosystem, emphasizing the necessity for ongoing monitoring and management strategies in the face of evolving environmental and anthropogenic influences.

渤海是中国的一个半封闭边缘海，受环境复杂性和人类活动的双重影响，本研究广泛探讨了渤海生态系统的复杂动态。我们以叶绿素 a 为指标，仔细研究了浮游植物如何对沿岸环境条件和压力因素做出反应。从 1998 年到 2020 年 23 年的时间分析表明，叶绿素-a 浓度呈明显的 "钟形 "变化。从空间上看，叶绿素-a 浓度从沿海地区向中部地区呈下降趋势，低值区分布广泛。通过 M-K 和斜率趋势分析，我们发现渤海北部下降了 42.13%，而中部和南部地区则显著上升了 57.87%。这项研究的创新之处在于确定了叶绿素-a 浓度、人类污染控制和营养物质输入之间复杂的相互作用。影响叶绿素-a 浓度的因素按重要性排序包括海面温度、光合可利用辐射（PAR）和风速。值得注意的是，"2015 年天津大爆炸 "的影响微乎其微，这凸显了渤海叶绿素-a 动态变化的稳健性。此外，磷输入与叶绿素之间的正相关性将渤海湾归类为磷限制型水生生态系统。总之，这项研究为保护渤海生态系统提供了重要启示，强调了面对不断变化的环境和人为影响，持续监测和管理策略的必要性。

{"title":"Chlorophyll-a concentration variations in Bohai sea: Impacts of environmental complexity and human activities based on remote sensing technologies","authors":"Yong Du , Xiaoyu Zhang , Shuchang Ma , Nan Yao","doi":"10.1016/j.bdr.2024.100440","DOIUrl":"10.1016/j.bdr.2024.100440","url":null,"abstract":"<div><p>This study extensively explores the intricate dynamics of the Bohai Sea ecosystem, a semi-closed marginal sea in China, influenced by both environmental complexity and human activities. By utilizing chlorophyll-a as an indicator, we closely examine how phytoplankton responds to coastal environmental conditions and stressors. The temporal analysis conducted over the 23-year period from 1998 to 2020 reveals a distinctive \"bell-shaped\" variation in chlorophyll-a concentration. Spatially, a declining trend is observed from coastal to central regions, characterized by widespread low-value areas. Employing M-K and slope trend analyses, we observe a 42.13 % decline in the northern Bohai Sea, contrasting with a significant 57.87 % increase in the central and southern regions. The innovative aspects of this research lie in identifying the complex interplay between chlorophyll-a concentration, human pollution controls, and nutrient inputs. Factors contributing to chlorophyll-a concentration, ranked by significance, include sea surface temperature, photosynthetically available radiation (PAR), and wind speed. Remarkably, the negligible impact of the \"2015 Tianjin explosion\" underscores the robustness of the Bohai Sea's chlorophyll-a dynamics. Furthermore, the positive correlation between phosphorus input and chlorophyll classifies Bohai Bay as a phosphorus-limited aquatic ecosystem. In conclusion, this study provides crucial insights for the preservation of the Bohai Sea ecosystem, emphasizing the necessity for ongoing monitoring and management strategies in the face of evolving environmental and anthropogenic influences.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139663075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0