Big Data Research最新文献_第9页

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research

Pub Date : 2024-04-30 DOI: 10.1016/j.bdr.2024.100462

Li Deng , Shihu Liu , Weihua Xu , Xianghong Lin

How to make a precise similarity measurement for graph data is considered as highly recommended research in many fields. Hereinto, the so-named graph data is the coalition of patterns and edges that connect patterns. By taking both of pattern information and edge information into consideration, this paper introduces an improved centrality and geometric perspective-based approach to measure the similarity between any two graph data. Once these two graph data are projected into a plane, the pattern distance can be calculated by Euclid metric. With the help of the area composed by length of each edge and angle that constructed by the positive X-axis and the edge, the area-based edge distance is computed. To get better measurement, position-based edge distance is used to modify the edge distance. Up to now, the global distance between any two graph data can be determined by combining the above mentioned two distance results. Finally, the letter dataset is applied for experiment to examine the proposed similarity approach. The experimental results show that the proposed approach captures the similarity of graph data commendably and gets a tradeoff between time and precision.

如何对图数据进行精确的相似性测量，是许多领域都非常推崇的研究。所谓图数据，就是由图案和连接图案的边组成的联盟。通过同时考虑模式信息和边信息，本文介绍了一种改进的基于中心性和几何透视的方法来测量任意两个图数据之间的相似性。将这两个图形数据投影到一个平面后，就可以用欧几里得度量计算出图案距离。借助由每条边的长度和正 X 轴与边的夹角构成的面积，可以计算出基于面积的边距。为了获得更好的测量结果，基于位置的边缘距离被用来修正边缘距离。至此，任何两个图形数据之间的全局距离都可以通过综合上述两种距离结果来确定。最后，应用信件数据集进行实验，检验所提出的相似性方法。实验结果表明，所提出的方法能很好地捕捉图数据的相似性，并在时间和精度之间取得了平衡。

{"title":"Similarity Measurement for Graph Data: An Improved Centrality and Geometric Perspective-Based Approach","authors":"Li Deng , Shihu Liu , Weihua Xu , Xianghong Lin","doi":"10.1016/j.bdr.2024.100462","DOIUrl":"https://doi.org/10.1016/j.bdr.2024.100462","url":null,"abstract":"<div><p>How to make a precise similarity measurement for graph data is considered as highly recommended research in many fields. Hereinto, the so-named graph data is the coalition of patterns and edges that connect patterns. By taking both of pattern information and edge information into consideration, this paper introduces an improved centrality and geometric perspective-based approach to measure the similarity between any two graph data. Once these two graph data are projected into a plane, the pattern distance can be calculated by Euclid metric. With the help of the area composed by length of each edge and angle that constructed by the positive X-axis and the edge, the area-based edge distance is computed. To get better measurement, position-based edge distance is used to modify the edge distance. Up to now, the global distance between any two graph data can be determined by combining the above mentioned two distance results. Finally, the <span>letter dataset</span> is applied for experiment to examine the proposed similarity approach. The experimental results show that the proposed approach captures the similarity of graph data commendably and gets a tradeoff between time and precision.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100462"},"PeriodicalIF":3.3,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140824127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Sea Surface Temperature Forecasting Problem with Deep Dilation-Erosion-Linear Models 论深层扩张-侵蚀-线性模型的海面温度预报问题

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research

Pub Date : 2024-04-26 DOI: 10.1016/j.bdr.2024.100455

Ricardo de A. Araújo , Paulo S.G. de Mattos Neto , Nadia Nedjah , Sergio C.B. Soares

The sea surface temperature (SST) is considered an important measure for detecting changes in climate and marine ecosystems. So, its forecasting is essential for supporting governmental strategies to avoid side effects on the global population. In this paper, we analyze the SST time series and suggest that a combination between a linear component and a nonlinear component with long-term dependency can better represent it. Based on this assumption, we propose a deep neural network architecture with dilation-erosion-linear (DEL) processing units to deal with this particular kind of time series. An empirical analysis is performed in this work using three SST time series, where we explore three statistical measures. The experimental results demonstrate that the proposed model outperformed recent and classical literature forecasting techniques according to well-known performance metrics.

海面温度（SST）被认为是检测气候和海洋生态系统变化的重要指标。因此，对其进行预测对于支持政府避免对全球人口造成副作用的战略至关重要。在本文中，我们分析了 SST 时间序列，并提出线性分量和非线性分量之间的组合具有长期依赖性，可以更好地代表 SST。基于这一假设，我们提出了一种带有扩张-侵蚀-线性（DEL）处理单元的深度神经网络架构，以处理这种特殊的时间序列。在这项工作中，我们使用三个 SST 时间序列进行了实证分析，探索了三种统计量。实验结果表明，根据著名的性能指标，所提出的模型优于最新的经典文献预测技术。

引用次数: 0

A Cross-Chain Mechanism for Agricultural Engineering Document Management Blockchain in the Context of Big Data 大数据背景下农业工程文件管理区块链的跨链机制

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research

Pub Date : 2024-04-25 DOI: 10.1016/j.bdr.2024.100459

Lei Shi , Yimin Zhou , Wei Wang , Juan Wang , Yang Bai , Chengzong Peng , Ding Chen , Zuli Wang

Cross-chain mechanism functions as typical approaches for information interaction between diverse blockchains tackling the problem of information silos in the big data era. Most of the existing cross-chain mechanisms are targeted at virtual currency blockchains in the financial sector. With more and more engineering documents manufactured by the development of modern smart farming, the need for engineering document management and cross-chaining between various blockchains has become increasingly urgent. This paper proposes a novel attainable cross-chain mechanism for agricultural engineering document management blockchains concerning the unique structure and operation principals of the specific domain. The methodology sufficiently integrated the characteristics of the agricultural engineering document management with the notary scheme, constructed by government supervision nodes with high credibility. Meanwhile, the authentication technology and cryptographic algorithms are internally fused, solving the authentication problem of the document cross-chain and protecting the cross-chain information respectively, which ensures the integrity and security of the file attribute information, alongside file ontology data in the cross-chain process. Adequate security proof and experiments illustrate that the developed mechanism can guarantee the feasibility of the mechanism, authenticity of the cross-chain parties, and the integrality and reliability of the document information, thus catering to the requirements of the cross-chain performance of blockchain in the field of agricultural engineering document management.

跨链机制是不同区块链之间进行信息交互的典型方法，可解决大数据时代的信息孤岛问题。现有的跨链机制大多针对金融领域的虚拟货币区块链。随着现代智能农业的发展，越来越多的工程文档被制造出来，各种区块链之间的工程文档管理和跨链需求日益迫切。本文针对农业工程文档管理区块链的独特结构和运行原理，提出了一种新颖的可实现的跨链机制。该方法充分结合了农业工程文件管理的特点和公证方案，由政府监管节点构建，具有较高的公信力。同时，内部融合了认证技术和密码算法，分别解决了文件跨链的认证问题和跨链信息的保护问题，确保了跨链过程中文件属性信息以及文件本体数据的完整性和安全性。充分的安全证明和实验表明，所开发的机制能够保证机制的可行性、跨链各方的真实性以及文件信息的完整性和可靠性，从而满足了农业工程文件管理领域对区块链跨链性能的要求。

{"title":"A Cross-Chain Mechanism for Agricultural Engineering Document Management Blockchain in the Context of Big Data","authors":"Lei Shi , Yimin Zhou , Wei Wang , Juan Wang , Yang Bai , Chengzong Peng , Ding Chen , Zuli Wang","doi":"10.1016/j.bdr.2024.100459","DOIUrl":"10.1016/j.bdr.2024.100459","url":null,"abstract":"<div><p>Cross-chain mechanism functions as typical approaches for information interaction between diverse blockchains tackling the problem of information silos in the big data era. Most of the existing cross-chain mechanisms are targeted at virtual currency blockchains in the financial sector. With more and more engineering documents manufactured by the development of modern smart farming, the need for engineering document management and cross-chaining between various blockchains has become increasingly urgent. This paper proposes a novel attainable cross-chain mechanism for agricultural engineering document management blockchains concerning the unique structure and operation principals of the specific domain. The methodology sufficiently integrated the characteristics of the agricultural engineering document management with the notary scheme, constructed by government supervision nodes with high credibility. Meanwhile, the authentication technology and cryptographic algorithms are internally fused, solving the authentication problem of the document cross-chain and protecting the cross-chain information respectively, which ensures the integrity and security of the file attribute information, alongside file ontology data in the cross-chain process. Adequate security proof and experiments illustrate that the developed mechanism can guarantee the feasibility of the mechanism, authenticity of the cross-chain parties, and the integrality and reliability of the document information, thus catering to the requirements of the cross-chain performance of blockchain in the field of agricultural engineering document management.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100459"},"PeriodicalIF":3.3,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140782467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tree parameter extraction method based on new remote sensing technology and terrestrial laser scanning technology 基于新型遥感技术和地面激光扫描技术的树木参数提取方法

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research

Pub Date : 2024-04-23 DOI: 10.1016/j.bdr.2024.100460

Aiguo Wang , Jun Wang , Haiming Li , Jian Hu , Haiyuan Zhou , Xinyu Zhang , Xuan Liu , Wanying Wang , Wenjin Zhang , Siting Wu , Ningyang Jiao , Yihao Wang

Ground LiDAR is a terrestrial LiDAR system that is often used for terrain and geomorphic mapping. Ground-based LiDAR can be used to collect more local and short-range data, making it ideal for mapping smaller areas with high precision. In order to solve the rapid extraction of tree parameters in the national public welfare forest survey, the ground-based LIDAR was used to obtain the point cloud of trees, and the point cloud data was registered, denoised, normalized, sliced, parameter extracted, etc., and the parameters of individual trees in the forest were obtained. The Bland-Altman consistency test is used to test whether the method of extracting tree parameters from point clouds is consistent with the traditional measurement method. The experimental results show that the point cloud data obtained by the ground-based LIDAR can quickly, conveniently and accurately extract the tree parameters, which is consistent with the traditional tree parameter extraction method, and has the advantages than the traditional tree parameter measurement, such as point cloud, image and traceability. It has a unique advantage in establishing a tree database. It is suggested that LIDAR should be used for forest survey in the future.

地面激光雷达是一种地面激光雷达系统，通常用于地形和地貌测绘。地基激光雷达可用于采集更多局部和短程数据，因此非常适合高精度绘制较小区域的地图。为解决全国公益林调查中树木参数的快速提取问题，利用地基激光雷达获取树木点云，并对点云数据进行注册、去噪、归一化、切片、参数提取等处理，得到森林中单株树木的参数。采用 Bland-Altman 一致性检验法检验从点云提取树木参数的方法与传统测量方法是否一致。实验结果表明，地基激光雷达获取的点云数据可以快速、方便、准确地提取树木参数，与传统的树木参数提取方法一致，与传统的树木参数测量方法相比，具有点云化、影像化、可追溯等优点。在建立树木数据库方面具有独特的优势。建议今后在森林调查中使用激光雷达。

{"title":"Tree parameter extraction method based on new remote sensing technology and terrestrial laser scanning technology","authors":"Aiguo Wang , Jun Wang , Haiming Li , Jian Hu , Haiyuan Zhou , Xinyu Zhang , Xuan Liu , Wanying Wang , Wenjin Zhang , Siting Wu , Ningyang Jiao , Yihao Wang","doi":"10.1016/j.bdr.2024.100460","DOIUrl":"10.1016/j.bdr.2024.100460","url":null,"abstract":"<div><p>Ground LiDAR is a terrestrial LiDAR system that is often used for terrain and geomorphic mapping. Ground-based LiDAR can be used to collect more local and short-range data, making it ideal for mapping smaller areas with high precision. In order to solve the rapid extraction of tree parameters in the national public welfare forest survey, the ground-based LIDAR was used to obtain the point cloud of trees, and the point cloud data was registered, denoised, normalized, sliced, parameter extracted, etc., and the parameters of individual trees in the forest were obtained. The Bland-Altman consistency test is used to test whether the method of extracting tree parameters from point clouds is consistent with the traditional measurement method. The experimental results show that the point cloud data obtained by the ground-based LIDAR can quickly, conveniently and accurately extract the tree parameters, which is consistent with the traditional tree parameter extraction method, and has the advantages than the traditional tree parameter measurement, such as point cloud, image and traceability. It has a unique advantage in establishing a tree database. It is suggested that LIDAR should be used for forest survey in the future.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100460"},"PeriodicalIF":3.3,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140795530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multiscale electricity theft detection model based on feature engineering 基于特征工程的多尺度窃电检测模型

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research

Pub Date : 2024-04-23 DOI: 10.1016/j.bdr.2024.100457

Wei Zhang, Yu Dai

With the widespread adoption of smart meters and the growing availability of data mining and machine learning algorithms, there is a pressing demand for methods that are both accurate and explicable in identifying electricity theft patterns among end-users. To address this need, this study proposes a multi-scale anomaly detection model based on feature engineering.Specifically, tsfresh is utilized in feature engineering to extract electricity consumption features from the raw data, and XGBoost is employed to select features that are highly correlated with anomalous behavior, which have clear physical interpretations. Multi-scale convolutional neural networks are then used to analyze and process the data at different temporal and frequency scales. Attention mechanisms are applied to assign weights to different feature channels, and all of the extracted information is fused for anomaly detection. The combination of feature engineering and multi-scale convolutional neural networks not only enhances the interpretability of the model but also improves its performance, as demonstrated by the experimental results, which show that the proposed method outperforms traditional anomaly detection approaches across multiple evaluation metrics.

随着智能电表的广泛应用以及数据挖掘和机器学习算法的日益普及，人们迫切需要既准确又可解释的方法来识别终端用户的窃电模式。为满足这一需求，本研究提出了一种基于特征工程的多尺度异常检测模型。具体来说，在特征工程中使用 tsfresh 从原始数据中提取用电特征，并使用 XGBoost 选择与异常行为高度相关的特征，这些特征具有明确的物理解释。然后使用多尺度卷积神经网络来分析和处理不同时间和频率尺度的数据。应用注意机制为不同的特征通道分配权重，并融合所有提取的信息进行异常检测。实验结果表明，特征工程与多尺度卷积神经网络的结合不仅增强了模型的可解释性，还提高了模型的性能。

{"title":"A multiscale electricity theft detection model based on feature engineering","authors":"Wei Zhang, Yu Dai","doi":"10.1016/j.bdr.2024.100457","DOIUrl":"10.1016/j.bdr.2024.100457","url":null,"abstract":"<div><p>With the widespread adoption of smart meters and the growing availability of data mining and machine learning algorithms, there is a pressing demand for methods that are both accurate and explicable in identifying electricity theft patterns among end-users. To address this need, this study proposes a multi-scale anomaly detection model based on feature engineering.Specifically, tsfresh is utilized in feature engineering to extract electricity consumption features from the raw data, and XGBoost is employed to select features that are highly correlated with anomalous behavior, which have clear physical interpretations. Multi-scale convolutional neural networks are then used to analyze and process the data at different temporal and frequency scales. Attention mechanisms are applied to assign weights to different feature channels, and all of the extracted information is fused for anomaly detection. The combination of feature engineering and multi-scale convolutional neural networks not only enhances the interpretability of the model but also improves its performance, as demonstrated by the experimental results, which show that the proposed method outperforms traditional anomaly detection approaches across multiple evaluation metrics.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100457"},"PeriodicalIF":3.3,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140762245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantitative analysis of big data for land resource classification and zoning at the township level in Northern Shaanxi 陕北乡镇级土地资源分类与区划的大数据定量分析

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research

Pub Date : 2024-04-23 DOI: 10.1016/j.bdr.2024.100458

Hongkun Xie , Minghua Huang , Wentao Lei , Yang Wang , Lu Ou

To analyze and evaluate the conditions and distribution characteristics of rural land resources in northern Shaanxi. The experiment extracts two terrain feature values, namely slope and undulation, which are highly correlated with land resources. Then, the extraction results of all 302-township level administrative regions in northern Shaanxi are processed, and the scoring results of all township level units are sorted. Based on this, optimization and adjustment are made to form a classification result. The experimental results show that land resources in primary townships are most scarce, mainly distributed in the central and western regions of northern Shaanxi, with 53 in Yan'an and 7 in Yulin; Land resources in secondary townships are relatively scarce, mainly distributed along the Yellow River in the central and southern parts of northern Shaanxi, with 40 in Yan'an and 53 in Yulin; The land resources of third level townships are relatively abundant, generally distributed along the Great Wall, and belong to the transitional zone between windblown sand and grassland areas and hilly and gully areas. Except for one third level township located in Yan'an, all 22 other townships are located in Yulin; The fourth level townships have abundant land resources and are located in the loess plateau landform area in the southern part of northern Shaanxi. They belong to Yan'an Luochuan and three surrounding counties, totaling 17 townships; The terrain of the fifth and sixth level townships is flat, and the land resources are the most abundant. They belong to the sandy and grassy terrain north of the Great Wall in northern Shaanxi. A total of 56 townships are located in 7 county-level administrative regions of Yulin City. The experimental results lay the foundation for the research on optimizing the spatial pattern of rural life in northern Shaanxi, and can also provide support for classified guidance and precise policy implementation for rural revitalization, agricultural industry policy formulation, human settlement environment construction, and ecological environment protection.

分析评价陕北农村土地资源状况及分布特征。实验提取了与土地资源高度相关的坡度和起伏两个地形特征值。然后，对陕北所有 302 个乡镇级行政区域的提取结果进行处理，并对所有乡镇级单位的评分结果进行排序。在此基础上进行优化调整，形成分类结果。试验结果表明，一级乡镇土地资源最为稀缺，主要分布在陕北中西部地区，延安 53 个，榆林 7 个；二级乡镇土地资源相对稀缺，主要分布在陕北中南部黄河沿岸地区，延安 40 个，榆林 53 个；三级乡镇土地资源相对丰富，一般分布在长城沿线，属于风沙草原区与丘陵沟壑区的过渡地带。除 1 个三级乡镇位于延安外，其余 22 个乡镇均位于榆林；四级乡镇土地资源丰富，位于陕北南部黄土高原地貌区。分属延安洛川及周边三个县，共 17 个乡镇；五、六级乡镇地势平坦，土地资源最为丰富。属于陕北长城以北的沙草地带。榆林市 7 个县级行政区共有 56 个乡镇。实验结果为陕北农村生活空间格局优化研究奠定了基础，也可为乡村振兴、农业产业政策制定、人居环境建设、生态环境保护等方面的分类指导和精准施策提供支撑。

{"title":"Quantitative analysis of big data for land resource classification and zoning at the township level in Northern Shaanxi","authors":"Hongkun Xie , Minghua Huang , Wentao Lei , Yang Wang , Lu Ou","doi":"10.1016/j.bdr.2024.100458","DOIUrl":"10.1016/j.bdr.2024.100458","url":null,"abstract":"<div><p>To analyze and evaluate the conditions and distribution characteristics of rural land resources in northern Shaanxi. The experiment extracts two terrain feature values, namely slope and undulation, which are highly correlated with land resources. Then, the extraction results of all 302-township level administrative regions in northern Shaanxi are processed, and the scoring results of all township level units are sorted. Based on this, optimization and adjustment are made to form a classification result. The experimental results show that land resources in primary townships are most scarce, mainly distributed in the central and western regions of northern Shaanxi, with 53 in Yan'an and 7 in Yulin; Land resources in secondary townships are relatively scarce, mainly distributed along the Yellow River in the central and southern parts of northern Shaanxi, with 40 in Yan'an and 53 in Yulin; The land resources of third level townships are relatively abundant, generally distributed along the Great Wall, and belong to the transitional zone between windblown sand and grassland areas and hilly and gully areas. Except for one third level township located in Yan'an, all 22 other townships are located in Yulin; The fourth level townships have abundant land resources and are located in the loess plateau landform area in the southern part of northern Shaanxi. They belong to Yan'an Luochuan and three surrounding counties, totaling 17 townships; The terrain of the fifth and sixth level townships is flat, and the land resources are the most abundant. They belong to the sandy and grassy terrain north of the Great Wall in northern Shaanxi. A total of 56 townships are located in 7 county-level administrative regions of Yulin City. The experimental results lay the foundation for the research on optimizing the spatial pattern of rural life in northern Shaanxi, and can also provide support for classified guidance and precise policy implementation for rural revitalization, agricultural industry policy formulation, human settlement environment construction, and ecological environment protection.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100458"},"PeriodicalIF":3.3,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140788495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Big Data in organizations: Exploring the adoption of Big Data applications and their impact on organizations in China and the Netherlands 组织中的大数据：探索大数据应用的采用及其对中国和荷兰组织的影响

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research

Pub Date : 2024-04-16 DOI: 10.1016/j.bdr.2024.100454

Jörg Raab , Yuting Pang , Joan Baaijens , Honggeng Zhou

Digital technology has rapidly been transforming how organizations operate. However, the literature in management studies has only just started to problematize the fundamental inter-relation of digital technology and organizing and we lack sound data about the actual breadth and depth of these changes. This study therefore explores the state of the implementation of Big Data applications in a wide range of organizations in China and the Netherlands and the impact on organizational structures and processes. Our findings show that most organizations are still in an experimental phase at best. We can therefore observe an evolutionary model of technology adoption

数字技术正在迅速改变组织的运作方式。然而，管理研究方面的文献才刚刚开始对数字技术与组织的基本相互关系提出问题，我们缺乏有关这些变化的实际广度和深度的可靠数据。因此，本研究探讨了大数据应用在中国和荷兰各类组织中的实施情况，以及对组织结构和流程的影响。我们的研究结果表明，大多数组织充其量仍处于试验阶段。因此，我们可以观察到技术采用的演进模式

引用次数: 0

Machine Learning for Tsunami Waves Forecasting Using Regression Trees 使用回归树进行海啸波预测的机器学习

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research

Pub Date : 2024-04-16 DOI: 10.1016/j.bdr.2024.100452

Eugenio Cesario , Salvatore Giampá , Enrico Baglione , Louise Cordrie , Jacopo Selva , Domenico Talia

After a seismic event, tsunami early warning systems (TEWSs) try to accurately forecast the maximum height of incident waves at specific target points in front of the coast, so that early warnings can be launched on locations where the impact of tsunami waves can be destructive to deliver aids in these locations in the immediate post-event management. The uncertainty on the forecast can be quantified with ensembles of alternative scenarios. Similarly, in probabilistic tsunami hazard analysis (PTHA) a large number of simulations is required to cover the natural variability of the source process in each location. To improve the accuracy and computational efficiency of tsunami forecasting methods, scientists have recently started to exploit machine learning techniques to process pre-computed simulation data. However, the approaches proposed in literature, mainly based on neural networks, suffer of high training time and limited model explainability. To overtake these issues, this paper describes a machine learning approach based on regression trees to model and forecast tsunami evolutions. The algorithm takes as input a set of simulations forming an ensemble that describes potential benefit regional impact of tsunami source scenarios in a given source area, and it provides predictive models to forecast the tsunami waves for other potential tsunami sources in the same area. The experimental evaluation, performed on the 2003 M6.8 Zemmouri-Boumerdes earthquake and tsunami simulation data, shows that regression trees achieve high forecasting accuracy. Moreover, they provide domain experts with fully-explainable and interpretable models, which are a valuable support for environmental scientists because they describe underlying rules and patterns behind the models and allow for an explicit inspection of their functioning. This can enable a full and trustable exploration of source uncertainty in tsunami early-warning and urgent computing scenarios, with large ensembles of computationally light tsunami simulations.

地震发生后，海啸预警系统（TEWS）试图准确预报海岸前方特定目标点的最大波浪高度，以便对海啸波浪可能造成破坏性影响的地点发出预警，为这些地点的灾后管理提供帮助。预报的不确定性可以通过替代方案的集合来量化。同样，在海啸危害概率分析（PTHA）中，需要进行大量的模拟，以涵盖每个地点海啸源过程的自然变化。为了提高海啸预测方法的准确性和计算效率，科学家们最近开始利用机器学习技术来处理预先计算的模拟数据。然而，文献中提出的主要基于神经网络的方法存在训练时间长、模型可解释性有限等问题。为了克服这些问题，本文介绍了一种基于回归树的机器学习方法，用于海啸演变的建模和预测。该算法将一组模拟结果作为输入，形成一个集合，描述特定海啸源地区海啸源情景的潜在区域影响，并提供预测模型，预测同一地区其他潜在海啸源的海啸波。在 2003 年 M6.8 Zemmouri-Boumerdes 地震和海啸模拟数据上进行的实验评估表明，回归树达到了很高的预测精度。此外，回归树还为领域专家提供了可充分解释和解读的模型，这对环境科学家来说是一种宝贵的支持，因为它们描述了模型背后的基本规则和模式，并允许对其功能进行明确的检查。这样，就可以利用大量计算轻便的海啸模拟集合，对海啸预警和紧急计算场景中的不确定性源进行全面、可信的探索。

{"title":"Machine Learning for Tsunami Waves Forecasting Using Regression Trees","authors":"Eugenio Cesario , Salvatore Giampá , Enrico Baglione , Louise Cordrie , Jacopo Selva , Domenico Talia","doi":"10.1016/j.bdr.2024.100452","DOIUrl":"https://doi.org/10.1016/j.bdr.2024.100452","url":null,"abstract":"<div><p>After a seismic event, tsunami early warning systems (TEWSs) try to accurately forecast the maximum height of incident waves at specific target points in front of the coast, so that early warnings can be launched on locations where the impact of tsunami waves can be destructive to deliver aids in these locations in the immediate post-event management. The uncertainty on the forecast can be quantified with ensembles of alternative scenarios. Similarly, in probabilistic tsunami hazard analysis (PTHA) a large number of simulations is required to cover the natural variability of the source process in each location. To improve the accuracy and computational efficiency of tsunami forecasting methods, scientists have recently started to exploit machine learning techniques to process pre-computed simulation data. However, the approaches proposed in literature, mainly based on neural networks, suffer of high training time and limited model explainability. To overtake these issues, this paper describes a machine learning approach based on regression trees to model and forecast tsunami evolutions. The algorithm takes as input a set of simulations forming an ensemble that describes potential benefit regional impact of tsunami source scenarios in a given source area, and it provides predictive models to forecast the tsunami waves for other potential tsunami sources in the same area. The experimental evaluation, performed on the 2003 M6.8 Zemmouri-Boumerdes earthquake and tsunami simulation data, shows that regression trees achieve high forecasting accuracy. Moreover, they provide domain experts with fully-explainable and interpretable models, which are a valuable support for environmental scientists because they describe underlying rules and patterns behind the models and allow for an explicit inspection of their functioning. This can enable a full and trustable exploration of source uncertainty in tsunami early-warning and urgent computing scenarios, with large ensembles of computationally light tsunami simulations.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100452"},"PeriodicalIF":3.3,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579624000285/pdfft?md5=942e994d950c715c0c020e511bc26341&pid=1-s2.0-S2214579624000285-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140559033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scheduling critical periodic jobs with selective partial computations along with gang jobs 调度关键的周期性工作，有选择地进行部分计算和帮派工作

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research

Pub Date : 2024-04-04 DOI: 10.1016/j.bdr.2024.100453

Helen Karatza

One of the main issues with distributed systems, like clouds, is scheduling complex workloads, which are made up of various job types with distinct features. Gang jobs are one kind of parallel applications that these systems support. This paper examines the scheduling of workloads that comprise gangs and critical periodic jobs that can allow for partial computations when necessary to overcome gang job execution. The simulation's results shed important light on how gang performance is impacted by partial computations of critical jobs. The results also reveal that, under the proposed scheduling scheme, partial computations which take into account gangs’ degree of parallelism, might lower the average response time of gang jobs, resulting in an acceptable level of the average results precision of the critical jobs. Additionally, it is observed that as the deviation from the average partial computation increases, the performance improvement due to partial computations increases with the aforementioned tradeoff remaining significant.

云计算等分布式系统的主要问题之一是调度复杂的工作负载，这些负载由具有不同特征的各种作业类型组成。帮派工作是这些系统支持的一种并行应用。本文研究了由帮派和关键周期性作业组成的工作负载的调度问题，这些工作负载可以在必要时进行部分计算，以克服帮派作业的执行问题。模拟结果揭示了帮派性能如何受到关键作业部分计算的影响。结果还显示，在建议的调度方案下，考虑到帮组并行程度的部分计算可能会降低帮组作业的平均响应时间，从而使关键作业的平均结果精度达到可接受的水平。此外，我们还观察到，随着部分计算与平均值的偏差增大，部分计算带来的性能提升也会增大，但上述权衡仍然重要。

引用次数: 0

Explanation-Guided Adversarial Example Attacks 解释引导的对抗性示例攻击

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research

Pub Date : 2024-03-26 DOI: 10.1016/j.bdr.2024.100451

Anli Yan , Xiaozhang Liu , Wanman Li , Hongwei Ye , Lang Li

Neural network-based classifiers are vulnerable to adversarial example attacks even in a black-box setting. Existing adversarial example generation technologies mainly rely on optimization-based attacks, which optimize the objective function by iterative input perturbation. While being able to craft adversarial examples, these techniques require big budgets. Latest transfer-based attacks, though being limited queries, also have a disadvantage of low attack success rate. In this paper, we propose an adversarial example attack method called MEAttack using the model-agnostic explanation technology, which can more efficiently generate adversarial examples in the black-box setting with limited queries. The core idea is to design a novel model-agnostic explanation method for target models, and generate adversarial examples based on model explanations. We experimentally demonstrate that MEAttack outperforms the state-of-the-art attack technology, i.e., AutoZOOM. The success rate of MEAttack is 4.54%-47.42% higher than AutoZOOM, and its query efficiency is reduced by 2.6-4.2 times. Experimental results show that MEAttack is efficient in terms of both attack success rate and query efficiency.

基于神经网络的分类器即使在黑盒环境中也容易受到对抗性示例攻击。现有的对抗示例生成技术主要依赖于基于优化的攻击，即通过迭代输入扰动来优化目标函数。这些技术虽然可以生成对抗示例，但需要大量预算。最新的基于转移的攻击虽然查询受限，但也存在攻击成功率低的缺点。在本文中，我们提出了一种名为 MEAttack 的对抗性示例攻击方法，它采用了模型无关解释技术，能在有限查询的黑盒环境中更高效地生成对抗性示例。其核心思想是为目标模型设计一种新颖的模型无关解释方法，并根据模型解释生成对抗示例。我们通过实验证明，MEAttack 优于最先进的攻击技术，即 AutoZOOM。MEAttack 的成功率比 AutoZOOM 高 4.54%-47.42%，查询效率降低了 2.6-4.2 倍。实验结果表明，MEAttack 在攻击成功率和查询效率方面都很有效。

{"title":"Explanation-Guided Adversarial Example Attacks","authors":"Anli Yan , Xiaozhang Liu , Wanman Li , Hongwei Ye , Lang Li","doi":"10.1016/j.bdr.2024.100451","DOIUrl":"https://doi.org/10.1016/j.bdr.2024.100451","url":null,"abstract":"<div><p>Neural network-based classifiers are vulnerable to adversarial example attacks even in a black-box setting. Existing adversarial example generation technologies mainly rely on optimization-based attacks, which optimize the objective function by iterative input perturbation. While being able to craft adversarial examples, these techniques require big budgets. Latest transfer-based attacks, though being limited queries, also have a disadvantage of low attack success rate. In this paper, we propose an adversarial example attack method called MEAttack using the model-agnostic explanation technology, which can more efficiently generate adversarial examples in the black-box setting with limited queries. The core idea is to design a novel model-agnostic explanation method for target models, and generate adversarial examples based on model explanations. We experimentally demonstrate that MEAttack outperforms the state-of-the-art attack technology, i.e., AutoZOOM. The success rate of MEAttack is 4.54%-47.42% higher than AutoZOOM, and its query efficiency is reduced by 2.6-4.2 times. Experimental results show that MEAttack is efficient in terms of both attack success rate and query efficiency.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100451"},"PeriodicalIF":3.3,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140347942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0