DataCentric Engineering最新文献

Semantic 3D city interfaces—Intelligent interactions on dynamic geospatial knowledge graphs 语义三维城市界面——动态地理空间知识图谱上的智能交互

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2023-09-06 DOI: 10.1017/dce.2023.14

A. Chadzynski, Shiying Li, Ayda Grisiute, Jefferson Chua, Markus Hofmeister, Jingya Yan, Huay Yi Tai, Emily Lloyd, Yi Kai Tsai, Mehal Agarwal, J. Akroyd, P. Herthogs, Markus Kraft

Abstract This article presents a system architecture and a set of interfaces that can build scalable information systems capable of large city modeling based on dynamic geospatial knowledge graphs to avoid pitfalls of Web 2.0 applications while blending artificial and human intelligence during the knowledge enhancement processes. We designed and developed a GeoSpatial Processor, an SQL2SPARQL Transformer, and a geospatial tiles ordering tasks and integrated them into a City Export Agent to visualize and interact with city models on an augmented 3D web client. We designed a Thematic Surface Discovery Agent to automatically upgrade the model’s level of detail to interact with thematic parts of city objects by other agents. We developed a City Information Agent to help retrieve contextual information, provide data concerning city regulations, and work with a City Energy Analyst Agent that automatically estimates the energy demands for city model members. We designed a Distance Agent to track the interactions with the model members on the web, calculate distances between objects of interest, and add new knowledge to the Cities Knowledge Graph. The logical foundations and CityGML-based conceptual schema used to describe cities in terms of the OntoCityGML ontology, together with the system of intelligent autonomous agents based on the J-Park Simulator Agent Framework, make such systems capable of assessing and maintaining ground truths with certainty. This new era of GeoWeb 2.5 systems lowers the risk of deliberate misinformation within geography web systems used for modeling critical infrastructures.

摘要本文提出了一种系统架构和一组接口，可以基于动态地理空间知识图构建能够进行大城市建模的可扩展信息系统，以避免Web2.0应用程序的陷阱，同时在知识增强过程中融合人工智能和人工智能。我们设计并开发了GeoSpatial处理器、SQL2SPARQL转换器和地理空间瓦片排序任务，并将它们集成到城市导出代理中，以便在增强的3D web客户端上可视化城市模型并与之交互。我们设计了一个主题表面发现代理，以自动升级模型的细节级别，从而通过其他代理与城市对象的主题部分进行交互。我们开发了一个城市信息代理，以帮助检索上下文信息，提供有关城市法规的数据，并与城市能源分析代理合作，自动估计城市模型成员的能源需求。我们设计了一个距离代理来跟踪与网络上模型成员的交互，计算感兴趣对象之间的距离，并将新知识添加到城市知识图中。基于OntoCityGML本体描述城市的逻辑基础和基于CityGML的概念模式，以及基于J-Park模拟器Agent框架的智能自治Agent系统，使这些系统能够确定地评估和维护地面实况。GeoWeb 2.5系统的这一新时代降低了用于建模关键基础设施的地理web系统中故意错误信息的风险。

{"title":"Semantic 3D city interfaces—Intelligent interactions on dynamic geospatial knowledge graphs","authors":"A. Chadzynski, Shiying Li, Ayda Grisiute, Jefferson Chua, Markus Hofmeister, Jingya Yan, Huay Yi Tai, Emily Lloyd, Yi Kai Tsai, Mehal Agarwal, J. Akroyd, P. Herthogs, Markus Kraft","doi":"10.1017/dce.2023.14","DOIUrl":"https://doi.org/10.1017/dce.2023.14","url":null,"abstract":"Abstract This article presents a system architecture and a set of interfaces that can build scalable information systems capable of large city modeling based on dynamic geospatial knowledge graphs to avoid pitfalls of Web 2.0 applications while blending artificial and human intelligence during the knowledge enhancement processes. We designed and developed a GeoSpatial Processor, an SQL2SPARQL Transformer, and a geospatial tiles ordering tasks and integrated them into a City Export Agent to visualize and interact with city models on an augmented 3D web client. We designed a Thematic Surface Discovery Agent to automatically upgrade the model’s level of detail to interact with thematic parts of city objects by other agents. We developed a City Information Agent to help retrieve contextual information, provide data concerning city regulations, and work with a City Energy Analyst Agent that automatically estimates the energy demands for city model members. We designed a Distance Agent to track the interactions with the model members on the web, calculate distances between objects of interest, and add new knowledge to the Cities Knowledge Graph. The logical foundations and CityGML-based conceptual schema used to describe cities in terms of the OntoCityGML ontology, together with the system of intelligent autonomous agents based on the J-Park Simulator Agent Framework, make such systems capable of assessing and maintaining ground truths with certainty. This new era of GeoWeb 2.5 systems lowers the risk of deliberate misinformation within geography web systems used for modeling critical infrastructures.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43490090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Optical network physical layer parameter optimization for digital backpropagation using Gaussian processes 基于高斯过程的数字反向传播光网络物理层参数优化

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2023-08-10 DOI: 10.1017/dce.2023.15

Josh W. Nevin, E. Sillekens, Ronit Sohanpal, L. Galdino, Sam Nallaperuma, P. Bayvel, S. Savory

Abstract We present a novel methodology for optimizing fiber optic network performance by determining the ideal values for attenuation, nonlinearity, and dispersion parameters in terms of achieved signal-to-noise ratio (SNR) gain from digital backpropagation (DBP). Our approach uses Gaussian process regression, a probabilistic machine learning technique, to create a computationally efficient model for mapping these parameters to the resulting SNR after applying DBP. We then use simplicial homology global optimization to find the parameter values that yield maximum SNR for the Gaussian process model within a set of a priori bounds. This approach optimizes the parameters in terms of the DBP gain at the receiver. We demonstrate the effectiveness of our method through simulation and experimental testing, achieving optimal estimates of the dispersion, nonlinearity, and attenuation parameters. Our approach also highlights the limitations of traditional one-at-a-time grid search methods and emphasizes the interpretability of the technique. This methodology has broad applications in engineering and can be used to optimize performance in various systems beyond optical networks.

摘要本文提出了一种优化光纤网络性能的新方法，通过确定衰减、非线性和色散参数的理想值来获得数字反向传播(DBP)的信噪比(SNR)增益。我们的方法使用高斯过程回归，一种概率机器学习技术，创建一个计算效率高的模型，将这些参数映射到应用DBP后得到的信噪比。然后，我们使用简单同调全局优化来找到在一组先验边界内高斯过程模型产生最大信噪比的参数值。这种方法根据接收机的DBP增益优化了参数。我们通过模拟和实验测试证明了我们方法的有效性，实现了色散、非线性和衰减参数的最佳估计。我们的方法还强调了传统的一次网格搜索方法的局限性，并强调了该技术的可解释性。该方法在工程上有广泛的应用，可用于优化光网络以外的各种系统的性能。

{"title":"Optical network physical layer parameter optimization for digital backpropagation using Gaussian processes","authors":"Josh W. Nevin, E. Sillekens, Ronit Sohanpal, L. Galdino, Sam Nallaperuma, P. Bayvel, S. Savory","doi":"10.1017/dce.2023.15","DOIUrl":"https://doi.org/10.1017/dce.2023.15","url":null,"abstract":"Abstract We present a novel methodology for optimizing fiber optic network performance by determining the ideal values for attenuation, nonlinearity, and dispersion parameters in terms of achieved signal-to-noise ratio (SNR) gain from digital backpropagation (DBP). Our approach uses Gaussian process regression, a probabilistic machine learning technique, to create a computationally efficient model for mapping these parameters to the resulting SNR after applying DBP. We then use simplicial homology global optimization to find the parameter values that yield maximum SNR for the Gaussian process model within a set of a priori bounds. This approach optimizes the parameters in terms of the DBP gain at the receiver. We demonstrate the effectiveness of our method through simulation and experimental testing, achieving optimal estimates of the dispersion, nonlinearity, and attenuation parameters. Our approach also highlights the limitations of traditional one-at-a-time grid search methods and emphasizes the interpretability of the technique. This methodology has broad applications in engineering and can be used to optimize performance in various systems beyond optical networks.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44814367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Finite element model updating with quantified uncertainties using point cloud data 基于点云数据的量化不确定性有限元模型更新

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2023-06-23 DOI: 10.1017/dce.2023.7

W. Graves, K. Nahshon, K. Aminfar, D. Lattanzi

Abstract While finite element (FE) modeling is widely used for ultimate strength assessments of structural systems, incorporating complex distortions and imperfections into FE models remains a challenge. Conventional methods typically rely on assumptions about the periodicity of distortions through spectral or modal methods. However, these approaches are not viable under the many realistic scenarios where these assumptions are invalid. Research efforts have consistently demonstrated the ability of point cloud data, generated through laser scanning or photogrammetry-based methods, to accurately capture structural deformations at the millimeter scale. This enables the updating of numerical models to capture the exact structural configuration and initial imperfections without the need for unrealistic assumptions. This research article investigates the use of point cloud data for updating the initial distortions in a FE model of a stiffened ship deck panel, for the purposes of ultimate strength estimation. The presented approach has the additional benefit of being able to explicitly account for measurement uncertainty in the analysis. Calculations using the updated FE models are compared against ground truth test data as well as FE models updated using standard spectral methods. The results demonstrate strength estimation that is comparable to existing approaches, with the additional advantages of uncertainty quantification and applicability to a wider range of application scenarios.

摘要虽然有限元建模被广泛用于结构系统的极限强度评估，但将复杂的变形和缺陷纳入有限元模型仍然是一个挑战。传统的方法通常依赖于通过谱或模态方法对畸变周期性的假设。然而，在这些假设无效的许多现实情况下，这些方法是不可行的。研究工作不断证明，通过激光扫描或基于摄影测量的方法生成的点云数据能够准确捕获毫米尺度的结构变形。这使得数值模型的更新能够捕捉精确的结构配置和初始缺陷，而不需要不切实际的假设。本文研究了用点云数据更新加劲船甲板板有限元模型的初始变形，以估计其极限强度。所提出的方法还有一个额外的好处，就是能够明确地说明分析中的测量不确定度。使用更新的有限元模型的计算与地面真值测试数据以及使用标准谱方法更新的有限元模型进行了比较。结果表明，强度估计与现有方法相当，具有不确定性量化和适用于更广泛的应用场景的额外优势。

{"title":"Finite element model updating with quantified uncertainties using point cloud data","authors":"W. Graves, K. Nahshon, K. Aminfar, D. Lattanzi","doi":"10.1017/dce.2023.7","DOIUrl":"https://doi.org/10.1017/dce.2023.7","url":null,"abstract":"Abstract While finite element (FE) modeling is widely used for ultimate strength assessments of structural systems, incorporating complex distortions and imperfections into FE models remains a challenge. Conventional methods typically rely on assumptions about the periodicity of distortions through spectral or modal methods. However, these approaches are not viable under the many realistic scenarios where these assumptions are invalid. Research efforts have consistently demonstrated the ability of point cloud data, generated through laser scanning or photogrammetry-based methods, to accurately capture structural deformations at the millimeter scale. This enables the updating of numerical models to capture the exact structural configuration and initial imperfections without the need for unrealistic assumptions. This research article investigates the use of point cloud data for updating the initial distortions in a FE model of a stiffened ship deck panel, for the purposes of ultimate strength estimation. The presented approach has the additional benefit of being able to explicitly account for measurement uncertainty in the analysis. Calculations using the updated FE models are compared against ground truth test data as well as FE models updated using standard spectral methods. The results demonstrate strength estimation that is comparable to existing approaches, with the additional advantages of uncertainty quantification and applicability to a wider range of application scenarios.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43588518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating probabilistic forecasts for maritime engineering operations 评估海事工程作业的概率预测

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2023-06-09 DOI: 10.1017/dce.2023.11

L. Astfalck, Michael Bertolacci, E. Cripps

Abstract Maritime engineering relies on model forecasts for many different processes, including meteorological and oceanographic forcings, structural responses, and energy demands. Understanding the performance and evaluation of such forecasting models is crucial in instilling reliability in maritime operations. Evaluation metrics that assess the point accuracy of the forecast (such as root-mean-squared error) are commonplace, but with the increased uptake of probabilistic forecasting methods such evaluation metrics may not consider the full forecasting distribution. The statistical theory of proper scoring rules provides a framework in which to score and compare competing probabilistic forecasts, but it is seldom appealed to in applications. This translational paper presents the underlying theory and principles of proper scoring rules, develops a simple panel of rules that may be used to robustly evaluate the performance of competing probabilistic forecasts, and demonstrates this with an application to forecasting surface winds at an asset on Australia’s North–West Shelf. Where appropriate, we relate the statistical theory to common requirements by maritime engineering industry. The case study is from a body of work that was undertaken to quantify the value resulting from an operational forecasting product and is a clear demonstration of the downstream impacts that statistical and data science methods can have in maritime engineering operations.

摘要海事工程依赖于许多不同过程的模型预测，包括气象和海洋学强迫、结构响应和能源需求。了解这种预测模型的性能和评估对于提高海上作业的可靠性至关重要。评估预测的点精度（如均方根误差）的评估度量是常见的，但随着概率预测方法的普及，这种评估度量可能不会考虑完整的预测分布。适当评分规则的统计理论提供了一个框架，在其中对竞争性概率预测进行评分和比较，但在应用中很少有吸引力。这篇转化论文介绍了适当评分规则的基本理论和原则，开发了一组简单的规则，可用于稳健地评估竞争概率预测的性能，并将其应用于澳大利亚西北陆架资产的地表风预测。在适当的情况下，我们将统计理论与海事工程行业的共同要求联系起来。该案例研究来自于为量化运营预测产品产生的价值而进行的一系列工作，清楚地表明了统计和数据科学方法在海事工程运营中可能产生的下游影响。

{"title":"Evaluating probabilistic forecasts for maritime engineering operations","authors":"L. Astfalck, Michael Bertolacci, E. Cripps","doi":"10.1017/dce.2023.11","DOIUrl":"https://doi.org/10.1017/dce.2023.11","url":null,"abstract":"Abstract Maritime engineering relies on model forecasts for many different processes, including meteorological and oceanographic forcings, structural responses, and energy demands. Understanding the performance and evaluation of such forecasting models is crucial in instilling reliability in maritime operations. Evaluation metrics that assess the point accuracy of the forecast (such as root-mean-squared error) are commonplace, but with the increased uptake of probabilistic forecasting methods such evaluation metrics may not consider the full forecasting distribution. The statistical theory of proper scoring rules provides a framework in which to score and compare competing probabilistic forecasts, but it is seldom appealed to in applications. This translational paper presents the underlying theory and principles of proper scoring rules, develops a simple panel of rules that may be used to robustly evaluate the performance of competing probabilistic forecasts, and demonstrates this with an application to forecasting surface winds at an asset on Australia’s North–West Shelf. Where appropriate, we relate the statistical theory to common requirements by maritime engineering industry. The case study is from a body of work that was undertaken to quantify the value resulting from an operational forecasting product and is a clear demonstration of the downstream impacts that statistical and data science methods can have in maritime engineering operations.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":"55 3-4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41297564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bottom-up forecasting: Applications and limitations in load forecasting using smart-meter data 自下而上的预测：使用智能电表数据进行负荷预测的应用和局限性

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2023-06-07 DOI: 10.1017/dce.2023.10

Harsh Anand, R. Nateghi, Negin Alemazkoor

Abstract Reliable short-term load forecasting is vital for the planning and operation of electric power systems. Short-term load forecasting is a critical component used in purchasing and generating electric power, dispatching, and load switching, which is essential for balancing supply and demand and mitigating the risk of power shortages. This is becoming even more critical given the transition to carbon-neutral technologies in the energy sector. Specifically, since renewable sources are inherently uncertain, a distributed energy system with renewable generation units is more heavily dependent on accurate load forecasts for demand-response management than traditional energy sectors. Despite extensive literature on forecasting electricity demand, most studies focus on predicting the total demand solely based on the previous-step observations of aggregate demand. With advances in smart-metering technology and the availability of high-resolution consumption data, harnessing fine-resolution smart-meter data in load forecasting has attracted increasing attention. Studies using smart-meter data mainly involve a “bottom-up” approach that develops separate forecast models at sub-aggregate levels and aggregates the forecasts to estimate the total demand. While this approach is conducive to incorporating fine-resolution data for load forecasting, it has several shortcomings that can result in sub-optimal forecasts. However, these shortcomings are hardly acknowledged in the load forecasting literature. This work demonstrates how limitations imposed by such a bottom-up load forecasting approach can lead to misleading results, which could hamper efficient load management within a carbon-neutral grid.

可靠的短期负荷预测对于电力系统的规划和运行至关重要。短期负荷预测是电力采购、发电、调度、负荷切换等环节的重要组成部分，对平衡电力供需、降低电力短缺风险具有重要意义。鉴于能源部门向碳中和技术的过渡，这一点变得更加重要。具体来说，由于可再生能源本身具有不确定性，与传统能源部门相比，具有可再生发电机组的分布式能源系统更依赖于准确的负荷预测来进行需求响应管理。尽管有大量关于预测电力需求的文献，但大多数研究只关注基于前一步总需求的观察来预测总需求。随着智能电表技术的进步和高分辨率用电数据的可用性，利用高分辨率智能电表数据进行负荷预测越来越受到关注。使用智能电表数据的研究主要涉及一种“自下而上”的方法，即在亚总量水平上开发单独的预测模型，并将预测汇总以估计总需求。虽然这种方法有利于结合精细分辨率数据进行负荷预测，但它有几个缺点，可能导致次优预测。然而，这些缺点在负荷预测文献中几乎没有得到承认。这项工作表明，这种自下而上的负荷预测方法所施加的限制可能会导致误导性的结果，从而阻碍碳中和电网内的有效负荷管理。

{"title":"Bottom-up forecasting: Applications and limitations in load forecasting using smart-meter data","authors":"Harsh Anand, R. Nateghi, Negin Alemazkoor","doi":"10.1017/dce.2023.10","DOIUrl":"https://doi.org/10.1017/dce.2023.10","url":null,"abstract":"Abstract Reliable short-term load forecasting is vital for the planning and operation of electric power systems. Short-term load forecasting is a critical component used in purchasing and generating electric power, dispatching, and load switching, which is essential for balancing supply and demand and mitigating the risk of power shortages. This is becoming even more critical given the transition to carbon-neutral technologies in the energy sector. Specifically, since renewable sources are inherently uncertain, a distributed energy system with renewable generation units is more heavily dependent on accurate load forecasts for demand-response management than traditional energy sectors. Despite extensive literature on forecasting electricity demand, most studies focus on predicting the total demand solely based on the previous-step observations of aggregate demand. With advances in smart-metering technology and the availability of high-resolution consumption data, harnessing fine-resolution smart-meter data in load forecasting has attracted increasing attention. Studies using smart-meter data mainly involve a “bottom-up” approach that develops separate forecast models at sub-aggregate levels and aggregates the forecasts to estimate the total demand. While this approach is conducive to incorporating fine-resolution data for load forecasting, it has several shortcomings that can result in sub-optimal forecasts. However, these shortcomings are hardly acknowledged in the load forecasting literature. This work demonstrates how limitations imposed by such a bottom-up load forecasting approach can lead to misleading results, which could hamper efficient load management within a carbon-neutral grid.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46158744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine learning approaches for the prediction of serious fluid leakage from hydrocarbon wells 预测油气井严重漏液的机器学习方法

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2023-05-19 DOI: 10.1017/dce.2023.9

Mehdi Rezvandehy, B. Mayer

Abstract The exploitation of hydrocarbon reservoirs may potentially lead to contamination of soils, shallow water resources, and greenhouse gas emissions. Fluids such as methane or CO2 may in some cases migrate toward the groundwater zone and atmosphere through and along imperfectly sealed hydrocarbon wells. Field tests in hydrocarbon-producing regions are routinely conducted for detecting serious leakage to prevent environmental pollution. The challenge is that testing is costly, time-consuming, and sometimes labor-intensive. In this study, machine learning approaches were applied to predict serious leakage with uncertainty quantification for wells that have not been field tested in Alberta, Canada. An improved imputation technique was developed by Cholesky factorization of the covariance matrix between features, where missing data are imputed via conditioning of available values. The uncertainty in imputed values was quantified and incorporated into the final prediction to improve decision-making. Next, a wide range of predictive algorithms and various performance metrics were considered to achieve the most reliable classifier. However, a highly skewed distribution of field tests toward the negative class (nonserious leakage) forces predictive models to unrealistically underestimate the minority class (serious leakage). To address this issue, a combination of oversampling, undersampling, and ensemble learning was applied. By investigating all the models on never-before-seen data, an optimum classifier with minimal false negative prediction was determined. The developed methodology can be applied to identify the wells with the highest likelihood for serious fluid leakage within producing fields. This information is of key importance for optimizing field test operations to achieve economic and environmental benefits.

摘要油气藏的开发可能导致土壤、浅水资源和温室气体排放的污染。在某些情况下，甲烷或CO2等流体可能会通过密封不完全的碳氢化合物井并沿其向地下水区和大气迁移。为了防止环境污染，通常在碳氢化合物生产区进行现场测试，以检测严重的泄漏。挑战在于测试成本高、耗时长，有时还需要耗费大量人力。在这项研究中，将机器学习方法应用于预测加拿大阿尔伯塔省未进行现场测试的油井的严重泄漏，并进行不确定性量化。通过对特征之间的协方差矩阵进行Cholesky因子分解，开发了一种改进的插补技术，其中通过对可用值的调节来插补缺失数据。估算值的不确定性被量化并纳入最终预测，以改进决策。接下来，考虑了广泛的预测算法和各种性能指标，以实现最可靠的分类器。然而，现场测试向负类（非严重泄漏）的高度偏斜分布迫使预测模型不切实际地低估了少数类（严重泄漏）。为了解决这个问题，应用了过采样、欠采样和集成学习的组合。通过对从未见过的数据上的所有模型进行研究，确定了具有最小假阴性预测的最优分类器。所开发的方法可用于确定生产油田内发生严重流体泄漏可能性最高的油井。这些信息对于优化现场测试操作以实现经济和环境效益至关重要。

{"title":"Machine learning approaches for the prediction of serious fluid leakage from hydrocarbon wells","authors":"Mehdi Rezvandehy, B. Mayer","doi":"10.1017/dce.2023.9","DOIUrl":"https://doi.org/10.1017/dce.2023.9","url":null,"abstract":"Abstract The exploitation of hydrocarbon reservoirs may potentially lead to contamination of soils, shallow water resources, and greenhouse gas emissions. Fluids such as methane or CO2 may in some cases migrate toward the groundwater zone and atmosphere through and along imperfectly sealed hydrocarbon wells. Field tests in hydrocarbon-producing regions are routinely conducted for detecting serious leakage to prevent environmental pollution. The challenge is that testing is costly, time-consuming, and sometimes labor-intensive. In this study, machine learning approaches were applied to predict serious leakage with uncertainty quantification for wells that have not been field tested in Alberta, Canada. An improved imputation technique was developed by Cholesky factorization of the covariance matrix between features, where missing data are imputed via conditioning of available values. The uncertainty in imputed values was quantified and incorporated into the final prediction to improve decision-making. Next, a wide range of predictive algorithms and various performance metrics were considered to achieve the most reliable classifier. However, a highly skewed distribution of field tests toward the negative class (nonserious leakage) forces predictive models to unrealistically underestimate the minority class (serious leakage). To address this issue, a combination of oversampling, undersampling, and ensemble learning was applied. By investigating all the models on never-before-seen data, an optimum classifier with minimal false negative prediction was determined. The developed methodology can be applied to identify the wells with the highest likelihood for serious fluid leakage within producing fields. This information is of key importance for optimizing field test operations to achieve economic and environmental benefits.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44624952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Probabilistic selection and design of concrete using machine learning 基于机器学习的混凝土概率选择与设计

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2023-04-20 DOI: 10.1017/dce.2023.5

Jessica C. Forsdyke, Bahdan Zviazhynski, J. Lees, G. Conduit

Abstract Development of robust concrete mixes with a lower environmental impact is challenging due to natural variability in constituent materials and a multitude of possible combinations of mix proportions. Making reliable property predictions with machine learning can facilitate performance-based specification of concrete, reducing material inefficiencies and improving the sustainability of concrete construction. In this work, we develop a machine learning algorithm that can utilize intermediate target variables and their associated noise to predict the final target variable. We apply the methodology to specify a concrete mix that has high resistance to carbonation, and another concrete mix that has low environmental impact. Both mixes also fulfill targets on the strength, density, and cost. The specified mixes are experimentally validated against their predictions. Our generic methodology enables the exploitation of noise in machine learning, which has a broad range of applications in structural engineering and beyond.

由于组成材料的自然变异性和多种可能的混合比例组合，开发具有较低环境影响的坚固混凝土混合料具有挑战性。通过机器学习进行可靠的性能预测可以促进基于性能的混凝土规范，减少材料效率低下，提高混凝土施工的可持续性。在这项工作中，我们开发了一种机器学习算法，该算法可以利用中间目标变量及其相关噪声来预测最终目标变量。我们应用该方法指定具有高抗碳化性能的混凝土混合料，以及具有低环境影响的另一种混凝土混合料。这两种混合物在强度、密度和成本上都达到了目标。根据他们的预测，实验验证了指定的混合物。我们的通用方法能够在机器学习中利用噪声，这在结构工程及其他领域具有广泛的应用。

引用次数: 1

Leveraging Big Data in port state control: An analysis of port state control data and its potential for governance and transparency in the shipping industry 在港口国控制中利用大数据：港口国控制数据及其在航运业治理和透明度方面的潜力分析

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2023-04-14 DOI: 10.1017/dce.2023.6

D. Ampatzidis

Abstract The International Maritime Organization along with couple European countries (Paris MoU) has introduced in 1982 the port state control (PSC) inspections of vessels in national ports to evaluate their compliance with safety and security regulations. This study discusses how the PSC data share common characteristics with Big Data fundamental theories, and by interpreting them as Big Data, we could enjoy their governance and transparency as a Big Data challenge to gain value from their use. Thus, from the scope of Big Data, PSC should exhibit volume, velocity, variety, value, and complexity to support in the best possible way both officers ashore and on board to maintain the vessel in the best possible conditions for sailing. For the above purpose, this paper employs Big Data theories broadly used within the academic and business environment on datasets characteristics and how to access the value from Big Data and Analytics. The research concludes that PSC data provide valid information to the shipping industry. However, the lack of PSC data ability to present the complete picture of PSC regimes and ports challenges the maritime community’s attempts for a safer and more sustainable industry.

摘要国际海事组织和几个欧洲国家（巴黎谅解备忘录）于1982年对国家港口的船只进行了港口国管制（PSC）检查，以评估其遵守安全和安保法规的情况。本研究讨论了PSC数据如何与大数据基础理论共享共同特征，并通过将其解释为大数据，我们可以将其治理和透明度视为大数据的挑战，从其使用中获得价值。因此，从大数据的范围来看，PSC应表现出体积、速度、多样性、价值和复杂性，以尽可能好的方式支持岸上和船上的官员，使船只保持在最佳航行条件下。出于上述目的，本文采用了学术和商业环境中广泛使用的大数据理论，研究数据集的特征以及如何从大数据和分析中获取价值。研究得出结论，PSC数据为航运业提供了有效的信息。然而，由于缺乏PSC数据能力来呈现PSC制度和港口的全貌，海事界对建立更安全、更可持续的行业的努力提出了挑战。

{"title":"Leveraging Big Data in port state control: An analysis of port state control data and its potential for governance and transparency in the shipping industry","authors":"D. Ampatzidis","doi":"10.1017/dce.2023.6","DOIUrl":"https://doi.org/10.1017/dce.2023.6","url":null,"abstract":"Abstract The International Maritime Organization along with couple European countries (Paris MoU) has introduced in 1982 the port state control (PSC) inspections of vessels in national ports to evaluate their compliance with safety and security regulations. This study discusses how the PSC data share common characteristics with Big Data fundamental theories, and by interpreting them as Big Data, we could enjoy their governance and transparency as a Big Data challenge to gain value from their use. Thus, from the scope of Big Data, PSC should exhibit volume, velocity, variety, value, and complexity to support in the best possible way both officers ashore and on board to maintain the vessel in the best possible conditions for sailing. For the above purpose, this paper employs Big Data theories broadly used within the academic and business environment on datasets characteristics and how to access the value from Big Data and Analytics. The research concludes that PSC data provide valid information to the shipping industry. However, the lack of PSC data ability to present the complete picture of PSC regimes and ports challenges the maritime community’s attempts for a safer and more sustainable industry.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47190164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling the wall shear stress in large-eddy simulation using graph neural networks 大涡模拟中壁面剪应力的图神经网络建模

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2023-03-09 DOI: 10.1017/dce.2023.2

D. Dupuy, N. Odier, C. Lapeyre, D. Papadogiannis

Abstract As the Reynolds number increases, the large-eddy simulation (LES) of complex flows becomes increasingly intractable because near-wall turbulent structures become increasingly small. Wall modeling reduces the computational requirements of LES by enabling the use of coarser cells at the walls. This paper presents a machine-learning methodology to develop data-driven wall-shear-stress models that can directly operate, a posteriori, on the unstructured grid of the simulation. The model architecture is based on graph neural networks. The model is trained on a database which includes fully developed boundary layers, adverse pressure gradients, separated boundary layers, and laminar–turbulent transition. The relevance of the trained model is verified a posteriori for the simulation of a channel flow, a backward-facing step and a linear blade cascade.

随着雷诺数的增加，近壁湍流结构变得越来越小，复杂流动的大涡模拟变得越来越棘手。墙壁建模通过允许在墙壁上使用较粗的单元，减少了LES的计算需求。本文提出了一种机器学习方法来开发数据驱动的墙壁剪切应力模型，该模型可以直接在模拟的非结构化网格上进行后验操作。模型结构基于图神经网络。该模型是在一个数据库上训练的，该数据库包括完全发育的边界层、逆压梯度、分离的边界层和层流-湍流过渡。通过对通道流、后向阶跃和线性叶栅的后验仿真，验证了所训练模型的相关性。

引用次数: 0

A switching Gaussian process latent force model for the identification of mechanical systems with a discontinuous nonlinearity 识别具有不连续非线性的机械系统的切换高斯过程潜力模型

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2023-03-07 DOI: 10.1017/dce.2023.12

L. Marino, A. Cicirello

Abstract An approach for the identification of discontinuous and nonsmooth nonlinear forces, as those generated by frictional contacts, in mechanical systems that can be approximated by a single-degree-of-freedom model is presented. To handle the sharp variations and multiple motion regimes introduced by these nonlinearities in the dynamic response, the partially known physics-based model and noisy measurements of the system’s response to a known input force are combined within a switching Gaussian process latent force model (GPLFM). In this grey-box framework, multiple Gaussian processes are used to model the unknown nonlinear force across different motion regimes and a resetting model enables the generation of discontinuities. The states of the system, nonlinear force, and regime transitions are inferred by using filtering and smoothing techniques for switching linear dynamical systems. The proposed switching GPLFM is applied to a simulated dry friction oscillator and an experimental setup consisting of a single-storey frame with a brass-to-steel contact. Excellent results are obtained in terms of the identified nonlinear and discontinuous friction force for varying: (i) normal load amplitudes in the contact; (ii) measurement noise levels, and (iii) number of samples in the datasets. Moreover, the identified states, friction force, and sequence of motion regimes are used for evaluating: (1) uncertain system parameters; (2) the friction force–velocity relationship, and (3) the static friction force. The correct identification of the discontinuous nonlinear force and the quantification of any remaining uncertainty in its prediction enable the implementation of an accurate forward model able to predict the system’s response to different input forces.

摘要提出了一种识别机械系统中由摩擦接触产生的不连续和非光滑非线性力的方法，这种力可以用单自由度模型近似。为了处理这些非线性在动态响应中引入的急剧变化和多种运动状态，将部分已知的基于物理的模型和系统对已知输入力响应的噪声测量结合在切换高斯过程潜在力模型(GPLFM)中。在这个灰盒框架中，使用多个高斯过程来模拟不同运动状态下的未知非线性力，重置模型可以生成不连续点。通过使用滤波和平滑技术来切换线性动力系统，可以推断出系统的状态、非线性力和状态转换。将所提出的开关式GPLFM应用于一个模拟干摩擦振荡器和一个由带有铜-钢触点的单层框架组成的实验装置。对于不同的非线性和不连续的摩擦力，得到了很好的结果:(i)接触中的法向载荷幅值;(ii)测量噪声水平，以及(iii)数据集中的样本数量。此外，识别的状态、摩擦力和运动状态序列用于评估:(1)不确定的系统参数;(2)摩擦力-速度关系;(3)静摩擦力。对不连续非线性力的正确识别和对其预测中任何剩余不确定性的量化，使准确的正演模型能够预测系统对不同输入力的响应。

{"title":"A switching Gaussian process latent force model for the identification of mechanical systems with a discontinuous nonlinearity","authors":"L. Marino, A. Cicirello","doi":"10.1017/dce.2023.12","DOIUrl":"https://doi.org/10.1017/dce.2023.12","url":null,"abstract":"Abstract An approach for the identification of discontinuous and nonsmooth nonlinear forces, as those generated by frictional contacts, in mechanical systems that can be approximated by a single-degree-of-freedom model is presented. To handle the sharp variations and multiple motion regimes introduced by these nonlinearities in the dynamic response, the partially known physics-based model and noisy measurements of the system’s response to a known input force are combined within a switching Gaussian process latent force model (GPLFM). In this grey-box framework, multiple Gaussian processes are used to model the unknown nonlinear force across different motion regimes and a resetting model enables the generation of discontinuities. The states of the system, nonlinear force, and regime transitions are inferred by using filtering and smoothing techniques for switching linear dynamical systems. The proposed switching GPLFM is applied to a simulated dry friction oscillator and an experimental setup consisting of a single-storey frame with a brass-to-steel contact. Excellent results are obtained in terms of the identified nonlinear and discontinuous friction force for varying: (i) normal load amplitudes in the contact; (ii) measurement noise levels, and (iii) number of samples in the datasets. Moreover, the identified states, friction force, and sequence of motion regimes are used for evaluating: (1) uncertain system parameters; (2) the friction force–velocity relationship, and (3) the static friction force. The correct identification of the discontinuous nonlinear force and the quantification of any remaining uncertainty in its prediction enable the implementation of an accurate forward model able to predict the system’s response to different input forces.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47282521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2