DataCentric Engineering最新文献

英文中文

On off-line and on-line Bayesian filtering for uncertainty quantification of structural deterioration 基于离线和在线贝叶斯滤波的结构劣化不确定性量化

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2022-05-06 DOI: 10.1017/dce.2023.13

Antonios Kamariotis, Luca Sardi, I. Papaioannou, E. Chatzi, D. Štraub

Abstract Data-informed predictive maintenance planning largely relies on stochastic deterioration models. Monitoring information can be utilized to update sequentially the knowledge on model parameters. In this context, on-line (recursive) Bayesian filtering algorithms typically fail to properly quantify the full posterior uncertainty of time-invariant model parameters. Off-line (batch) algorithms are—in principle—better suited for the uncertainty quantification task, yet they are computationally prohibitive in sequential settings. In this work, we adapt and investigate selected Bayesian filters for parameter estimation: an on-line particle filter, an on-line iterated batch importance sampling filter, which performs Markov Chain Monte Carlo (MCMC) move steps, and an off-line MCMC-based sequential Monte Carlo filter. A Gaussian mixture model approximates the posterior distribution within the resampling process in all three filters. Two numerical examples provide the basis for a comparative assessment. The first example considers a low-dimensional, nonlinear, non-Gaussian probabilistic fatigue crack growth model that is updated with sequential monitoring measurements. The second high-dimensional, linear, Gaussian example employs a random field to model corrosion deterioration across a beam, which is updated with sequential sensor measurements. The numerical investigations provide insights into the performance of off-line and on-line filters in terms of the accuracy of posterior estimates and the computational cost, when applied to problems of different nature, increasing dimensionality and varying sensor information amount. Importantly, they show that a tailored implementation of the on-line particle filter proves competitive with the computationally demanding MCMC-based filters. Suggestions on the choice of the appropriate method in function of problem characteristics are provided.

摘要基于数据的预测性维修计划在很大程度上依赖于随机退化模型。可以利用监测信息来顺序地更新关于模型参数的知识。在这种情况下，在线（递归）贝叶斯滤波算法通常无法正确量化时不变模型参数的完全后验不确定性。离线（批处理）算法原则上更适合不确定性量化任务，但在顺序设置中，它们在计算上是禁止的。在这项工作中，我们调整并研究了用于参数估计的选定贝叶斯滤波器：在线粒子滤波器、执行马尔可夫链蒙特卡罗（MCMC）移动步骤的在线迭代批重要性采样滤波器，以及离线基于MCMC的序列蒙特卡罗滤波器。高斯混合模型近似于所有三个滤波器中的重采样过程中的后验分布。两个数值例子为比较评估提供了基础。第一个例子考虑了一个低维、非线性、非高斯概率疲劳裂纹扩展模型，该模型通过连续监测测量进行更新。第二个高维、线性、高斯示例使用随机场来模拟光束上的腐蚀劣化，并通过顺序传感器测量进行更新。当应用于不同性质、增加维度和改变传感器信息量的问题时，数值研究从后验估计的准确性和计算成本的角度深入了解了离线和在线滤波器的性能。重要的是，他们表明，在线粒子滤波器的定制实现证明与基于计算要求的MCMC滤波器具有竞争力。提出了根据问题特征选择适当方法的建议。

{"title":"On off-line and on-line Bayesian filtering for uncertainty quantification of structural deterioration","authors":"Antonios Kamariotis, Luca Sardi, I. Papaioannou, E. Chatzi, D. Štraub","doi":"10.1017/dce.2023.13","DOIUrl":"https://doi.org/10.1017/dce.2023.13","url":null,"abstract":"Abstract Data-informed predictive maintenance planning largely relies on stochastic deterioration models. Monitoring information can be utilized to update sequentially the knowledge on model parameters. In this context, on-line (recursive) Bayesian filtering algorithms typically fail to properly quantify the full posterior uncertainty of time-invariant model parameters. Off-line (batch) algorithms are—in principle—better suited for the uncertainty quantification task, yet they are computationally prohibitive in sequential settings. In this work, we adapt and investigate selected Bayesian filters for parameter estimation: an on-line particle filter, an on-line iterated batch importance sampling filter, which performs Markov Chain Monte Carlo (MCMC) move steps, and an off-line MCMC-based sequential Monte Carlo filter. A Gaussian mixture model approximates the posterior distribution within the resampling process in all three filters. Two numerical examples provide the basis for a comparative assessment. The first example considers a low-dimensional, nonlinear, non-Gaussian probabilistic fatigue crack growth model that is updated with sequential monitoring measurements. The second high-dimensional, linear, Gaussian example employs a random field to model corrosion deterioration across a beam, which is updated with sequential sensor measurements. The numerical investigations provide insights into the performance of off-line and on-line filters in terms of the accuracy of posterior estimates and the computational cost, when applied to problems of different nature, increasing dimensionality and varying sensor information amount. Importantly, they show that a tailored implementation of the on-line particle filter proves competitive with the computationally demanding MCMC-based filters. Suggestions on the choice of the appropriate method in function of problem characteristics are provided.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42149087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques 化学过程中的数据挖掘和知识发现：替代处理技术的影响

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2022-04-26 DOI: 10.1017/dce.2022.21

L. Briceno-Mena, M. Nnadili, M. G. Benton, J. Romagnoli

Abstract Data mining and knowledge discovery (DMKD) focuses on extracting useful information from data. In the chemical process industry, tasks such as process monitoring, fault detection, process control, optimization, etc., can be achieved using DMKD. However, the selection of the appropriate method for each step in the DMKD process, namely data cleaning, sampling, scaling, dimensionality reduction (DR), clustering, clustering analysis and data visualization to obtain meaningful insights is far from trivial. In this contribution, a computational environment (FastMan) is introduced and used to illustrate how method selection affects DMKD in chemical process data. Two case studies, using data from a simulated natural gas liquid plant and real data from an industrial pyrolysis unit, were conducted to demonstrate the applicability of these methodologies in real-life scenarios. Sampling and normalization methods were found to have a great impact on the quality of the DMKD results. Also, a neighbor graphs method for DR, t-distributed stochastic neighbor embedding, outperformed principal component analysis, a matrix factorization method frequently used in the chemical process industry for identifying both local and global changes.

摘要数据挖掘和知识发现（DMKD）侧重于从数据中提取有用的信息。在化工过程工业中，可以使用DMKD实现过程监控、故障检测、过程控制、优化等任务。然而，为DMKD过程中的每个步骤选择合适的方法，即数据清理、采样、缩放、降维（DR）、聚类、聚类分析和数据可视化，以获得有意义的见解，绝非易事。在这篇文章中，引入了一个计算环境（FastMan），并用它来说明方法选择如何影响化学过程数据中的DMKD。使用模拟天然气液体工厂的数据和工业热解装置的真实数据进行了两个案例研究，以证明这些方法在现实场景中的适用性。采样和归一化方法对DMKD结果的质量有很大影响。此外，用于DR的邻居图方法，t-分布随机邻居嵌入，优于主成分分析，主成分分析是化学过程工业中经常用于识别局部和全局变化的矩阵分解方法。

{"title":"Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques","authors":"L. Briceno-Mena, M. Nnadili, M. G. Benton, J. Romagnoli","doi":"10.1017/dce.2022.21","DOIUrl":"https://doi.org/10.1017/dce.2022.21","url":null,"abstract":"Abstract Data mining and knowledge discovery (DMKD) focuses on extracting useful information from data. In the chemical process industry, tasks such as process monitoring, fault detection, process control, optimization, etc., can be achieved using DMKD. However, the selection of the appropriate method for each step in the DMKD process, namely data cleaning, sampling, scaling, dimensionality reduction (DR), clustering, clustering analysis and data visualization to obtain meaningful insights is far from trivial. In this contribution, a computational environment (FastMan) is introduced and used to illustrate how method selection affects DMKD in chemical process data. Two case studies, using data from a simulated natural gas liquid plant and real data from an industrial pyrolysis unit, were conducted to demonstrate the applicability of these methodologies in real-life scenarios. Sampling and normalization methods were found to have a great impact on the quality of the DMKD results. Also, a neighbor graphs method for DR, t-distributed stochastic neighbor embedding, outperformed principal component analysis, a matrix factorization method frequently used in the chemical process industry for identifying both local and global changes.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46308246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Thermoacoustic stability prediction using classification algorithms 基于分类算法的热声稳定性预测

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2022-04-25 DOI: 10.1017/dce.2022.17

R. Gaudron, A. Morgans

Abstract Predicting the occurrence of thermoacoustic instabilities is of major interest in a variety of engineering applications such as aircraft propulsion, power generation, and industrial heating. Predictive methodologies based on a physical approach have been developed in the past decades, but have a moderate-to-high computational cost when exploring a large number of designs. In this study, the stability prediction capabilities and computational cost of four well-established classification algorithms—the K-Nearest Neighbors, Decision Tree (DT), Random Forest (RF), and Multilayer Perceptron (MLP) algorithms—are investigated. These algorithms are trained using an in-house physics-based low-order network model tool called OSCILOS. All four algorithms are able to predict which configurations are thermoacoustically unstable with a very high accuracy and a very low runtime. Furthermore, the frequency intervals containing unstable modes for a given configuration are also accurately predicted using multilabel classification. The RF algorithm correctly predicts the overall stability and finds all frequency intervals containing unstable modes for 99.6 and 98.3% of all configurations, respectively, which makes it the most accurate algorithm when a large number of training examples is available. For smaller training sets, the MLP algorithm becomes the most accurate algorithm. The DT algorithm is found to be slightly less accurate, but can be trained extremely quickly and runs about a million times faster than a traditional physics-based low-order network model tool. These findings could be used to devise a new generation of combustor optimization tools that would run much faster than existing codes while retaining a similar accuracy.

预测热声不稳定性的发生在飞机推进、发电和工业加热等各种工程应用中具有重要意义。基于物理方法的预测方法在过去几十年中得到了发展，但是在探索大量设计时具有中等到高的计算成本。在本研究中，研究了四种成熟的分类算法——k近邻、决策树(DT)、随机森林(RF)和多层感知器(MLP)算法的稳定性预测能力和计算成本。这些算法使用内部基于物理的低阶网络模型工具OSCILOS进行训练。所有四种算法都能够以非常高的精度和非常低的运行时间预测哪些结构是热声不稳定的。此外，对于给定的配置，包含不稳定模式的频率区间也可以使用多标签分类进行准确预测。在99.6%和98.3%的配置中，RF算法正确地预测了整体稳定性，并找到了包含不稳定模式的所有频率区间，使其成为在大量训练样例可用时最准确的算法。对于较小的训练集，MLP算法成为最准确的算法。DT算法的准确性略低，但训练速度非常快，运行速度比传统的基于物理的低阶网络模型工具快100万倍。这些发现可以用来设计新一代的燃烧器优化工具，这些工具将比现有代码运行得快得多，同时保持相似的准确性。

{"title":"Thermoacoustic stability prediction using classification algorithms","authors":"R. Gaudron, A. Morgans","doi":"10.1017/dce.2022.17","DOIUrl":"https://doi.org/10.1017/dce.2022.17","url":null,"abstract":"Abstract Predicting the occurrence of thermoacoustic instabilities is of major interest in a variety of engineering applications such as aircraft propulsion, power generation, and industrial heating. Predictive methodologies based on a physical approach have been developed in the past decades, but have a moderate-to-high computational cost when exploring a large number of designs. In this study, the stability prediction capabilities and computational cost of four well-established classification algorithms—the K-Nearest Neighbors, Decision Tree (DT), Random Forest (RF), and Multilayer Perceptron (MLP) algorithms—are investigated. These algorithms are trained using an in-house physics-based low-order network model tool called OSCILOS. All four algorithms are able to predict which configurations are thermoacoustically unstable with a very high accuracy and a very low runtime. Furthermore, the frequency intervals containing unstable modes for a given configuration are also accurately predicted using multilabel classification. The RF algorithm correctly predicts the overall stability and finds all frequency intervals containing unstable modes for 99.6 and 98.3% of all configurations, respectively, which makes it the most accurate algorithm when a large number of training examples is available. For smaller training sets, the MLP algorithm becomes the most accurate algorithm. The DT algorithm is found to be slightly less accurate, but can be trained extremely quickly and runs about a million times faster than a traditional physics-based low-order network model tool. These findings could be used to devise a new generation of combustor optimization tools that would run much faster than existing codes while retaining a similar accuracy.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45914005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A data-driven method for automated data superposition with applications in soft matter science 数据驱动的自动数据叠加方法及其在软物质科学中的应用

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2022-04-20 DOI: 10.1017/dce.2023.3

Kyle R. Lennon, G. McKinley, J. Swan

Abstract The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.

具有内部参数自相似性的数据集的叠加是物理科学中许多类型的实验数据分析的长期和广泛的技术。通常，这种叠加是手动执行的，或者最近通过应用几种自动化算法中的一种来执行。然而，这些方法在本质上往往是启发式的，容易因手动数据转移或参数化而导致用户偏差，并且缺乏处理数据和叠加数据的结果模型中的不确定性的原生框架。在这项工作中，我们开发了一种数据驱动的非参数方法，用于将实验数据与任意坐标变换叠加在一起，该方法使用高斯过程回归来学习描述数据的统计模型，然后使用最大后验估计来最佳地叠加数据集。该统计框架对实验噪声具有较强的鲁棒性，并对学习到的坐标变换自动产生不确定性估计。此外，它与黑箱机器学习的区别在于它的可解释性——具体来说，它产生的模型本身可以被询问，以深入了解所研究的系统。我们通过将其应用于表征软材料力学的四个代表性数据集来展示我们方法的这些显著特征。在每种情况下，我们的方法都重复了使用其他方法获得的结果，但减少了偏差并增加了不确定性估计。这种方法可以对许多领域的自相似数据进行标准化的统计处理，产生可解释的数据驱动模型，可以为材料分类、设计和发现等应用提供信息。

{"title":"A data-driven method for automated data superposition with applications in soft matter science","authors":"Kyle R. Lennon, G. McKinley, J. Swan","doi":"10.1017/dce.2023.3","DOIUrl":"https://doi.org/10.1017/dce.2023.3","url":null,"abstract":"Abstract The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46638783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Universal Digital Twin: Land Use – ADDENDUM 通用数字孪生：土地利用——附录

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2022-04-08 DOI: 10.1017/dce.2022.8

J. Akroyd, Zachary S. Harper, David Soutar, Feroz Farazi, A. Bhave, S. Mosbach, Markus Kraft

引用次数: 0

Intelligent vehicle drive mode which predicts the driver behavior vector to augment the engine performance in real-time 预测驾驶员行为向量以实时提高发动机性能的智能车辆驾驶模式

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2022-04-07 DOI: 10.1017/dce.2022.15

Srikanth Kolachalama, Hafiz Abid Mahmood Malik

Abstract In this article, a novel drive mode, “intelligent vehicle drive mode” (IVDM), was proposed, which augments the vehicle engine performance in real-time. This drive mode predicts the driver behavior vector (DBV), which optimizes the vehicle engine performance, and the metric of optimal vehicle engine performance was defined using the elements of engine operating point (EOP) and heating ventilation and air conditioning system (HVAC). Deep learning (DL) models were developed by mapping the vehicle level vectors (VLV) with EOP and HVAC parameters, and the trained functions were utilized to predict the future states of DBV reflecting augmented vehicle engine performance. The iterative analysis was performed by empirically estimating the future states of VLV in the allowable range of DBV and was fed into the DL model to predict the performance vectors. The defined vehicle engine performance metric was applied to the predicted vectors, and thus optimal DBV is the instantaneous output of the IVDM. The analytical and validation techniques were developed using field data obtained from General Motors Inc., Warren, Michigan. Finally, the proposed concept was quantified by analyzing the instantaneous engine efficiency (IEE) and smoothness measure of the instantaneous engine map (IEM).

摘要本文提出了一种新的驱动模式——“智能汽车驱动模式”(IVDM)，以增强汽车发动机的实时性能。该驾驶模式预测驾驶员行为向量(DBV)，对汽车发动机性能进行优化，并以发动机工作点(EOP)和暖通空调系统(HVAC)为指标定义了汽车发动机最优性能指标。通过将车辆水平向量(VLV)与EOP和HVAC参数进行映射，建立深度学习模型，并利用训练函数预测DBV的未来状态，以反映增强后的汽车发动机性能。通过经验估计DBV允许范围内VLV的未来状态进行迭代分析，并将其输入到DL模型中预测性能向量。将定义的车辆发动机性能指标应用于预测向量，因此最优DBV是IVDM的瞬时输出。分析和验证技术是根据密歇根州沃伦市通用汽车公司的现场数据开发的。最后，通过分析瞬时发动机效率(IEE)和瞬时发动机图(IEM)的平滑度量，对所提概念进行了量化。

{"title":"Intelligent vehicle drive mode which predicts the driver behavior vector to augment the engine performance in real-time","authors":"Srikanth Kolachalama, Hafiz Abid Mahmood Malik","doi":"10.1017/dce.2022.15","DOIUrl":"https://doi.org/10.1017/dce.2022.15","url":null,"abstract":"Abstract In this article, a novel drive mode, “intelligent vehicle drive mode” (IVDM), was proposed, which augments the vehicle engine performance in real-time. This drive mode predicts the driver behavior vector (DBV), which optimizes the vehicle engine performance, and the metric of optimal vehicle engine performance was defined using the elements of engine operating point (EOP) and heating ventilation and air conditioning system (HVAC). Deep learning (DL) models were developed by mapping the vehicle level vectors (VLV) with EOP and HVAC parameters, and the trained functions were utilized to predict the future states of DBV reflecting augmented vehicle engine performance. The iterative analysis was performed by empirically estimating the future states of VLV in the allowable range of DBV and was fed into the DL model to predict the performance vectors. The defined vehicle engine performance metric was applied to the predicted vectors, and thus optimal DBV is the instantaneous output of the IVDM. The analytical and validation techniques were developed using field data obtained from General Motors Inc., Warren, Michigan. Finally, the proposed concept was quantified by analyzing the instantaneous engine efficiency (IEE) and smoothness measure of the instantaneous engine map (IEM).","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42830087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Managing streamed sensor data for mobile equipment prognostics 管理用于移动设备预测的流式传感器数据

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2022-04-06 DOI: 10.1017/dce.2022.4

T. Griffiths, Débora C. Corrêa, M. Hodkiewicz, A. Polpo

Abstract The ability to wirelessly stream data from sensors on heavy mobile equipment provides opportunities to proactively assess asset condition. However, data analysis methods are challenging to apply due to the size and structure of the data, which contain inconsistent and asynchronous entries, and large periods of missing data. Current methods usually require expertise from site engineers to inform variable selection. In this work, we develop a data preparation method to clean and arrange this streaming data for analysis, including a data-driven variable selection. Data are drawn from a mining industry case study, with sensor data from a primary production excavator over a period of 9 months. Variables include 58 numerical sensors and 40 binary indicators captured in 45-million rows of data describing the conditions and status of different subsystems of the machine. A total of 57% of time stamps contain missing values for at least one sensor. The response variable is drawn from fault codes selected by the operator and stored in the fleet management system. Application to the hydraulic system, for 21 failure events identified by the operator, shows that the data-driven selection contains variables consistent with subject matter expert expectations, as well as some sensors on other systems on the excavator that are less easy to explain from an engineering perspective. Our contribution is to demonstrate a compressed data representation using open-high-low-close and variable selection to visualize data and support identification of potential indicators of failure events from multivariate streamed data.

摘要从重型移动设备上的传感器无线传输数据的能力为主动评估资产状况提供了机会。然而，由于数据的大小和结构，数据分析方法的应用具有挑战性，其中包含不一致和异步的条目，以及大量的数据丢失。目前的方法通常需要现场工程师的专业知识来为变量选择提供信息。在这项工作中，我们开发了一种数据准备方法来清理和排列这些流数据进行分析，包括数据驱动的变量选择。数据来自采矿业案例研究，传感器数据来自一台初级生产挖掘机，历时9个月。变量包括58个数字传感器和40个二进制指示器，它们被捕获在4500万行数据中，描述机器不同子系统的条件和状态。总共57%的时间戳包含至少一个传感器的缺失值。响应变量来自操作员选择的故障代码，并存储在车队管理系统中。应用于液压系统，针对操作员识别的21个故障事件，表明数据驱动的选择包含与主题专家预期一致的变量，以及挖掘机上其他系统上的一些传感器，这些传感器从工程角度来看不太容易解释。我们的贡献是展示了一种压缩数据表示，使用开-高-低-关和变量选择来可视化数据，并支持从多变量流数据中识别故障事件的潜在指标。

{"title":"Managing streamed sensor data for mobile equipment prognostics","authors":"T. Griffiths, Débora C. Corrêa, M. Hodkiewicz, A. Polpo","doi":"10.1017/dce.2022.4","DOIUrl":"https://doi.org/10.1017/dce.2022.4","url":null,"abstract":"Abstract The ability to wirelessly stream data from sensors on heavy mobile equipment provides opportunities to proactively assess asset condition. However, data analysis methods are challenging to apply due to the size and structure of the data, which contain inconsistent and asynchronous entries, and large periods of missing data. Current methods usually require expertise from site engineers to inform variable selection. In this work, we develop a data preparation method to clean and arrange this streaming data for analysis, including a data-driven variable selection. Data are drawn from a mining industry case study, with sensor data from a primary production excavator over a period of 9 months. Variables include 58 numerical sensors and 40 binary indicators captured in 45-million rows of data describing the conditions and status of different subsystems of the machine. A total of 57% of time stamps contain missing values for at least one sensor. The response variable is drawn from fault codes selected by the operator and stored in the fleet management system. Application to the hydraulic system, for 21 failure events identified by the operator, shows that the data-driven selection contains variables consistent with subject matter expert expectations, as well as some sensors on other systems on the excavator that are less easy to explain from an engineering perspective. Our contribution is to demonstrate a compressed data representation using open-high-low-close and variable selection to visualize data and support identification of potential indicators of failure events from multivariate streamed data.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43392846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Feature extraction and artificial neural networks for the on-the-fly classification of high-dimensional thermochemical spaces in adaptive-chemistry simulations—ADDENDUM 自适应化学模拟中高维热化学空间动态分类的特征提取和人工神经网络——附录

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2022-04-06 DOI: 10.1017/dce.2022.12

G. D’Alessio, A. Cuoci, A. Parente

引用次数: 0

Development of a digital twin operational platform using Python Flask—ADDENDUM 使用Python Flask开发数字双操作平台——附录

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2022-04-06 DOI: 10.1017/dce.2022.13

M. Bonney, M. de Angelis, M. Dal Borgo, Luis Andrade, S. Beregi, N. Jamia, D. Wagg

引用次数: 2

Emulating computer experiments of transport infrastructure slope stability using Gaussian processes and Bayesian inference—ADDENDUM 使用高斯过程和贝叶斯推理模拟交通基础设施边坡稳定性的计算机实验——附录

Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

DataCentric Engineering

Pub Date : 2022-04-04 DOI: 10.1017/dce.2022.14

A. Svalova, P. Helm, D. Prangle, M. Rouainia, S. Glendinning, D. Wilkinson

The editors and publisher ofData-Centric Engineering have awarded the Open Data and OpenMaterials badges to this article Svalova A, et al. (2021). Open Data Badge—indicates that data necessary to reproduce the reported results are available in an open access repository, under an open licence, with an accompanying description of the data. Open Materials Badge—indicates that any infrastructure, instruments, or equipment related to the reported methodology are available in an open access repository and are described in sufficient detail to allow a researcher to reproduce the procedure. The original article has been updated to include the badges. Please refer to the Data Availability Statement to find the identifier linking to the open data or open materials.

《以数据为中心的工程》的编辑和出版商授予这篇文章“开放数据”和“开放材料”奖章。开放数据标志——表明再现报告结果所需的数据在开放访问存储库中可用，在开放许可下，并附有数据描述。开放材料徽章——表明与报告方法相关的任何基础设施、仪器或设备都可以在开放存取存储库中获得，并且有足够详细的描述，允许研究人员复制该过程。原来的文章已经更新，包括徽章。请参考数据可用性声明找到链接到开放数据或开放材料的标识符。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

DataCentric Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀