Abstract In various scenarios, the motion of a tracked object, for example, a pointing apparatus, pedestrian, animal, vehicle, and others, is driven by achieving a premeditated goal such as reaching a destination. This is albeit the various possible trajectories to this endpoint. This paper presents a generic Bayesian framework that utilizes stochastic models that can capture the influence of intent (viz., destination) on the object behavior. It leads to simple algorithms to infer, as early as possible, the intended endpoint from noisy sensory observations, with relatively low computational and training data requirements. This framework is introduced in the context of the novel predictive touch technology for intelligent user interfaces and touchless interactions. It can determine, early in the interaction task or pointing gesture, the interface item the user intends to select on the display (e.g., touchscreen) and accordingly simplify as well as expedite the selection task. This is shown to significantly improve the usability of displays in vehicles, especially under the influence of perturbations due to road and driving conditions, and enable intuitive contact-free interactions. Data collected in instrumented vehicles are shown to demonstrate the effectiveness of the proposed intent prediction approach. Impact Statement The presented Bayesian framework facilitates automated decision-making, resource allocation and future action planning with applications in various fields, such as in human–computer interaction (HCI), surveillance, robotics, to name a few. It led to the introduction of the patented HCI technology predictive touch, developed as part of a collaboration with Jaguar Land Rover and is set for commercialization; it won a Jaguar Land Rover TATA Innovista Award 2020 (“Dare To Try” category). Predictive touch does not only offer an intuitive approach to touchless interactions (i.e., no physical contact with the display is required), but also it can significantly improve the usability of interactive displays in vehicles or any moving platform, reduce the attention they require and enhance the input accuracy, including under the influence of perturbations due to road and driving conditions. This has been demonstrated in various on-road trials. This touchless interaction technology can have widespread applications in a post COVID-19 world by minimizing the risk of transmission of pathogens via touch surfaces, for instance, when using ticketing or self checkout machines, control panels, and interactive displays in public spaces, kiosks, or workplaces, and so on. It also offers a means to easily interact with emerging display technologies that do not have a physical surface, such as 2D/3D projections and in virtual or augmented reality, and offers additional design flexibility to support inclusive design practices.
{"title":"Modeling intent and destination prediction within a Bayesian framework: Predictive touch as a usecase","authors":"Runze Gan, Jiaming Liang, B. I. Ahmad, S. Godsill","doi":"10.1017/dce.2020.11","DOIUrl":"https://doi.org/10.1017/dce.2020.11","url":null,"abstract":"Abstract In various scenarios, the motion of a tracked object, for example, a pointing apparatus, pedestrian, animal, vehicle, and others, is driven by achieving a premeditated goal such as reaching a destination. This is albeit the various possible trajectories to this endpoint. This paper presents a generic Bayesian framework that utilizes stochastic models that can capture the influence of intent (viz., destination) on the object behavior. It leads to simple algorithms to infer, as early as possible, the intended endpoint from noisy sensory observations, with relatively low computational and training data requirements. This framework is introduced in the context of the novel predictive touch technology for intelligent user interfaces and touchless interactions. It can determine, early in the interaction task or pointing gesture, the interface item the user intends to select on the display (e.g., touchscreen) and accordingly simplify as well as expedite the selection task. This is shown to significantly improve the usability of displays in vehicles, especially under the influence of perturbations due to road and driving conditions, and enable intuitive contact-free interactions. Data collected in instrumented vehicles are shown to demonstrate the effectiveness of the proposed intent prediction approach. Impact Statement The presented Bayesian framework facilitates automated decision-making, resource allocation and future action planning with applications in various fields, such as in human–computer interaction (HCI), surveillance, robotics, to name a few. It led to the introduction of the patented HCI technology predictive touch, developed as part of a collaboration with Jaguar Land Rover and is set for commercialization; it won a Jaguar Land Rover TATA Innovista Award 2020 (“Dare To Try” category). Predictive touch does not only offer an intuitive approach to touchless interactions (i.e., no physical contact with the display is required), but also it can significantly improve the usability of interactive displays in vehicles or any moving platform, reduce the attention they require and enhance the input accuracy, including under the influence of perturbations due to road and driving conditions. This has been demonstrated in various on-road trials. This touchless interaction technology can have widespread applications in a post COVID-19 world by minimizing the risk of transmission of pathogens via touch surfaces, for instance, when using ticketing or self checkout machines, control panels, and interactive displays in public spaces, kiosks, or workplaces, and so on. It also offers a means to easily interact with emerging display technologies that do not have a physical surface, such as 2D/3D projections and in virtual or augmented reality, and offers additional design flexibility to support inclusive design practices.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/dce.2020.11","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44996062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract We develop a model that successfully learns social and organizational human network structure using ambient sensing data from distributed plug load energy sensors in commercial buildings. A key goal for the design and operation of commercial buildings is to support the success of organizations within them. In modern workspaces, a particularly important goal is collaboration, which relies on physical interactions among individuals. Learning the true socio-organizational relational ties among workers can therefore help managers of buildings and organizations make decisions that improve collaboration. In this paper, we introduce the Interaction Model, a method for inferring human network structure that leverages data from distributed plug load energy sensors. In a case study, we benchmark our method against network data obtained through a survey and compare its performance to other data-driven tools. We find that unlike previous methods, our method infers a network that is correlated with the survey network to a statistically significant degree (graph correlation of 0.46, significant at the 0.01 confidence level). We additionally find that our method requires only 10 weeks of sensing data, enabling dynamic network measurement. Learning human network structure through data-driven means can enable the design and operation of spaces that encourage, rather than inhibit, the success of organizations. Impact Statement The structure of social and organizational relationships in commercial building workplaces is a key component of work processes. Understanding this structure—typically described as a network of relational ties—can help designers of workspaces and managers of workplaces make decisions that promote the success of organizations. These networks are complex, and as a result, our traditional means of measuring them are time and cost intensive. In this paper, we present a novel method, the Interaction Model, for learning these network structures automatically through sensing data. When we compare the learned network to network data obtained through a survey, we find statistically significant correlation, demonstrating the success of our method. Two key strengths of our proposed method are, first, that it uncovers network patterns quickly, requiring just 10 weeks of data, and, second, that it is interpretable, relying on intuitive opportunities for social interaction. Data-driven inference of the structure of human systems within our built environment will enable the design and operation of engineered built spaces that promote our human-centered objectives.
{"title":"Learning socio-organizational network structure in buildings with ambient sensing data","authors":"A. Sonta, Rishee K. Jain","doi":"10.1017/dce.2020.9","DOIUrl":"https://doi.org/10.1017/dce.2020.9","url":null,"abstract":"Abstract We develop a model that successfully learns social and organizational human network structure using ambient sensing data from distributed plug load energy sensors in commercial buildings. A key goal for the design and operation of commercial buildings is to support the success of organizations within them. In modern workspaces, a particularly important goal is collaboration, which relies on physical interactions among individuals. Learning the true socio-organizational relational ties among workers can therefore help managers of buildings and organizations make decisions that improve collaboration. In this paper, we introduce the Interaction Model, a method for inferring human network structure that leverages data from distributed plug load energy sensors. In a case study, we benchmark our method against network data obtained through a survey and compare its performance to other data-driven tools. We find that unlike previous methods, our method infers a network that is correlated with the survey network to a statistically significant degree (graph correlation of 0.46, significant at the 0.01 confidence level). We additionally find that our method requires only 10 weeks of sensing data, enabling dynamic network measurement. Learning human network structure through data-driven means can enable the design and operation of spaces that encourage, rather than inhibit, the success of organizations. Impact Statement The structure of social and organizational relationships in commercial building workplaces is a key component of work processes. Understanding this structure—typically described as a network of relational ties—can help designers of workspaces and managers of workplaces make decisions that promote the success of organizations. These networks are complex, and as a result, our traditional means of measuring them are time and cost intensive. In this paper, we present a novel method, the Interaction Model, for learning these network structures automatically through sensing data. When we compare the learned network to network data obtained through a survey, we find statistically significant correlation, demonstrating the success of our method. Two key strengths of our proposed method are, first, that it uncovers network patterns quickly, requiring just 10 weeks of data, and, second, that it is interpretable, relying on intuitive opportunities for social interaction. Data-driven inference of the structure of human systems within our built environment will enable the design and operation of engineered built spaces that promote our human-centered objectives.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/dce.2020.9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43490039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Tewari, Siddharth Dixit, Niteesh Sahni, S. Bordas
Abstract The search space for new thermoelectric oxides has been limited to the alloys of a few known systems, such as ZnO, SrTiO3, and CaMnO3. Notwithstanding the high power factor, their high thermal conductivity is a roadblock in achieving higher efficiency. In this paper, we apply machine learning (ML) models for discovering novel transition metal oxides with low lattice thermal conductivity ( $ {k}_L $ ). A two-step process is proposed to address the problem of small datasets frequently encountered in material informatics. First, a gradient-boosted tree classifier is learnt to categorize unknown compounds into three categories of $ {k}_L $ : low, medium, and high. In the second step, we fit regression models on the targeted class (i.e., low $ {k}_L $ ) to estimate $ {k}_L $ with an $ {R}^2>0.9 $ . Gradient boosted tree model was also used to identify key material properties influencing classification of $ {k}_L $ , namely lattice energy per atom, atom density, band gap, mass density, and ratio of oxygen by transition metal atoms. Only fundamental materials properties describing the crystal symmetry, compound chemistry, and interatomic bonding were used in the classification process, which can be readily used in the initial phases of materials design. The proposed two-step process addresses the problem of small datasets and improves the predictive accuracy. The ML approach adopted in the present work is generic in nature and can be combined with high-throughput computing for the rapid discovery of new materials for specific applications. Impact Statement Discovery of new materials is a complex and challenging task. Sequential nature of experimental route of investigating new materials makes it tedious and resource expensive. Application of data centric methods have shown a lot of promise in the recent past in the rapid discovery of new materials. Machine learning (ML) algorithms do not only predict the properties of interest, but also provide insight into the complex correlations between properties of materials. But the availability of large materials database is a challenge, which are usually required for these methods to attain high levels of predictive accuracy. In this work, a two-step ML process has been proposed to overcome the aforementioned challenge. The proposed method has been demonstrated using a dataset of transition metal oxides to predict their lattice thermal conductivity. Low thermal conductivity transition metal oxides are specially attractive for high temperature thermoelectric application because they exhibit excellent high temperature stability and have tunable electrical properties. The proposed method was able to provide most influencing fundamental materials properties, which can be readily used as design parameters in the early stages of materials selection. The method can be combined with high throughput computations to discover novel materials for specific applications.
{"title":"Machine learning approaches to identify and design low thermal conductivity oxides for thermoelectric applications","authors":"A. Tewari, Siddharth Dixit, Niteesh Sahni, S. Bordas","doi":"10.1017/dce.2020.7","DOIUrl":"https://doi.org/10.1017/dce.2020.7","url":null,"abstract":"Abstract The search space for new thermoelectric oxides has been limited to the alloys of a few known systems, such as ZnO, SrTiO3, and CaMnO3. Notwithstanding the high power factor, their high thermal conductivity is a roadblock in achieving higher efficiency. In this paper, we apply machine learning (ML) models for discovering novel transition metal oxides with low lattice thermal conductivity ( $ {k}_L $ ). A two-step process is proposed to address the problem of small datasets frequently encountered in material informatics. First, a gradient-boosted tree classifier is learnt to categorize unknown compounds into three categories of $ {k}_L $ : low, medium, and high. In the second step, we fit regression models on the targeted class (i.e., low $ {k}_L $ ) to estimate $ {k}_L $ with an $ {R}^2>0.9 $ . Gradient boosted tree model was also used to identify key material properties influencing classification of $ {k}_L $ , namely lattice energy per atom, atom density, band gap, mass density, and ratio of oxygen by transition metal atoms. Only fundamental materials properties describing the crystal symmetry, compound chemistry, and interatomic bonding were used in the classification process, which can be readily used in the initial phases of materials design. The proposed two-step process addresses the problem of small datasets and improves the predictive accuracy. The ML approach adopted in the present work is generic in nature and can be combined with high-throughput computing for the rapid discovery of new materials for specific applications. Impact Statement Discovery of new materials is a complex and challenging task. Sequential nature of experimental route of investigating new materials makes it tedious and resource expensive. Application of data centric methods have shown a lot of promise in the recent past in the rapid discovery of new materials. Machine learning (ML) algorithms do not only predict the properties of interest, but also provide insight into the complex correlations between properties of materials. But the availability of large materials database is a challenge, which are usually required for these methods to attain high levels of predictive accuracy. In this work, a two-step ML process has been proposed to overcome the aforementioned challenge. The proposed method has been demonstrated using a dataset of transition metal oxides to predict their lattice thermal conductivity. Low thermal conductivity transition metal oxides are specially attractive for high temperature thermoelectric application because they exhibit excellent high temperature stability and have tunable electrical properties. The proposed method was able to provide most influencing fundamental materials properties, which can be readily used as design parameters in the early stages of materials selection. The method can be combined with high throughput computations to discover novel materials for specific applications.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/dce.2020.7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42397520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Eibeck, A. Chadzynski, Mei Qi Lim, K. Aditya, Laura Ong, A. Devanand, G. Karmakar, S. Mosbach, Raymond Lau, I. Karimi, Eddy Y. S. Foo, M. Kraft
Abstract This paper presents Parallel World Framework as a solution for simulations of complex systems within a time-varying knowledge graph and its application to the electric grid of Jurong Island in Singapore. The underlying modeling system is based on the Semantic Web Stack. Its linked data layer is described by means of ontologies, which span multiple domains. The framework is designed to allow what-if scenarios to be simulated generically, even for complex, inter-linked, cross-domain applications, as well as conducting multi-scale optimizations of complex superstructures within the system. Parallel world containers, introduced by the framework, ensure data separation and versioning of structures crossing various domain boundaries. Separation of operations, belonging to a particular version of the world, is taken care of by a scenario agent. It encapsulates functionality of operations on data and acts as a parallel world proxy to all of the other agents operating on the knowledge graph. Electric network optimization for carbon tax is demonstrated as a use case. The framework allows to model and evaluate electrical networks corresponding to set carbon tax values by retrofitting different types of power generators and optimizing the grid accordingly. The use case shows the possibility of using this solution as a tool for CO2 reduction modeling and planning at scale due to its distributed architecture. Impact Statement The methodology developed in this paper allows simulation of complex systems that consist of many interdependent parts, such as an industrial park, as well as variations thereof, referred to as parallel worlds. In addition to the ability to consider different scenarios, a key distinguishing feature of our approach, which is based on a generic all-purpose design that enables interoperability between heterogeneous software and, as a consequence, cross-domain applications, is its employment of knowledge graphs and autonomous software agents. As such, the methodology presented here allows city planners and policy makers to ask what-if questions or explore alternatives—a process that can play an important role in decision-making. As an example, optimizing the electrical grid of Jurong Island in Singapore is considered, for two different levels of carbon tax, thus demonstrating how the methodology can assist planning for carbon footprint reduction.
{"title":"A Parallel World Framework for scenario analysis in knowledge graphs","authors":"A. Eibeck, A. Chadzynski, Mei Qi Lim, K. Aditya, Laura Ong, A. Devanand, G. Karmakar, S. Mosbach, Raymond Lau, I. Karimi, Eddy Y. S. Foo, M. Kraft","doi":"10.1017/dce.2020.6","DOIUrl":"https://doi.org/10.1017/dce.2020.6","url":null,"abstract":"Abstract This paper presents Parallel World Framework as a solution for simulations of complex systems within a time-varying knowledge graph and its application to the electric grid of Jurong Island in Singapore. The underlying modeling system is based on the Semantic Web Stack. Its linked data layer is described by means of ontologies, which span multiple domains. The framework is designed to allow what-if scenarios to be simulated generically, even for complex, inter-linked, cross-domain applications, as well as conducting multi-scale optimizations of complex superstructures within the system. Parallel world containers, introduced by the framework, ensure data separation and versioning of structures crossing various domain boundaries. Separation of operations, belonging to a particular version of the world, is taken care of by a scenario agent. It encapsulates functionality of operations on data and acts as a parallel world proxy to all of the other agents operating on the knowledge graph. Electric network optimization for carbon tax is demonstrated as a use case. The framework allows to model and evaluate electrical networks corresponding to set carbon tax values by retrofitting different types of power generators and optimizing the grid accordingly. The use case shows the possibility of using this solution as a tool for CO2 reduction modeling and planning at scale due to its distributed architecture. Impact Statement The methodology developed in this paper allows simulation of complex systems that consist of many interdependent parts, such as an industrial park, as well as variations thereof, referred to as parallel worlds. In addition to the ability to consider different scenarios, a key distinguishing feature of our approach, which is based on a generic all-purpose design that enables interoperability between heterogeneous software and, as a consequence, cross-domain applications, is its employment of knowledge graphs and autonomous software agents. As such, the methodology presented here allows city planners and policy makers to ask what-if questions or explore alternatives—a process that can play an important role in decision-making. As an example, optimizing the electrical grid of Jurong Island in Singapore is considered, for two different levels of carbon tax, thus demonstrating how the methodology can assist planning for carbon footprint reduction.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/dce.2020.6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48430064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pavan K Inguva, L. Mason, Indranil Pan, Miselle Hengardi, O. Matar
Abstract Multicomponent polymer systems are of interest in organic photovoltaic and drug delivery applications, among others where diverse morphologies influence performance. An improved understanding of morphology classification, driven by composition-informed prediction tools, will aid polymer engineering practice. We use a modified Cahn–Hilliard model to simulate polymer precipitation. Such physics-based models require high-performance computations that prevent rapid prototyping and iteration in engineering settings. To reduce the required computational costs, we apply machine learning (ML) techniques for clustering and consequent prediction of the simulated polymer-blend images in conjunction with simulations. Integrating ML and simulations in such a manner reduces the number of simulations needed to map out the morphology of polymer blends as a function of input parameters and also generates a data set which can be used by others to this end. We explore dimensionality reduction, via principal component analysis and autoencoder techniques, and analyze the resulting morphology clusters. Supervised ML using Gaussian process classification was subsequently used to predict morphology clusters according to species molar fraction and interaction parameter inputs. Manual pattern clustering yielded the best results, but ML techniques were able to predict the morphology of polymer blends with ≥90% accuracy.
{"title":"Numerical simulation, clustering, and prediction of multicomponent polymer precipitation","authors":"Pavan K Inguva, L. Mason, Indranil Pan, Miselle Hengardi, O. Matar","doi":"10.1017/dce.2020.14","DOIUrl":"https://doi.org/10.1017/dce.2020.14","url":null,"abstract":"Abstract Multicomponent polymer systems are of interest in organic photovoltaic and drug delivery applications, among others where diverse morphologies influence performance. An improved understanding of morphology classification, driven by composition-informed prediction tools, will aid polymer engineering practice. We use a modified Cahn–Hilliard model to simulate polymer precipitation. Such physics-based models require high-performance computations that prevent rapid prototyping and iteration in engineering settings. To reduce the required computational costs, we apply machine learning (ML) techniques for clustering and consequent prediction of the simulated polymer-blend images in conjunction with simulations. Integrating ML and simulations in such a manner reduces the number of simulations needed to map out the morphology of polymer blends as a function of input parameters and also generates a data set which can be used by others to this end. We explore dimensionality reduction, via principal component analysis and autoencoder techniques, and analyze the resulting morphology clusters. Supervised ML using Gaussian process classification was subsequently used to predict morphology clusters according to species molar fraction and interaction parameter inputs. Manual pattern clustering yielded the best results, but ML techniques were able to predict the morphology of polymer blends with ≥90% accuracy.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/dce.2020.14","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44184828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data-Centric Engineering isapeer-reviewed,openaccess journalforworkthatpromotestheuseofexperimental and observational data — and new methods of sensing, measurement, and data capture — in all areas of engineering in order to design systems and products that are more reliable, resilient, efficient and safe. For more details see cambridge.org/dce.
{"title":"Introducing Data-Centric Engineering: An open access journal dedicated to the transformation of engineering design and practice","authors":"M. Girolami","doi":"10.1017/dce.2020.5","DOIUrl":"https://doi.org/10.1017/dce.2020.5","url":null,"abstract":"Data-Centric Engineering isapeer-reviewed,openaccess journalforworkthatpromotestheuseofexperimental and observational data — and new methods of sensing, measurement, and data capture — in all areas of engineering in order to design systems and products that are more reliable, resilient, efficient and safe. For more details see cambridge.org/dce.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/dce.2020.5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46903681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Dodds, Pauline L'Henaff, James Maddison, D. Yates
Abstract This paper introduces a set of principles that articulate a shared vision for increasing access to data in the engineering and related sectors. The principles are intended to help guide progress toward a data ecosystem that provides sustainable access to data, in ways that will help a variety of stakeholders in maximizing its value while mitigating potential harms. In addition to being a manifesto for change, the principles can also be viewed as a means for understanding the alignment, overlaps and gaps between a range of existing research programs, policy initiatives, and related work on data governance and sharing. After providing background on the growing data economy and relevant recent policy initiatives in the United Kingdom and European Union, we then introduce the nine key principles of the manifesto. For each principle, we provide some additional rationale and links to related work. We invite feedback on the manifesto and endorsements from a range of stakeholders.
{"title":"A manifesto for increasing access to data in engineering","authors":"L. Dodds, Pauline L'Henaff, James Maddison, D. Yates","doi":"10.1017/dce.2020.3","DOIUrl":"https://doi.org/10.1017/dce.2020.3","url":null,"abstract":"Abstract This paper introduces a set of principles that articulate a shared vision for increasing access to data in the engineering and related sectors. The principles are intended to help guide progress toward a data ecosystem that provides sustainable access to data, in ways that will help a variety of stakeholders in maximizing its value while mitigating potential harms. In addition to being a manifesto for change, the principles can also be viewed as a means for understanding the alignment, overlaps and gaps between a range of existing research programs, policy initiatives, and related work on data governance and sharing. After providing background on the growing data economy and relevant recent policy initiatives in the United Kingdom and European Union, we then introduce the nine key principles of the manifesto. For each principle, we provide some additional rationale and links to related work. We invite feedback on the manifesto and endorsements from a range of stakeholders.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/dce.2020.3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44410608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Data-Centric Engineering is an emerging branch of science that certainly will take on a leading role in data-driven research. We live in the Big Data era with huge amounts of available data and unseen computing power, and therefore a crafty combination of Statistics (or, in more modern terms, Data Science), Computer Science and Engineering is required to filter out the most important information, master the ever more difficult challenges of a changing world and open new paths. In this paper, we will highlight some of these aspects from a combined perspective of a statistician, an engineer and a software developer. In particular, we will focus on sound data handling and analysis, computational science in Structural Engineering, data care, security and monitoring, and conclude with an outlook on future developments.
{"title":"Data-Centric Engineering in modern science from the perspective of a statistician, an engineer, and a software developer","authors":"Christophe Ley, Mike Tibolt, Dirk Fromme","doi":"10.1017/dce.2020.2","DOIUrl":"https://doi.org/10.1017/dce.2020.2","url":null,"abstract":"Abstract Data-Centric Engineering is an emerging branch of science that certainly will take on a leading role in data-driven research. We live in the Big Data era with huge amounts of available data and unseen computing power, and therefore a crafty combination of Statistics (or, in more modern terms, Data Science), Computer Science and Engineering is required to filter out the most important information, master the ever more difficult challenges of a changing world and open new paths. In this paper, we will highlight some of these aspects from a combined perspective of a statistician, an engineer and a software developer. In particular, we will focus on sound data handling and analysis, computational science in Structural Engineering, data care, security and monitoring, and conclude with an outlook on future developments.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/dce.2020.2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43543893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Callum Webb, J. Sikorska, R. N. Khan, M. Hodkiewicz
Abstract Conveyor belt wear is an important consideration in the bulk materials handling industry. We define four belt wear rate metrics and develop a model to predict wear rates of new conveyor configurations using an industry dataset that includes ultrasonic thickness measurements, conveyor attributes, and conveyor throughput. All variables are expected to contribute in some way to explaining wear rate and are included in modeling. One specific metric, the maximum throughput-based wear rate, is selected as the prediction target, and cross-validation is used to evaluate the out-of-sample performance of random forest and linear regression algorithms. The random forest approach achieves a lower error of 0.152 mm/megatons (standard deviation [SD] = 0.0648). Permutation importance and partial dependence plots are computed to provide insights into the relationship between conveyor parameters and wear rate. This work demonstrates how belt wear rate can be quantified from imprecise thickness testing methods and provides a transparent modeling framework applicable to other supervised learning problems in risk and reliability.
{"title":"Developing and evaluating predictive conveyor belt wear models","authors":"Callum Webb, J. Sikorska, R. N. Khan, M. Hodkiewicz","doi":"10.1017/dce.2020.1","DOIUrl":"https://doi.org/10.1017/dce.2020.1","url":null,"abstract":"Abstract Conveyor belt wear is an important consideration in the bulk materials handling industry. We define four belt wear rate metrics and develop a model to predict wear rates of new conveyor configurations using an industry dataset that includes ultrasonic thickness measurements, conveyor attributes, and conveyor throughput. All variables are expected to contribute in some way to explaining wear rate and are included in modeling. One specific metric, the maximum throughput-based wear rate, is selected as the prediction target, and cross-validation is used to evaluate the out-of-sample performance of random forest and linear regression algorithms. The random forest approach achieves a lower error of 0.152 mm/megatons (standard deviation [SD] = 0.0648). Permutation importance and partial dependence plots are computed to provide insights into the relationship between conveyor parameters and wear rate. This work demonstrates how belt wear rate can be quantified from imprecise thickness testing methods and provides a transparent modeling framework applicable to other supervised learning problems in risk and reliability.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/dce.2020.1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49371962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changmin Yu, M. Seslija, George Brownbridge, S. Mosbach, M. Kraft, M. Parsi, Mark Davis, Vivian J. Page, A. Bhave
Abstract We apply deep kernel learning (DKL), which can be viewed as a combination of a Gaussian process (GP) and a deep neural network (DNN), to compression ignition engine emissions and compare its performance to a selection of other surrogate models on the same dataset. Surrogate models are a class of computationally cheaper alternatives to physics-based models. High-dimensional model representation (HDMR) is also briefly discussed and acts as a benchmark model for comparison. We apply the considered methods to a dataset, which was obtained from a compression ignition engine and includes as outputs soot and NOx emissions as functions of 14 engine operating condition variables. We combine a quasi-random global search with a conventional grid-optimization method in order to identify suitable values for several DKL hyperparameters, which include network architecture, kernel, and learning parameters. The performance of DKL, HDMR, plain GPs, and plain DNNs is compared in terms of the root mean squared error (RMSE) of the predictions as well as computational expense of training and evaluation. It is shown that DKL performs best in terms of RMSE in the predictions whilst maintaining the computational cost at a reasonable level, and DKL predictions are in good agreement with the experimental emissions data.
{"title":"Deep kernel learning approach to engine emissions modeling","authors":"Changmin Yu, M. Seslija, George Brownbridge, S. Mosbach, M. Kraft, M. Parsi, Mark Davis, Vivian J. Page, A. Bhave","doi":"10.1017/dce.2020.4","DOIUrl":"https://doi.org/10.1017/dce.2020.4","url":null,"abstract":"Abstract We apply deep kernel learning (DKL), which can be viewed as a combination of a Gaussian process (GP) and a deep neural network (DNN), to compression ignition engine emissions and compare its performance to a selection of other surrogate models on the same dataset. Surrogate models are a class of computationally cheaper alternatives to physics-based models. High-dimensional model representation (HDMR) is also briefly discussed and acts as a benchmark model for comparison. We apply the considered methods to a dataset, which was obtained from a compression ignition engine and includes as outputs soot and NOx emissions as functions of 14 engine operating condition variables. We combine a quasi-random global search with a conventional grid-optimization method in order to identify suitable values for several DKL hyperparameters, which include network architecture, kernel, and learning parameters. The performance of DKL, HDMR, plain GPs, and plain DNNs is compared in terms of the root mean squared error (RMSE) of the predictions as well as computational expense of training and evaluation. It is shown that DKL performs best in terms of RMSE in the predictions whilst maintaining the computational cost at a reasonable level, and DKL predictions are in good agreement with the experimental emissions data.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/dce.2020.4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42073950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}