{"title":"Probabilistically Extended Ontologies: A Basis for Systematic Testing of ML-Based Systems","authors":"H. Wiesbrock, Jürgen Grossmann","doi":"10.4271/2024-01-3002","DOIUrl":null,"url":null,"abstract":"Typically, machine learning techniques are used to realise autonomous driving. Be it as part of environment recognition or ultimately when making driving decisions. Machine learning generally involves the use of stochastic methods to provide statistical inference. Failures and wrong decisions are unavoidable due to the statistical nature of machine learning and are often directly related to root causes that cannot be easily eliminated. The quality of these systems is normally indicated by statistical indicators such as accuracy and precision. Providing evidence that accuracy and precision of these systems are sufficient to guarantee a safe operation is key for the acceptance of autonomous driving. Usually, tests and simulations are extensively used to provide this kind of evidence.However, the basis of all descriptive statistics is a random selection from a probability space. A major challenge in testing or constructing the training and test data set is that this probability space is usually not well defined. To systematically address this shortcoming, ontologies have been and are being developed to capture the various concepts and properties of the operational design domain. They serve as a basis for the specification of appropriate tests in different approaches. However, in order to make statistical statements about the system, information about the realistic frequency of the inferred test cases is still missing. Related to this problem is the proof of completeness and balance of the training data. While an ontology may be able to check the completeness of the training data, it lacks any information to prove its representativeness. In this article, we propose the extension of ontologies to include probabilistic information. This allows to evaluate the completeness and balance of training sets. Moreover, it serves as a basis for a random sampling of test cases, which allows mathematically sound statistical proofs of the quality of the ML system. We demonstrate our approach by extending published ontologies that capture typical scenarios of autonomous driving systems with probabilistic information.","PeriodicalId":510086,"journal":{"name":"SAE Technical Paper Series","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SAE Technical Paper Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4271/2024-01-3002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Typically, machine learning techniques are used to realise autonomous driving. Be it as part of environment recognition or ultimately when making driving decisions. Machine learning generally involves the use of stochastic methods to provide statistical inference. Failures and wrong decisions are unavoidable due to the statistical nature of machine learning and are often directly related to root causes that cannot be easily eliminated. The quality of these systems is normally indicated by statistical indicators such as accuracy and precision. Providing evidence that accuracy and precision of these systems are sufficient to guarantee a safe operation is key for the acceptance of autonomous driving. Usually, tests and simulations are extensively used to provide this kind of evidence.However, the basis of all descriptive statistics is a random selection from a probability space. A major challenge in testing or constructing the training and test data set is that this probability space is usually not well defined. To systematically address this shortcoming, ontologies have been and are being developed to capture the various concepts and properties of the operational design domain. They serve as a basis for the specification of appropriate tests in different approaches. However, in order to make statistical statements about the system, information about the realistic frequency of the inferred test cases is still missing. Related to this problem is the proof of completeness and balance of the training data. While an ontology may be able to check the completeness of the training data, it lacks any information to prove its representativeness. In this article, we propose the extension of ontologies to include probabilistic information. This allows to evaluate the completeness and balance of training sets. Moreover, it serves as a basis for a random sampling of test cases, which allows mathematically sound statistical proofs of the quality of the ML system. We demonstrate our approach by extending published ontologies that capture typical scenarios of autonomous driving systems with probabilistic information.