{"title":"A Multi-layered Collaborative Framework for Evidence-driven Data Requirements Engineering for Machine Learning-based Safety-critical Systems","authors":"Sangeeta Dey, Seok-Won Lee","doi":"10.1145/3555776.3577647","DOIUrl":null,"url":null,"abstract":"In the days of AI, data-centric machine learning (ML) models are increasingly used in various complex systems. While many researchers are focusing on specifying ML-specific performance requirements, not enough guideline is provided to engineer the data requirements systematically involving diverse stakeholders. Lack of written agreement about the training data, collaboration bottlenecks, lack of data validation framework, etc. are posing new challenges to ensuring training data fitness for safety-critical ML components. To reduce these gaps, we propose a multi-layered framework that helps to perceive and elicit data requirements. We provide a template for verifiable data requirements specifications. Moreover, we show how such requirements can facilitate an evidence-driven assessment of the training data quality based on the experts' judgments about the satisfaction of the requirements. We use Dempster Shafer's theory to combine experts' subjective opinions in the process. A preliminary case study on the CityPersons dataset for the pedestrian detection feature of autonomous cars shows the usefulness of the proposed framework for data requirements understanding and the confidence assessment of the dataset.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555776.3577647","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In the days of AI, data-centric machine learning (ML) models are increasingly used in various complex systems. While many researchers are focusing on specifying ML-specific performance requirements, not enough guideline is provided to engineer the data requirements systematically involving diverse stakeholders. Lack of written agreement about the training data, collaboration bottlenecks, lack of data validation framework, etc. are posing new challenges to ensuring training data fitness for safety-critical ML components. To reduce these gaps, we propose a multi-layered framework that helps to perceive and elicit data requirements. We provide a template for verifiable data requirements specifications. Moreover, we show how such requirements can facilitate an evidence-driven assessment of the training data quality based on the experts' judgments about the satisfaction of the requirements. We use Dempster Shafer's theory to combine experts' subjective opinions in the process. A preliminary case study on the CityPersons dataset for the pedestrian detection feature of autonomous cars shows the usefulness of the proposed framework for data requirements understanding and the confidence assessment of the dataset.