{"title":"机器学习(ML)系统中风险缓解的依赖跟踪","authors":"Xiwei Xu, Chen Wang, Zhen Wang, Q. Lu, Liming Zhu","doi":"10.1145/3510457.3513058","DOIUrl":null,"url":null,"abstract":"In a Machine Learning (ML) system, characteristics of the ML components create new challenges for software system design and development activities. Data-dependent behavior causes risks in ML systems. Dealing with such risks in the development phase requires non-trivial costs due to un-controllable data generation processes in the test phase. In addition, ML systems often need continuous monitoring and validation in run-time. In this paper, we propose an integrated dependency tracking system that balances the cost and risks in the development stage and operation stage. Our solution uses blockchain (an immutable data store) to track the co-evolution of the models and the corresponding datasets. The provenance of data and models provides a trustworthy trace for dependencies between datasets and models at the development phase, and predictions at the operation phase. A graph database is used to provide visualization and query of the provenance information, and enables explainability for the model-data co-evolution.","PeriodicalId":119790,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Dependency Tracking for Risk Mitigation in Machine Learning (ML) Systems\",\"authors\":\"Xiwei Xu, Chen Wang, Zhen Wang, Q. Lu, Liming Zhu\",\"doi\":\"10.1145/3510457.3513058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In a Machine Learning (ML) system, characteristics of the ML components create new challenges for software system design and development activities. Data-dependent behavior causes risks in ML systems. Dealing with such risks in the development phase requires non-trivial costs due to un-controllable data generation processes in the test phase. In addition, ML systems often need continuous monitoring and validation in run-time. In this paper, we propose an integrated dependency tracking system that balances the cost and risks in the development stage and operation stage. Our solution uses blockchain (an immutable data store) to track the co-evolution of the models and the corresponding datasets. The provenance of data and models provides a trustworthy trace for dependencies between datasets and models at the development phase, and predictions at the operation phase. A graph database is used to provide visualization and query of the provenance information, and enables explainability for the model-data co-evolution.\",\"PeriodicalId\":119790,\"journal\":{\"name\":\"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3510457.3513058\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510457.3513058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dependency Tracking for Risk Mitigation in Machine Learning (ML) Systems
In a Machine Learning (ML) system, characteristics of the ML components create new challenges for software system design and development activities. Data-dependent behavior causes risks in ML systems. Dealing with such risks in the development phase requires non-trivial costs due to un-controllable data generation processes in the test phase. In addition, ML systems often need continuous monitoring and validation in run-time. In this paper, we propose an integrated dependency tracking system that balances the cost and risks in the development stage and operation stage. Our solution uses blockchain (an immutable data store) to track the co-evolution of the models and the corresponding datasets. The provenance of data and models provides a trustworthy trace for dependencies between datasets and models at the development phase, and predictions at the operation phase. A graph database is used to provide visualization and query of the provenance information, and enables explainability for the model-data co-evolution.