{"title":"克服数据质量差的问题:优化先例关系数据的验证","authors":"Benedikt Finnah , Jochen Gönsch , Alena Otto","doi":"10.1016/j.ejor.2024.11.009","DOIUrl":null,"url":null,"abstract":"<div><div>Insufficient data quality prevents data usage by decision support systems (DSS) in many areas of business. This is the case for data on precedence relations between tasks, which is relevant, for instance, in project scheduling and assembly line balancing. Inaccurate data on unnecessary precedence relations cannot be used, otherwise the recommendations of DSS may turn infeasible. So, unnecessary relations must be satisfied, diminishing the baseline problem’s solution space and the business result. Experts can validate the data, but their time is limited.</div><div>We apply an optimization lens and formulate the data validation problem (DVP). Restricted by the available time budget, an expert dynamically receives queries about specific data entries and corrects or validates them. The DVP searches for an interview policy that states queries to the expert, each using up some of the time budget, in a way that maximizes the (weighted) number of removed precedence relations. We model the DVP as a dynamic program, derive optimal policies for several important special cases and design a heuristic interview policy LSTD. In a case study of an automobile manufacturer, this policy substantially reduces the stations’ idle time after selectively addressing about 8% of the data entries.</div><div>We prove theoretically and numerically that data validation by experts can lead to significant savings. The number of queries required to validate the data exhaustively is much less than naive estimates. Additionally, the probability to remove an unnecessary precedence relation per query in a series of queries is high, even for simple interview policies.</div></div>","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"322 3","pages":"Pages 740-752"},"PeriodicalIF":6.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Overcoming poor data quality: Optimizing validation of precedence relation data\",\"authors\":\"Benedikt Finnah , Jochen Gönsch , Alena Otto\",\"doi\":\"10.1016/j.ejor.2024.11.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Insufficient data quality prevents data usage by decision support systems (DSS) in many areas of business. This is the case for data on precedence relations between tasks, which is relevant, for instance, in project scheduling and assembly line balancing. Inaccurate data on unnecessary precedence relations cannot be used, otherwise the recommendations of DSS may turn infeasible. So, unnecessary relations must be satisfied, diminishing the baseline problem’s solution space and the business result. Experts can validate the data, but their time is limited.</div><div>We apply an optimization lens and formulate the data validation problem (DVP). Restricted by the available time budget, an expert dynamically receives queries about specific data entries and corrects or validates them. The DVP searches for an interview policy that states queries to the expert, each using up some of the time budget, in a way that maximizes the (weighted) number of removed precedence relations. We model the DVP as a dynamic program, derive optimal policies for several important special cases and design a heuristic interview policy LSTD. In a case study of an automobile manufacturer, this policy substantially reduces the stations’ idle time after selectively addressing about 8% of the data entries.</div><div>We prove theoretically and numerically that data validation by experts can lead to significant savings. The number of queries required to validate the data exhaustively is much less than naive estimates. Additionally, the probability to remove an unnecessary precedence relation per query in a series of queries is high, even for simple interview policies.</div></div>\",\"PeriodicalId\":55161,\"journal\":{\"name\":\"European Journal of Operational Research\",\"volume\":\"322 3\",\"pages\":\"Pages 740-752\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Operational Research\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0377221724008609\",\"RegionNum\":2,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/14 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"OPERATIONS RESEARCH & MANAGEMENT SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377221724008609","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/14 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
Overcoming poor data quality: Optimizing validation of precedence relation data
Insufficient data quality prevents data usage by decision support systems (DSS) in many areas of business. This is the case for data on precedence relations between tasks, which is relevant, for instance, in project scheduling and assembly line balancing. Inaccurate data on unnecessary precedence relations cannot be used, otherwise the recommendations of DSS may turn infeasible. So, unnecessary relations must be satisfied, diminishing the baseline problem’s solution space and the business result. Experts can validate the data, but their time is limited.
We apply an optimization lens and formulate the data validation problem (DVP). Restricted by the available time budget, an expert dynamically receives queries about specific data entries and corrects or validates them. The DVP searches for an interview policy that states queries to the expert, each using up some of the time budget, in a way that maximizes the (weighted) number of removed precedence relations. We model the DVP as a dynamic program, derive optimal policies for several important special cases and design a heuristic interview policy LSTD. In a case study of an automobile manufacturer, this policy substantially reduces the stations’ idle time after selectively addressing about 8% of the data entries.
We prove theoretically and numerically that data validation by experts can lead to significant savings. The number of queries required to validate the data exhaustively is much less than naive estimates. Additionally, the probability to remove an unnecessary precedence relation per query in a series of queries is high, even for simple interview policies.
期刊介绍:
The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.