Learning State-Dependent Policy Parametrizations for Dynamic Technician Routing with Rework

arXiv - CS - Artificial Intelligence Pub Date : 2024-09-03 DOI:arxiv-2409.01815

Jonas Stein, Florentin D Hildebrandt, Barrett W Thomas, Marlin W Ulmer

{"title":"Learning State-Dependent Policy Parametrizations for Dynamic Technician Routing with Rework","authors":"Jonas Stein, Florentin D Hildebrandt, Barrett W Thomas, Marlin W Ulmer","doi":"arxiv-2409.01815","DOIUrl":null,"url":null,"abstract":"Home repair and installation services require technicians to visit customers\nand resolve tasks of different complexity. Technicians often have heterogeneous\nskills and working experiences. The geographical spread of customers makes\nachieving only perfect matches between technician skills and task requirements\nimpractical. Additionally, technicians are regularly absent due to sickness.\nWith non-perfect assignments regarding task requirement and technician skill,\nsome tasks may remain unresolved and require a revisit and rework. Companies\nseek to minimize customer inconvenience due to delay. We model the problem as a\nsequential decision process where, over a number of service days, customers\nrequest service while heterogeneously skilled technicians are routed to serve\ncustomers in the system. Each day, our policy iteratively builds tours by\nadding \"important\" customers. The importance bases on analytical considerations\nand is measured by respecting routing efficiency, urgency of service, and risk\nof rework in an integrated fashion. We propose a state-dependent balance of\nthese factors via reinforcement learning. A comprehensive study shows that\ntaking a few non-perfect assignments can be quite beneficial for the overall\nservice quality. We further demonstrate the value provided by a state-dependent\nparametrization.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"248 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Home repair and installation services require technicians to visit customers and resolve tasks of different complexity. Technicians often have heterogeneous skills and working experiences. The geographical spread of customers makes achieving only perfect matches between technician skills and task requirements impractical. Additionally, technicians are regularly absent due to sickness. With non-perfect assignments regarding task requirement and technician skill, some tasks may remain unresolved and require a revisit and rework. Companies seek to minimize customer inconvenience due to delay. We model the problem as a sequential decision process where, over a number of service days, customers request service while heterogeneously skilled technicians are routed to serve customers in the system. Each day, our policy iteratively builds tours by adding "important" customers. The importance bases on analytical considerations and is measured by respecting routing efficiency, urgency of service, and risk of rework in an integrated fashion. We propose a state-dependent balance of these factors via reinforcement learning. A comprehensive study shows that taking a few non-perfect assignments can be quite beneficial for the overall service quality. We further demonstrate the value provided by a state-dependent parametrization.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为带返工的动态技术人员路由学习与状态相关的策略参数化

家庭维修和安装服务需要技术人员拜访客户，并解决不同复杂程度的任务。技术人员通常拥有不同的技能和工作经验。由于客户分布在不同的地域，要实现技术人员的技能与任务要求完全匹配是不现实的。此外，技术人员经常因病缺勤。在任务要求和技术人员技能不完全匹配的情况下，有些任务可能无法解决，需要重新检查和返工。公司希望尽量减少因延误而给客户带来的不便。我们将该问题建模为一个连续的决策过程，在该过程中，在若干个服务日内，客户提出服务请求，而技术水平参差不齐的技术人员被分派到系统中为客户提供服务。每天，我们的策略都会通过增加 "重要 "客户来迭代建立巡回服务。重要程度基于分析考虑，并通过综合考虑路由效率、服务紧迫性和返工风险来衡量。我们建议通过强化学习来平衡这些因素。一项综合研究表明，接受一些非完美任务对整体服务质量是非常有益的。我们进一步证明了与状态相关的参数化所带来的价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Artificial Intelligence

自引率

0.00%

发文量