Harmonizing recurring patterns and non-recurring trends in traffic datasets for enhanced estimation of missing information

IF 7.6 1区工程技术 Q1 TRANSPORTATION SCIENCE & TECHNOLOGY Transportation Research Part C-Emerging Technologies Pub Date : 2025-05-01 Epub Date: 2025-03-22 DOI:10.1016/j.trc.2025.105083

Shubham Sharma , Richi Nayak , Ashish Bhaskar

{"title":"Harmonizing recurring patterns and non-recurring trends in traffic datasets for enhanced estimation of missing information","authors":"Shubham Sharma , Richi Nayak , Ashish Bhaskar","doi":"10.1016/j.trc.2025.105083","DOIUrl":null,"url":null,"abstract":"<div><div>Traffic datasets commonly comprise missing information due to sensor malfunctions, environmental conditions, security concerns, and technical/data quality issues. These challenges are inherent in real-world traffic data collection systems. Despite numerous imputation algorithms proposed in the literature, concerns persist about selecting a reliable algorithm that consistently performs well across diverse missing data scenarios. This is crucial for two main reasons. Firstly, real-world traffic datasets often exhibit a range of missing gaps with varying temporal durations, encompassing both short and long gaps within a single dataset. Secondly, in spatio-temporal traffic datasets, both recurring and non-recurring traffic conditions coexist. Since different data imputation principles reported in the literature suit either type of missing data (short or long gaps) and traffic conditions (recurring or non-recurring) better than others, algorithms often output sub-optimal estimates for network-wide datasets characterized by multiple types of missing gaps and traffic conditions.</div><div>To address the issue, this paper proposes a tensor decomposition algorithm named SINTD (Stochastic Informed Non-Negative Tensor Decomposition) and logically integrates it with a spline regression model in a novel data imputation framework called SPRINT (Spline-powered Informed Non-negative Tensor Decomposition). Where SINTD mines dominant patterns in the traffic datasets, effective in estimating missing gaps under recurring traffic conditions, integration of spline with tensor decomposition helps <em>a) capturing the time-localized trends unaccounted by tensor decomposition, aiding in approximating better the non-recurring component of the traffic states</em>, and <em>b) complementing SINTD for improved mining of recurring patterns in the subsequent iterations</em> of SPRINT. Although the two algorithms have distinct limitations when used separately, their harmonization allows us to effectively utilize their respective strengths and overcome individual limitations. This paper, through extensive experimentation on six traffic datasets and benchmarking against nine baseline algorithms, demonstrates the efficacy of SPRINT in consistently producing high-accuracy missing data estimates across five diverse missing data scenarios. These include a) experiments on datasets exhibiting a mix of short and long-duration missing gaps—mimicking the intricate missing data structure of real-world traffic datasets, and b) a Logan City (Australia) case study highlighting the imputation of missing data under potential non-recurring traffic conditions resulting from road incidents.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"174 ","pages":"Article 105083"},"PeriodicalIF":7.6000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25000877","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Traffic datasets commonly comprise missing information due to sensor malfunctions, environmental conditions, security concerns, and technical/data quality issues. These challenges are inherent in real-world traffic data collection systems. Despite numerous imputation algorithms proposed in the literature, concerns persist about selecting a reliable algorithm that consistently performs well across diverse missing data scenarios. This is crucial for two main reasons. Firstly, real-world traffic datasets often exhibit a range of missing gaps with varying temporal durations, encompassing both short and long gaps within a single dataset. Secondly, in spatio-temporal traffic datasets, both recurring and non-recurring traffic conditions coexist. Since different data imputation principles reported in the literature suit either type of missing data (short or long gaps) and traffic conditions (recurring or non-recurring) better than others, algorithms often output sub-optimal estimates for network-wide datasets characterized by multiple types of missing gaps and traffic conditions.

To address the issue, this paper proposes a tensor decomposition algorithm named SINTD (Stochastic Informed Non-Negative Tensor Decomposition) and logically integrates it with a spline regression model in a novel data imputation framework called SPRINT (Spline-powered Informed Non-negative Tensor Decomposition). Where SINTD mines dominant patterns in the traffic datasets, effective in estimating missing gaps under recurring traffic conditions, integration of spline with tensor decomposition helps a) capturing the time-localized trends unaccounted by tensor decomposition, aiding in approximating better the non-recurring component of the traffic states, and b) complementing SINTD for improved mining of recurring patterns in the subsequent iterations of SPRINT. Although the two algorithms have distinct limitations when used separately, their harmonization allows us to effectively utilize their respective strengths and overcome individual limitations. This paper, through extensive experimentation on six traffic datasets and benchmarking against nine baseline algorithms, demonstrates the efficacy of SPRINT in consistently producing high-accuracy missing data estimates across five diverse missing data scenarios. These include a) experiments on datasets exhibiting a mix of short and long-duration missing gaps—mimicking the intricate missing data structure of real-world traffic datasets, and b) a Logan City (Australia) case study highlighting the imputation of missing data under potential non-recurring traffic conditions resulting from road incidents.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

协调交通数据集中的重复模式和非重复趋势，以增强对缺失信息的估计

交通数据集通常包含由于传感器故障、环境条件、安全问题和技术/数据质量问题而丢失的信息。这些挑战是现实世界交通数据收集系统所固有的。尽管文献中提出了许多代入算法，但人们仍然关注如何选择一种可靠的算法，该算法在各种缺失数据场景中始终表现良好。这一点至关重要，主要有两个原因。首先，现实世界的交通数据集经常表现出一系列具有不同时间持续时间的缺失缺口，包括单个数据集中的短缺口和长缺口。其次，在时空交通数据集中，反复出现的交通状况与非反复出现的交通状况并存。由于文献中报道的不同数据插入原则比其他原则更适合缺失数据类型（短或长间隙）和交通条件（重复或非重复），因此算法通常会对具有多种缺失间隙和交通条件的网络范围数据集输出次优估计。为了解决这一问题，本文提出了一种名为SINTD（随机知情非负张量分解）的张量分解算法，并将其与样条回归模型逻辑集成在一个名为SPRINT（样条驱动的知情非负张量分解）的新型数据输入框架中。SINTD挖掘交通数据集中的主导模式，有效地估计重复交通条件下的缺失缺口，样条与张量分解的集成有助于a)捕获张量分解未考虑的时间局部趋势，帮助更好地近似交通状态的非重复成分，以及b)补充SINTD，以改进SPRINT后续迭代中重复模式的挖掘。虽然这两种算法在单独使用时具有明显的局限性，但它们的协调使我们能够有效地利用各自的优势并克服各自的局限性。本文通过对六个交通数据集的广泛实验和对九种基线算法的基准测试，证明了SPRINT在五种不同的缺失数据场景中始终如一地产生高精度缺失数据估计的有效性。其中包括a)对显示短期和长期缺失缺口的数据集进行实验，模拟现实世界交通数据集复杂的缺失数据结构，以及b)澳大利亚洛根市的案例研究，强调在道路事故导致的潜在非重复性交通状况下缺失数据的输入。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Transportation Research Part C-Emerging Technologies 工程技术-运输科技

CiteScore

15.80

自引率

12.00%

发文量

332

审稿时长

64 days

期刊介绍： Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.