{"title":"A supervised machine learning model for imputing missing boarding stops in smart card data.","authors":"Nadav Shalit, Michael Fire, Eran Ben-Elia","doi":"10.1007/s12469-022-00309-0","DOIUrl":null,"url":null,"abstract":"<p><p>Public transport has become an essential part of urban existence with increased population densities and environmental awareness. Large quantities of data are currently generated, allowing for more robust methods to understand travel behavior by harvesting smart card usage. However, public transport datasets suffer from data integrity problems; boarding stop information may be missing due to imperfect acquirement processes or inadequate reporting. This study introduces a supervised machine learning method to impute missing boarding stops based on ordinal classification using GTFS timetable, smart card, and geospatial datasets. A new metric, Pareto Accuracy, is suggested to evaluate algorithms where classes have an ordinal nature. The results are based on a case study in the city of Beer Sheva, Israel, consisting of one month of smart card data. We show that our proposed method is robust to irregular travelers and significantly outperforms well-known imputation methods without the need to mine any additional datasets. The data validation from another Israeli city using transfer learning shows the presented model is general and context-free. The implications for transportation planning and travel behavior research are further discussed.</p>","PeriodicalId":46539,"journal":{"name":"Public Transport","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9734418/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Public Transport","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12469-022-00309-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/12/7 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Public transport has become an essential part of urban existence with increased population densities and environmental awareness. Large quantities of data are currently generated, allowing for more robust methods to understand travel behavior by harvesting smart card usage. However, public transport datasets suffer from data integrity problems; boarding stop information may be missing due to imperfect acquirement processes or inadequate reporting. This study introduces a supervised machine learning method to impute missing boarding stops based on ordinal classification using GTFS timetable, smart card, and geospatial datasets. A new metric, Pareto Accuracy, is suggested to evaluate algorithms where classes have an ordinal nature. The results are based on a case study in the city of Beer Sheva, Israel, consisting of one month of smart card data. We show that our proposed method is robust to irregular travelers and significantly outperforms well-known imputation methods without the need to mine any additional datasets. The data validation from another Israeli city using transfer learning shows the presented model is general and context-free. The implications for transportation planning and travel behavior research are further discussed.
期刊介绍:
The scope and purpose of the journal includes, but is not limited to, any type of research in the area of Public Transport: Planning and Operations. As its core it serves the primary mission of advancing the state of the art and the state of the practice in computer-aided systems and scheduling in public transport. The journal considers any type of subjects in this area especially with a focus to planning and scheduling, the common ground is the use of computer-aided methods and operations research techniques to improve information management, network and route planning, vehicle and crew scheduling and rostering, vehicle monitoring and management, and practical experience with scheduling and public transport planning methods. Besides theoretical papers, the journal also publishes case studies and applications. Public Transport addresses transport operators, consulting firms and academic institutions involved in development, utilization or research of computer-aided planning and scheduling in public transport.Officially cited as: Public Transp