Sajjad Sowlati, Rahim Ali Abbaspour, Alireza Chehreghan
{"title":"评估特征在检测交通模式中的作用的方法","authors":"Sajjad Sowlati, Rahim Ali Abbaspour, Alireza Chehreghan","doi":"10.1007/s11116-024-10492-7","DOIUrl":null,"url":null,"abstract":"<p>One of the fundamental prerequisites for interpreting collected passive travel data to develop intelligent transportation systems is the detection of transportation modes. The literature has divided transportation mode detection into two parts: feature extraction and implementation of classification models. Selecting and employing influential features will help maximize the power of the classification model. Meanwhile, the interpretation and identification of influential features, which will be the focus of this study, have received less attention. Importantly, the influence of features varies depending on the nature of the input data and the choice of classification models. In many cases, the extracted features show interdependence, where their combined correlation significantly impacts specific outcomes. Consequently, evaluating the effectiveness of individual features in isolation may not produce accurate results, requiring the exploration of alternative methodologies. This study seeks to bridge these gaps through a comprehensive investigation. Three open-source datasets, Geolife, MTL Trajet 2017, and MTL Trajet 2016, were utilized to enhance reliability, validate the approach, and investigate the variability of influential features under various data collection conditions. Originally, various features were extracted and grouped for this purpose based on their kinematic, spatial, and contextual features. Then, three powerful classification models (Random Forest, LightGBM, and XGBoost) were utilized. A hybrid feature selection algorithm was employed to select a subset of features to analyze the variability of influential features across different classification models. The algorithm removed over half of the features with minimal or negative impact, thereby simplifying the process of classification identification. Since the features when combined in the form of a subset, would result in powerful identification, the influence of the features was analyzed within a set of features instead of analyzing each feature individually. Two approaches, “number of feature repetitions” and “Shapley Additive Explanations (SHAP) value,” were adopted to interpret the computation. After implementation, the “average velocity” with repetition in all datasets and classification models (nine repetitions) had the highest SHAP value, making it the most influential feature across all datasets and classification models. The “public stations indicator” was the most influential spatial feature with the highest SHAP value, appearing nine times, while “holiday” had the most repetitions among the contextual features.</p>","PeriodicalId":49419,"journal":{"name":"Transportation","volume":"28 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An approach to assess the role of features in detection of transportation modes\",\"authors\":\"Sajjad Sowlati, Rahim Ali Abbaspour, Alireza Chehreghan\",\"doi\":\"10.1007/s11116-024-10492-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>One of the fundamental prerequisites for interpreting collected passive travel data to develop intelligent transportation systems is the detection of transportation modes. The literature has divided transportation mode detection into two parts: feature extraction and implementation of classification models. Selecting and employing influential features will help maximize the power of the classification model. Meanwhile, the interpretation and identification of influential features, which will be the focus of this study, have received less attention. Importantly, the influence of features varies depending on the nature of the input data and the choice of classification models. In many cases, the extracted features show interdependence, where their combined correlation significantly impacts specific outcomes. Consequently, evaluating the effectiveness of individual features in isolation may not produce accurate results, requiring the exploration of alternative methodologies. This study seeks to bridge these gaps through a comprehensive investigation. Three open-source datasets, Geolife, MTL Trajet 2017, and MTL Trajet 2016, were utilized to enhance reliability, validate the approach, and investigate the variability of influential features under various data collection conditions. Originally, various features were extracted and grouped for this purpose based on their kinematic, spatial, and contextual features. Then, three powerful classification models (Random Forest, LightGBM, and XGBoost) were utilized. A hybrid feature selection algorithm was employed to select a subset of features to analyze the variability of influential features across different classification models. The algorithm removed over half of the features with minimal or negative impact, thereby simplifying the process of classification identification. Since the features when combined in the form of a subset, would result in powerful identification, the influence of the features was analyzed within a set of features instead of analyzing each feature individually. Two approaches, “number of feature repetitions” and “Shapley Additive Explanations (SHAP) value,” were adopted to interpret the computation. After implementation, the “average velocity” with repetition in all datasets and classification models (nine repetitions) had the highest SHAP value, making it the most influential feature across all datasets and classification models. The “public stations indicator” was the most influential spatial feature with the highest SHAP value, appearing nine times, while “holiday” had the most repetitions among the contextual features.</p>\",\"PeriodicalId\":49419,\"journal\":{\"name\":\"Transportation\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s11116-024-10492-7\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11116-024-10492-7","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
An approach to assess the role of features in detection of transportation modes
One of the fundamental prerequisites for interpreting collected passive travel data to develop intelligent transportation systems is the detection of transportation modes. The literature has divided transportation mode detection into two parts: feature extraction and implementation of classification models. Selecting and employing influential features will help maximize the power of the classification model. Meanwhile, the interpretation and identification of influential features, which will be the focus of this study, have received less attention. Importantly, the influence of features varies depending on the nature of the input data and the choice of classification models. In many cases, the extracted features show interdependence, where their combined correlation significantly impacts specific outcomes. Consequently, evaluating the effectiveness of individual features in isolation may not produce accurate results, requiring the exploration of alternative methodologies. This study seeks to bridge these gaps through a comprehensive investigation. Three open-source datasets, Geolife, MTL Trajet 2017, and MTL Trajet 2016, were utilized to enhance reliability, validate the approach, and investigate the variability of influential features under various data collection conditions. Originally, various features were extracted and grouped for this purpose based on their kinematic, spatial, and contextual features. Then, three powerful classification models (Random Forest, LightGBM, and XGBoost) were utilized. A hybrid feature selection algorithm was employed to select a subset of features to analyze the variability of influential features across different classification models. The algorithm removed over half of the features with minimal or negative impact, thereby simplifying the process of classification identification. Since the features when combined in the form of a subset, would result in powerful identification, the influence of the features was analyzed within a set of features instead of analyzing each feature individually. Two approaches, “number of feature repetitions” and “Shapley Additive Explanations (SHAP) value,” were adopted to interpret the computation. After implementation, the “average velocity” with repetition in all datasets and classification models (nine repetitions) had the highest SHAP value, making it the most influential feature across all datasets and classification models. The “public stations indicator” was the most influential spatial feature with the highest SHAP value, appearing nine times, while “holiday” had the most repetitions among the contextual features.
期刊介绍:
In our first issue, published in 1972, we explained that this Journal is intended to promote the free and vigorous exchange of ideas and experience among the worldwide community actively concerned with transportation policy, planning and practice. That continues to be our mission, with a clear focus on topics concerned with research and practice in transportation policy and planning, around the world.
These four words, policy and planning, research and practice are our key words. While we have a particular focus on transportation policy analysis and travel behaviour in the context of ground transportation, we willingly consider all good quality papers that are highly relevant to transportation policy, planning and practice with a clear focus on innovation, on extending the international pool of knowledge and understanding. Our interest is not only with transportation policies - and systems and services – but also with their social, economic and environmental impacts, However, papers about the application of established procedures to, or the development of plans or policies for, specific locations are unlikely to prove acceptable unless they report experience which will be of real benefit those working elsewhere. Papers concerned with the engineering, safety and operational management of transportation systems are outside our scope.