对机器学习在海上运输中的潜力进行公平评估

IF 1.1 4区数学 Q1 MATHEMATICS Electronic Research Archive Pub Date : 2023-01-01 DOI:10.3934/era.2023243

Xi Luo, Ran Yan, Shuaian Wang, Lu Zhen

{"title":"对机器学习在海上运输中的潜力进行公平评估","authors":"Xi Luo, Ran Yan, Shuaian Wang, Lu Zhen","doi":"10.3934/era.2023243","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) techniques are extensively applied to practical maritime transportation issues. Due to the difficulty and high cost of collecting large volumes of data in the maritime industry, in many maritime studies, ML models are trained with small training datasets. The relative predictive performances of these trained ML models are then compared with each other and with the conventional model using the same test set. The ML model that performs the best out of the ML models and better than the conventional model on the test set is regarded as the most effective in terms of this prediction task. However, in scenarios with small datasets, this common process may lead to an unfair comparison between the ML and the conventional model. Therefore, we propose a novel process to fairly compare multiple ML models and the conventional model. We first select the best ML model in terms of predictive performance for the validation set. Then, we combine the training and the validation sets to retrain the best ML model and compare it with the conventional model on the same test set. Based on historical port state control (PSC) inspection data, we examine both the common process and the novel process in terms of their ability to fairly compare ML models and the conventional model. The results show that the novel process is more effective at fairly comparing the ML models with the conventional model on different test sets. Therefore, the novel process enables a fair assessment of ML models' ability to predict key performance indicators in the context of limited data availability in the maritime industry, such as predicting the ship fuel consumption and port traffic volume, thereby enhancing their reliability for real-world applications.","PeriodicalId":48554,"journal":{"name":"Electronic Research Archive","volume":"1 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A fair evaluation of the potential of machine learning in maritime transportation\",\"authors\":\"Xi Luo, Ran Yan, Shuaian Wang, Lu Zhen\",\"doi\":\"10.3934/era.2023243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning (ML) techniques are extensively applied to practical maritime transportation issues. Due to the difficulty and high cost of collecting large volumes of data in the maritime industry, in many maritime studies, ML models are trained with small training datasets. The relative predictive performances of these trained ML models are then compared with each other and with the conventional model using the same test set. The ML model that performs the best out of the ML models and better than the conventional model on the test set is regarded as the most effective in terms of this prediction task. However, in scenarios with small datasets, this common process may lead to an unfair comparison between the ML and the conventional model. Therefore, we propose a novel process to fairly compare multiple ML models and the conventional model. We first select the best ML model in terms of predictive performance for the validation set. Then, we combine the training and the validation sets to retrain the best ML model and compare it with the conventional model on the same test set. Based on historical port state control (PSC) inspection data, we examine both the common process and the novel process in terms of their ability to fairly compare ML models and the conventional model. The results show that the novel process is more effective at fairly comparing the ML models with the conventional model on different test sets. Therefore, the novel process enables a fair assessment of ML models' ability to predict key performance indicators in the context of limited data availability in the maritime industry, such as predicting the ship fuel consumption and port traffic volume, thereby enhancing their reliability for real-world applications.\",\"PeriodicalId\":48554,\"journal\":{\"name\":\"Electronic Research Archive\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronic Research Archive\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.3934/era.2023243\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Research Archive","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.3934/era.2023243","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

摘要

机器学习(ML)技术广泛应用于实际的海上运输问题。由于在航运业中收集大量数据的难度和高成本，在许多海事研究中，ML模型使用小型训练数据集进行训练。然后使用相同的测试集相互比较这些训练好的ML模型的相对预测性能，并与传统模型进行比较。在ML模型中表现最好且在测试集上优于传统模型的ML模型被认为是在该预测任务中最有效的。然而，在小数据集的场景中，这种常见的过程可能会导致机器学习和传统模型之间的不公平比较。因此，我们提出了一种新的过程来公平地比较多个ML模型和传统模型。我们首先根据验证集的预测性能选择最佳ML模型。然后，我们结合训练集和验证集对最佳ML模型进行重新训练，并在同一测试集上将其与常规模型进行比较。基于历史港口国控制(PSC)检查数据，我们从公平比较ML模型和传统模型的能力方面检查了通用过程和新过程。结果表明，在不同的测试集上，该方法可以更有效地比较机器学习模型和传统模型。因此，新过程可以公平评估ML模型在航运业有限数据可用性背景下预测关键性能指标的能力，例如预测船舶燃料消耗和港口交通量，从而提高其在实际应用中的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A fair evaluation of the potential of machine learning in maritime transportation

Machine learning (ML) techniques are extensively applied to practical maritime transportation issues. Due to the difficulty and high cost of collecting large volumes of data in the maritime industry, in many maritime studies, ML models are trained with small training datasets. The relative predictive performances of these trained ML models are then compared with each other and with the conventional model using the same test set. The ML model that performs the best out of the ML models and better than the conventional model on the test set is regarded as the most effective in terms of this prediction task. However, in scenarios with small datasets, this common process may lead to an unfair comparison between the ML and the conventional model. Therefore, we propose a novel process to fairly compare multiple ML models and the conventional model. We first select the best ML model in terms of predictive performance for the validation set. Then, we combine the training and the validation sets to retrain the best ML model and compare it with the conventional model on the same test set. Based on historical port state control (PSC) inspection data, we examine both the common process and the novel process in terms of their ability to fairly compare ML models and the conventional model. The results show that the novel process is more effective at fairly comparing the ML models with the conventional model on different test sets. Therefore, the novel process enables a fair assessment of ML models' ability to predict key performance indicators in the context of limited data availability in the maritime industry, such as predicting the ship fuel consumption and port traffic volume, thereby enhancing their reliability for real-world applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Electronic Research Archive MATHEMATICS-

CiteScore

1.30

自引率

12.50%

发文量

170