{"title":"机器学习海洋学数据,预测海洋资源潜力","authors":"Denny Arbahri, O. Nurhayati, Imam Mudita","doi":"10.3844/jcssp.2024.129.139","DOIUrl":null,"url":null,"abstract":": Marine data and information are very important for human survival, therefore this data and information is attractive to investors because of the potential economic value. This data and information has been difficult to obtain, the solution to overcome this is by analyzing oceanographic data for 2009-2019 collected from the marine database belonging to the Agency for the Study and Application of Technology (BPPT). The data is the result of a collaborative marine survey between Indonesian and foreign researchers from various countries who sailed in various Indonesian waters. Raw oceanographic data is converted and classified into Conductivity, Temperature, and Depth (CTD) data as oceanographic data parameters identified as predictor variables (X) that are correlated with each other. CTD data is processed into numeric data attributes that have been labeled for input and training. The data was modeled using the Machine Learning (ML) type Supervised Learning (SL) method with the Decision Tree (DT), Linear Regression (LR) and Random Forest (RF) algorithms which were interpreted according to the characteristics of the CTD data. ML will learn data models to understand and store. Next, the model is evaluated using accuracy metrics by measuring the difference between the predicted value and the actual value to obtain a good prediction model. The prediction results show a salinity level of 34.0 parts per thousand (ppt), meaning that in this area of marine waters salinity will affect the solubility of Oxygen (O 2 ) and play a major role in the sustainability and growth of the fertility level of biological resources which is supported by sea surface temperature conditions 29.2°C. So the salinity values obtained using ML techniques and marine resource potential can be assumed to have a strong correlation. The research results show that the RF model has the lowest level of prediction error based on the values: Mean Square Error (MSE) = 0.007; Root Mean Squared Error (RMSE) = 0.082; Mean Absolute Error (MAE) = 0.007 compared to DT model: MSE = 0.008; RMSE = 0.088; MAE = 0.012 and LR model: MSE = 1.008; RMSE = 1.004; MAE = 0.281. The equivalent RF and DT models have a Determination Coefficient (R 2 ) = 0.999, meaning that a model is created that is good at predicting, compared to the LR model with a value of R 2 = 0.914. The correlation between variables shows that the LR model is very linear with a Correlation Coefficient (r) = 1.000 compared to the DT model (r) = 0.621 and the RF model (r) = 0.379. Therefore the algorithm that has a value of (r) +1 has the best level of accuracy. The use of ML to predict marine resource potential is a relatively new research field, so this research has the potential to contribute data and information as a reference for innovative studies and investment decision material for investors.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Oceanographic Data for Prediction of the Potential of Marine Resources\",\"authors\":\"Denny Arbahri, O. Nurhayati, Imam Mudita\",\"doi\":\"10.3844/jcssp.2024.129.139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Marine data and information are very important for human survival, therefore this data and information is attractive to investors because of the potential economic value. This data and information has been difficult to obtain, the solution to overcome this is by analyzing oceanographic data for 2009-2019 collected from the marine database belonging to the Agency for the Study and Application of Technology (BPPT). The data is the result of a collaborative marine survey between Indonesian and foreign researchers from various countries who sailed in various Indonesian waters. Raw oceanographic data is converted and classified into Conductivity, Temperature, and Depth (CTD) data as oceanographic data parameters identified as predictor variables (X) that are correlated with each other. CTD data is processed into numeric data attributes that have been labeled for input and training. The data was modeled using the Machine Learning (ML) type Supervised Learning (SL) method with the Decision Tree (DT), Linear Regression (LR) and Random Forest (RF) algorithms which were interpreted according to the characteristics of the CTD data. ML will learn data models to understand and store. Next, the model is evaluated using accuracy metrics by measuring the difference between the predicted value and the actual value to obtain a good prediction model. The prediction results show a salinity level of 34.0 parts per thousand (ppt), meaning that in this area of marine waters salinity will affect the solubility of Oxygen (O 2 ) and play a major role in the sustainability and growth of the fertility level of biological resources which is supported by sea surface temperature conditions 29.2°C. So the salinity values obtained using ML techniques and marine resource potential can be assumed to have a strong correlation. The research results show that the RF model has the lowest level of prediction error based on the values: Mean Square Error (MSE) = 0.007; Root Mean Squared Error (RMSE) = 0.082; Mean Absolute Error (MAE) = 0.007 compared to DT model: MSE = 0.008; RMSE = 0.088; MAE = 0.012 and LR model: MSE = 1.008; RMSE = 1.004; MAE = 0.281. The equivalent RF and DT models have a Determination Coefficient (R 2 ) = 0.999, meaning that a model is created that is good at predicting, compared to the LR model with a value of R 2 = 0.914. The correlation between variables shows that the LR model is very linear with a Correlation Coefficient (r) = 1.000 compared to the DT model (r) = 0.621 and the RF model (r) = 0.379. Therefore the algorithm that has a value of (r) +1 has the best level of accuracy. The use of ML to predict marine resource potential is a relatively new research field, so this research has the potential to contribute data and information as a reference for innovative studies and investment decision material for investors.\",\"PeriodicalId\":40005,\"journal\":{\"name\":\"Journal of Computer Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3844/jcssp.2024.129.139\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3844/jcssp.2024.129.139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
:海洋数据和信息对人类生存非常重要,因此这些数据和信息因其潜在的经济价值而对投资者具有吸引力。这些数据和信息一直难以获得,解决这一问题的办法是从技术研究和应用局(BPPT)海洋数据库中收集的 2009-2019 年海洋学数据进行分析。这些数据是印尼和来自不同国家的外国研究人员在印尼不同水域合作开展海洋调查的结果。原始海洋学数据被转换并分类为电导率、温度和深度(CTD)数据,这些海洋学数据参数被确定为相互关联的预测变量(X)。CTD 数据被处理成数字数据属性,这些属性已被标记,用于输入和训练。数据建模采用机器学习(ML)类型的监督学习(SL)方法,包括决策树(DT)、线性回归(LR)和随机森林(RF)算法,这些算法根据 CTD 数据的特征进行解释。ML 将学习数据模型,以便理解和存储。接下来,通过测量预测值与实际值之间的差异,使用准确度指标对模型进行评估,以获得良好的预测模型。预测结果显示,盐度水平为千分之 34.0(ppt),这意味着在这一区域的海水中,盐度将影响氧气(O 2 )的溶解度,并对生物资源肥力水平的可持续性和增长起到重要作用,而生物资源的肥力水平是由 29.2°C 的海面温度条件支持的。因此,可以认为利用 ML 技术获得的盐度值与海洋资源潜力具有很强的相关性。研究结果表明,RF 模型的预测误差值最小:平均平方误差(MSE)= 0.007;均方根误差(RMSE)= 0.082;平均绝对误差(MAE)= 0.007:MSE = 0.008;RMSE = 0.088;MAE = 0.012 和 LR 模型:MSE = 1.008; RMSE = 1.004; MAE = 0.281。等效的 RF 和 DT 模型的判定系数(R 2 )= 0.999,这意味着所创建的模型具有良好的预测能力,而 LR 模型的 R 2 = 0.914。变量之间的相关性显示,LR 模型的相关系数(r)= 1.000,而 DT 模型的相关系数(r)= 0.621,RF 模型的相关系数(r)= 0.379。因此,(r)+1 值的算法准确度最高。使用 ML 预测海洋资源潜力是一个相对较新的研究领域,因此本研究有可能提供数据和信息,作为创新研究和投资者投资决策材料的参考。
Machine Learning Oceanographic Data for Prediction of the Potential of Marine Resources
: Marine data and information are very important for human survival, therefore this data and information is attractive to investors because of the potential economic value. This data and information has been difficult to obtain, the solution to overcome this is by analyzing oceanographic data for 2009-2019 collected from the marine database belonging to the Agency for the Study and Application of Technology (BPPT). The data is the result of a collaborative marine survey between Indonesian and foreign researchers from various countries who sailed in various Indonesian waters. Raw oceanographic data is converted and classified into Conductivity, Temperature, and Depth (CTD) data as oceanographic data parameters identified as predictor variables (X) that are correlated with each other. CTD data is processed into numeric data attributes that have been labeled for input and training. The data was modeled using the Machine Learning (ML) type Supervised Learning (SL) method with the Decision Tree (DT), Linear Regression (LR) and Random Forest (RF) algorithms which were interpreted according to the characteristics of the CTD data. ML will learn data models to understand and store. Next, the model is evaluated using accuracy metrics by measuring the difference between the predicted value and the actual value to obtain a good prediction model. The prediction results show a salinity level of 34.0 parts per thousand (ppt), meaning that in this area of marine waters salinity will affect the solubility of Oxygen (O 2 ) and play a major role in the sustainability and growth of the fertility level of biological resources which is supported by sea surface temperature conditions 29.2°C. So the salinity values obtained using ML techniques and marine resource potential can be assumed to have a strong correlation. The research results show that the RF model has the lowest level of prediction error based on the values: Mean Square Error (MSE) = 0.007; Root Mean Squared Error (RMSE) = 0.082; Mean Absolute Error (MAE) = 0.007 compared to DT model: MSE = 0.008; RMSE = 0.088; MAE = 0.012 and LR model: MSE = 1.008; RMSE = 1.004; MAE = 0.281. The equivalent RF and DT models have a Determination Coefficient (R 2 ) = 0.999, meaning that a model is created that is good at predicting, compared to the LR model with a value of R 2 = 0.914. The correlation between variables shows that the LR model is very linear with a Correlation Coefficient (r) = 1.000 compared to the DT model (r) = 0.621 and the RF model (r) = 0.379. Therefore the algorithm that has a value of (r) +1 has the best level of accuracy. The use of ML to predict marine resource potential is a relatively new research field, so this research has the potential to contribute data and information as a reference for innovative studies and investment decision material for investors.
期刊介绍:
Journal of Computer Science is aimed to publish research articles on theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems. JCS updated twelve times a year and is a peer reviewed journal covers the latest and most compelling research of the time.