Md Gezani Bin Md Ghazi, Loong Chuen Lee, A S Samsudin, H Sino
{"title":"基于GC-MS数据的决策树与Naïve贝叶斯算法在汽油痕量残留检测中的比较","authors":"Md Gezani Bin Md Ghazi, Loong Chuen Lee, A S Samsudin, H Sino","doi":"10.1093/fsr/owad031","DOIUrl":null,"url":null,"abstract":"Abstract Fire debris analysis aims to detect and identify any ignitable liquid residues in burnt residues collected at a fire scene. Typically, the burnt residues are analysed using gas chromatography–mass spectrometry (GC–MS) and are manually interpreted. The interpretation process can be laborious due to the complexity and high dimensionality of the GC–MS data. Therefore, this study aims to compare the potential of classification and regression tree (CART) and naïve Bayes (NB) algorithms in analysing the pixel-level GC–MS data of fire debris. The data comprise 14 positive (i.e. fire debris with traces of gasoline) and 24 negative (i.e. fire debris without traces of gasoline) samples. The differences between the positive and negative samples were first inspected based on the mean chromatograms and scores plots of the principal component analysis technique. Then, CART and NB algorithms were independently applied to the GC–MS data. Stratified random resampling was applied to prepare three sets of 200 pairs of training and testing samples (i.e. split ratio of 7:3, 8:2, and 9:1) for estimating the prediction accuracies. Although both the positive and negative samples were hardly differentiated based on the mean chromatograms and scores plots of principal component analysis, the respective NB and CART predictive models produced satisfactory performances with the normalized GC–MS data, i.e. majority achieved prediction accuracy >70%. NB consistently outperformed CART based on the prediction accuracies of testing samples and the corresponding risk of overfitting except when evaluated using only 10% of samples. The accuracy of CART was found to be inversely proportional to the number of testing samples; meanwhile, NB demonstrated rather consistent performances across the three split ratios. In conclusion, NB seems to be much better than CART based on the robustness against the number of testing samples and the consistent lower risk of overfitting.","PeriodicalId":45852,"journal":{"name":"Forensic Sciences Research","volume":"31 1","pages":"0"},"PeriodicalIF":1.4000,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of decision tree and Naïve Bayes algorithms in detecting trace residue of gasoline based on GC–MS data\",\"authors\":\"Md Gezani Bin Md Ghazi, Loong Chuen Lee, A S Samsudin, H Sino\",\"doi\":\"10.1093/fsr/owad031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Fire debris analysis aims to detect and identify any ignitable liquid residues in burnt residues collected at a fire scene. Typically, the burnt residues are analysed using gas chromatography–mass spectrometry (GC–MS) and are manually interpreted. The interpretation process can be laborious due to the complexity and high dimensionality of the GC–MS data. Therefore, this study aims to compare the potential of classification and regression tree (CART) and naïve Bayes (NB) algorithms in analysing the pixel-level GC–MS data of fire debris. The data comprise 14 positive (i.e. fire debris with traces of gasoline) and 24 negative (i.e. fire debris without traces of gasoline) samples. The differences between the positive and negative samples were first inspected based on the mean chromatograms and scores plots of the principal component analysis technique. Then, CART and NB algorithms were independently applied to the GC–MS data. Stratified random resampling was applied to prepare three sets of 200 pairs of training and testing samples (i.e. split ratio of 7:3, 8:2, and 9:1) for estimating the prediction accuracies. Although both the positive and negative samples were hardly differentiated based on the mean chromatograms and scores plots of principal component analysis, the respective NB and CART predictive models produced satisfactory performances with the normalized GC–MS data, i.e. majority achieved prediction accuracy >70%. NB consistently outperformed CART based on the prediction accuracies of testing samples and the corresponding risk of overfitting except when evaluated using only 10% of samples. The accuracy of CART was found to be inversely proportional to the number of testing samples; meanwhile, NB demonstrated rather consistent performances across the three split ratios. In conclusion, NB seems to be much better than CART based on the robustness against the number of testing samples and the consistent lower risk of overfitting.\",\"PeriodicalId\":45852,\"journal\":{\"name\":\"Forensic Sciences Research\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2023-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Forensic Sciences Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/fsr/owad031\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MEDICINE, LEGAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Sciences Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/fsr/owad031","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
Comparison of decision tree and Naïve Bayes algorithms in detecting trace residue of gasoline based on GC–MS data
Abstract Fire debris analysis aims to detect and identify any ignitable liquid residues in burnt residues collected at a fire scene. Typically, the burnt residues are analysed using gas chromatography–mass spectrometry (GC–MS) and are manually interpreted. The interpretation process can be laborious due to the complexity and high dimensionality of the GC–MS data. Therefore, this study aims to compare the potential of classification and regression tree (CART) and naïve Bayes (NB) algorithms in analysing the pixel-level GC–MS data of fire debris. The data comprise 14 positive (i.e. fire debris with traces of gasoline) and 24 negative (i.e. fire debris without traces of gasoline) samples. The differences between the positive and negative samples were first inspected based on the mean chromatograms and scores plots of the principal component analysis technique. Then, CART and NB algorithms were independently applied to the GC–MS data. Stratified random resampling was applied to prepare three sets of 200 pairs of training and testing samples (i.e. split ratio of 7:3, 8:2, and 9:1) for estimating the prediction accuracies. Although both the positive and negative samples were hardly differentiated based on the mean chromatograms and scores plots of principal component analysis, the respective NB and CART predictive models produced satisfactory performances with the normalized GC–MS data, i.e. majority achieved prediction accuracy >70%. NB consistently outperformed CART based on the prediction accuracies of testing samples and the corresponding risk of overfitting except when evaluated using only 10% of samples. The accuracy of CART was found to be inversely proportional to the number of testing samples; meanwhile, NB demonstrated rather consistent performances across the three split ratios. In conclusion, NB seems to be much better than CART based on the robustness against the number of testing samples and the consistent lower risk of overfitting.