{"title":"Anomaly Detection-Oriented Positive-Unlabeled Metric Learning for Extracting High-Dimensional Geochemical Anomalies Linked to Mineralization","authors":"Zhaorui Yang, Yongliang Chen","doi":"10.1007/s11053-025-10464-3","DOIUrl":null,"url":null,"abstract":"<p>In geochemical exploration, a small number of positive samples and a large number of unlabeled samples can be defined according to the geochemical exploration data and the mineral deposits (occurrences) found in the exploration area. The positive samples usually comprise multiple types of mineral deposits (occurrences) while the unlabeled samples usually comprise a large number of background samples and some unknown positive samples. Accurate recognition of unknown positive samples among a large number of unlabeled samples is a challenge in the field of exploration geochemistry. To address this challenge, the positive-unlabeled (PU) metric learning for anomaly detection (PUMAD) is developed to model positive-unlabeled geochemical exploration data to detect mineralization-related anomalies. The PUMAD is a novel PU learning algorithm that incorporates artificial neural networks with distance hashing-based filtering (DHF) and deep metric learning (DML) to establish an anomaly detection model for dataset with positive and unlabeled samples. To test the effectiveness and robustness of the PUMAD in mineralization-related geochemical anomaly identification, the Baishan area of Jilin Province (China) was chosen as the case research area, and a dataset with positive and unlabeled samples was constructed according to the stream sediment geochemical survey data from four 1:200,000 scale geological maps and spatial locations of more than 30 discovered polymetallic deposits. The PUMAD model, PU learning model and DML model were established on the constructed dataset and were used to identify the geochemical anomalies linked to known polymetallic mineralization. A comparative analysis of the three models showed that the PUMAD model performed much better than the other two models in identifying mineralization-related geochemical anomalies. The receiver operating characteristic (ROC) curve of the PUMAD model was closer to the upper left corner of the ROC space compared to those of the PU learning model and DML model. The calculated area under the ROC curve (AUC) of the PUMAD model was 0.9626, which substantially exceeded those of the PU learning model (0.8493) and the DML model (0.7542). The geochemical anomalies linked to polymetallic mineralization recognized by the PUMAD model comprised 10.89% of the Baishan exploration area and encompass all the discovered polymetallic deposits within the area, while those recognized by the PU learning model and DML model comprised 16.87% and 25.29%, respectively, of the study area and encompassed 90% and 87%, respectively, of the discovered polymetallic deposits. The recognized mineralization-related geochemical anomalies are spatially linked to regional geological factors that controlled polymetallic mineralization in the Baishan exploration area. Therefore, it can be concluded that PUMAD is an awesome technique for detecting mineralization-related anomalies within an exploration area. It is worthwhile to further test its validity for mapping mineralization-related geochemical anomalies in different exploration areas.</p>","PeriodicalId":54284,"journal":{"name":"Natural Resources Research","volume":"87 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Resources Research","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s11053-025-10464-3","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
In geochemical exploration, a small number of positive samples and a large number of unlabeled samples can be defined according to the geochemical exploration data and the mineral deposits (occurrences) found in the exploration area. The positive samples usually comprise multiple types of mineral deposits (occurrences) while the unlabeled samples usually comprise a large number of background samples and some unknown positive samples. Accurate recognition of unknown positive samples among a large number of unlabeled samples is a challenge in the field of exploration geochemistry. To address this challenge, the positive-unlabeled (PU) metric learning for anomaly detection (PUMAD) is developed to model positive-unlabeled geochemical exploration data to detect mineralization-related anomalies. The PUMAD is a novel PU learning algorithm that incorporates artificial neural networks with distance hashing-based filtering (DHF) and deep metric learning (DML) to establish an anomaly detection model for dataset with positive and unlabeled samples. To test the effectiveness and robustness of the PUMAD in mineralization-related geochemical anomaly identification, the Baishan area of Jilin Province (China) was chosen as the case research area, and a dataset with positive and unlabeled samples was constructed according to the stream sediment geochemical survey data from four 1:200,000 scale geological maps and spatial locations of more than 30 discovered polymetallic deposits. The PUMAD model, PU learning model and DML model were established on the constructed dataset and were used to identify the geochemical anomalies linked to known polymetallic mineralization. A comparative analysis of the three models showed that the PUMAD model performed much better than the other two models in identifying mineralization-related geochemical anomalies. The receiver operating characteristic (ROC) curve of the PUMAD model was closer to the upper left corner of the ROC space compared to those of the PU learning model and DML model. The calculated area under the ROC curve (AUC) of the PUMAD model was 0.9626, which substantially exceeded those of the PU learning model (0.8493) and the DML model (0.7542). The geochemical anomalies linked to polymetallic mineralization recognized by the PUMAD model comprised 10.89% of the Baishan exploration area and encompass all the discovered polymetallic deposits within the area, while those recognized by the PU learning model and DML model comprised 16.87% and 25.29%, respectively, of the study area and encompassed 90% and 87%, respectively, of the discovered polymetallic deposits. The recognized mineralization-related geochemical anomalies are spatially linked to regional geological factors that controlled polymetallic mineralization in the Baishan exploration area. Therefore, it can be concluded that PUMAD is an awesome technique for detecting mineralization-related anomalies within an exploration area. It is worthwhile to further test its validity for mapping mineralization-related geochemical anomalies in different exploration areas.
期刊介绍:
This journal publishes quantitative studies of natural (mainly but not limited to mineral) resources exploration, evaluation and exploitation, including environmental and risk-related aspects. Typical articles use geoscientific data or analyses to assess, test, or compare resource-related aspects. NRR covers a wide variety of resources including minerals, coal, hydrocarbon, geothermal, water, and vegetation. Case studies are welcome.