Maximilien Meyrieux, Samer Hmoud, Pim van Geffen, David Kaeter
{"title":"CLUSTERDC: A New Density-Based Clustering Algorithm and its Application in a Geological Material Characterization Workflow","authors":"Maximilien Meyrieux, Samer Hmoud, Pim van Geffen, David Kaeter","doi":"10.1007/s11053-024-10379-5","DOIUrl":null,"url":null,"abstract":"<p>The ore and waste materials extracted from a mineral deposit during the mining process can have significant variations in their physical and chemical characteristics. The current approaches to geological material characterization are often subjective and usually involve a significant human workload, as there is no optimized, well-defined, and robust methodology to perform this task. This paper proposes a robust, data-driven workflow for geological material characterization. The methodology involves selecting relevant features as a starting point to discriminate between material types. The workflow then employs a robust, state-of-the-art nonlinear dimension reduction (DR) algorithm when the dataset is multidimensional to obtain a two-dimensional embedding. From this two-dimensional embedding, a kernel density estimation (KDE) function is derived. Subsequently, a new clustering algorithm, named ClusterDC, is employed to generate clusters from the KDE function, accurately reflecting geological material types while achieving scalable clustering performance on large drillhole datasets. ClusterDC is a density-based clustering algorithm capable of delineating and ranking high-density zones corresponding to clusters of data samples from a two-dimensional KDE function. The algorithm reduces subjectivity by automatically determining optimal cluster numbers and minimizing reliance on hyperparameters. It also offers hierarchical and flexible clustering, allowing users to group or split clusters, optimally reassign data samples, and identify cluster core points as well as potential outliers. Two case studies were carried out to test the algorithm and demonstrate its application to geochemical drill-core assay data. The results of these case studies demonstrate that the application of ClusterDC in the presented workflow supports the characterization of geological material types based on multi-element geochemistry and thus has the potential to help mining companies optimize downstream processes and mitigate technical risks by improving their understanding of their orebodies.</p>","PeriodicalId":54284,"journal":{"name":"Natural Resources Research","volume":"27 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Resources Research","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s11053-024-10379-5","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The ore and waste materials extracted from a mineral deposit during the mining process can have significant variations in their physical and chemical characteristics. The current approaches to geological material characterization are often subjective and usually involve a significant human workload, as there is no optimized, well-defined, and robust methodology to perform this task. This paper proposes a robust, data-driven workflow for geological material characterization. The methodology involves selecting relevant features as a starting point to discriminate between material types. The workflow then employs a robust, state-of-the-art nonlinear dimension reduction (DR) algorithm when the dataset is multidimensional to obtain a two-dimensional embedding. From this two-dimensional embedding, a kernel density estimation (KDE) function is derived. Subsequently, a new clustering algorithm, named ClusterDC, is employed to generate clusters from the KDE function, accurately reflecting geological material types while achieving scalable clustering performance on large drillhole datasets. ClusterDC is a density-based clustering algorithm capable of delineating and ranking high-density zones corresponding to clusters of data samples from a two-dimensional KDE function. The algorithm reduces subjectivity by automatically determining optimal cluster numbers and minimizing reliance on hyperparameters. It also offers hierarchical and flexible clustering, allowing users to group or split clusters, optimally reassign data samples, and identify cluster core points as well as potential outliers. Two case studies were carried out to test the algorithm and demonstrate its application to geochemical drill-core assay data. The results of these case studies demonstrate that the application of ClusterDC in the presented workflow supports the characterization of geological material types based on multi-element geochemistry and thus has the potential to help mining companies optimize downstream processes and mitigate technical risks by improving their understanding of their orebodies.
在采矿过程中,从矿床中提取的矿石和废料在物理和化学特性上会有很大差异。目前的地质材料表征方法往往是主观的,而且通常涉及大量的人力工作,因为没有优化、明确和稳健的方法来执行这项任务。本文提出了一种稳健的、数据驱动的地质材料特征描述工作流程。该方法涉及选择相关特征作为区分材料类型的起点。然后,当数据集是多维的时候,该工作流程会采用最先进的稳健非线性降维(DR)算法,以获得二维嵌入。根据这个二维嵌入,可以得出核密度估计(KDE)函数。随后,一种名为 ClusterDC 的新聚类算法被用来从 KDE 函数中生成聚类,准确反映地质材料类型,同时在大型钻孔数据集上实现可扩展的聚类性能。ClusterDC 是一种基于密度的聚类算法,能够从二维 KDE 函数中划分出与数据样本聚类相对应的高密度区,并对其进行排序。该算法可自动确定最佳聚类数,最大程度地减少对超参数的依赖,从而降低主观性。该算法还提供分层和灵活的聚类,允许用户分组或拆分聚类,优化数据样本的重新分配,并识别聚类核心点和潜在的异常值。为测试该算法并展示其在地球化学钻芯化验数据中的应用,进行了两项案例研究。这些案例研究的结果表明,在所介绍的工作流程中应用 ClusterDC 可支持基于多元素地球化学的地质材料类型特征描述,从而有可能帮助采矿公司优化下游流程,并通过提高对矿体的认识来降低技术风险。
期刊介绍:
This journal publishes quantitative studies of natural (mainly but not limited to mineral) resources exploration, evaluation and exploitation, including environmental and risk-related aspects. Typical articles use geoscientific data or analyses to assess, test, or compare resource-related aspects. NRR covers a wide variety of resources including minerals, coal, hydrocarbon, geothermal, water, and vegetation. Case studies are welcome.