Metadata, or data about data, is essential in Knowledge Discovery from Data (KDD) and Artificial Intelligence (AI) processes, providing details about the meanings and technical aspects of dataset variables. Historically, research has focused on software to store and manage metadata, mainly describing data structure and formats for analyst understanding. However, little has been done to analyze the metadata required for automating advanced KDD processes. Traditionally, metadata creation has been a manual process, relying on analysts to gather information from stakeholders. This paper introduces the GeMeDaFi methodology, enabling stakeholders to automatically generate machine-readable metadata files, facilitating the automatic management of metadata. GeMeDaFi is a key component of the AM4IDA methodology, which guides intelligent data analysis using automatically generated metadata. This process spans from data preprocessing to result interpretation, including modeling. The metadata file, based on the MdM formal model, incorporates semantic information from stakeholders and the dataset’s structure, supporting automated intelligent preprocessing and analysis. The proposal also enhances the INSESS methodology for intelligent data analysis and has been applied in four real-world scenarios. The primary contributions are the significant reduction in time and errors in creating metadata files, accelerating the preprocessing phase, and enabling automation in the analytical step of KDD processes.
扫码关注我们
求助内容:
应助结果提醒方式:
