用于知识发现的可解释混合数据表示和无损可视化工具包

2022 26th International Conference Information Visualisation (IV) Pub Date : 2022-06-13 DOI:10.1109/IV56949.2022.00060

B. Kovalerchuk, Elijah McCoy

{"title":"用于知识发现的可解释混合数据表示和无损可视化工具包","authors":"B. Kovalerchuk, Elijah McCoy","doi":"10.1109/IV56949.2022.00060","DOIUrl":null,"url":null,"abstract":"Developing Machine Learning (ML) algorithms for heterogeneous/mixed data is a longstanding problem. Many ML algorithms are not applicable to mixed data, which include numeric and non-numeric data, text, graphs and so on to generate interpretable models. Another longstanding problem is developing algorithms for lossless visualization of multidimensional mixed data. The further progress in ML heavily depends on success interpretable ML algorithms for mixed data and lossless interpretable visualization of multidimensional data. The later allows developing interpretable ML models using visual knowledge discovery by end-users, who can bring valuable domain knowledge which is absent in the training data. The challenges for mixed data include: (1) generating numeric coding schemes for non-numeric attributes for numeric ML algorithms to provide accurate and interpretable ML models, (2) generating methods for lossless visualization of n-D non-numeric data and visual rule discovery in these visualizations. This paper presents a classification of mixed data types, analyzes their importance for ML and present the developed experimental toolkit to deal with mixed data. It combines the Data Types Editor, VisCanvas data visualization and rule discovery system which is available on GitHub.","PeriodicalId":153161,"journal":{"name":"2022 26th International Conference Information Visualisation (IV)","volume":"369 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Explainable Mixed Data Representation and Lossless Visualization Toolkit for Knowledge Discovery\",\"authors\":\"B. Kovalerchuk, Elijah McCoy\",\"doi\":\"10.1109/IV56949.2022.00060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developing Machine Learning (ML) algorithms for heterogeneous/mixed data is a longstanding problem. Many ML algorithms are not applicable to mixed data, which include numeric and non-numeric data, text, graphs and so on to generate interpretable models. Another longstanding problem is developing algorithms for lossless visualization of multidimensional mixed data. The further progress in ML heavily depends on success interpretable ML algorithms for mixed data and lossless interpretable visualization of multidimensional data. The later allows developing interpretable ML models using visual knowledge discovery by end-users, who can bring valuable domain knowledge which is absent in the training data. The challenges for mixed data include: (1) generating numeric coding schemes for non-numeric attributes for numeric ML algorithms to provide accurate and interpretable ML models, (2) generating methods for lossless visualization of n-D non-numeric data and visual rule discovery in these visualizations. This paper presents a classification of mixed data types, analyzes their importance for ML and present the developed experimental toolkit to deal with mixed data. It combines the Data Types Editor, VisCanvas data visualization and rule discovery system which is available on GitHub.\",\"PeriodicalId\":153161,\"journal\":{\"name\":\"2022 26th International Conference Information Visualisation (IV)\",\"volume\":\"369 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 26th International Conference Information Visualisation (IV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IV56949.2022.00060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 26th International Conference Information Visualisation (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IV56949.2022.00060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

为异构/混合数据开发机器学习(ML)算法是一个长期存在的问题。许多ML算法不适用于混合数据，包括数字和非数字数据、文本、图形等生成可解释的模型。另一个长期存在的问题是开发用于多维混合数据无损可视化的算法。机器学习的进一步发展在很大程度上取决于混合数据的可解释机器学习算法的成功和多维数据的无损可解释可视化。后者允许最终用户使用视觉知识发现开发可解释的ML模型，最终用户可以带来训练数据中缺失的有价值的领域知识。混合数据的挑战包括:(1)为数字ML算法生成非数字属性的数字编码方案，以提供准确和可解释的ML模型;(2)生成n-D非数字数据的无损可视化方法，并在这些可视化中发现可视化规则。本文提出了混合数据类型的分类，分析了混合数据类型对机器学习的重要性，并给出了开发的处理混合数据的实验工具包。它结合了数据类型编辑器、VisCanvas数据可视化和规则发现系统，这些都可以在GitHub上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Explainable Mixed Data Representation and Lossless Visualization Toolkit for Knowledge Discovery

Developing Machine Learning (ML) algorithms for heterogeneous/mixed data is a longstanding problem. Many ML algorithms are not applicable to mixed data, which include numeric and non-numeric data, text, graphs and so on to generate interpretable models. Another longstanding problem is developing algorithms for lossless visualization of multidimensional mixed data. The further progress in ML heavily depends on success interpretable ML algorithms for mixed data and lossless interpretable visualization of multidimensional data. The later allows developing interpretable ML models using visual knowledge discovery by end-users, who can bring valuable domain knowledge which is absent in the training data. The challenges for mixed data include: (1) generating numeric coding schemes for non-numeric attributes for numeric ML algorithms to provide accurate and interpretable ML models, (2) generating methods for lossless visualization of n-D non-numeric data and visual rule discovery in these visualizations. This paper presents a classification of mixed data types, analyzes their importance for ML and present the developed experimental toolkit to deal with mixed data. It combines the Data Types Editor, VisCanvas data visualization and rule discovery system which is available on GitHub.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 26th International Conference Information Visualisation (IV)

自引率

0.00%

发文量