首页 > 最新文献

Data Mining and Knowledge Discovery最新文献

英文 中文
MSGNN: Multi-scale Spatio-temporal Graph Neural Network for epidemic forecasting MSGNN:用于流行病预测的多尺度时空图神经网络
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-21 DOI: 10.1007/s10618-024-01035-w
Mingjie Qiu, Zhiyi Tan, Bing-Kun Bao

Infectious disease forecasting has been a key focus and proved to be crucial in controlling epidemic. A recent trend is to develop forecasting models based on graph neural networks (GNNs). However, existing GNN-based methods suffer from two key limitations: (1) current models broaden receptive fields by scaling the depth of GNNs, which is insufficient to preserve the semantics of long-range connectivity between distant but epidemic related areas. (2) Previous approaches model epidemics within single spatial scale, while ignoring the multi-scale epidemic patterns derived from different scales. To address these deficiencies, we devise the Multi-scale Spatio-temporal Graph Neural Network (MSGNN) based on an innovative multi-scale view. To be specific, in the proposed MSGNN model, we first devise a novel graph learning module, which directly captures long-range connectivity from trans-regional epidemic signals and integrates them into a multi-scale graph. Based on the learned multi-scale graph, we utilize a newly designed graph convolution module to exploit multi-scale epidemic patterns. This module allows us to facilitate multi-scale epidemic modeling by mining both scale-shared and scale-specific patterns. Experimental results on forecasting new cases of COVID-19 in United State demonstrate the superiority of our method over state-of-arts. Further analyses and visualization also show that MSGNN offers not only accurate, but also robust and interpretable forecasting result. Code is available at https://github.com/JashinKorone/MSGNN.

传染病预测一直是一个重点,并被证明是控制流行病的关键。最近的一个趋势是开发基于图神经网络(GNN)的预测模型。然而,现有的基于图神经网络的方法存在两个主要局限:(1)目前的模型通过缩放图神经网络的深度来拓宽感受野,这不足以保留遥远但与流行病相关的区域之间的长程连接语义。(2)以往的方法只模拟单一空间尺度内的流行病,而忽略了不同尺度下的多尺度流行病模式。针对这些不足,我们基于创新的多尺度视角设计了多尺度时空图神经网络(MSGNN)。具体来说,在所提出的 MSGNN 模型中,我们首先设计了一个新颖的图学习模块,该模块可直接捕捉跨区域流行病信号中的长程连接性,并将其整合到多尺度图中。基于学习到的多尺度图,我们利用新设计的图卷积模块来利用多尺度流行病模式。通过该模块,我们可以挖掘尺度共享模式和特定尺度模式,从而促进多尺度流行病建模。预测美国 COVID-19 新病例的实验结果表明,我们的方法优于现有技术。进一步的分析和可视化也表明,MSGNN 不仅能提供准确的预测结果,还能提供稳健、可解释的预测结果。代码见 https://github.com/JashinKorone/MSGNN。
{"title":"MSGNN: Multi-scale Spatio-temporal Graph Neural Network for epidemic forecasting","authors":"Mingjie Qiu, Zhiyi Tan, Bing-Kun Bao","doi":"10.1007/s10618-024-01035-w","DOIUrl":"https://doi.org/10.1007/s10618-024-01035-w","url":null,"abstract":"<p>Infectious disease forecasting has been a key focus and proved to be crucial in controlling epidemic. A recent trend is to develop forecasting models based on graph neural networks (GNNs). However, existing GNN-based methods suffer from two key limitations: (1) current models broaden receptive fields by scaling the depth of GNNs, which is insufficient to preserve the semantics of long-range connectivity between distant but epidemic related areas. (2) Previous approaches model epidemics within single spatial scale, while ignoring the multi-scale epidemic patterns derived from different scales. To address these deficiencies, we devise the Multi-scale Spatio-temporal Graph Neural Network (MSGNN) based on an innovative multi-scale view. To be specific, in the proposed MSGNN model, we first devise a novel graph learning module, which directly captures long-range connectivity from trans-regional epidemic signals and integrates them into a multi-scale graph. Based on the learned multi-scale graph, we utilize a newly designed graph convolution module to exploit multi-scale epidemic patterns. This module allows us to facilitate multi-scale epidemic modeling by mining both scale-shared and scale-specific patterns. Experimental results on forecasting new cases of COVID-19 in United State demonstrate the superiority of our method over state-of-arts. Further analyses and visualization also show that MSGNN offers not only accurate, but also robust and interpretable forecasting result. Code is available at https://github.com/JashinKorone/MSGNN.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"22 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141150030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised feature based algorithms for time series extrinsic regression 基于无监督特征的时间序列外回归算法
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-19 DOI: 10.1007/s10618-024-01027-w
David Guijo-Rubio, Matthew Middlehurst, Guilherme Arcencio, Diego Furtado Silva, Anthony Bagnall

Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable that is not directly related to the regressor series. The TSER archive for comparing algorithms was released in 2022 with 19 problems. We increase the size of this archive to 63 problems and reproduce the previous comparison of baseline algorithms. We then extend the comparison to include a wider range of standard regressors and the latest versions of TSER models used in the previous study. We show that none of the previously evaluated regressors can outperform a regression adaptation of a standard classifier, rotation forest. We introduce two new TSER algorithms developed from related work in time series classification. FreshPRINCE is a pipeline estimator consisting of a transform into a wide range of summary features followed by a rotation forest regressor. DrCIF is a tree ensemble that creates features from summary statistics over random intervals. Our study demonstrates that both algorithms, along with InceptionTime, exhibit significantly better performance compared to the other 18 regressors tested. More importantly, DrCIF is the only one that significantly outperforms a standard rotation forest regressor.

时间序列外回归(TSER)是指使用一组训练时间序列来形成连续响应变量的预测模型,该连续响应变量与回归序列没有直接关系。用于比较算法的 TSER 档案于 2022 年发布,共有 19 个问题。我们将存档问题的数量增加到 63 个,并重现了之前的基线算法比较。然后,我们将比较范围扩大到更广泛的标准回归因子和之前研究中使用的 TSER 模型的最新版本。我们发现,之前评估过的回归因子都无法超越标准分类器旋转森林的回归适应性。我们介绍了从时间序列分类的相关工作中开发出来的两种新 TSER 算法。FreshPRINCE 是一种流水线估计器,由转换为各种摘要特征和旋转森林回归器组成。DrCIF 是一种树状集合,通过随机区间的汇总统计数据创建特征。我们的研究表明,与测试的其他 18 个回归器相比,这两种算法以及 InceptionTime 的性能都有显著提高。更重要的是,DrCIF 是唯一一种明显优于标准旋转森林回归器的算法。
{"title":"Unsupervised feature based algorithms for time series extrinsic regression","authors":"David Guijo-Rubio, Matthew Middlehurst, Guilherme Arcencio, Diego Furtado Silva, Anthony Bagnall","doi":"10.1007/s10618-024-01027-w","DOIUrl":"https://doi.org/10.1007/s10618-024-01027-w","url":null,"abstract":"<p>Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable that is not directly related to the regressor series. The TSER archive for comparing algorithms was released in 2022 with 19 problems. We increase the size of this archive to 63 problems and reproduce the previous comparison of baseline algorithms. We then extend the comparison to include a wider range of standard regressors and the latest versions of TSER models used in the previous study. We show that none of the previously evaluated regressors can outperform a regression adaptation of a standard classifier, rotation forest. We introduce two new TSER algorithms developed from related work in time series classification. FreshPRINCE is a pipeline estimator consisting of a transform into a wide range of summary features followed by a rotation forest regressor. DrCIF is a tree ensemble that creates features from summary statistics over random intervals. Our study demonstrates that both algorithms, along with InceptionTime, exhibit significantly better performance compared to the other 18 regressors tested. More importantly, DrCIF is the only one that significantly outperforms a standard rotation forest regressor.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"32 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141063378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards more sustainable and trustworthy reporting in machine learning 让机器学习报告更具可持续性和可信度
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-30 DOI: 10.1007/s10618-024-01020-3
Raphael Fischer, Thomas Liebig, Katharina Morik

With machine learning (ML) becoming a popular tool across all domains, practitioners are in dire need of comprehensive reporting on the state-of-the-art. Benchmarks and open databases provide helpful insights for many tasks, however suffer from several phenomena: Firstly, they overly focus on prediction quality, which is problematic considering the demand for more sustainability in ML. Depending on the use case at hand, interested users might also face tight resource constraints and thus should be allowed to interact with reporting frameworks, in order to prioritize certain reported characteristics. Furthermore, as some practitioners might not yet be well-skilled in ML, it is important to convey information on a more abstract, comprehensible level. Usability and extendability are key for moving with the state-of-the-art and in order to be trustworthy, frameworks should explicitly address reproducibility. In this work, we analyze established reporting systems under consideration of the aforementioned issues. Afterwards, we propose STREP, our novel framework that aims at overcoming these shortcomings and paves the way towards more sustainable and trustworthy reporting. We use STREP’s (publicly available) implementation to investigate various existing report databases. Our experimental results unveil the need for making reporting more resource-aware and demonstrate our framework’s capabilities of overcoming current reporting limitations. With our work, we want to initiate a paradigm shift in reporting and help with making ML advances more considerate of sustainability and trustworthiness.

随着机器学习(ML)成为各个领域的流行工具,从业人员迫切需要有关最新技术的全面报告。基准和开放数据库为许多任务提供了有用的见解,但也存在一些问题:首先,它们过于关注预测质量,而考虑到对 ML 可持续性的要求,这是有问题的。根据当前的使用情况,感兴趣的用户也可能面临资源紧张的问题,因此应允许他们与报告框架进行交互,以便优先考虑某些报告特征。此外,由于一些从业人员可能尚未熟练掌握 ML,因此必须在更抽象、更易理解的层面上传达信息。可用性和可扩展性是与最新技术保持同步的关键,为了值得信赖,框架应明确解决可重复性问题。在这项工作中,我们根据上述问题对已有的报告系统进行了分析。随后,我们提出了 STREP -- 我们的新型框架,旨在克服这些不足,为实现更可持续、更可信的报告铺平道路。我们使用 STREP(公开可用)的实现来调查各种现有的报告数据库。我们的实验结果揭示了使报告更具资源感知能力的必要性,并展示了我们的框架克服当前报告局限性的能力。通过我们的工作,我们希望启动报告范式的转变,并帮助使 ML 的进步更加考虑可持续性和可信性。
{"title":"Towards more sustainable and trustworthy reporting in machine learning","authors":"Raphael Fischer, Thomas Liebig, Katharina Morik","doi":"10.1007/s10618-024-01020-3","DOIUrl":"https://doi.org/10.1007/s10618-024-01020-3","url":null,"abstract":"<p>With machine learning (ML) becoming a popular tool across all domains, practitioners are in dire need of comprehensive reporting on the state-of-the-art. Benchmarks and open databases provide helpful insights for many tasks, however suffer from several phenomena: Firstly, they overly focus on prediction quality, which is problematic considering the demand for more sustainability in ML. Depending on the use case at hand, interested users might also face tight resource constraints and thus should be allowed to interact with reporting frameworks, in order to prioritize certain reported characteristics. Furthermore, as some practitioners might not yet be well-skilled in ML, it is important to convey information on a more abstract, comprehensible level. Usability and extendability are key for moving with the state-of-the-art and in order to be trustworthy, frameworks should explicitly address reproducibility. In this work, we analyze established reporting systems under consideration of the aforementioned issues. Afterwards, we propose STREP, our novel framework that aims at overcoming these shortcomings and paves the way towards more sustainable and trustworthy reporting. We use STREP’s (publicly available) implementation to investigate various existing report databases. Our experimental results unveil the need for making reporting more resource-aware and demonstrate our framework’s capabilities of overcoming current reporting limitations. With our work, we want to initiate a paradigm shift in reporting and help with making ML advances more considerate of sustainability and trustworthiness.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"8 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable representations in explainable AI: from theory to practice 可解释人工智能中的可解释表征:从理论到实践
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-25 DOI: 10.1007/s10618-024-01010-5
Kacper Sokol, Peter Flach

Interpretable representations are the backbone of many explainers that target black-box predictive systems based on artificial intelligence and machine learning algorithms. They translate the low-level data representation necessary for good predictive performance into high-level human-intelligible concepts used to convey the explanatory insights. Notably, the explanation type and its cognitive complexity are directly controlled by the interpretable representation, tweaking which allows to target a particular audience and use case. However, many explainers built upon interpretable representations overlook their merit and fall back on default solutions that often carry implicit assumptions, thereby degrading the explanatory power and reliability of such techniques. To address this problem, we study properties of interpretable representations that encode presence and absence of human-comprehensible concepts. We demonstrate how they are operationalised for tabular, image and text data; discuss their assumptions, strengths and weaknesses; identify their core building blocks; and scrutinise their configuration and parameterisation. In particular, this in-depth analysis allows us to pinpoint their explanatory properties, desiderata and scope for (malicious) manipulation in the context of tabular data where a linear model is used to quantify the influence of interpretable concepts on a black-box prediction. Our findings lead to a range of recommendations for designing trustworthy interpretable representations; specifically, the benefits of class-aware (supervised) discretisation of tabular data, e.g., with decision trees, and sensitivity of image interpretable representations to segmentation granularity and occlusion colour.

可解释表征是许多以基于人工智能和机器学习算法的黑盒预测系统为目标的解释器的支柱。它们将良好预测性能所需的低层次数据表示转化为高层次的人类可理解概念,用于传达解释性见解。值得注意的是,解释类型及其认知复杂性直接受可解释表征的控制,调整可解释表征可针对特定受众和用例。然而,许多建立在可解释表征基础上的解说词忽视了它们的优点,转而使用往往带有隐含假设的默认解决方案,从而降低了这类技术的解释能力和可靠性。为了解决这个问题,我们研究了可解释表征的特性,这些特性编码了人类可理解概念的存在与否。我们展示了如何对表格、图像和文本数据进行操作;讨论了它们的假设、优势和劣势;确定了它们的核心构件;并仔细研究了它们的配置和参数化。特别是,通过这种深入分析,我们可以在表格数据的背景下精确定位它们的解释属性、可取之处和(恶意)操纵范围,在表格数据中,线性模型用于量化可解释概念对黑箱预测的影响。我们的研究结果为设计值得信赖的可解释表征提出了一系列建议;特别是对表格数据进行类感知(监督)离散化(如使用决策树)的好处,以及图像可解释表征对分割粒度和遮挡颜色的敏感性。
{"title":"Interpretable representations in explainable AI: from theory to practice","authors":"Kacper Sokol, Peter Flach","doi":"10.1007/s10618-024-01010-5","DOIUrl":"https://doi.org/10.1007/s10618-024-01010-5","url":null,"abstract":"<p>Interpretable representations are the backbone of many explainers that target black-box predictive systems based on artificial intelligence and machine learning algorithms. They translate the low-level data representation necessary for good predictive performance into high-level human-intelligible concepts used to convey the explanatory insights. Notably, the explanation type and its cognitive complexity are directly controlled by the interpretable representation, tweaking which allows to target a particular audience and use case. However, many explainers built upon interpretable representations overlook their merit and fall back on default solutions that often carry implicit assumptions, thereby degrading the explanatory power and reliability of such techniques. To address this problem, we study properties of interpretable representations that encode presence and absence of human-comprehensible concepts. We demonstrate how they are operationalised for tabular, image and text data; discuss their assumptions, strengths and weaknesses; identify their core building blocks; and scrutinise their configuration and parameterisation. In particular, this in-depth analysis allows us to pinpoint their explanatory properties, desiderata and scope for (malicious) manipulation in the context of tabular data where a linear model is used to quantify the influence of interpretable concepts on a black-box prediction. Our findings lead to a range of recommendations for designing trustworthy interpretable representations; specifically, the benefits of class-aware (supervised) discretisation of tabular data, e.g., with decision trees, and sensitivity of image interpretable representations to segmentation granularity and occlusion colour.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"50 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140800893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bake off redux: a review and experimental evaluation of recent time series classification algorithms 烘焙大赛再现:近期时间序列分类算法回顾与实验评估
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-19 DOI: 10.1007/s10618-024-01022-1
Matthew Middlehurst, Patrick Schäfer, Anthony Bagnall

In 2017, a research paper (Bagnall et al. Data Mining and Knowledge Discovery 31(3):606-660. 2017) compared 18 Time Series Classification (TSC) algorithms on 85 datasets from the University of California, Riverside (UCR) archive. This study, commonly referred to as a ‘bake off’, identified that only nine algorithms performed significantly better than the Dynamic Time Warping (DTW) and Rotation Forest benchmarks that were used. The study categorised each algorithm by the type of feature they extract from time series data, forming a taxonomy of five main algorithm types. This categorisation of algorithms alongside the provision of code and accessible results for reproducibility has helped fuel an increase in popularity of the TSC field. Over six years have passed since this bake off, the UCR archive has expanded to 112 datasets and there have been a large number of new algorithms proposed. We revisit the bake off, seeing how each of the proposed categories have advanced since the original publication, and evaluate the performance of newer algorithms against the previous best-of-category using an expanded UCR archive. We extend the taxonomy to include three new categories to reflect recent developments. Alongside the originally proposed distance, interval, shapelet, dictionary and hybrid based algorithms, we compare newer convolution and feature based algorithms as well as deep learning approaches. We introduce 30 classification datasets either recently donated to the archive or reformatted to the TSC format, and use these to further evaluate the best performing algorithm from each category. Overall, we find that two recently proposed algorithms, MultiROCKET+Hydra (Dempster et al. 2022) and HIVE-COTEv2 (Middlehurst et al. Mach Learn 110:3211-3243. 2021), perform significantly better than other approaches on both the current and new TSC problems.

2017 年,一篇研究论文(Bagnall et al. Data Mining and Knowledge Discovery 31(3):606-660.2017)在加州大学河滨分校(UCR)档案馆的 85 个数据集上比较了 18 种时间序列分类(TSC)算法。这项通常被称为 "烘烤 "的研究发现,只有九种算法的性能明显优于所使用的动态时间扭曲(DTW)和旋转森林基准。该研究按照从时间序列数据中提取特征的类型对每种算法进行了分类,形成了五种主要算法类型的分类法。对算法进行分类,同时提供代码和可访问的结果,以实现可重复性,这有助于提高 TSC 领域的受欢迎程度。自这次竞赛以来,六年多过去了,UCR 档案已扩展到 112 个数据集,并提出了大量新算法。我们重温了这次评选活动,看看自最初发表以来,所提出的每个分类是如何发展的,并利用扩充的 UCR 档案,对照以前的最佳分类,评估新算法的性能。我们扩展了分类法,增加了三个新类别,以反映最新发展。除了最初提出的基于距离、区间、小形、字典和混合的算法外,我们还比较了较新的基于卷积和特征的算法以及深度学习方法。我们引入了 30 个分类数据集,这些数据集要么是最近捐赠给档案馆的,要么是重新格式化为 TSC 格式的,我们将利用这些数据集进一步评估每个类别中性能最佳的算法。总体而言,我们发现最近提出的两种算法--MultiROCKET+Hydra (Dempster et al. 2022) 和 HIVE-COTEv2 (Middlehurst et al. Mach Learn 110:3211-3243. 2021),在当前和新的 TSC 问题上的表现都明显优于其他方法。
{"title":"Bake off redux: a review and experimental evaluation of recent time series classification algorithms","authors":"Matthew Middlehurst, Patrick Schäfer, Anthony Bagnall","doi":"10.1007/s10618-024-01022-1","DOIUrl":"https://doi.org/10.1007/s10618-024-01022-1","url":null,"abstract":"<p>In 2017, a research paper (Bagnall et al. Data Mining and Knowledge Discovery 31(3):606-660. 2017) compared 18 Time Series Classification (TSC) algorithms on 85 datasets from the University of California, Riverside (UCR) archive. This study, commonly referred to as a ‘bake off’, identified that only nine algorithms performed significantly better than the Dynamic Time Warping (DTW) and Rotation Forest benchmarks that were used. The study categorised each algorithm by the type of feature they extract from time series data, forming a taxonomy of five main algorithm types. This categorisation of algorithms alongside the provision of code and accessible results for reproducibility has helped fuel an increase in popularity of the TSC field. Over six years have passed since this bake off, the UCR archive has expanded to 112 datasets and there have been a large number of new algorithms proposed. We revisit the bake off, seeing how each of the proposed categories have advanced since the original publication, and evaluate the performance of newer algorithms against the previous best-of-category using an expanded UCR archive. We extend the taxonomy to include three new categories to reflect recent developments. Alongside the originally proposed distance, interval, shapelet, dictionary and hybrid based algorithms, we compare newer convolution and feature based algorithms as well as deep learning approaches. We introduce 30 classification datasets either recently donated to the archive or reformatted to the TSC format, and use these to further evaluate the best performing algorithm from each category. Overall, we find that two recently proposed algorithms, MultiROCKET+Hydra (Dempster et al. 2022) and HIVE-COTEv2 (Middlehurst et al. Mach Learn 110:3211-3243. 2021), perform significantly better than other approaches on both the current and new TSC problems.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"54 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140627694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lost in the Forest: Encoding categorical variables and the absent levels problem 迷失在森林中分类变量编码和缺失水平问题
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-10 DOI: 10.1007/s10618-024-01019-w
Helen L. Smith, Patrick J. Biggs, Nigel P. French, Adam N. H. Smith, Jonathan C. Marshall

Levels of a predictor variable that are absent when a classification tree is grown can not be subject to an explicit splitting rule. This is an issue if these absent levels are present in a new observation for prediction. To date, there remains no satisfactory solution for absent levels in random forest models. Unlike missing data, absent levels are fully observed and known. Ordinal encoding of predictors allows absent levels to be integrated and used for prediction. Using a case study on source attribution of Campylobacter species using whole genome sequencing (WGS) data as predictors, we examine how target-agnostic versus target-based encoding of predictor variables with absent levels affects the accuracy of random forest models. We show that a target-based encoding approach using class probabilities, with absent levels designated the highest rank, is systematically biased, and that this bias is resolved by encoding absent levels according to the a priori hypothesis of equal class probability. We present a novel method of ordinal encoding predictors via principal coordinates analysis (PCO) which capitalizes on the similarity between pairs of predictor levels. Absent levels are encoded according to their similarity to each of the other levels in the training data. We show that the PCO-encoding method performs at least as well as the target-based approach and is not biased.

在分类树生长过程中不存在的预测变量层级不能使用明确的拆分规则。如果这些缺失水平出现在新的预测观测值中,这就是一个问题。迄今为止,对于随机森林模型中的缺失水平还没有令人满意的解决方案。与缺失数据不同,缺失水平是完全可观察和已知的。对预测因子进行序数编码可以整合缺失水平并用于预测。我们通过一个使用全基因组测序(WGS)数据作为预测因子的弯曲杆菌物种来源归因案例研究,考察了目标不确定性编码与基于目标编码的预测变量缺失水平如何影响随机森林模型的准确性。我们发现,基于目标的编码方法使用类概率,缺失水平被指定为最高等级,这种方法存在系统性偏差,而根据类概率相等的先验假设对缺失水平进行编码可以解决这种偏差。我们提出了一种通过主坐标分析(PCO)对预测因子进行序编码的新方法,该方法利用了预测因子水平对之间的相似性。缺失水平根据其与训练数据中其他水平的相似性进行编码。我们的研究表明,PCO 编码方法的性能至少与基于目标的方法相当,而且没有偏差。
{"title":"Lost in the Forest: Encoding categorical variables and the absent levels problem","authors":"Helen L. Smith, Patrick J. Biggs, Nigel P. French, Adam N. H. Smith, Jonathan C. Marshall","doi":"10.1007/s10618-024-01019-w","DOIUrl":"https://doi.org/10.1007/s10618-024-01019-w","url":null,"abstract":"<p>Levels of a predictor variable that are absent when a classification tree is grown can not be subject to an explicit splitting rule. This is an issue if these absent levels are present in a new observation for prediction. To date, there remains no satisfactory solution for absent levels in random forest models. Unlike missing data, absent levels are fully observed and known. Ordinal encoding of predictors allows absent levels to be integrated and used for prediction. Using a case study on source attribution of <i>Campylobacter</i> species using whole genome sequencing (WGS) data as predictors, we examine how target-agnostic <i>versus</i> target-based encoding of predictor variables with absent levels affects the accuracy of random forest models. We show that a target-based encoding approach using class probabilities, with absent levels designated the highest rank, is systematically biased, and that this bias is resolved by encoding absent levels according to the <i>a priori</i> hypothesis of equal class probability. We present a novel method of ordinal encoding predictors <i>via</i> principal coordinates analysis (PCO) which capitalizes on the similarity between pairs of predictor levels. Absent levels are encoded according to their similarity to each of the other levels in the training data. We show that the PCO-encoding method performs at least as well as the target-based approach and is not biased.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"13 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140562689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time series clustering with random convolutional kernels 使用随机卷积核进行时间序列聚类
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-01 DOI: 10.1007/s10618-024-01018-x

Abstract

Time series data, spanning applications ranging from climatology to finance to healthcare, presents significant challenges in data mining due to its size and complexity. One open issue lies in time series clustering, which is crucial for processing large volumes of unlabeled time series data and unlocking valuable insights. Traditional and modern analysis methods, however, often struggle with these complexities. To address these limitations, we introduce R-Clustering, a novel method that utilizes convolutional architectures with randomly selected parameters. Through extensive evaluations, R-Clustering demonstrates superior performance over existing methods in terms of clustering accuracy, computational efficiency and scalability. Empirical results obtained using the UCR archive demonstrate the effectiveness of our approach across diverse time series datasets. The findings highlight the significance of R-Clustering in various domains and applications, contributing to the advancement of time series data mining.

摘要 时间序列数据的应用范围从气候学、金融到医疗保健,由于其规模和复杂性,给数据挖掘带来了巨大挑战。其中一个有待解决的问题是时间序列聚类,这对于处理大量无标记的时间序列数据和挖掘有价值的见解至关重要。然而,传统和现代的分析方法往往难以应对这些复杂性。为了解决这些局限性,我们引入了 R-聚类,这是一种利用随机选择参数的卷积架构的新方法。通过广泛的评估,R-聚类在聚类准确性、计算效率和可扩展性方面都表现出优于现有方法的性能。使用 UCR 档案获得的经验结果表明,我们的方法在各种时间序列数据集上都很有效。研究结果凸显了 R 聚类在不同领域和应用中的重要性,有助于推动时间序列数据挖掘的发展。
{"title":"Time series clustering with random convolutional kernels","authors":"","doi":"10.1007/s10618-024-01018-x","DOIUrl":"https://doi.org/10.1007/s10618-024-01018-x","url":null,"abstract":"<h3>Abstract</h3> <p>Time series data, spanning applications ranging from climatology to finance to healthcare, presents significant challenges in data mining due to its size and complexity. One open issue lies in time series clustering, which is crucial for processing large volumes of unlabeled time series data and unlocking valuable insights. Traditional and modern analysis methods, however, often struggle with these complexities. To address these limitations, we introduce R-Clustering, a novel method that utilizes convolutional architectures with randomly selected parameters. Through extensive evaluations, R-Clustering demonstrates superior performance over existing methods in terms of clustering accuracy, computational efficiency and scalability. Empirical results obtained using the UCR archive demonstrate the effectiveness of our approach across diverse time series datasets. The findings highlight the significance of R-Clustering in various domains and applications, contributing to the advancement of time series data mining.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"8 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140562687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable linear dimensionality reduction based on bias-variance analysis 基于偏差-方差分析的可解释线性降维
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-25 DOI: 10.1007/s10618-024-01015-0

Abstract

One of the central issues of several machine learning applications on real data is the choice of the input features. Ideally, the designer should select a small number of the relevant, nonredundant features to preserve the complete information contained in the original dataset, with little collinearity among features. This procedure helps mitigate problems like overfitting and the curse of dimensionality, which arise when dealing with high-dimensional problems. On the other hand, it is not desirable to simply discard some features, since they may still contain information that can be exploited to improve results. Instead, dimensionality reduction techniques are designed to limit the number of features in a dataset by projecting them into a lower dimensional space, possibly considering all the original features. However, the projected features resulting from the application of dimensionality reduction techniques are usually difficult to interpret. In this paper, we seek to design a principled dimensionality reduction approach that maintains the interpretability of the resulting features. Specifically, we propose a bias-variance analysis for linear models and we leverage these theoretical results to design an algorithm, Linear Correlated Features Aggregation (LinCFA), which aggregates groups of continuous features with their average if their correlation is “sufficiently large”. In this way, all features are considered, the dimensionality is reduced and the interpretability is preserved. Finally, we provide numerical validations of the proposed algorithm both on synthetic datasets to confirm the theoretical results and on real datasets to show some promising applications.

摘要 对真实数据进行机器学习的若干应用的核心问题之一是输入特征的选择。理想情况下,设计者应选择少量相关的、非冗余的特征,以保留原始数据集所包含的完整信息,且特征之间的共线性很小。这一程序有助于缓解处理高维问题时出现的过拟合和维度诅咒等问题。另一方面,简单地丢弃某些特征并不可取,因为这些特征可能仍然包含可用于改进结果的信息。相反,降维技术旨在将数据集中的特征投影到一个更低的维度空间,从而限制特征的数量,并可能考虑到所有原始特征。然而,应用降维技术得到的投影特征通常很难解释。在本文中,我们试图设计一种有原则的降维方法,以保持所得特征的可解释性。具体来说,我们提出了线性模型的偏差-方差分析,并利用这些理论结果设计了一种算法--线性相关特征聚合(Linear Correlated Features Aggregation,LinCFA)。通过这种方法,所有特征都被考虑在内,维度得以降低,可解释性得以保持。最后,我们在合成数据集上对所提出的算法进行了数值验证,以确认理论结果,并在真实数据集上展示了一些有前景的应用。
{"title":"Interpretable linear dimensionality reduction based on bias-variance analysis","authors":"","doi":"10.1007/s10618-024-01015-0","DOIUrl":"https://doi.org/10.1007/s10618-024-01015-0","url":null,"abstract":"<h3>Abstract</h3> <p>One of the central issues of several machine learning applications on real data is the choice of the input features. Ideally, the designer should select a small number of the relevant, nonredundant features to preserve the complete information contained in the original dataset, with little collinearity among features. This procedure helps mitigate problems like overfitting and the curse of dimensionality, which arise when dealing with high-dimensional problems. On the other hand, it is not desirable to simply discard some features, since they may still contain information that can be exploited to improve results. Instead, <em>dimensionality reduction</em> techniques are designed to limit the number of features in a dataset by projecting them into a lower dimensional space, possibly considering all the original features. However, the projected features resulting from the application of dimensionality reduction techniques are usually difficult to interpret. In this paper, we seek to design a principled dimensionality reduction approach that maintains the interpretability of the resulting features. Specifically, we propose a bias-variance analysis for linear models and we leverage these theoretical results to design an algorithm, <em>Linear Correlated Features Aggregation</em> (LinCFA), which aggregates groups of continuous features with their average if their correlation is “sufficiently large”. In this way, all features are considered, the dimensionality is reduced and the interpretability is preserved. Finally, we provide numerical validations of the proposed algorithm both on synthetic datasets to confirm the theoretical results and on real datasets to show some promising applications.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"86 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140300618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data MCCE:对表格数据的有效和现实的反事实解释进行蒙特卡洛采样
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-22 DOI: 10.1007/s10618-024-01017-y
Annabelle Redelmeier, Martin Jullum, Kjersti Aas, Anders Løland

We introduce MCCE: ({{{underline{varvec{M}}}}})onte ({{{underline{varvec{C}}}}})arlo sampling of valid and realistic ({{{underline{varvec{C}}}}})ounterfactual ({{{underline{varvec{E}}}}})xplanations for tabular data, a novel counterfactual explanation method that generates on-manifold, actionable and valid counterfactuals by modeling the joint distribution of the mutable features given the immutable features and the decision. Unlike other on-manifold methods that tend to rely on variational autoencoders and have strict prediction model and data requirements, MCCE handles any type of prediction model and categorical features with more than two levels. MCCE first models the joint distribution of the features and the decision with an autoregressive generative model where the conditionals are estimated using decision trees. Then, it samples a large set of observations from this model, and finally, it removes the samples that do not obey certain criteria. We compare MCCE with a range of state-of-the-art on-manifold counterfactual methods using four well-known data sets and show that MCCE outperforms these methods on all common performance metrics and speed. In particular, including the decision in the modeling process improves the efficiency of the method substantially.

我们介绍 MCCE:针对表格数据的有效和现实的反事实解释({{underline{/varvec{M}}}}}onte &({{{underline{/varvec{C}}}}})arlo sampling of valid and realistic &({{{underline{/varvec{C}}}}})unterfactual &({{{underline{/varvec{E}}}}})xplanations for tabular data)、这是一种新颖的反事实解释方法,通过给定不可变特征和决策,对可变特征的联合分布进行建模,生成可操作的有效反事实。与其他往往依赖变异自动编码器并对预测模型和数据有严格要求的manifold方法不同,MCCE可处理任何类型的预测模型和两级以上的分类特征。MCCE 首先使用自回归生成模型对特征和决策的联合分布进行建模,其中的条件使用决策树进行估计。然后,它从该模型中抽取大量观察样本,最后剔除不符合特定标准的样本。我们使用四个著名的数据集将 MCCE 与一系列最先进的本体反事实方法进行了比较,结果表明 MCCE 在所有常见性能指标和速度上都优于这些方法。特别是,将决策纳入建模过程大大提高了该方法的效率。
{"title":"MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data","authors":"Annabelle Redelmeier, Martin Jullum, Kjersti Aas, Anders Løland","doi":"10.1007/s10618-024-01017-y","DOIUrl":"https://doi.org/10.1007/s10618-024-01017-y","url":null,"abstract":"<p>We introduce MCCE: <span>({{{underline{varvec{M}}}}})</span>onte <span>({{{underline{varvec{C}}}}})</span>arlo sampling of valid and realistic <span>({{{underline{varvec{C}}}}})</span>ounterfactual <span>({{{underline{varvec{E}}}}})</span>xplanations for tabular data, a novel counterfactual explanation method that generates on-manifold, actionable and valid counterfactuals by modeling the joint distribution of the mutable features given the immutable features and the decision. Unlike other on-manifold methods that tend to rely on variational autoencoders and have strict prediction model and data requirements, MCCE handles any type of prediction model and categorical features with more than two levels. MCCE first models the joint distribution of the features and the decision with an autoregressive generative model where the conditionals are estimated using decision trees. Then, it samples a large set of observations from this model, and finally, it removes the samples that do not obey certain criteria. We compare MCCE with a range of state-of-the-art on-manifold counterfactual methods using four well-known data sets and show that MCCE outperforms these methods on all common performance metrics and speed. In particular, including the decision in the modeling process improves the efficiency of the method substantially.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"365 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140203508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binary quantification and dataset shift: an experimental investigation 二元量化与数据集转移:一项实验研究
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-18 DOI: 10.1007/s10618-024-01014-1
Pablo González, Alejandro Moreo, Fabrizio Sebastiani

Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data, and is of special interest when the labelled data on which the predictor has been trained and the unlabelled data are not IID, i.e., suffer from dataset shift. To date, quantification methods have mostly been tested only on a special case of dataset shift, i.e., prior probability shift; the relationship between quantification and other types of dataset shift remains, by and large, unexplored. In this work we carry out an experimental analysis of how current quantification algorithms behave under different types of dataset shift, in order to identify limitations of current approaches and hopefully pave the way for the development of more broadly applicable methods. We do this by proposing a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift, and by testing existing quantification methods on the datasets thus generated. One finding that results from this investigation is that many existing quantification methods that had been found robust to prior probability shift are not necessarily robust to other types of dataset shift. A second finding is that no existing quantification method seems to be robust enough to dealing with all the types of dataset shift we simulate in our experiments. The code needed to reproduce all our experiments is publicly available at https://github.com/pglez82/quant_datasetshift.

量化是一种监督学习任务,包括训练未标记数据集的类别流行值预测器,当预测器所训练的标记数据和未标记数据不是 IID 时,即出现数据集偏移时,量化就会引起特别的兴趣。迄今为止,量化方法大多只在数据集偏移的一种特殊情况下(即先验概率偏移)进行过测试;量化与其他类型的数据集偏移之间的关系基本上仍未得到探讨。在这项工作中,我们对当前的量化算法在不同类型的数据集偏移下的表现进行了实验分析,以找出当前方法的局限性,并希望为开发更广泛适用的方法铺平道路。为此,我们提出了数据集偏移类型的精细分类法,建立了生成受这些类型偏移影响的数据集的协议,并在由此生成的数据集上测试了现有的量化方法。这项调查得出的一个发现是,许多现有的量化方法对先验概率偏移具有鲁棒性,但对其他类型的数据集偏移并不一定具有鲁棒性。第二个发现是,现有的量化方法似乎都不足以应对我们在实验中模拟的所有类型的数据集偏移。重现我们所有实验所需的代码可在 https://github.com/pglez82/quant_datasetshift 公开获取。
{"title":"Binary quantification and dataset shift: an experimental investigation","authors":"Pablo González, Alejandro Moreo, Fabrizio Sebastiani","doi":"10.1007/s10618-024-01014-1","DOIUrl":"https://doi.org/10.1007/s10618-024-01014-1","url":null,"abstract":"<p>Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data, and is of special interest when the labelled data on which the predictor has been trained and the unlabelled data are not IID, i.e., suffer from <i>dataset shift</i>. To date, quantification methods have mostly been tested only on a special case of dataset shift, i.e., <i>prior probability shift</i>; the relationship between quantification and other types of dataset shift remains, by and large, unexplored. In this work we carry out an experimental analysis of how current quantification algorithms behave under different types of dataset shift, in order to identify limitations of current approaches and hopefully pave the way for the development of more broadly applicable methods. We do this by proposing a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift, and by testing existing quantification methods on the datasets thus generated. One finding that results from this investigation is that many existing quantification methods that had been found robust to prior probability shift are not necessarily robust to other types of dataset shift. A second finding is that no existing quantification method seems to be robust enough to dealing with all the types of dataset shift we simulate in our experiments. The code needed to reproduce all our experiments is publicly available at https://github.com/pglez82/quant_datasetshift.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"159 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140167619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data Mining and Knowledge Discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1