首页 > 最新文献

2020 International Conference on Data Mining Workshops (ICDMW)最新文献

英文 中文
Multi-Task Time Series Forecasting With Shared Attention 具有共同关注的多任务时间序列预测
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00132
Zekai Chen, Jiaze E, Xiao Zhang, Hao Sheng, Xiuzhen Cheng
Time series forecasting is a key component in many industrial and business decision processes and recurrent neural network (RNN) based models have achieved impressive progress on various time series forecasting tasks. However, most of the existing methods focus on single-task forecasting problems by learning separately based on limited supervised objectives, which often suffer from insufficient training instances. As the Transformer architecture and other attention-based models have demonstrated its great capability of capturing long term dependency, we propose two self-attention based sharing schemes for multi-task time series forecasting which can train jointly across multiple tasks. We augment a sequence of paralleled Transformer encoders with an external public multi-head attention function, which is updated by all data of all tasks. Experiments on a number of real-world multi-task time series forecasting tasks show that our proposed architectures can not only outperform the state-of-the-art single-task forecasting baselines but also outperform the RNN-based multi-task forecasting method.
时间序列预测是许多工业和商业决策过程中的关键组成部分,基于循环神经网络(RNN)的模型在各种时间序列预测任务中取得了令人瞩目的进展。然而,现有的方法大多集中在基于有限监督目标的单独学习的单任务预测问题上,往往存在训练实例不足的问题。由于Transformer体系结构和其他基于注意力的模型已经证明了其捕获长期依赖性的强大能力,我们提出了两种基于自注意力的多任务时间序列预测共享方案,它们可以跨多个任务进行联合训练。我们增加了一个外部公共多头关注功能的并行变压器编码器序列,该功能由所有任务的所有数据更新。在多个现实世界多任务时间序列预测任务上的实验表明,我们提出的架构不仅优于最先进的单任务预测基线,而且优于基于rnn的多任务预测方法。
{"title":"Multi-Task Time Series Forecasting With Shared Attention","authors":"Zekai Chen, Jiaze E, Xiao Zhang, Hao Sheng, Xiuzhen Cheng","doi":"10.1109/ICDMW51313.2020.00132","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00132","url":null,"abstract":"Time series forecasting is a key component in many industrial and business decision processes and recurrent neural network (RNN) based models have achieved impressive progress on various time series forecasting tasks. However, most of the existing methods focus on single-task forecasting problems by learning separately based on limited supervised objectives, which often suffer from insufficient training instances. As the Transformer architecture and other attention-based models have demonstrated its great capability of capturing long term dependency, we propose two self-attention based sharing schemes for multi-task time series forecasting which can train jointly across multiple tasks. We augment a sequence of paralleled Transformer encoders with an external public multi-head attention function, which is updated by all data of all tasks. Experiments on a number of real-world multi-task time series forecasting tasks show that our proposed architectures can not only outperform the state-of-the-art single-task forecasting baselines but also outperform the RNN-based multi-task forecasting method.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127166986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
The IEEE ICDM 2020 Workshops IEEE ICDM 2020研讨会
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00009
G. D. Fatta, V. Sheng, A. Cuzzocrea
The 20th IEEE International Conference on Data Mining (ICDM) hosts many co-located workshops, whose papers are traditionally published in a dedicated IEEE CS press proceedings. The purpose of these workshops is to give exposure to the current trends in data mining research, which do not find space or sufficient attention in the main conference tracks either because of the specialised application domain or because of the emerging nature of the field. The aim is to disseminate advances in these emerging fields and to attract more attention and research effort from the community. The quality of the papers and of the final program is guaranteed by an initial selection process of the workshop proposals and by a rigorous peer-review process within each workshop. This volume contains all the papers accepted for publication in the ICDM 2020 workshops and represents an interesting snapshot of data mining methods and applications of emerging and innovative areas of interest. This editorial provides an overview of the workshops included in the final program of ICDM 2020.
第20届IEEE数据挖掘国际会议(ICDM)主办了许多共同举办的研讨会,其论文传统上发表在专门的IEEE CS新闻文集上。这些研讨会的目的是揭示数据挖掘研究的当前趋势,由于专门的应用领域或由于该领域的新兴性质,这些趋势在主要会议轨道上没有找到空间或足够的关注。其目的是传播这些新兴领域的进展,并吸引更多的关注和研究努力。论文和最终项目的质量由研讨会提案的初始选择过程和每个研讨会内严格的同行评审过程保证。本卷包含ICDM 2020研讨会上接受发表的所有论文,代表了数据挖掘方法和新兴和创新领域应用的有趣快照。这篇社论概述了ICDM 2020最终计划中包括的讲习班。
{"title":"The IEEE ICDM 2020 Workshops","authors":"G. D. Fatta, V. Sheng, A. Cuzzocrea","doi":"10.1109/ICDMW51313.2020.00009","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00009","url":null,"abstract":"The 20th IEEE International Conference on Data Mining (ICDM) hosts many co-located workshops, whose papers are traditionally published in a dedicated IEEE CS press proceedings. The purpose of these workshops is to give exposure to the current trends in data mining research, which do not find space or sufficient attention in the main conference tracks either because of the specialised application domain or because of the emerging nature of the field. The aim is to disseminate advances in these emerging fields and to attract more attention and research effort from the community. The quality of the papers and of the final program is guaranteed by an initial selection process of the workshop proposals and by a rigorous peer-review process within each workshop. This volume contains all the papers accepted for publication in the ICDM 2020 workshops and represents an interesting snapshot of data mining methods and applications of emerging and innovative areas of interest. This editorial provides an overview of the workshops included in the final program of ICDM 2020.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124866163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Textual Lyrics Based Emotion Analysis of Bengali Songs 基于文本歌词的孟加拉语歌曲情感分析
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00015
Devjyoti Nath, Anirban Roy, Sumitra Kumari Shaw, Amlan Ghorai, Shanta Phani
The type of song or music a person is listening to, usually represents his state of mind. We find many online music libraries/ repositories and music streaming and media services providers play or store songs or music based on the human emotions. This makes the emotion or mood detection of songs a very interesting study which has got the above mentioned areas of application along with many others. In this paper, we have proposed methods to identify the connotation of any song based on the textual lyrics only. The methods used are the very basic language independent features which show competent performance with F1-score of more than 81%.
一个人正在听的歌曲或音乐的类型通常代表了他的精神状态。我们发现许多在线音乐库/存储库、音乐流媒体和媒体服务提供商播放或存储基于人类情感的歌曲或音乐。这使得歌曲的情感或情绪检测成为一项非常有趣的研究,它得到了上述应用领域以及许多其他领域的应用。在本文中,我们提出了仅根据歌词文本来识别歌曲内涵的方法。所使用的方法是非常基本的语言独立特征,表现出良好的性能,f1得分超过81%。
{"title":"Textual Lyrics Based Emotion Analysis of Bengali Songs","authors":"Devjyoti Nath, Anirban Roy, Sumitra Kumari Shaw, Amlan Ghorai, Shanta Phani","doi":"10.1109/ICDMW51313.2020.00015","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00015","url":null,"abstract":"The type of song or music a person is listening to, usually represents his state of mind. We find many online music libraries/ repositories and music streaming and media services providers play or store songs or music based on the human emotions. This makes the emotion or mood detection of songs a very interesting study which has got the above mentioned areas of application along with many others. In this paper, we have proposed methods to identify the connotation of any song based on the textual lyrics only. The methods used are the very basic language independent features which show competent performance with F1-score of more than 81%.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131278898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Copyright 版权
Pub Date : 2020-11-01 DOI: 10.1109/icdmw51313.2020.00003
{"title":"Copyright","authors":"","doi":"10.1109/icdmw51313.2020.00003","DOIUrl":"https://doi.org/10.1109/icdmw51313.2020.00003","url":null,"abstract":"","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131554862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAMTA: Causal Attention Model for Multi-touch Attribution 多点触控归因的因果注意模型
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00020
Sachin Kumar, Garima Gupta, Ranjitha Prasad, Arnab Chatterjee, L. Vig, Gautam M. Shroff
Advertising channels have evolved from conventional print media, billboards and radio-advertising to online digital advertising (ad), where the users are exposed to a sequence of ad campaigns via social networks, display ads, search etc. While advertisers revisit the design of ad campaigns to concurrently serve the requirements emerging out of new ad channels, it is also critical for advertisers to estimate the contribution from touch-points (view, clicks, converts) on different channels, based on the sequence of customer actions. This process of contribution measurement is often referred to as multi-touch attribution (MTA). In this work, we propose CAMTA, a novel deep recurrent neural network architecture which is a causal attribution mechanism for user-personalised MTA in the context of observational data. CAMTA minimizes the selection bias in channel assignment across time-steps and touchpoints. Furthermore, it utilizes the users' pre-conversion actions in a principled way in order to predict per-channel attribution. To quantitatively benchmark the proposed MTA model, we employ the real-world Criteo dataset and demonstrate the superior performance of CAMTA with respect to prediction accuracy as compared to several baselines. In addition, we provide results for budget allocation and user-behaviour modeling on the predicted channel attribution.
广告渠道已经从传统的印刷媒体、广告牌和广播广告发展到在线数字广告(广告),用户通过社交网络、展示广告、搜索等渠道接触到一系列广告活动。当广告商重新审视广告活动的设计,以同时满足新广告渠道的需求时,广告商根据客户行为的顺序,估计不同渠道上的接触点(观看、点击、转化)的贡献也很重要。这种贡献测量过程通常被称为多点触控归因(MTA)。在这项工作中,我们提出了CAMTA,一种新的深度递归神经网络架构,它是观测数据背景下用户个性化MTA的因果归因机制。CAMTA最大限度地减少了跨时间步长和接触点的通道分配的选择偏差。此外,它以一种有原则的方式利用用户的预转换行为来预测每个渠道的归因。为了对提出的MTA模型进行定量基准测试,我们采用了真实世界的Criteo数据集,并与几个基线相比,证明了CAMTA在预测精度方面的优越性能。此外,我们还提供了预测渠道归属的预算分配和用户行为建模结果。
{"title":"CAMTA: Causal Attention Model for Multi-touch Attribution","authors":"Sachin Kumar, Garima Gupta, Ranjitha Prasad, Arnab Chatterjee, L. Vig, Gautam M. Shroff","doi":"10.1109/ICDMW51313.2020.00020","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00020","url":null,"abstract":"Advertising channels have evolved from conventional print media, billboards and radio-advertising to online digital advertising (ad), where the users are exposed to a sequence of ad campaigns via social networks, display ads, search etc. While advertisers revisit the design of ad campaigns to concurrently serve the requirements emerging out of new ad channels, it is also critical for advertisers to estimate the contribution from touch-points (view, clicks, converts) on different channels, based on the sequence of customer actions. This process of contribution measurement is often referred to as multi-touch attribution (MTA). In this work, we propose CAMTA, a novel deep recurrent neural network architecture which is a causal attribution mechanism for user-personalised MTA in the context of observational data. CAMTA minimizes the selection bias in channel assignment across time-steps and touchpoints. Furthermore, it utilizes the users' pre-conversion actions in a principled way in order to predict per-channel attribution. To quantitatively benchmark the proposed MTA model, we employ the real-world Criteo dataset and demonstrate the superior performance of CAMTA with respect to prediction accuracy as compared to several baselines. In addition, we provide results for budget allocation and user-behaviour modeling on the predicted channel attribution.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133179011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Deep Cooperative Reconstruction with Security Constraints in multi-view environments 多视图环境下具有安全约束的深度协同重构
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00083
D. Maurel, Sylvain Lefebvre, Jérémie Sublime
Nowadays, we can observe a multiplication of multiview data in domains such as marketing, bank administration, survey analysis, or social networks: We are dealing with large data bases that share a fair amount of data representing the same individual with different features depending on the data base. In this context, one can use Machine Learning methods to analyze this fragmented data across several heterogeneous sources (called views). Such analysis is subject to several difficulties: First, not all individual will be present and represented in all data sites and views. And second, this type of cross site analysis raises several ethical questions on privacy issues as no local site should have direct access to data from the other sources. To solve these problems, we present a method called the Cooperative Reconstruction System which aims at reconstructing information missing in some views in a multi-view context using information available in the other views. Furthermore, our method considers privacy issues and therefore achieves said reconstruction without direct data transfer from one view to another.
如今,我们可以在市场营销、银行管理、调查分析或社交网络等领域观察到多视图数据的倍增:我们正在处理大型数据库,这些数据库共享相当数量的数据,这些数据表示具有不同特征的同一个人,具体取决于数据库。在这种情况下,可以使用机器学习方法跨多个异构源(称为视图)分析这些碎片数据。这种分析有几个困难:首先,并非所有的个人都会出现在所有的数据站点和视图中。其次,这种类型的跨站点分析引发了一些关于隐私问题的道德问题,因为任何本地站点都不应该直接访问其他来源的数据。为了解决这些问题,我们提出了一种称为协同重建系统的方法,该方法旨在利用其他视图中可用的信息来重建多视图环境中某些视图中缺失的信息。此外,我们的方法考虑了隐私问题,因此实现了上述重建,而无需从一个视图直接传输数据到另一个视图。
{"title":"Deep Cooperative Reconstruction with Security Constraints in multi-view environments","authors":"D. Maurel, Sylvain Lefebvre, Jérémie Sublime","doi":"10.1109/ICDMW51313.2020.00083","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00083","url":null,"abstract":"Nowadays, we can observe a multiplication of multiview data in domains such as marketing, bank administration, survey analysis, or social networks: We are dealing with large data bases that share a fair amount of data representing the same individual with different features depending on the data base. In this context, one can use Machine Learning methods to analyze this fragmented data across several heterogeneous sources (called views). Such analysis is subject to several difficulties: First, not all individual will be present and represented in all data sites and views. And second, this type of cross site analysis raises several ethical questions on privacy issues as no local site should have direct access to data from the other sources. To solve these problems, we present a method called the Cooperative Reconstruction System which aims at reconstructing information missing in some views in a multi-view context using information available in the other views. Furthermore, our method considers privacy issues and therefore achieves said reconstruction without direct data transfer from one view to another.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133800420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of multivariate time series predictability based on their features 基于多变量时间序列特征的可预测性分析
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00055
A. Kovantsev, P. Gladilin
In this study we explore the features of time-series that can be used for evaluation of their predictability. We suggest using features based on Kolmogorov-Sinai entropy, correlation dimension and Hurst exponent to test multivariate predictability. Besides we use two new features such as ‘noise measure’ and ‘random walk detection’. Then we experimentally test the accuracy of multivariate time series forecasting models, including vector autoregressive model (VAR), multivariate singular spectrum analysis (MSSA) model, local approximation (LA) model and recurrent neural network model with long short term memory (LSTM) cells. At last we test different causality methods for choosing additional time series as the predictors and claim that the relevance of taking into account additional predictors highly depends on the characteristics of the target time series and can be estimated using the developed method. The results of the work can be used as theoretical and experimental basis for the development of forecasting applications for the short time series using a combination of corporate and open source data as additional data predictors.
在本研究中,我们探讨了可用于评估其可预测性的时间序列的特征。我们建议使用基于Kolmogorov-Sinai熵、相关维数和Hurst指数的特征来检验多变量可预测性。此外,我们还使用了“噪声测量”和“随机行走检测”两个新特征。在此基础上,对向量自回归模型(VAR)、多元奇异谱分析模型(MSSA)、局部逼近模型(LA)和长短期记忆递归神经网络模型(LSTM)等多元时间序列预测模型的准确性进行了实验验证。最后,我们测试了选择附加时间序列作为预测因子的不同因果关系方法,并声称考虑附加预测因子的相关性高度依赖于目标时间序列的特征,并且可以使用所开发的方法进行估计。研究结果可作为开发短时间序列预测应用程序的理论和实验基础,该应用程序使用公司和开源数据的组合作为额外的数据预测因子。
{"title":"Analysis of multivariate time series predictability based on their features","authors":"A. Kovantsev, P. Gladilin","doi":"10.1109/ICDMW51313.2020.00055","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00055","url":null,"abstract":"In this study we explore the features of time-series that can be used for evaluation of their predictability. We suggest using features based on Kolmogorov-Sinai entropy, correlation dimension and Hurst exponent to test multivariate predictability. Besides we use two new features such as ‘noise measure’ and ‘random walk detection’. Then we experimentally test the accuracy of multivariate time series forecasting models, including vector autoregressive model (VAR), multivariate singular spectrum analysis (MSSA) model, local approximation (LA) model and recurrent neural network model with long short term memory (LSTM) cells. At last we test different causality methods for choosing additional time series as the predictors and claim that the relevance of taking into account additional predictors highly depends on the characteristics of the target time series and can be estimated using the developed method. The results of the work can be used as theoretical and experimental basis for the development of forecasting applications for the short time series using a combination of corporate and open source data as additional data predictors.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129423054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Efficient Distance-based Global Sensitivity Analysis for Terrestrial Ecosystem Modeling 基于距离的陆地生态系统模拟全球敏感性分析
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00052
D. Lu, D. Ricciuto
Sensitivity analysis in terrestrial ecosystem modeling is important for understanding controlling processes, guiding model development, and targeting new observations to reduce parameter and prediction uncertainty. Complex and computationally expensive terrestrial ecosystem models (TEM) limit the number of ensemble simulations, requiring sophisticated and efficient methods to analyze sensitivities of multiple model responses to different types of parameter uncertainties. In this study, we propose a distance-based global sensitivity analysis (DGSA) method. DGSA first classifies model response samples into a small set of discrete classes and then calculates the distance between parameter frequency distributions in different classes to measure the parameter sensitivity. The principle is that, if the parameter distribution is the same in each class, then the model response is insensitive to the parameter, while a large difference in the distributions indicates the parameter is influential to the response. Built on this idea, DGSA can be applied to analyze sensitivity of a single and a group of responses to different kinds of parameter uncertainties including continuous, discrete and even stochastic. Besides the main-effect sensitivity from a single parameter, DGSA can also quantify the sensitivity from parameter interactions. Additionally, DGSA is computationally efficient which can use a small number of model evaluations to obtain an accurate and statistically significant result. We applied DGSA to two TEMs, one having eight parameters and three kinds of model responses, and the other having 47 parameters and a long-period response. We demonstrated that DGSA can be used for sensitivity problems with multiple responses and high-dimensional parameters efficiently.
陆地生态系统建模中的敏感性分析对于理解控制过程、指导模型开发、瞄准新的观测值以减少参数和预测的不确定性具有重要意义。复杂且计算成本高的陆地生态系统模型(TEM)限制了集合模拟的数量,需要复杂而有效的方法来分析多种模式响应对不同类型参数不确定性的敏感性。在这项研究中,我们提出了一种基于距离的全局敏感性分析(DGSA)方法。DGSA首先将模型响应样本划分为一个小的离散类集合,然后计算不同类中参数频率分布之间的距离来衡量参数的灵敏度。其原理是,如果每一类中参数分布相同,则模型响应对参数不敏感,而分布差异大则表明参数对响应有影响。基于这一思想,DGSA可以应用于分析单个和一组响应对不同类型参数不确定性的灵敏度,包括连续的、离散的甚至是随机的。除了单个参数的主效应灵敏度外,DGSA还可以量化参数相互作用的灵敏度。此外,DGSA计算效率高,可以使用少量模型评估获得准确且具有统计意义的结果。我们将DGSA应用于两个tem,其中一个具有8个参数和3种模型响应,另一个具有47个参数和长周期响应。我们证明了DGSA可以有效地用于具有多响应和高维参数的灵敏度问题。
{"title":"Efficient Distance-based Global Sensitivity Analysis for Terrestrial Ecosystem Modeling","authors":"D. Lu, D. Ricciuto","doi":"10.1109/ICDMW51313.2020.00052","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00052","url":null,"abstract":"Sensitivity analysis in terrestrial ecosystem modeling is important for understanding controlling processes, guiding model development, and targeting new observations to reduce parameter and prediction uncertainty. Complex and computationally expensive terrestrial ecosystem models (TEM) limit the number of ensemble simulations, requiring sophisticated and efficient methods to analyze sensitivities of multiple model responses to different types of parameter uncertainties. In this study, we propose a distance-based global sensitivity analysis (DGSA) method. DGSA first classifies model response samples into a small set of discrete classes and then calculates the distance between parameter frequency distributions in different classes to measure the parameter sensitivity. The principle is that, if the parameter distribution is the same in each class, then the model response is insensitive to the parameter, while a large difference in the distributions indicates the parameter is influential to the response. Built on this idea, DGSA can be applied to analyze sensitivity of a single and a group of responses to different kinds of parameter uncertainties including continuous, discrete and even stochastic. Besides the main-effect sensitivity from a single parameter, DGSA can also quantify the sensitivity from parameter interactions. Additionally, DGSA is computationally efficient which can use a small number of model evaluations to obtain an accurate and statistically significant result. We applied DGSA to two TEMs, one having eight parameters and three kinds of model responses, and the other having 47 parameters and a long-period response. We demonstrated that DGSA can be used for sensitivity problems with multiple responses and high-dimensional parameters efficiently.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131333939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Rebuilding Trust in Active Learning with Actionable Metrics 用可操作的指标重建主动学习中的信任
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00120
A. Abraham, L. Dreyfus-Schmidt
Active Learning (AL) is an active domain of research, but is seldom used in the industry despite the pressing needs. This is in part due to a misalignment of objectives, while research strives at getting the best results on selected datasets, the industry wants guarantees that Active Learning will perform consistently and at least better than random labeling. The very one-off nature of Active Learning makes it crucial to understand how strategy selection can be carried out and what drives poor performance (lack of exploration, selection of samples that are too hard to classify, …). To help rebuild trust of industrial practitioners in Active Learning, we present various actionable metrics. Through extensive experiments on reference datasets such as CIFAR100, Fashion-MNIST, and 20Newsgroups, we show that those metrics brings interpretability to AL strategies that can be leveraged by the practitioner.
主动学习(AL)是一个活跃的研究领域,但在工业上的应用却很少。这在一定程度上是由于目标不一致,虽然研究努力在选定的数据集上获得最佳结果,但行业希望保证主动学习能够始终如一地执行,至少比随机标记要好。主动学习的一次性性质使得理解如何进行策略选择以及导致糟糕表现的原因(缺乏探索,选择样本难以分类,……)变得至关重要。为了帮助重建行业从业者对主动学习的信任,我们提出了各种可操作的指标。通过对参考数据集(如CIFAR100、Fashion-MNIST和20Newsgroups)的广泛实验,我们表明,这些指标为从业者可以利用的人工智能策略带来了可解释性。
{"title":"Rebuilding Trust in Active Learning with Actionable Metrics","authors":"A. Abraham, L. Dreyfus-Schmidt","doi":"10.1109/ICDMW51313.2020.00120","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00120","url":null,"abstract":"Active Learning (AL) is an active domain of research, but is seldom used in the industry despite the pressing needs. This is in part due to a misalignment of objectives, while research strives at getting the best results on selected datasets, the industry wants guarantees that Active Learning will perform consistently and at least better than random labeling. The very one-off nature of Active Learning makes it crucial to understand how strategy selection can be carried out and what drives poor performance (lack of exploration, selection of samples that are too hard to classify, …). To help rebuild trust of industrial practitioners in Active Learning, we present various actionable metrics. Through extensive experiments on reference datasets such as CIFAR100, Fashion-MNIST, and 20Newsgroups, we show that those metrics brings interpretability to AL strategies that can be leveraged by the practitioner.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114895659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
You see a set of wagons - I see one train: Towards a unified view of local and global arbitrarily oriented subspace clusters 你看到的是一组马车,而我看到的是一列火车:朝着局部和全局任意定向子空间集群的统一视图前进
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00050
Daniyal Kazempour, Long Matthias Yan, Peer Kröger, T. Seidl
Having data with a high number of features raises the need to detect clusters which exhibit within subspaces of features a high similarity. These subspaces can be arbitrarily oriented which gave rise to arbitrarily-oriented subspace clustering (AOSC) algorithms. In the diversity of such algorithms some are specialized at detecting clusters which are global, across the entire dataset regardless of any distances, while others are tailored at detecting local clusters. Both of these views (local and global) are obtained separately by each of the algorithms. While from an algebraic point of view, none of both representations can claim to be the true one, it is vital that domain scientists are presented both views, enabling them to inspect and decide which of the representations is closest to the domain specific reality. We propose in this work a framework which is capable to detect locally dense arbitrarily oriented subspace clusters which are embedded within a global one. We also first introduce definitions of locally and globally arbitrarily oriented subspace clusters. Our experiments illustrate that this approach has no significant impact on the cluster quality nor on the runtime performance, and enables scientists to be no longer limited exclusively to either of the local or global views.
拥有具有大量特征的数据,就需要检测在特征子空间中表现出高度相似性的聚类。这些子空间可以任意定向,这就产生了任意定向子空间聚类(AOSC)算法。在这些算法的多样性中,有些专门用于检测全局的聚类,跨越整个数据集,而不考虑任何距离,而另一些则专门用于检测局部聚类。这两种视图(局部视图和全局视图)分别由每种算法获得。虽然从代数的角度来看,这两种表示都不能声称是正确的,但领域科学家能够同时看到这两种观点是至关重要的,这使他们能够检查并决定哪一种表示最接近领域特定的现实。在这项工作中,我们提出了一个框架,该框架能够检测嵌入在全局子空间中的局部密集任意方向子空间簇。我们还首先介绍了局部和全局任意定向子空间簇的定义。我们的实验表明,这种方法对集群质量和运行时性能没有显著影响,并且使科学家不再局限于局部或全局视图。
{"title":"You see a set of wagons - I see one train: Towards a unified view of local and global arbitrarily oriented subspace clusters","authors":"Daniyal Kazempour, Long Matthias Yan, Peer Kröger, T. Seidl","doi":"10.1109/ICDMW51313.2020.00050","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00050","url":null,"abstract":"Having data with a high number of features raises the need to detect clusters which exhibit within subspaces of features a high similarity. These subspaces can be arbitrarily oriented which gave rise to arbitrarily-oriented subspace clustering (AOSC) algorithms. In the diversity of such algorithms some are specialized at detecting clusters which are global, across the entire dataset regardless of any distances, while others are tailored at detecting local clusters. Both of these views (local and global) are obtained separately by each of the algorithms. While from an algebraic point of view, none of both representations can claim to be the true one, it is vital that domain scientists are presented both views, enabling them to inspect and decide which of the representations is closest to the domain specific reality. We propose in this work a framework which is capable to detect locally dense arbitrarily oriented subspace clusters which are embedded within a global one. We also first introduce definitions of locally and globally arbitrarily oriented subspace clusters. Our experiments illustrate that this approach has no significant impact on the cluster quality nor on the runtime performance, and enables scientists to be no longer limited exclusively to either of the local or global views.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116709168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 International Conference on Data Mining Workshops (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1