首页 > 最新文献

2022 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

英文 中文
Diagonally Colorized iVAT Images for Labeled Data 标记数据的对角线彩色iVAT图像
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00043
Elizabeth D. Hathaway, R. Hathaway
The iVAT (improved Visual Assessment of cluster Tendency) image is a useful tool for assessing possible cluster structure in an unlabeled, numerical data set. If labeled data are available then it is sometimes helpful to determine how closely the (unlabeled) data clusters agree with the data partitioning based on the labels. In this note the DCiVAT (Diagonally Colorized iVAT) image is introduced for the case of labeled data. It incorporates all available data and label information into a single colorized iVAT image so that it is possible to visually assess the degree to which data clusters are aligned with label categories. The new approach is illustrated with several examples.
iVAT(改进的聚类趋势视觉评估)图像是在未标记的数值数据集中评估可能的聚类结构的有用工具。如果有标记的数据可用,那么确定(未标记的)数据集群与基于标签的数据分区的一致程度有时是有帮助的。在本文中,针对标记数据的情况,介绍了DCiVAT(对角线彩色iVAT)图像。它将所有可用的数据和标签信息合并到单个彩色iVAT图像中,以便可以直观地评估数据簇与标签类别对齐的程度。用几个例子说明了这种新方法。
{"title":"Diagonally Colorized iVAT Images for Labeled Data","authors":"Elizabeth D. Hathaway, R. Hathaway","doi":"10.1109/ICDMW58026.2022.00043","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00043","url":null,"abstract":"The iVAT (improved Visual Assessment of cluster Tendency) image is a useful tool for assessing possible cluster structure in an unlabeled, numerical data set. If labeled data are available then it is sometimes helpful to determine how closely the (unlabeled) data clusters agree with the data partitioning based on the labels. In this note the DCiVAT (Diagonally Colorized iVAT) image is introduced for the case of labeled data. It incorporates all available data and label information into a single colorized iVAT image so that it is possible to visually assess the degree to which data clusters are aligned with label categories. The new approach is illustrated with several examples.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133342849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparison of Ambulance Redeployment Systems on Real-World Data 基于真实世界数据的救护车重新部署系统的比较
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00010
Niklas Strauß, Max Berrendorf, Tom Haider, M. Schubert
Modern Emergency Medical Services (EMS) benefit from real-time sensor information in various ways as they provide up-to-date location information and help assess current local emergency risks. A critical part of EMS is dynamic ambulance redeployment, i.e., the task of assigning idle ambulances to base stations throughout a community. Although there has been a considerable effort on methods to optimize emergency response systems, a comparison of proposed methods is generally difficult as reported results are mostly based on artificial and proprietary test beds. In this paper, we present a benchmark simulation environment for dynamic ambulance redeployment based on real emergency data from the city of San Francisco. Our proposed simulation environment is highly scalable and is compatible with modern reinforcement learning frameworks. We provide a comparative study of several state-of-the-art methods for various metrics. Results indicate that even simple baseline algorithms can perform considerably well in close-to-realistic settings. The code of our simulator is openly available at https://github.com/niklasdbs/ambusim.
现代紧急医疗服务(EMS)以各种方式受益于实时传感器信息,因为它们提供最新的位置信息并帮助评估当前的当地紧急风险。EMS的一个关键部分是动态救护车重新部署,即将空闲救护车分配到整个社区的基站的任务。尽管在优化应急响应系统的方法上已经做出了相当大的努力,但由于报告的结果大多基于人工和专有的试验台,因此通常很难对所提出的方法进行比较。在本文中,我们提出了一个基于旧金山市真实急救数据的动态救护车调配基准模拟环境。我们提出的仿真环境具有高度可扩展性,并且与现代强化学习框架兼容。我们提供了几种最先进的方法对各种指标的比较研究。结果表明,即使是简单的基线算法也可以在接近现实的设置中表现得相当好。我们的模拟器的代码可以在https://github.com/niklasdbs/ambusim上公开获得。
{"title":"A Comparison of Ambulance Redeployment Systems on Real-World Data","authors":"Niklas Strauß, Max Berrendorf, Tom Haider, M. Schubert","doi":"10.1109/ICDMW58026.2022.00010","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00010","url":null,"abstract":"Modern Emergency Medical Services (EMS) benefit from real-time sensor information in various ways as they provide up-to-date location information and help assess current local emergency risks. A critical part of EMS is dynamic ambulance redeployment, i.e., the task of assigning idle ambulances to base stations throughout a community. Although there has been a considerable effort on methods to optimize emergency response systems, a comparison of proposed methods is generally difficult as reported results are mostly based on artificial and proprietary test beds. In this paper, we present a benchmark simulation environment for dynamic ambulance redeployment based on real emergency data from the city of San Francisco. Our proposed simulation environment is highly scalable and is compatible with modern reinforcement learning frameworks. We provide a comparative study of several state-of-the-art methods for various metrics. Results indicate that even simple baseline algorithms can perform considerably well in close-to-realistic settings. The code of our simulator is openly available at https://github.com/niklasdbs/ambusim.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133739470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Making Sense of Sentiments for Aesthetic Plastic Surgery 审美整形手术的情感理解
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00061
A. Choudhary, E. Cambria
With social media pervading all aspects of our life, the opinions expressed by netizens are a gold mine ready to be exploited in a meaningful way to influence all major public do-mains. Sentiment analysis is a way to interpret this unstructured data using AI tools. It is a well-known fact that there has been a 'Zoom Boom’ in the field of aesthetic plastic surgery due to the COVID-19 pandemic and the same has put the focus of attention sharply on our appearance. Polarity detection of tweets published on popular aesthetic plastic surgery procedures before and after the onset of COVID can provide great insights for aesthetic plastic surgeons and the health industry at large. In this work, we develop an end-to-end system for the sentiment analysis of such tweets incorporating a state-of-the-art fine-tuned deep learning model, an ingenious 'keyword search and filter approach’ and SenticNet. Our system was tested on a large database of 196,900 tweets and the results were visualized using affectively correct word clouds and also subjected to rigorous statistical hypothesis testing to draw meaningful inferences. The results showed a high level of statistical significance.
随着社交媒体渗透到我们生活的方方面面,网民表达的意见是一座金矿,随时可以被利用,以一种有意义的方式影响所有主要的公共事务。情感分析是一种使用人工智能工具解释这种非结构化数据的方法。受新冠肺炎疫情影响,美容整形领域出现了“变焦热潮”,这是众所周知的事实,人们的注意力也集中在了我们的外表上。对新冠肺炎疫情前后发布的流行整形手术推文进行极性检测,可以为整形外科医生和整个健康行业提供很好的见解。在这项工作中,我们开发了一个端到端系统,用于对此类推文进行情感分析,该系统结合了最先进的微调深度学习模型、巧妙的“关键字搜索和过滤方法”以及SenticNet。我们的系统在一个包含196,900条推文的大型数据库上进行了测试,使用有效校正的词云将结果可视化,并进行了严格的统计假设检验,以得出有意义的推论。结果具有高度的统计学意义。
{"title":"Making Sense of Sentiments for Aesthetic Plastic Surgery","authors":"A. Choudhary, E. Cambria","doi":"10.1109/ICDMW58026.2022.00061","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00061","url":null,"abstract":"With social media pervading all aspects of our life, the opinions expressed by netizens are a gold mine ready to be exploited in a meaningful way to influence all major public do-mains. Sentiment analysis is a way to interpret this unstructured data using AI tools. It is a well-known fact that there has been a 'Zoom Boom’ in the field of aesthetic plastic surgery due to the COVID-19 pandemic and the same has put the focus of attention sharply on our appearance. Polarity detection of tweets published on popular aesthetic plastic surgery procedures before and after the onset of COVID can provide great insights for aesthetic plastic surgeons and the health industry at large. In this work, we develop an end-to-end system for the sentiment analysis of such tweets incorporating a state-of-the-art fine-tuned deep learning model, an ingenious 'keyword search and filter approach’ and SenticNet. Our system was tested on a large database of 196,900 tweets and the results were visualized using affectively correct word clouds and also subjected to rigorous statistical hypothesis testing to draw meaningful inferences. The results showed a high level of statistical significance.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131449944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-Scale Sequential Utility Pattern Mining in Uncertain Environments 不确定环境下大规模顺序效用模式挖掘
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00077
J. Wu, Shuo Liu, Jerry Chun‐wei Lin
High utility sequential pattern mining (HUSPM) considers timestamp, internal quantization, and external utility factors to mine high utility sequential patterns (HUSP), which has taken an essential place in data mining. The data collection may be uncertain in real life due to environmental factors, equipment limitations, privacy issues, etc. With the rapid increase of uncertain data volume, the efficiency of traditional mining algorithms decreases seriously. When the data volume is large, the conventional stand-alone algorithm will generate more candidate sequences, occupy a lot of memory, and significantly affect the execution speed. This paper designs a high utility probability sequence pattern mining algorithm based on MapReduce. The algorithm utilizes the MapReduce framework to solve the bottleneck of single-computer operation when the data volume is too large. The algorithm adopts an effective pruning strategy, which can effectively handle and reduce the number of candidate itemsets generated, thus the performance of the designed model can be greatly improved. The performance of the proposed algorithm is verified experimentally, and the correctness and completeness of the proposed algorithm are demonstrated and discussed to show the great achievement of the designed model.
高效用序列模式挖掘(HUSPM)考虑时间戳、内部量化和外部效用因素来挖掘高效用序列模式(HUSP),在数据挖掘中占有重要地位。由于环境因素、设备限制、隐私问题等,数据收集在现实生活中可能存在不确定性。随着不确定数据量的迅速增加,传统挖掘算法的效率严重下降。当数据量较大时,传统的单机算法会产生更多的候选序列,占用大量内存,显著影响执行速度。本文设计了一种基于MapReduce的高效用概率序列模式挖掘算法。该算法利用MapReduce框架解决了数据量过大时单机运行的瓶颈问题。该算法采用有效的剪枝策略,可以有效地处理和减少生成的候选项集的数量,从而大大提高设计模型的性能。通过实验验证了所提算法的性能,并对所提算法的正确性和完整性进行了论证和讨论,显示了所设计模型的巨大成就。
{"title":"Large-Scale Sequential Utility Pattern Mining in Uncertain Environments","authors":"J. Wu, Shuo Liu, Jerry Chun‐wei Lin","doi":"10.1109/ICDMW58026.2022.00077","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00077","url":null,"abstract":"High utility sequential pattern mining (HUSPM) considers timestamp, internal quantization, and external utility factors to mine high utility sequential patterns (HUSP), which has taken an essential place in data mining. The data collection may be uncertain in real life due to environmental factors, equipment limitations, privacy issues, etc. With the rapid increase of uncertain data volume, the efficiency of traditional mining algorithms decreases seriously. When the data volume is large, the conventional stand-alone algorithm will generate more candidate sequences, occupy a lot of memory, and significantly affect the execution speed. This paper designs a high utility probability sequence pattern mining algorithm based on MapReduce. The algorithm utilizes the MapReduce framework to solve the bottleneck of single-computer operation when the data volume is too large. The algorithm adopts an effective pruning strategy, which can effectively handle and reduce the number of candidate itemsets generated, thus the performance of the designed model can be greatly improved. The performance of the proposed algorithm is verified experimentally, and the correctness and completeness of the proposed algorithm are demonstrated and discussed to show the great achievement of the designed model.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132924428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed LSTM-Learning from Differentially Private Label Proportions 基于不同自有标签比例的分布式lstm学习
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00139
Timon Sachweh, Daniel Boiar, T. Liebig
Data privacy and decentralised data collection has become more and more popular in recent years. In order to solve issues with privacy, communication bandwidth and learning from spatio-temporal data, we will propose two efficient models which use Differential Privacy and decentralized LSTM-Learning: One, in which a Long Short Term Memory (LSTM) model is learned for extracting local temporal node constraints and feeding them into a Dense-Layer (LabeIProportionToLocal). The other approach extends the first one by fetching histogram data from the neighbors and joining the information with the LSTM output (LabeIProportionToDense). For evaluation two popular datasets are used: Pems-Bay and METR-LA. Additionally, we provide an own dataset, which is based on LuST. The evaluation will show the tradeoff between performance and data privacy.
近年来,数据隐私和分散的数据收集越来越受欢迎。为了解决隐私、通信带宽和从时空数据中学习的问题,我们将提出两种使用差分隐私和分散LSTM学习的高效模型:一种是学习长短期记忆(LSTM)模型,用于提取局部时间节点约束并将其馈送到致密层(LabeIProportionToLocal)。另一种方法是对第一种方法的扩展,从邻居中获取直方图数据,并将这些信息与LSTM输出(LabeIProportionToDense)连接起来。为了进行评估,使用了两个流行的数据集:Pems-Bay和metro - la。此外,我们提供了自己的数据集,该数据集基于LuST。评估将显示性能和数据隐私之间的权衡。
{"title":"Distributed LSTM-Learning from Differentially Private Label Proportions","authors":"Timon Sachweh, Daniel Boiar, T. Liebig","doi":"10.1109/ICDMW58026.2022.00139","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00139","url":null,"abstract":"Data privacy and decentralised data collection has become more and more popular in recent years. In order to solve issues with privacy, communication bandwidth and learning from spatio-temporal data, we will propose two efficient models which use Differential Privacy and decentralized LSTM-Learning: One, in which a Long Short Term Memory (LSTM) model is learned for extracting local temporal node constraints and feeding them into a Dense-Layer (LabeIProportionToLocal). The other approach extends the first one by fetching histogram data from the neighbors and joining the information with the LSTM output (LabeIProportionToDense). For evaluation two popular datasets are used: Pems-Bay and METR-LA. Additionally, we provide an own dataset, which is based on LuST. The evaluation will show the tradeoff between performance and data privacy.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133036493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A study of automatic speech recognition in Portuguese by the Brazilian General Attorney of the Union 在葡萄牙语自动语音识别的研究由巴西总检察长联盟
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00038
Rodrigo Fay Verqara, Paulo Henrique dos Santos, Guilherme Fay Verqara, Fábio L. L. Mendonça, C. E. L. Veiga, B. Praciano, Daniel Alves da Silva, Rafael Timóteo de Sousa Júnior
This article presents a study of an automatic speech recognition system in Portuguese applied to videos by the General Attorney of the Union of Brazil. As they are confidential videos, using proprietary software from large companies is not allowed for security reasons. Thus, constructing an artificial intelligence model capable of performing automatic speech recognition in Portuguese in the judicial context and making this model available for large-scale inference is critical to maintaining data security. For this purpose, a dataset in Brazilian Portuguese was used by a combination of 3 datasets already built. The system used TDNN Jasper and QuartzNet architectures for network training, obtaining promising preliminary results, having a word error rate (WER) of 56% without using a linguistic model.
本文介绍了巴西联邦总检察长应用于视频的葡萄牙语自动语音识别系统的研究。由于是机密视频,因此出于安全考虑,不允许使用大公司的专有软件。因此,构建一个能够在司法环境中执行葡萄牙语自动语音识别的人工智能模型,并使该模型可用于大规模推理,对于维护数据安全至关重要。为此,一个巴西葡萄牙语的数据集被3个已经建立的数据集组合使用。该系统使用TDNN Jasper和QuartzNet架构进行网络训练,获得了有希望的初步结果,在不使用语言模型的情况下,单词错误率(WER)为56%。
{"title":"A study of automatic speech recognition in Portuguese by the Brazilian General Attorney of the Union","authors":"Rodrigo Fay Verqara, Paulo Henrique dos Santos, Guilherme Fay Verqara, Fábio L. L. Mendonça, C. E. L. Veiga, B. Praciano, Daniel Alves da Silva, Rafael Timóteo de Sousa Júnior","doi":"10.1109/ICDMW58026.2022.00038","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00038","url":null,"abstract":"This article presents a study of an automatic speech recognition system in Portuguese applied to videos by the General Attorney of the Union of Brazil. As they are confidential videos, using proprietary software from large companies is not allowed for security reasons. Thus, constructing an artificial intelligence model capable of performing automatic speech recognition in Portuguese in the judicial context and making this model available for large-scale inference is critical to maintaining data security. For this purpose, a dataset in Brazilian Portuguese was used by a combination of 3 datasets already built. The system used TDNN Jasper and QuartzNet architectures for network training, obtaining promising preliminary results, having a word error rate (WER) of 56% without using a linguistic model.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132387894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving net ecosystem CO2 flux prediction using memory-based interpretable machine learning 利用基于记忆的可解释机器学习改进净生态系统二氧化碳通量预测
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00145
Siyan Liu, Dawei Lu, D. Ricciuto, A. Walker
Terrestrial ecosystems play a central role in the global carbon cycle and affect climate change. However, our predictive understanding of these systems is still limited due to their complexity and uncertainty about how key drivers and their legacy effects influence carbon fluxes. Here, we propose an interpretable Long Short-Term Memory (iLSTM) network for predicting net ecosystem CO2 exchange (NEE) and interpreting the influence on the NEE prediction from environmental drivers and their memory effects. We consider five drivers and apply the method to three forest sites in the United States. Besides performing the prediction in each site, we also conduct transfer learning by using the iLSTM model trained in one site to predict at other sites. Results show that the iLSTM model produces good NEE predictions for all three sites and, more importantly, it provides reasonable interpretations on the input driver's importance as well as their temporal importance on the NEE prediction. Additionally, the iLSTM model demonstrates good across-site transferability in terms of both prediction accuracy and interpretability. The transferability can improve the NEE prediction in unobserved forest sites, and the interpretability advances our predictive understanding and guides process-based model development.
陆地生态系统在全球碳循环和影响气候变化中发挥着核心作用。然而,由于这些系统的复杂性和关键驱动因素及其遗留效应如何影响碳通量的不确定性,我们对这些系统的预测性理解仍然有限。在此,我们提出了一个可解释的长短期记忆(iLSTM)网络来预测净生态系统二氧化碳交换(NEE),并解释环境驱动因素及其记忆效应对净生态系统二氧化碳交换预测的影响。我们考虑了五个驱动因素,并将该方法应用于美国的三个森林地点。除了在每个站点进行预测外,我们还使用在一个站点训练的iLSTM模型进行迁移学习,以预测其他站点。结果表明,iLSTM模型对所有三个站点都能产生良好的NEE预测,更重要的是,它对输入驱动因素的重要性及其对NEE预测的时间重要性提供了合理的解释。此外,iLSTM模型在预测精度和可解释性方面具有良好的跨站点可移植性。可转移性可以提高对未观测样地的新能源经济预测能力,可解释性可以提高我们对新能源经济预测的认识,并指导基于过程的模型开发。
{"title":"Improving net ecosystem CO2 flux prediction using memory-based interpretable machine learning","authors":"Siyan Liu, Dawei Lu, D. Ricciuto, A. Walker","doi":"10.1109/ICDMW58026.2022.00145","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00145","url":null,"abstract":"Terrestrial ecosystems play a central role in the global carbon cycle and affect climate change. However, our predictive understanding of these systems is still limited due to their complexity and uncertainty about how key drivers and their legacy effects influence carbon fluxes. Here, we propose an interpretable Long Short-Term Memory (iLSTM) network for predicting net ecosystem CO2 exchange (NEE) and interpreting the influence on the NEE prediction from environmental drivers and their memory effects. We consider five drivers and apply the method to three forest sites in the United States. Besides performing the prediction in each site, we also conduct transfer learning by using the iLSTM model trained in one site to predict at other sites. Results show that the iLSTM model produces good NEE predictions for all three sites and, more importantly, it provides reasonable interpretations on the input driver's importance as well as their temporal importance on the NEE prediction. Additionally, the iLSTM model demonstrates good across-site transferability in terms of both prediction accuracy and interpretability. The transferability can improve the NEE prediction in unobserved forest sites, and the interpretability advances our predictive understanding and guides process-based model development.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"R-30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126631298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Transfer Tensor Factorization for Multi-View Learning 多视图学习的深度传递张量分解
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00067
Penghao Jiang, Ke Xin, Chunxi Li
This paper studies the data sparsity problem in multi-view learning. To solve data sparsity problem in multi-view ratings, we propose a generic architecture of deep transfer tensor factorization (DTTF) by integrating deep learning and cross-domain tensor factorization, where the side information is embedded to provide effective compensation for the tensor sparsity. Then we exhibit instantiation of our architecture by combining stacked denoising autoencoder (SDAE) and CANDE-COMPIPARAFAC (CP) tensor factorization in both source and target domains, where the side information of both users and items is tightly coupled with the sparse multi-view ratings and the latent factors are learned based on the joint optimization. We tightly couple the multi-view ratings and the side information to improve cross-domain tensor factorization based recommendations. Experimental results on real-world datasets demonstrate that our DTTF schemes outperform state-of-the-art methods on multi-view rating predictions.
研究了多视图学习中的数据稀疏性问题。为了解决多视图评级中的数据稀疏性问题,我们提出了一种融合深度学习和跨域张量分解的深度传递张量分解(deep transfer tensor factorization, DTTF)通用架构,其中嵌入了侧信息,为张量稀疏性提供了有效的补偿。然后,我们在源域和目标域中结合堆栈去噪自编码器(SDAE)和CANDE-COMPIPARAFAC (CP)张量分解,展示了我们的架构实例化,其中用户和项目的侧信息与稀疏多视图评级紧密耦合,并基于联合优化学习潜在因素。我们将多视图评分和侧信息紧密耦合,以改进基于跨域张量分解的推荐。在真实数据集上的实验结果表明,我们的DTTF方案在多视图评级预测方面优于最先进的方法。
{"title":"Deep Transfer Tensor Factorization for Multi-View Learning","authors":"Penghao Jiang, Ke Xin, Chunxi Li","doi":"10.1109/ICDMW58026.2022.00067","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00067","url":null,"abstract":"This paper studies the data sparsity problem in multi-view learning. To solve data sparsity problem in multi-view ratings, we propose a generic architecture of deep transfer tensor factorization (DTTF) by integrating deep learning and cross-domain tensor factorization, where the side information is embedded to provide effective compensation for the tensor sparsity. Then we exhibit instantiation of our architecture by combining stacked denoising autoencoder (SDAE) and CANDE-COMPIPARAFAC (CP) tensor factorization in both source and target domains, where the side information of both users and items is tightly coupled with the sparse multi-view ratings and the latent factors are learned based on the joint optimization. We tightly couple the multi-view ratings and the side information to improve cross-domain tensor factorization based recommendations. Experimental results on real-world datasets demonstrate that our DTTF schemes outperform state-of-the-art methods on multi-view rating predictions.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127074551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AARS: A novel adaptive archive-based efficient counting method for machine learning applications AARS:一种用于机器学习应用的基于档案的新型自适应高效计数方法
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00085
Sajib K. Biswas, Pranab K. Muhuri, Uttam K. Roy
For many machine learning methods, while dealing with problems such as classification, clustering, prediction, and association rule mining, counting the occurrences of given queries plays a crucial role. However, these methods, which usually function in two different steps, i.e., learning and sampling, become impractical for large datasets due to computational costs or excessive memory consumption. Therefore, this paper proposes a novel approach to handle the counting queries. The proposed method is an adaptive archive-based method that offers efficient archiving with reduced computational time and moderate mem-ory requirements. We conduct numerous experiments to show the performance and scalability of the proposed approach on random queries, learning probabilistic networks, and association rule mining. From experimental results, we see that our proposed method outperforms the previously proposed ADtree, Bitmap and Radix strategies when applied to the datasets with higher dimensions and a large set of observations.
对于许多机器学习方法,在处理分类、聚类、预测和关联规则挖掘等问题时,计算给定查询的出现次数起着至关重要的作用。然而,这些方法通常分为两个不同的步骤,即学习和采样,由于计算成本或过多的内存消耗,对于大型数据集来说变得不切实际。因此,本文提出了一种处理计数查询的新方法。该方法是一种基于自适应归档的方法,在减少计算时间和适度的内存需求的情况下提供了有效的归档。我们进行了大量的实验来证明所提出的方法在随机查询、学习概率网络和关联规则挖掘方面的性能和可扩展性。从实验结果来看,我们所提出的方法在应用于具有更高维度和大量观测数据集的数据集时优于先前提出的ADtree, Bitmap和Radix策略。
{"title":"AARS: A novel adaptive archive-based efficient counting method for machine learning applications","authors":"Sajib K. Biswas, Pranab K. Muhuri, Uttam K. Roy","doi":"10.1109/ICDMW58026.2022.00085","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00085","url":null,"abstract":"For many machine learning methods, while dealing with problems such as classification, clustering, prediction, and association rule mining, counting the occurrences of given queries plays a crucial role. However, these methods, which usually function in two different steps, i.e., learning and sampling, become impractical for large datasets due to computational costs or excessive memory consumption. Therefore, this paper proposes a novel approach to handle the counting queries. The proposed method is an adaptive archive-based method that offers efficient archiving with reduced computational time and moderate mem-ory requirements. We conduct numerous experiments to show the performance and scalability of the proposed approach on random queries, learning probabilistic networks, and association rule mining. From experimental results, we see that our proposed method outperforms the previously proposed ADtree, Bitmap and Radix strategies when applied to the datasets with higher dimensions and a large set of observations.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"35 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126097048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using spatial data and cluster analysis to automatically detect non-trivial relationships between environmental transgressors 利用空间数据和聚类分析自动检测环境违规者之间的非琐碎关系
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00022
José Alberto Sousa Torres, Paulo Henrique dos Santos, Daniel Alves da Silva, C. E. L. Veiga, Márcio Bastos Medeiros, Guilherme Fay Verqara, Fábio L. L. Mendonça, Rafael Timóteo de Sousa Júnior
The Amazon Rainforest is the most significant biodiversi-ty reserve on the planet. It plays a central role in combating global warming and climate change on the Earth. De-spite its importance, in 2021, the illegal deforestation process in the Brazilian Amazon rainforest had the worst year in a decade. The data show that more than 10,000 kilometers of native forest were destroyed that year-an increase of 29% compared to 2020. To fight against the action of deforesters, Brazilian environmental inspection agencies imposed more than 14 billion dollars in environmental fines in recent decades. However, it has not effectively reduced deforestation as only 4% of this amount was effectively collected-not inhibiting lawbreakers from deforesting. This is due to the difficulty of identifying the real transgressors, who use scapegoats to hide their crimes. The main objective of this paper is to propose an approach to find the real environmental transgressors through the analysis of data related to the fines imposed by Brazilian governmental agencies in the last three decades. We propose a method that employ clustering techniques in geo-graphic and temporal data extracted from fines to identify non-trivial correlations between scapegoats and large landowners. The automatically identified links were load-ed into a graph analysis database for accuracy assessment. The observed results were positive and indicated that this strategy could effectively identify the real culprits.
亚马逊雨林是地球上最重要的生物多样性保护区。它在应对全球变暖和地球气候变化方面发挥着核心作用。尽管它很重要,但在2021年,巴西亚马逊雨林的非法砍伐过程是十年来最严重的一年。数据显示,那一年有超过1万公里的原始森林被破坏,比2020年增加了29%。为了打击毁林者的行为,近几十年来,巴西环境检查机构征收了140多亿美元的环境罚款。然而,它并没有有效地减少森林砍伐,因为只有4%的森林被有效收集,这并没有阻止不法分子砍伐森林。这是因为很难识别真正的罪犯,他们用替罪羊来掩盖自己的罪行。本文的主要目的是提出一种方法,通过分析与巴西政府机构在过去三十年中施加的罚款有关的数据,找到真正的环境违规者。我们提出了一种方法,在从罚款中提取的地理和时间数据中使用聚类技术来识别替罪羊和大土地所有者之间的非琐碎相关性。自动识别的链接被加载到图形分析数据库中进行准确性评估。观察结果是积极的,表明该策略可以有效地识别真正的罪魁祸首。
{"title":"Using spatial data and cluster analysis to automatically detect non-trivial relationships between environmental transgressors","authors":"José Alberto Sousa Torres, Paulo Henrique dos Santos, Daniel Alves da Silva, C. E. L. Veiga, Márcio Bastos Medeiros, Guilherme Fay Verqara, Fábio L. L. Mendonça, Rafael Timóteo de Sousa Júnior","doi":"10.1109/ICDMW58026.2022.00022","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00022","url":null,"abstract":"The Amazon Rainforest is the most significant biodiversi-ty reserve on the planet. It plays a central role in combating global warming and climate change on the Earth. De-spite its importance, in 2021, the illegal deforestation process in the Brazilian Amazon rainforest had the worst year in a decade. The data show that more than 10,000 kilometers of native forest were destroyed that year-an increase of 29% compared to 2020. To fight against the action of deforesters, Brazilian environmental inspection agencies imposed more than 14 billion dollars in environmental fines in recent decades. However, it has not effectively reduced deforestation as only 4% of this amount was effectively collected-not inhibiting lawbreakers from deforesting. This is due to the difficulty of identifying the real transgressors, who use scapegoats to hide their crimes. The main objective of this paper is to propose an approach to find the real environmental transgressors through the analysis of data related to the fines imposed by Brazilian governmental agencies in the last three decades. We propose a method that employ clustering techniques in geo-graphic and temporal data extracted from fines to identify non-trivial correlations between scapegoats and large landowners. The automatically identified links were load-ed into a graph analysis database for accuracy assessment. The observed results were positive and indicated that this strategy could effectively identify the real culprits.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128219766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE International Conference on Data Mining Workshops (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1