2022 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

英文中文

Modeling Non-deterministic Human Behaviors in Discrete Food Choices 离散食物选择中的非确定性人类行为建模

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00131

Andrew Starnes, Anton Dereventsov, E. S. Blazek, Folasade Phillips

We establish a non-deterministic model that predicts a user's food preferences from their demographic information. Our simulator is based on NHANES dataset and domain expert knowledge in the form of established behavioral studies. Our model can be used to generate an arbitrary amount of synthetic datapoints that are similar in distribution to the original dataset and align with behavioral science expectations. Such a simulator can be used in a variety of machine learning tasks and especially in applications requiring human behavior prediction.

我们建立了一个非确定性模型，根据用户的人口统计信息预测他们的食物偏好。我们的模拟器基于NHANES数据集和已建立的行为研究形式的领域专家知识。我们的模型可用于生成任意数量的合成数据点，这些数据点在分布上与原始数据集相似，并符合行为科学的期望。这样的模拟器可以用于各种机器学习任务，特别是在需要人类行为预测的应用中。

引用次数: 0

Efficient Distributed Algorithms for Minimum Spanning Tree in Dense Graphs 密集图中最小生成树的高效分布式算法

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00106

M. Bateni, Morteza Monemizadeh, Kees Voorintholt

In recent years, the Massively Parallel Computation (MPC) model capturing the MapReduce framework has become the de facto standard model for large-scale data analysis, given the ubiquity of efficient and affordable cloud implementations. In this model, an input of size $m$ is initially distributed among $t$ machines, each with a local space of size $s$. Computation proceeds in synchronous rounds in which each machine performs arbitrary local computation on its data and then sends messages to other machines. In this paper, we study the Minimum Spanning Tree (MST) problem for dense graphs in the MPC model. We say a graph $G(V, E)$ is relatively dense if $m=Theta(n^{1+c})$ where $n=vert Vvert$ is the number of vertices, $m=vert Evert$ is the number of edges in this graph, and $0 < cleq 1$. We develop the first work- and space-efficient MPC algorithm that with high probability computes an MST of $G$ using $lceillogfrac{c}{epsilon}rceil+1$ rounds of communication. As an MPC algorithm, our algorithm uses $t=O(n^{c-epsilon})$ machines each one having local storage of size $s=O(n^{1+epsilon})$ for any $0 < epsilonleq c$. Indeed, not only is this algorithm very simple and easy to implement, it also simultaneously achieves optimal total work, per-machine space, and number of rounds.

近年来，大规模并行计算(MPC)模型捕获MapReduce框架已经成为大规模数据分析事实上的标准模型，考虑到高效和负担得起的云实现无处不在。在该模型中，大小为$m$的输入初始分布在$t$台机器上，每台机器的局部空间大小为$s$。计算以同步轮进行，每台机器对其数据执行任意本地计算，然后向其他机器发送消息。本文研究了MPC模型中密集图的最小生成树(MST)问题。我们说一个图$G(V, E)$是相对密集的，如果$m=Theta(n^{1+c})$$n=vert Vvert$是顶点的数量，$m=vert Evert$是这个图的边的数量，$0 < cleq 1$。我们开发了第一个工作和空间效率高的MPC算法，该算法使用$lceillogfrac{c}{epsilon}rceil+1$轮通信以高概率计算出$G$的MST。作为MPC算法，我们的算法使用$t=O(n^{c-epsilon})$机器，每台机器的本地存储大小为$s=O(n^{1+epsilon})$，用于任何$0 < epsilonleq c$。实际上，该算法不仅非常简单且易于实现，而且还同时实现了最优的总工作量、每台机器的空间和轮数。

{"title":"Efficient Distributed Algorithms for Minimum Spanning Tree in Dense Graphs","authors":"M. Bateni, Morteza Monemizadeh, Kees Voorintholt","doi":"10.1109/ICDMW58026.2022.00106","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00106","url":null,"abstract":"In recent years, the Massively Parallel Computation (MPC) model capturing the MapReduce framework has become the de facto standard model for large-scale data analysis, given the ubiquity of efficient and affordable cloud implementations. In this model, an input of size $m$ is initially distributed among $t$ machines, each with a local space of size $s$. Computation proceeds in synchronous rounds in which each machine performs arbitrary local computation on its data and then sends messages to other machines. In this paper, we study the Minimum Spanning Tree (MST) problem for dense graphs in the MPC model. We say a graph $G(V, E)$ is relatively dense if $m=Theta(n^{1+c})$ where $n=vert Vvert$ is the number of vertices, $m=vert Evert$ is the number of edges in this graph, and $0 < cleq 1$. We develop the first work- and space-efficient MPC algorithm that with high probability computes an MST of $G$ using $lceillogfrac{c}{epsilon}rceil+1$ rounds of communication. As an MPC algorithm, our algorithm uses $t=O(n^{c-epsilon})$ machines each one having local storage of size $s=O(n^{1+epsilon})$ for any $0 < epsilonleq c$. Indeed, not only is this algorithm very simple and easy to implement, it also simultaneously achieves optimal total work, per-machine space, and number of rounds.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131828476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepDive: Deep Latent Factor Model for Enhancing Diversity in Recommender Systems DeepDive:用于增强推荐系统多样性的深层潜在因素模型

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00031

Kriti Kumar, A. Majumdar, M. Chandra

Most collaborative filtering techniques concentrate on increasing the accuracy of business-to-customer recommender systems. Emphasis on accuracy alone leads to repetitive recommendations based on user's past preferences; such predictions pose a problem from both business and user's perspective as they fail to recommend niche items and maintain the user's interest. Incorporating diversity in recommendations can overcome these issues. Most prior studies include diversity by randomizing the item-set predicted by the collaborating filtering technique. These techniques do not have control over the accuracy vs. diversity trade-off; one needs to be mindful that a drastic loss in accuracy is not acceptable from the recommender system. Our work proposes a deep latent factor model with a diversity cost/penalty that allows us to control the trade-off between diversity and accuracy. Experimental results obtained with the Movielens dataset demonstrate the superior performance of our proposed method in providing relevant, novel, and diverse recommendations compared to state-of-the-art techniques; with a slight drop in accuracy, our proposed method provides an improvement in different established measures of diversity.

大多数协同过滤技术集中于提高企业对客户推荐系统的准确性。只强调准确性会导致基于用户过去偏好的重复推荐;从商业和用户的角度来看，这种预测都存在问题，因为它们无法推荐利基商品并保持用户的兴趣。在建议中加入多样性可以克服这些问题。以往的研究大多通过对协同过滤技术预测的项目集进行随机化来实现多样性。这些技术无法控制准确性与多样性之间的权衡;需要注意的是，对于推荐系统来说，准确度的大幅下降是不可接受的。我们的工作提出了一个具有多样性成本/惩罚的深层潜在因素模型，使我们能够控制多样性和准确性之间的权衡。使用Movielens数据集获得的实验结果表明，与最先进的技术相比，我们提出的方法在提供相关、新颖和多样化的推荐方面具有优越的性能;虽然准确度略有下降，但我们提出的方法对不同的已建立的多样性测量方法提供了改进。

{"title":"DeepDive: Deep Latent Factor Model for Enhancing Diversity in Recommender Systems","authors":"Kriti Kumar, A. Majumdar, M. Chandra","doi":"10.1109/ICDMW58026.2022.00031","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00031","url":null,"abstract":"Most collaborative filtering techniques concentrate on increasing the accuracy of business-to-customer recommender systems. Emphasis on accuracy alone leads to repetitive recommendations based on user's past preferences; such predictions pose a problem from both business and user's perspective as they fail to recommend niche items and maintain the user's interest. Incorporating diversity in recommendations can overcome these issues. Most prior studies include diversity by randomizing the item-set predicted by the collaborating filtering technique. These techniques do not have control over the accuracy vs. diversity trade-off; one needs to be mindful that a drastic loss in accuracy is not acceptable from the recommender system. Our work proposes a deep latent factor model with a diversity cost/penalty that allows us to control the trade-off between diversity and accuracy. Experimental results obtained with the Movielens dataset demonstrate the superior performance of our proposed method in providing relevant, novel, and diverse recommendations compared to state-of-the-art techniques; with a slight drop in accuracy, our proposed method provides an improvement in different established measures of diversity.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"59 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131830047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TPE-AutoClust: A Tree-based Pipline Ensemble Framework for Automated Clustering tpe - autocluster:用于自动聚类的基于树的管道集成框架

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00149

Radwa El Shawi, S. Sakr

Novel technologies in automated machine learning ease the complexity of building well-performed machine learning pipelines. However, these are usually restricted to supervised learning tasks such as classification and regression, while unsu-pervised learning, particularly clustering, remains a largely un-explored problem due to the ambiguity involved when evaluating the clustering solutions. Motivated by this shortcoming, in this paper, we introduce TPE-AutoClust, a genetic programming-based automated machine learning framework for clustering. TPE-AutoCl ust optimizes a series of feature preprocessors and machine learning models to optimize the performance on an unsupervised clustering task. TPE-AutoClust mainly consists of three main phases: meta-learning phase, optimization phase and clustering ensemble construction phase. The meta-learning phase suggests some instantiations of pipelines that are likely to perform well on a new dataset. These pipelines are used to warmstart the optimization phase that adopts a multi-objective optimization technique to select pipelines based on the Pareto front of the trade-off between the pipeline length and performance. The ensemble construction phase develops a collaborative mechanism based on a clustering ensemble to combine optimized pipelines based on different internal cluster validity indices and construct a well-performing solution for a new dataset. The proposed framework is based on scikit-learn with 4 preprocessors and 6 clustering algorithms. Extensive experiments are conducted on 27 real and synthetic benchmark datasets to validate the superiority of TPE-AutoCl ust. The results show that TPE-AutoClust outperforms the state-of-the-art techniques for building automated clustering solutions.

自动化机器学习中的新技术简化了构建性能良好的机器学习管道的复杂性。然而，这些通常仅限于监督学习任务，如分类和回归，而非监督学习，特别是聚类，由于在评估聚类解决方案时涉及的模糊性，仍然是一个很大程度上未被探索的问题。基于这一缺点，本文引入了基于遗传编程的自动机器学习聚类框架TPE-AutoClust。TPE-AutoCl优化了一系列特征预处理器和机器学习模型，以优化无监督聚类任务的性能。TPE-AutoClust主要包括三个主要阶段:元学习阶段、优化阶段和聚类集成构建阶段。元学习阶段提出了一些可能在新数据集上表现良好的管道实例。这些管道用于启动优化阶段，该阶段采用基于管道长度和性能之间权衡的Pareto前的多目标优化技术来选择管道。集成构建阶段开发了基于聚类集成的协作机制，将基于不同内部聚类有效性指标的优化管道组合在一起，为新数据集构建性能良好的解决方案。该框架基于scikit-learn，包含4个预处理器和6种聚类算法。在27个真实和合成的基准数据集上进行了大量的实验，验证了TPE-AutoCl - ust的优越性。结果表明，TPE-AutoClust在构建自动化集群解决方案方面优于最先进的技术。

{"title":"TPE-AutoClust: A Tree-based Pipline Ensemble Framework for Automated Clustering","authors":"Radwa El Shawi, S. Sakr","doi":"10.1109/ICDMW58026.2022.00149","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00149","url":null,"abstract":"Novel technologies in automated machine learning ease the complexity of building well-performed machine learning pipelines. However, these are usually restricted to supervised learning tasks such as classification and regression, while unsu-pervised learning, particularly clustering, remains a largely un-explored problem due to the ambiguity involved when evaluating the clustering solutions. Motivated by this shortcoming, in this paper, we introduce TPE-AutoClust, a genetic programming-based automated machine learning framework for clustering. TPE-AutoCl ust optimizes a series of feature preprocessors and machine learning models to optimize the performance on an unsupervised clustering task. TPE-AutoClust mainly consists of three main phases: meta-learning phase, optimization phase and clustering ensemble construction phase. The meta-learning phase suggests some instantiations of pipelines that are likely to perform well on a new dataset. These pipelines are used to warmstart the optimization phase that adopts a multi-objective optimization technique to select pipelines based on the Pareto front of the trade-off between the pipeline length and performance. The ensemble construction phase develops a collaborative mechanism based on a clustering ensemble to combine optimized pipelines based on different internal cluster validity indices and construct a well-performing solution for a new dataset. The proposed framework is based on scikit-learn with 4 preprocessors and 6 clustering algorithms. Extensive experiments are conducted on 27 real and synthetic benchmark datasets to validate the superiority of TPE-AutoCl ust. The results show that TPE-AutoClust outperforms the state-of-the-art techniques for building automated clustering solutions.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134143316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MAISON - Multimodal AI-based Sensor platform for Older Individuals MAISON -老年人多模态人工智能传感器平台

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00040

A. Abedi, Faranak Dayyani, Charlene H. Chu, Shehroz S. Khan

There is a global aging population requiring the need for the right tools that can enable older adults' greater independence and the ability to age at home, as well as assist healthcare workers. It is feasible to achieve this objective by building predictive models that assist healthcare workers in monitoring and analyzing older adults' behavioral, functional, and psychological data. To develop such models, a large amount of multimodal sensor data is typically required. In this paper, we propose MAISON, a scalable cloud-based platform of commercially available smart devices capable of collecting desired multimodal sensor data from older adults and patients living in their own homes. The MAISON platform is novel due to its ability to collect a greater variety of data modalities than the existing platforms, as well as its new features that result in seamless data collection and ease of use for older adults who may not be digitally literate. We demonstrated the feasibility of the MAISON platform with two older adults discharged home from a large rehabilitation center. The results indicate that the MAISON platform was able to collect and store sensor data in a cloud without functional glitches or performance degradation. This paper will also discuss the challenges faced during the development of the platform and data collection in the homes of the older adults. MAISON is a novel platform designed to collect multimodal data and facilitate the development of predictive models for detecting key health indicators, including social isolation, depression, and functional decline, and is feasible to use with older adults in the community.

全球人口老龄化要求需要合适的工具，使老年人能够更大的独立性和在家养老的能力，并协助医护人员。通过建立预测模型来帮助医护人员监测和分析老年人的行为、功能和心理数据，实现这一目标是可行的。为了建立这样的模型，通常需要大量的多模态传感器数据。在本文中，我们提出了MAISON，这是一个可扩展的基于云的商用智能设备平台，能够从生活在自己家中的老年人和患者那里收集所需的多模态传感器数据。MAISON平台是新颖的，因为它能够收集比现有平台更多种类的数据模式，以及它的新功能，可以无缝地收集数据，并且对于可能不懂数字的老年人来说很容易使用。我们用两名从大型康复中心出院回家的老年人来证明MAISON平台的可行性。结果表明，MAISON平台能够在云中收集和存储传感器数据，而不会出现功能故障或性能下降。本文还将讨论平台开发和老年人家庭数据收集过程中面临的挑战。MAISON是一个新颖的平台，旨在收集多模式数据，促进预测模型的发展，以检测关键的健康指标，包括社会孤立、抑郁和功能衰退，并且在社区老年人中使用是可行的。

{"title":"MAISON - Multimodal AI-based Sensor platform for Older Individuals","authors":"A. Abedi, Faranak Dayyani, Charlene H. Chu, Shehroz S. Khan","doi":"10.1109/ICDMW58026.2022.00040","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00040","url":null,"abstract":"There is a global aging population requiring the need for the right tools that can enable older adults' greater independence and the ability to age at home, as well as assist healthcare workers. It is feasible to achieve this objective by building predictive models that assist healthcare workers in monitoring and analyzing older adults' behavioral, functional, and psychological data. To develop such models, a large amount of multimodal sensor data is typically required. In this paper, we propose MAISON, a scalable cloud-based platform of commercially available smart devices capable of collecting desired multimodal sensor data from older adults and patients living in their own homes. The MAISON platform is novel due to its ability to collect a greater variety of data modalities than the existing platforms, as well as its new features that result in seamless data collection and ease of use for older adults who may not be digitally literate. We demonstrated the feasibility of the MAISON platform with two older adults discharged home from a large rehabilitation center. The results indicate that the MAISON platform was able to collect and store sensor data in a cloud without functional glitches or performance degradation. This paper will also discuss the challenges faced during the development of the platform and data collection in the homes of the older adults. MAISON is a novel platform designed to collect multimodal data and facilitate the development of predictive models for detecting key health indicators, including social isolation, depression, and functional decline, and is feasible to use with older adults in the community.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134090460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Unsupervised DeepView: Global Uncertainty Visualization for High Dimensional Data 无监督深度视图:高维数据的全局不确定性可视化

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00086

Carina Newen, Emmanuel Müller

In recent years, more and more visualization methods for explanations of artificial intelligence have been proposed that focus on untangling black box models for single instances of the data set. While the focus often lies on supervised learning algorithms, the study of uncertainty estimations in the unsupervised domain for high-dimensional data sets in the explainability domain has been neglected so far. As a result, existing visualization methods struggle to visualize global uncertainty patterns over whole datasets. We propose Unsupervised DeepView, the first global uncertainty visualization method for high dimensional data based on a novel unsupervised proxy for local uncertainties. In this paper, we exploit the mathematical notion of local intrinsic dimensionality as a measure of local data complexity. As a label-agnostic measure of model uncertainty in unsupervised machine learning, it shows two highly desirable features: It can be used for global structure visualization as well as for the detection of local adversarials. In our empirical evaluation, we demonstrate its ability both in visualizations and quantitative analysis for unsupervised models on multiple datasets.

近年来，越来越多用于解释人工智能的可视化方法被提出，这些方法的重点是解开数据集单个实例的黑箱模型。在监督学习算法的研究中，对高维数据集在可解释性域中的无监督域的不确定性估计的研究一直被忽视。因此，现有的可视化方法难以在整个数据集上可视化全局不确定性模式。我们提出了Unsupervised DeepView，这是第一个基于局部不确定性的新型无监督代理的高维数据全局不确定性可视化方法。在本文中，我们利用局部固有维数的数学概念作为局部数据复杂性的度量。作为无监督机器学习中模型不确定性的标签不可知度量，它显示了两个非常理想的特征:它可以用于全局结构可视化以及局部对手的检测。在我们的经验评估中，我们证明了它在多个数据集上的无监督模型的可视化和定量分析方面的能力。

{"title":"Unsupervised DeepView: Global Uncertainty Visualization for High Dimensional Data","authors":"Carina Newen, Emmanuel Müller","doi":"10.1109/ICDMW58026.2022.00086","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00086","url":null,"abstract":"In recent years, more and more visualization methods for explanations of artificial intelligence have been proposed that focus on untangling black box models for single instances of the data set. While the focus often lies on supervised learning algorithms, the study of uncertainty estimations in the unsupervised domain for high-dimensional data sets in the explainability domain has been neglected so far. As a result, existing visualization methods struggle to visualize global uncertainty patterns over whole datasets. We propose Unsupervised DeepView, the first global uncertainty visualization method for high dimensional data based on a novel unsupervised proxy for local uncertainties. In this paper, we exploit the mathematical notion of local intrinsic dimensionality as a measure of local data complexity. As a label-agnostic measure of model uncertainty in unsupervised machine learning, it shows two highly desirable features: It can be used for global structure visualization as well as for the detection of local adversarials. In our empirical evaluation, we demonstrate its ability both in visualizations and quantitative analysis for unsupervised models on multiple datasets.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132889585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-purpose Recommender Platform using Perceiver IO 使用percepver IO的多用途推荐平台

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00126

Ali Cevahir, Kentaro Kanada

Web services usually require many different types of recommender systems using large amount of user log and content data, in order to provide personalized content to their customers. Different recommenders may share the same customer-base or cross-use models/data. It is challenging to design different models for each recommendation task. In this work, we propose a general-purpose framework for various recommendation tasks based on Perceiver IO model. Perceiver lOis a general ma-chine learning architecture based on transformer-style attention modules, which helps eliminating feature engineering for various tasks. Different type of recommenders can be developed with minimal modifications and models can be transferred among dif- ferent tasks. Our experiments with a variety of recommendation scenarios confirm that our framework is able to handle those tasks while achieving state-of-the-art accuracy.

Web服务通常需要使用大量用户日志和内容数据的许多不同类型的推荐系统，以便向客户提供个性化的内容。不同的推荐人可能共享相同的客户基础或交叉使用的模型/数据。为每个推荐任务设计不同的模型是一个挑战。在这项工作中，我们提出了一个基于感知器IO模型的各种推荐任务的通用框架。感知器lOis是一种基于变压器式注意力模块的通用机器学习架构，有助于消除各种任务的特征工程。不同类型的推荐可以用最小的修改开发，模型可以在不同的任务之间转移。我们对各种推荐场景的实验证实，我们的框架能够处理这些任务，同时达到最先进的精度。

引用次数: 0

A Recommendation System Framework to Generalize AutoRec and Neural Collaborative Filtering 一种推广自动识别和神经协同过滤的推荐系统框架

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00151

Ramin Raziperchikolaei, Young-joo Chung

AutoRec and neural collaborative filtering (NCF) are two widely used neural network-based frameworks in the recommendation system literature. In this paper, we show that these two apparently very different frameworks have a lot in common. We propose a general neural network-based frame-work, which gives us flexibility in choosing elements in the input sources, prediction functions, etc. Then, we show that AutoRec and NCF are special forms of our generalized framework. In our experimental results, first, we compare different variants of NCF and Autorec. Then, we indicate that it is necessary to use our general framework since there is no specific structure that performs well in all datasets. Finally, we show that by choosing the right elements, our framework outperforms the state-of-the-art methods with complicated structures.

AutoRec和神经协同过滤(NCF)是推荐系统文献中应用最广泛的两种基于神经网络的框架。在本文中，我们展示了这两个明显不同的框架有很多共同点。我们提出了一个通用的基于神经网络的框架，它使我们能够灵活地选择输入源中的元素，预测函数等。然后，我们证明了AutoRec和NCF是我们的广义框架的特殊形式。在我们的实验结果中，首先，我们比较了NCF和Autorec的不同变体。然后，我们指出有必要使用我们的通用框架，因为没有在所有数据集中表现良好的特定结构。最后，我们表明，通过选择正确的元素，我们的框架优于具有复杂结构的最先进的方法。

引用次数: 0

Scalable Joins over Big Data Streams: Actual and Future Research Trends 大数据流上的可扩展连接:实际和未来的研究趋势

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00132

A. Cuzzocrea

Joins are at the basis of a plethora of big data analytics tools over massive big data streams. Developed in the context of static data sets, joins have emerged as of tremendous interest in the context of streaming data sets, due to their versatility in a wide range of applicative settings, ranging from environmental networks to logistics systems, from smart city applications to healthcare systems, from energy management systems to prognostic tools, and so forth. Joins over big data streams has traditionally attracted the attention of a growing part of the database and data mining community, then landing in the wider big data community. Following these considerations, this paper proposes a critical review of actual and future trends in the context of scalable joins over big data streams.

连接是海量大数据流上大量大数据分析工具的基础。连接是在静态数据集的背景下发展起来的，由于其在广泛的应用环境中的多功能性，从环境网络到物流系统，从智慧城市应用到医疗保健系统，从能源管理系统到预测工具等，因此在流数据集的背景下引起了极大的兴趣。传统上，大数据流上的join吸引了越来越多的数据库和数据挖掘社区的关注，然后在更广泛的大数据社区落地。根据这些考虑，本文对大数据流上可扩展连接的实际和未来趋势进行了批判性的回顾。

引用次数: 0

Human Mobility Driven Modeling of an Infectious Disease 人类活动驱动的传染病模型

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00155

Ismael Villanueva-Miranda, M. Hossain, Monika Akbar

In conventional disease models, disease properties are dominant parameters (e.g., infection rate, incubation pe-riod). As seen in the recent literature on infectious diseases, human behavior - particularly mobility - plays a crucial role in spreading diseases. This paper proposes an epidemiological model named SEIRD+m that considers human mobility instead of modeling disease properties alone. SEIRD+m relies on the core deterministic epidemic model SEIR (Susceptible, Exposed, Infected, and Recovered), adds a new compartment D - Dead, and enhances each SEIRD component by human mobility information (such as time, location, and movements) retrieved from cell-phone data collected by SafeGraph. We demonstrate a way to reduce the number of infections and deaths due to COVID-19 by restricting mobility on specific Census Block Groups (CBGs) detected as COVID-19 hotspots. A case study in this paper depicts that a reduction of mobility by 50 % could help reduce the number of infections and deaths in significant percentages in different population groups based on race, income, and age.

在传统的疾病模型中，疾病特性是主要参数(例如，感染率、潜伏期)。从最近关于传染病的文献中可以看出，人类行为——尤其是流动性——在疾病传播中起着至关重要的作用。本文提出了一个流行病学模型SEIRD+m，该模型考虑了人类的流动性，而不是单独建模疾病特性。SEIRD+m依赖于核心确定性流行病模型SEIR(易感、暴露、感染和恢复)，增加了一个新的隔间D -死亡，并通过从SafeGraph收集的手机数据中检索到的人类移动信息(如时间、位置和运动)增强了每个SEIRD组件。我们展示了一种通过限制被检测为COVID-19热点的特定人口普查街区群体(cbg)的流动性来减少COVID-19感染和死亡人数的方法。本文中的一个案例研究表明，流动性减少50%有助于根据种族、收入和年龄减少不同人口群体的感染和死亡人数。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀