首页 > 最新文献

2018 IEEE International Conference on Data Mining (ICDM)最新文献

英文 中文
Deep Semantic Correlation Learning Based Hashing for Multimedia Cross-Modal Retrieval 基于深度语义相关学习的多媒体跨模态检索哈希算法
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00027
Xiaolong Gong, Linpeng Huang, Fuwei Wang
For many large-scale multimedia datasets and web contents, the nearest neighbor search methods based on the hashing strategy for cross-modal retrieval have attracted considerable attention due to its fast query speed and low storage cost. Most existing hashing methods try to map different modalities to Hamming embedding in a supervised way where the semantic information comes from a large manual label matrix and each sample in different modalities is usually encoded by a sparse label vector. However, previous studies didn't address the semantic correlation learning challenges and couldn't make the best use of the prior semantic information. Therefore, they cannot preserve the accurate semantic similarities and often degrade the performance of hashing function learning. To fill this gap, we firstly proposed a novel Deep Semantic Correlation learning based Hashing framework (DSCH) that generates unified hash codes in an end-to-end deep learning architecture for cross-modal retrieval task. The major contribution in this work is to effectively automatically construct the semantic correlation between data representation and demonstrate how to utilize correlation information to generate hash codes for new samples. In particular, DSCH integrates latent semantic embedding with a unified hash embedding to strengthen the similarity information among multiple modalities. Furthermore, additional graph regularization is employed in our framework, to capture the correspondences from the inter-modal and intra-modal. Our model simultaneously learns the semantic correlation and the unified hash codes, which enhances the effectiveness of cross-modal retrieval task. Experimental results show the superior accuracy of our proposed approach to several state-of-the-art cross-modality methods on two large datasets.
对于许多大型多媒体数据集和web内容,基于哈希策略的最近邻搜索方法因其查询速度快、存储成本低而受到广泛关注。大多数现有的哈希方法都试图以监督的方式将不同的模态映射到汉明嵌入,其中语义信息来自于一个大的手动标签矩阵,每个不同模态的样本通常由一个稀疏的标签向量编码。然而,以往的研究并没有解决语义相关学习的难题,也没有充分利用先验的语义信息。因此,它们不能保持准确的语义相似度,往往会降低哈希函数学习的性能。为了填补这一空白,我们首先提出了一种新的基于深度语义相关学习的哈希框架(DSCH),该框架在端到端深度学习架构中为跨模态检索任务生成统一的哈希码。该工作的主要贡献是有效地自动构建数据表示之间的语义相关性,并演示如何利用相关信息生成新样本的哈希码。特别是,DSCH将潜在语义嵌入与统一哈希嵌入相结合,增强了多模态之间的相似信息。此外,在我们的框架中使用了额外的图正则化,以捕获来自模态间和模态内的对应关系。该模型同时学习语义关联和统一哈希码,提高了跨模态检索任务的有效性。实验结果表明,我们提出的方法在两个大型数据集上优于几种最先进的交叉模态方法。
{"title":"Deep Semantic Correlation Learning Based Hashing for Multimedia Cross-Modal Retrieval","authors":"Xiaolong Gong, Linpeng Huang, Fuwei Wang","doi":"10.1109/ICDM.2018.00027","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00027","url":null,"abstract":"For many large-scale multimedia datasets and web contents, the nearest neighbor search methods based on the hashing strategy for cross-modal retrieval have attracted considerable attention due to its fast query speed and low storage cost. Most existing hashing methods try to map different modalities to Hamming embedding in a supervised way where the semantic information comes from a large manual label matrix and each sample in different modalities is usually encoded by a sparse label vector. However, previous studies didn't address the semantic correlation learning challenges and couldn't make the best use of the prior semantic information. Therefore, they cannot preserve the accurate semantic similarities and often degrade the performance of hashing function learning. To fill this gap, we firstly proposed a novel Deep Semantic Correlation learning based Hashing framework (DSCH) that generates unified hash codes in an end-to-end deep learning architecture for cross-modal retrieval task. The major contribution in this work is to effectively automatically construct the semantic correlation between data representation and demonstrate how to utilize correlation information to generate hash codes for new samples. In particular, DSCH integrates latent semantic embedding with a unified hash embedding to strengthen the similarity information among multiple modalities. Furthermore, additional graph regularization is employed in our framework, to capture the correspondences from the inter-modal and intra-modal. Our model simultaneously learns the semantic correlation and the unified hash codes, which enhances the effectiveness of cross-modal retrieval task. Experimental results show the superior accuracy of our proposed approach to several state-of-the-art cross-modality methods on two large datasets.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121938648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Coherent Graphical Lasso for Brain Network Discovery 脑网络发现连贯图形套索
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00191
Hang Yin, Xiangnan Kong, Xinyue Liu
In brain network discovery, researchers are interested in discovering brain regions (nodes) and functional connections (edges) between these regions from fMRI scan of human brain. Some recent works propose coherent models to address both of these sub-tasks. However, these approaches either suffer from mathematical inconsistency or fail to distinguish direct connections and indirect connections between the nodes. In this paper, we study the problem of collective discovery of coherent brain regions and direct connections between these regions. Each node of the brain network represents a brain region, i.e., a set of voxels in fMRI with coherent activities. Each edge denotes a direct dependency between two nodes. The discovered brain network represents a Gaussian graphical model that encodes conditional independence between the activities of different brain regions. We propose a novel model, called CGLasso, which combines Graphical Lasso (GLasso) and orthogonal non-negative matrix tri-factorization (ONMtF), to perform nodes discovery and edge detection simultaneously. We perform experiments on synthetic datasets with ground-truth. The results show that the proposed method performs better than the compared baselines in terms of four quantitative metrics. Besides, we also apply the proposed method and other baselines on the real ADHD-200 fMRI dataset. The results demonstrate that our method produces more meaningful networks comparing with other baseline methods.
在脑网络发现方面,研究人员对通过功能磁共振成像(fMRI)扫描发现脑区域(节点)和这些区域之间的功能连接(边缘)感兴趣。最近的一些工作提出了连贯的模型来解决这两个子任务。然而,这些方法要么存在数学上的不一致性,要么无法区分节点之间的直接连接和间接连接。在本文中,我们研究了集体发现连贯脑区和这些区域之间的直接联系的问题。大脑网络的每个节点代表一个大脑区域,即在fMRI中具有连贯活动的一组体素。每条边表示两个节点之间的直接依赖关系。发现的大脑网络代表了一个高斯图形模型,该模型编码了不同大脑区域活动之间的条件独立性。我们提出了一种新的模型,称为CGLasso,它结合了图形Lasso (GLasso)和正交非负矩阵三因子分解(ONMtF),同时进行节点发现和边缘检测。我们在合成数据集上进行实验。结果表明,该方法在四个定量指标上都优于比较基线。此外,我们还将该方法和其他基线应用于真实的ADHD-200 fMRI数据集。结果表明,与其他基线方法相比,我们的方法产生了更有意义的网络。
{"title":"Coherent Graphical Lasso for Brain Network Discovery","authors":"Hang Yin, Xiangnan Kong, Xinyue Liu","doi":"10.1109/ICDM.2018.00191","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00191","url":null,"abstract":"In brain network discovery, researchers are interested in discovering brain regions (nodes) and functional connections (edges) between these regions from fMRI scan of human brain. Some recent works propose coherent models to address both of these sub-tasks. However, these approaches either suffer from mathematical inconsistency or fail to distinguish direct connections and indirect connections between the nodes. In this paper, we study the problem of collective discovery of coherent brain regions and direct connections between these regions. Each node of the brain network represents a brain region, i.e., a set of voxels in fMRI with coherent activities. Each edge denotes a direct dependency between two nodes. The discovered brain network represents a Gaussian graphical model that encodes conditional independence between the activities of different brain regions. We propose a novel model, called CGLasso, which combines Graphical Lasso (GLasso) and orthogonal non-negative matrix tri-factorization (ONMtF), to perform nodes discovery and edge detection simultaneously. We perform experiments on synthetic datasets with ground-truth. The results show that the proposed method performs better than the compared baselines in terms of four quantitative metrics. Besides, we also apply the proposed method and other baselines on the real ADHD-200 fMRI dataset. The results demonstrate that our method produces more meaningful networks comparing with other baseline methods.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121962246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
EPAB: Early Pattern Aware Bayesian Model for Social Content Popularity Prediction 社会内容流行度预测的早期模式感知贝叶斯模型
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00175
Qitian Wu, Chaoqi Yang, Xiaofeng Gao, Peng He, Guihai Chen
The boom of information technology enables social platforms (like Twitter) to disseminate social content (like news) in an unprecedented rate, which makes early-stage prediction for social content popularity of great practical significance. However, most existing studies assume a long-term observation before prediction and suffer from limited precision for early-stage prediction due to insufficient observation. In this paper, we take a fresh perspective, and propose a novel early pattern aware Bayesian model. The early pattern representation, which stands for early time series normalized on future popularity, can address what we call early-stage indistinctiveness challenge. Then we use an expressive evolving function to fit the time series and estimate three interpretable coefficients characterizing temporal effect of observed series on future evolution. Furthermore, Bayesian network is leveraged to model the probabilistic relations among features, early indicators and early patterns. Experiments on three real-world social platforms (Twitter, Weibo and WeChat) show that under different evaluation metrics, our model outperforms other methods in early-stage prediction and possesses low sensitivity to observation time.
信息技术的繁荣使社交平台(如Twitter)以前所未有的速度传播社会内容(如新闻),这使得对社会内容流行程度的早期预测具有重要的现实意义。然而,现有的研究大多假设在预测前进行长期观测,由于观测不足,早期预测精度有限。本文从一个全新的视角,提出了一种新的早期模式感知贝叶斯模型。早期模式表示,即对未来流行度进行规范化的早期时间序列,可以解决我们所说的早期不确定性挑战。然后利用表达性演化函数对时间序列进行拟合,估计出三个表征观测序列对未来演化的时间效应的可解释系数。此外,利用贝叶斯网络对特征、早期指标和早期模式之间的概率关系进行建模。在三个真实社交平台(Twitter、微博和微信)上的实验表明,在不同的评价指标下,我们的模型在早期预测方面优于其他方法,并且对观测时间的敏感性较低。
{"title":"EPAB: Early Pattern Aware Bayesian Model for Social Content Popularity Prediction","authors":"Qitian Wu, Chaoqi Yang, Xiaofeng Gao, Peng He, Guihai Chen","doi":"10.1109/ICDM.2018.00175","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00175","url":null,"abstract":"The boom of information technology enables social platforms (like Twitter) to disseminate social content (like news) in an unprecedented rate, which makes early-stage prediction for social content popularity of great practical significance. However, most existing studies assume a long-term observation before prediction and suffer from limited precision for early-stage prediction due to insufficient observation. In this paper, we take a fresh perspective, and propose a novel early pattern aware Bayesian model. The early pattern representation, which stands for early time series normalized on future popularity, can address what we call early-stage indistinctiveness challenge. Then we use an expressive evolving function to fit the time series and estimate three interpretable coefficients characterizing temporal effect of observed series on future evolution. Furthermore, Bayesian network is leveraged to model the probabilistic relations among features, early indicators and early patterns. Experiments on three real-world social platforms (Twitter, Weibo and WeChat) show that under different evaluation metrics, our model outperforms other methods in early-stage prediction and possesses low sensitivity to observation time.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"48 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126051949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Summarizing Graphs at Multiple Scales: New Trends 多尺度图表总结:新趋势
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00141
Danai Koutra, Jilles Vreeken, F. Bonchi
Recent advances in computing resources have made it possible to collect enormous amounts of interconnected data, such as social media interactions, web activity, knowledge bases, product and service purchases, autonomous vehicle routing, smart home sensor data, and more. The massive scale and complexity of this data, however, not only vastly surpasses human processing power, but also goes beyond limitations with regard to computation and storage. That is, there is an urgent need for methods and tools that summarize large interconnected data to enable faster computations, storage reduction, interactive large-scale visualization and understanding, and pattern discovery. Network summarization-which aims to find a small representation of an original, larger graph-features a variety of methods with different goals and for different input data representations (e.g., attributed graphs, time-evolving or streaming graphs, heterogeneous graphs). The objective of this tutorial is to give a systematic overview of methods for summarizing and explaining graphs at different scales: the node-group level, the network level, and the multi-network level. We emphasize the current challenges, present real-world applications, and highlight the open research problems in this vibrant research area.
计算资源的最新进展使得收集大量相互关联的数据成为可能,例如社交媒体互动、网络活动、知识库、产品和服务购买、自动驾驶汽车路线、智能家居传感器数据等等。然而,这些数据的庞大规模和复杂性不仅大大超过了人类的处理能力,而且也超出了计算和存储方面的限制。也就是说,迫切需要总结大型互联数据的方法和工具,以实现更快的计算、存储减少、交互式大规模可视化和理解以及模式发现。网络摘要——旨在找到原始的、更大的图形的小表示——具有各种不同目标和不同输入数据表示的方法(例如,属性图、时间演化或流图、异构图)。本教程的目的是系统地概述总结和解释不同尺度上的图的方法:节点组级、网络级和多网络级。我们强调当前的挑战,呈现现实世界的应用,并强调在这个充满活力的研究领域开放的研究问题。
{"title":"Summarizing Graphs at Multiple Scales: New Trends","authors":"Danai Koutra, Jilles Vreeken, F. Bonchi","doi":"10.1109/ICDM.2018.00141","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00141","url":null,"abstract":"Recent advances in computing resources have made it possible to collect enormous amounts of interconnected data, such as social media interactions, web activity, knowledge bases, product and service purchases, autonomous vehicle routing, smart home sensor data, and more. The massive scale and complexity of this data, however, not only vastly surpasses human processing power, but also goes beyond limitations with regard to computation and storage. That is, there is an urgent need for methods and tools that summarize large interconnected data to enable faster computations, storage reduction, interactive large-scale visualization and understanding, and pattern discovery. Network summarization-which aims to find a small representation of an original, larger graph-features a variety of methods with different goals and for different input data representations (e.g., attributed graphs, time-evolving or streaming graphs, heterogeneous graphs). The objective of this tutorial is to give a systematic overview of methods for summarizing and explaining graphs at different scales: the node-group level, the network level, and the multi-network level. We emphasize the current challenges, present real-world applications, and highlight the open research problems in this vibrant research area.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"72 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123524154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Blockchain Data Analytics 区块链数据分析
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00013
C. Akcora, Murat Kantarcioglu, Y. Gel
Over the last couple of years, Bitcoin cryptocurrency and the Blockchain technology that forms the basis of Bitcoin have witnessed an unprecedented attention. Designed to facilitate a secure distributed platform without central regulation, Blockchain is heralded as a novel paradigm that will be as powerful as Big Data, Cloud Computing, and Machine Learning. The Blockchain technology garners an ever increasing interest of researchers in various domains that benefit from scalable cooperation among trust-less parties. As Blockchain data analytics further proliferates, a need to glean successful approaches and to disseminate them among a diverse body of data scientists became a critical task. As an inter-disciplinary team of researchers, our aim is to fill this vital role. In this tutorial, we offer a holistic view on Blockchain Data Analytics. Starting with the core components of Blockchain, we will discuss the state of art in Blockchain data analytics for privacy, security, finance, and management domains. We will share tutorial notes and further reading pointers on the tutorial website blockchaintutorial.github.io.
在过去的几年里,比特币加密货币和构成比特币基础的区块链技术受到了前所未有的关注。区块链旨在促进一个没有中央监管的安全分布式平台,被誉为一种新的范例,将与大数据、云计算和机器学习一样强大。区块链技术引起了各个领域研究人员越来越多的兴趣,这些领域受益于无信任各方之间的可扩展合作。随着区块链数据分析的进一步扩散,需要收集成功的方法并在不同的数据科学家群体中传播它们,这成为一项关键任务。作为一个跨学科的研究团队,我们的目标是填补这一重要角色。在本教程中,我们提供了区块链数据分析的整体视图。从区块链的核心组件开始,我们将讨论区块链数据分析在隐私、安全、金融和管理领域的最新进展。我们将在教程网站blockchaintutorial.github.io上分享教程笔记和进一步的阅读指针。
{"title":"Blockchain Data Analytics","authors":"C. Akcora, Murat Kantarcioglu, Y. Gel","doi":"10.1109/ICDM.2018.00013","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00013","url":null,"abstract":"Over the last couple of years, Bitcoin cryptocurrency and the Blockchain technology that forms the basis of Bitcoin have witnessed an unprecedented attention. Designed to facilitate a secure distributed platform without central regulation, Blockchain is heralded as a novel paradigm that will be as powerful as Big Data, Cloud Computing, and Machine Learning. The Blockchain technology garners an ever increasing interest of researchers in various domains that benefit from scalable cooperation among trust-less parties. As Blockchain data analytics further proliferates, a need to glean successful approaches and to disseminate them among a diverse body of data scientists became a critical task. As an inter-disciplinary team of researchers, our aim is to fill this vital role. In this tutorial, we offer a holistic view on Blockchain Data Analytics. Starting with the core components of Blockchain, we will discuss the state of art in Blockchain data analytics for privacy, security, finance, and management domains. We will share tutorial notes and further reading pointers on the tutorial website blockchaintutorial.github.io.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121732702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Time-Discounting Convolution for Event Sequences with Ambiguous Timestamps 具有模糊时间戳的事件序列的时间折扣卷积
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00139
Takayuki Katsuki, T. Osogami, Akira Koseki, Masaki Ono, M. Kudo, M. Makino, Atsushi Suzuki
This paper proposes a method for modeling event sequences with ambiguous timestamps, a time-discounting convolution. Unlike in ordinary time series, time intervals are not constant, small time-shifts have no significant effect, and inputting timestamps or time durations into a model is not effective. The criteria that we require for the modeling are providing robustness against time-shifts or timestamps uncertainty as well as maintaining the essential capabilities of time-series models, i.e., forgetting meaningless past information and handling infinite sequences. The proposed method handles them with a convolutional mechanism across time with specific parameterizations, which efficiently represents the event dependencies in a time-shift invariant manner while discounting the effect of past events, and a dynamic pooling mechanism, which provides robustness against the uncertainty in timestamps and enhances the time-discounting capability by dynamically changing the pooling window size. In our learning algorithm, the decaying and dynamic pooling mechanisms play critical roles in handling infinite and variable length sequences. Numerical experiments on real-world event sequences with ambiguous timestamps and ordinary time series demonstrated the advantages of our method.
本文提出了一种具有模糊时间戳的事件序列建模方法——时间贴现卷积。与普通时间序列不同,时间间隔不是恒定的,小的时移没有显著的影响,并且在模型中输入时间戳或时间持续时间是无效的。我们要求建模的标准是提供对时移或时间戳不确定性的鲁棒性,以及保持时间序列模型的基本功能,即忘记无意义的过去信息和处理无限序列。该方法采用具有特定参数化的跨时间卷积机制来处理这些问题,该机制以时移不变的方式有效地表示事件依赖关系,同时忽略了过去事件的影响;采用动态池化机制,该机制对时间戳的不确定性具有鲁棒性,并通过动态改变池化窗口大小来增强时间贴现能力。在我们的学习算法中,衰减和动态池化机制在处理无限长和变长序列方面起着至关重要的作用。在具有模糊时间戳和普通时间序列的真实事件序列上的数值实验证明了该方法的优越性。
{"title":"Time-Discounting Convolution for Event Sequences with Ambiguous Timestamps","authors":"Takayuki Katsuki, T. Osogami, Akira Koseki, Masaki Ono, M. Kudo, M. Makino, Atsushi Suzuki","doi":"10.1109/ICDM.2018.00139","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00139","url":null,"abstract":"This paper proposes a method for modeling event sequences with ambiguous timestamps, a time-discounting convolution. Unlike in ordinary time series, time intervals are not constant, small time-shifts have no significant effect, and inputting timestamps or time durations into a model is not effective. The criteria that we require for the modeling are providing robustness against time-shifts or timestamps uncertainty as well as maintaining the essential capabilities of time-series models, i.e., forgetting meaningless past information and handling infinite sequences. The proposed method handles them with a convolutional mechanism across time with specific parameterizations, which efficiently represents the event dependencies in a time-shift invariant manner while discounting the effect of past events, and a dynamic pooling mechanism, which provides robustness against the uncertainty in timestamps and enhances the time-discounting capability by dynamically changing the pooling window size. In our learning algorithm, the decaying and dynamic pooling mechanisms play critical roles in handling infinite and variable length sequences. Numerical experiments on real-world event sequences with ambiguous timestamps and ordinary time series demonstrated the advantages of our method.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130730623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unsupervised User Identity Linkage via Factoid Embedding 基于Factoid嵌入的无监督用户身份链接
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00182
Wei Xie, Xin Mu, R. Lee, Feida Zhu, Ee-Peng Lim
User identity linkage (UIL), the problem of matching user account across multiple online social networks (OSNs), is widely studied and important to many real-world applications. Most existing UIL solutions adopt a supervised or semi-supervised approach which generally suffer from scarcity of labeled data. In this paper, we propose Factoid Embedding, a novel framework that adopts an unsupervised approach. It is designed to cope with different profile attributes, content types and network links of different OSNs. The key idea is that each piece of information about a user identity describes the real identity owner, and thus distinguishes the owner from other users. We represent such a piece of information by a factoid and model it as a triplet consisting of user identity, predicate, and an object or another user identity. By embedding these factoids, we learn the user identity latent representations and link two user identities from different OSNs if they are close to each other in the user embedding space. Our Factoid Embedding algorithm is designed such that as we learn the embedding space, each embedded factoid is "translated" into a motion in the user embedding space to bring similar user identities closer, and different user identities further apart. Extensive experiments are conducted to evaluate Factoid Embedding on two real-world OSNs data sets. The experiment results show that Factoid Embedding outperforms the state-of-the-art methods even without training data.
用户身份链接(User identity linkage, UIL)是指跨多个在线社交网络(online social network, OSNs)匹配用户账户的问题,它被广泛研究,对许多现实应用都很重要。大多数现有的UIL解决方案采用监督或半监督方法,通常受到标记数据稀缺的影响。在本文中,我们提出了一种采用无监督方法的新框架Factoid Embedding。针对不同osn的不同配置文件属性、不同内容类型、不同网络链路而设计。关键思想是,关于用户身份的每条信息都描述了真正的身份所有者,从而将所有者与其他用户区分开来。我们用factoid表示这样的信息,并将其建模为由用户标识、谓词和对象或另一个用户标识组成的三元组。通过嵌入这些factoid,我们学习用户身份的潜在表征,并在用户嵌入空间中,如果两个用户身份在不同的osn中彼此接近,我们将它们链接起来。我们的Factoid嵌入算法是这样设计的:当我们学习嵌入空间时,每个嵌入的Factoid被“翻译”成用户嵌入空间中的运动,从而使相似的用户身份更接近,而不同的用户身份更远。在两个真实的osn数据集上进行了大量的实验来评估Factoid嵌入。实验结果表明,即使没有训练数据,Factoid嵌入也优于最先进的方法。
{"title":"Unsupervised User Identity Linkage via Factoid Embedding","authors":"Wei Xie, Xin Mu, R. Lee, Feida Zhu, Ee-Peng Lim","doi":"10.1109/ICDM.2018.00182","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00182","url":null,"abstract":"User identity linkage (UIL), the problem of matching user account across multiple online social networks (OSNs), is widely studied and important to many real-world applications. Most existing UIL solutions adopt a supervised or semi-supervised approach which generally suffer from scarcity of labeled data. In this paper, we propose Factoid Embedding, a novel framework that adopts an unsupervised approach. It is designed to cope with different profile attributes, content types and network links of different OSNs. The key idea is that each piece of information about a user identity describes the real identity owner, and thus distinguishes the owner from other users. We represent such a piece of information by a factoid and model it as a triplet consisting of user identity, predicate, and an object or another user identity. By embedding these factoids, we learn the user identity latent representations and link two user identities from different OSNs if they are close to each other in the user embedding space. Our Factoid Embedding algorithm is designed such that as we learn the embedding space, each embedded factoid is \"translated\" into a motion in the user embedding space to bring similar user identities closer, and different user identities further apart. Extensive experiments are conducted to evaluate Factoid Embedding on two real-world OSNs data sets. The experiment results show that Factoid Embedding outperforms the state-of-the-art methods even without training data.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131069374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Transfer Hawkes Processes with Content Information 传输带有内容信息的Hawkes流程
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00145
Tianbo Li, Pengfei Wei, Yiping Ke
Hawkes processes are widely used for modeling event cascades. However, content and cross-domain information which is also instrumental in modeling is usually neglected. In this paper, we propose a novel model called transfer Hybrid Least Square for Hawkes (trHLSH) that incorporates Hawkes processes with content and cross-domain information. We also present the effective learning algorithm for the model. Evaluation on both synthetic and real-world datasets demonstrates that the proposed model can jointly learn knowledge from temporal, content and cross-domain information, and has better performance in terms of network recovery and prediction.
霍克斯过程被广泛用于事件级联的建模。然而,同样有助于建模的内容和跨领域信息通常被忽略。在本文中,我们提出了一种新的模型,称为Hawkes的传递混合最小二乘法(trHLSH),该模型将Hawkes过程与内容和跨域信息相结合。同时给出了该模型的有效学习算法。综合数据集和真实数据集的评估表明,该模型能够从时间、内容和跨域信息中共同学习知识,在网络恢复和预测方面具有较好的性能。
{"title":"Transfer Hawkes Processes with Content Information","authors":"Tianbo Li, Pengfei Wei, Yiping Ke","doi":"10.1109/ICDM.2018.00145","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00145","url":null,"abstract":"Hawkes processes are widely used for modeling event cascades. However, content and cross-domain information which is also instrumental in modeling is usually neglected. In this paper, we propose a novel model called transfer Hybrid Least Square for Hawkes (trHLSH) that incorporates Hawkes processes with content and cross-domain information. We also present the effective learning algorithm for the model. Evaluation on both synthetic and real-world datasets demonstrates that the proposed model can jointly learn knowledge from temporal, content and cross-domain information, and has better performance in terms of network recovery and prediction.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131621637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SuperPart: Supervised Graph Partitioning for Record Linkage SuperPart:记录链接的监督图划分
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00054
Russell Reas, Stephen M. Ash, Robert A. Barton, Andrew Borthwick
Identifying sets of items that are equivalent to one another is a problem common to many fields. Systems addressing this generally have at their core a function s(d_i, d_j) for computing the similarity between pairs of records d_i, d_j. The output of s() can be interpreted as a weighted graph where edges indicate the likelihood of two records matching. Partitioning this graph into equivalence classes is non-trivial due to the presence of inconsistencies and imperfections in s(). Numerous algorithmic approaches to the problem have been proposed, but (1) it is unclear which approach should be used on a given dataset; (2) the algorithms do not generally output a confidence in their decisions; and (3) require error-prone tuning to a particular notion of ground truth. We present SuperPart, a scalable, supervised learning approach to graph partitioning. We demonstrate that SuperPart yields competitive results on the problem of detecting equivalent records without manual selection of algorithms or an exhaustive search over hyperparameters. Also, we show the quality of SuperPart's confidence measures by reporting Area Under the Precision-Recall Curve metrics that exceed a baseline measure by 11%. Finally, to bolster additional research in this domain, we release three new datasets derived from real-world Amazon product data along with ground-truth partitionings.
识别彼此等价的项集是许多字段的共同问题。解决这个问题的系统通常在其核心有一个函数s(d_i, d_j),用于计算记录d_i, d_j对之间的相似性。s()的输出可以解释为一个加权图,其中的边表示两条记录匹配的可能性。由于s()中存在不一致和不完善,将此图划分为等价类是非平凡的。已经提出了许多算法方法来解决这个问题,但是(1)对于给定的数据集应该使用哪种方法尚不清楚;(2)算法通常不会对其决策输出置信度;(3)需要容易出错的调谐到一个特定的基础真理的概念。我们提出了SuperPart,一种可扩展的、监督学习的图划分方法。我们证明了SuperPart在检测等效记录的问题上产生了竞争结果,而无需手动选择算法或对超参数进行穷举搜索。此外,我们通过报告精确度-召回率曲线指标下的面积,显示了SuperPart信心指标的质量,该指标超过了基准指标11%。最后,为了支持这一领域的进一步研究,我们发布了三个新的数据集,这些数据集来源于真实的亚马逊产品数据,并进行了ground-truth分区。
{"title":"SuperPart: Supervised Graph Partitioning for Record Linkage","authors":"Russell Reas, Stephen M. Ash, Robert A. Barton, Andrew Borthwick","doi":"10.1109/ICDM.2018.00054","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00054","url":null,"abstract":"Identifying sets of items that are equivalent to one another is a problem common to many fields. Systems addressing this generally have at their core a function s(d_i, d_j) for computing the similarity between pairs of records d_i, d_j. The output of s() can be interpreted as a weighted graph where edges indicate the likelihood of two records matching. Partitioning this graph into equivalence classes is non-trivial due to the presence of inconsistencies and imperfections in s(). Numerous algorithmic approaches to the problem have been proposed, but (1) it is unclear which approach should be used on a given dataset; (2) the algorithms do not generally output a confidence in their decisions; and (3) require error-prone tuning to a particular notion of ground truth. We present SuperPart, a scalable, supervised learning approach to graph partitioning. We demonstrate that SuperPart yields competitive results on the problem of detecting equivalent records without manual selection of algorithms or an exhaustive search over hyperparameters. Also, we show the quality of SuperPart's confidence measures by reporting Area Under the Precision-Recall Curve metrics that exceed a baseline measure by 11%. Finally, to bolster additional research in this domain, we release three new datasets derived from real-world Amazon product data along with ground-truth partitionings.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127537470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Hierarchical Hybrid Feature Model for Top-N Context-Aware Recommendation Top-N上下文感知推荐的分层混合特征模型
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00026
Yingpeng Du, Hongzhi Liu, Zhonghai Wu, Xing Zhang
Precise prediction of users' behavior is critical for users' satisfaction and platforms' benefit. A user's behavior heavily depends on the user's general preference and contextual information (current location, weather etc.). In this paper, we propose a succinct hierarchical framework named Hierarchical Hybrid Feature Model (HHFM). It combines users' general taste and diverse contextual information into a hybrid feature representation to profile users' dynamic preference w.r.t context. Meanwhile, we propose an n-way concatenation pooling strategy to capture the non-linear and complex inherent structures of real-world data, which were ignored by most existing methods like Factorization Machines. Conceptually, our model subsumes several existing methods when choosing proper concatenation and pooling strategies. Extensive experiments show our model consistently outperforms state-of-the-art methods on three real-world data sets.
准确预测用户行为对用户满意度和平台利益至关重要。用户的行为很大程度上取决于用户的一般偏好和上下文信息(当前位置、天气等)。在本文中,我们提出了一个简洁的层次结构框架——层次混合特征模型(HHFM)。它将用户的一般品味和不同的上下文信息组合成混合特征表示,以描述用户在上下文中的动态偏好。同时,我们提出了一种n-way连接池策略,以捕获现实世界数据的非线性和复杂的固有结构,这些结构被大多数现有方法(如Factorization Machines)所忽略。从概念上讲,我们的模型在选择适当的连接和池化策略时包含了几种现有的方法。广泛的实验表明,我们的模型在三个真实世界的数据集上始终优于最先进的方法。
{"title":"Hierarchical Hybrid Feature Model for Top-N Context-Aware Recommendation","authors":"Yingpeng Du, Hongzhi Liu, Zhonghai Wu, Xing Zhang","doi":"10.1109/ICDM.2018.00026","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00026","url":null,"abstract":"Precise prediction of users' behavior is critical for users' satisfaction and platforms' benefit. A user's behavior heavily depends on the user's general preference and contextual information (current location, weather etc.). In this paper, we propose a succinct hierarchical framework named Hierarchical Hybrid Feature Model (HHFM). It combines users' general taste and diverse contextual information into a hybrid feature representation to profile users' dynamic preference w.r.t context. Meanwhile, we propose an n-way concatenation pooling strategy to capture the non-linear and complex inherent structures of real-world data, which were ignored by most existing methods like Factorization Machines. Conceptually, our model subsumes several existing methods when choosing proper concatenation and pooling strategies. Extensive experiments show our model consistently outperforms state-of-the-art methods on three real-world data sets.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127006110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2018 IEEE International Conference on Data Mining (ICDM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1