首页 > 最新文献

Data Mining and Knowledge Discovery最新文献

英文 中文
Adaptive Bernstein change detector for high-dimensional data streams 用于高维数据流的自适应伯恩斯坦变化检测器
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-01-09 DOI: 10.1007/s10618-023-00999-5
Marco Heyden, Edouard Fouché, Vadim Arzamasov, Tanja Fenn, Florian Kalinke, Klemens Böhm

Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring and prediction systems to react, e.g., by issuing an alarm or by updating a learning algorithm. However, detecting changes is challenging when observations are high-dimensional. In high-dimensional data, change detectors should not only be able to identify when changes happen, but also in which subspace they occur. Ideally, one should also quantify how severe they are. Our approach, ABCD, has these properties. ABCD learns an encoder-decoder model and monitors its accuracy over a window of adaptive size. ABCD derives a change score based on Bernstein’s inequality to detect deviations in terms of accuracy, which indicate changes. Our experiments demonstrate that ABCD outperforms its best competitor by up to 20% in F1-score on average. It can also accurately estimate changes’ subspace, together with a severity measure that correlates with the ground truth.

在分析数据流时,变化检测至关重要。快速而准确地检测变化可使监控和预测系统做出反应,例如发出警报或更新学习算法。然而,当观测数据是高维数据时,检测变化是一项挑战。在高维数据中,变化检测器不仅要能识别变化发生的时间,还要能识别变化发生在哪个子空间。理想情况下,还应该量化变化的严重程度。我们的方法 ABCD 就具有这些特性。ABCD 学习编码器-解码器模型,并在一个自适应大小的窗口内监控其准确性。ABCD 基于伯恩斯坦不等式得出变化分数,以检测准确度方面的偏差,这表明发生了变化。我们的实验证明,ABCD 的 F1 分数平均比最佳竞争对手高出 20%。它还能准确估计变化的子空间,以及与地面实况相关的严重程度。
{"title":"Adaptive Bernstein change detector for high-dimensional data streams","authors":"Marco Heyden, Edouard Fouché, Vadim Arzamasov, Tanja Fenn, Florian Kalinke, Klemens Böhm","doi":"10.1007/s10618-023-00999-5","DOIUrl":"https://doi.org/10.1007/s10618-023-00999-5","url":null,"abstract":"<p>Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring and prediction systems to react, e.g., by issuing an alarm or by updating a learning algorithm. However, detecting changes is challenging when observations are high-dimensional. In high-dimensional data, change detectors should not only be able to identify when changes happen, but also in which subspace they occur. Ideally, one should also quantify how severe they are. Our approach, ABCD, has these properties. ABCD learns an encoder-decoder model and monitors its accuracy over a window of adaptive size. ABCD derives a change score based on Bernstein’s inequality to detect deviations in terms of accuracy, which indicate changes. Our experiments demonstrate that ABCD outperforms its best competitor by up to 20% in F1-score on average. It can also accurately estimate changes’ subspace, together with a severity measure that correlates with the ground truth.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"54 ","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139411891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When graph convolution meets double attention: online privacy disclosure detection with multi-label text classification 当图卷积遇到双重关注:利用多标签文本分类进行在线隐私披露检测
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-01-05 DOI: 10.1007/s10618-023-00992-y
Zhanbo Liang, Jie Guo, Weidong Qiu, Zheng Huang, Shujun Li

With the rise of Web 2.0 platforms such as online social media, people’s private information, such as their location, occupation and even family information, is often inadvertently disclosed through online discussions. Therefore, it is important to detect such unwanted privacy disclosures to help alert people affected and the online platform. In this paper, privacy disclosure detection is modeled as a multi-label text classification (MLTC) problem, and a new privacy disclosure detection model is proposed to construct an MLTC classifier for detecting online privacy disclosures. This classifier takes an online post as the input and outputs multiple labels, each reflecting a possible privacy disclosure. The proposed presentation method combines three different sources of information, the input text itself, the label-to-text correlation and the label-to-label correlation. A double-attention mechanism is used to combine the first two sources of information, and a graph convolutional network is employed to extract the third source of information that is then used to help fuse features extracted from the first two sources of information. Our extensive experimental results, obtained on a public dataset of privacy-disclosing posts on Twitter, demonstrated that our proposed privacy disclosure detection method significantly and consistently outperformed other state-of-the-art methods in terms of all key performance indicators.

随着网络社交媒体等 Web 2.0 平台的兴起,人们的私人信息,如位置、职业甚至家庭信息,往往会在网上讨论中不经意地泄露。因此,检测此类不必要的隐私泄露以帮助提醒受影响者和网络平台是非常重要的。本文将隐私披露检测建模为一个多标签文本分类(MLTC)问题,并提出了一个新的隐私披露检测模型,以构建一个用于检测在线隐私披露的 MLTC 分类器。该分类器以网上帖子为输入,输出多个标签,每个标签反映一个可能的隐私披露。所提出的呈现方法结合了三种不同的信息来源:输入文本本身、标签与文本之间的相关性以及标签与标签之间的相关性。双重关注机制用于结合前两个信息源,图卷积网络用于提取第三个信息源,然后用来帮助融合从前两个信息源中提取的特征。我们在 Twitter 上公开的隐私披露帖子数据集上取得的大量实验结果表明,我们提出的隐私披露检测方法在所有关键性能指标上都显著且持续地优于其他最先进的方法。
{"title":"When graph convolution meets double attention: online privacy disclosure detection with multi-label text classification","authors":"Zhanbo Liang, Jie Guo, Weidong Qiu, Zheng Huang, Shujun Li","doi":"10.1007/s10618-023-00992-y","DOIUrl":"https://doi.org/10.1007/s10618-023-00992-y","url":null,"abstract":"<p>With the rise of Web 2.0 platforms such as online social media, people’s private information, such as their location, occupation and even family information, is often inadvertently disclosed through online discussions. Therefore, it is important to detect such unwanted privacy disclosures to help alert people affected and the online platform. In this paper, privacy disclosure detection is modeled as a multi-label text classification (MLTC) problem, and a new privacy disclosure detection model is proposed to construct an MLTC classifier for detecting online privacy disclosures. This classifier takes an online post as the input and outputs multiple labels, each reflecting a possible privacy disclosure. The proposed presentation method combines three different sources of information, the input text itself, the label-to-text correlation and the label-to-label correlation. A double-attention mechanism is used to combine the first two sources of information, and a graph convolutional network is employed to extract the third source of information that is then used to help fuse features extracted from the first two sources of information. Our extensive experimental results, obtained on a public dataset of privacy-disclosing posts on Twitter, demonstrated that our proposed privacy disclosure detection method significantly and consistently outperformed other state-of-the-art methods in terms of all key performance indicators.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"80 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139376729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CompTrails: comparing hypotheses across behavioral networks CompTrails:跨行为网络的假设比较
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-01-03 DOI: 10.1007/s10618-023-00996-8
Tobias Koopmann, Martin Becker, Florian Lemmerich, Andreas Hotho

The term Behavioral Networks describes networks that contain relational information on human behavior. This ranges from social networks that contain friendships or cooperations between individuals, to navigational networks that contain geographical or web navigation, and many more. Understanding the forces driving behavior within these networks can be beneficial to improving the underlying network, for example, by generating new hyperlinks on websites, or by proposing new connections and friends on social networks. Previous approaches considered different hypotheses on a single network and evaluated which hypothesis fits best. These hypotheses can represent human intuition and expert opinions or be based on previous insights. In this work, we extend these approaches to enable the comparison of a single hypothesis between multiple networks. We unveil several issues of naive approaches that potentially impact comparisons and lead to undesired results. Based on these findings, we propose a framework with five flexible components that allow addressing specific analysis goals tailored to the application scenario. We show the benefits and limits of our approach by applying it to synthetic data and several real-world datasets, including web navigation, bibliometric navigation, and geographic navigation. Our work supports practitioners and researchers with the aim of understanding similarities and differences in human behavior between environments.

行为网络一词描述的是包含人类行为相关信息的网络。其中包括包含个人之间友谊或合作关系的社交网络,以及包含地理或网络导航的导航网络等等。了解这些网络中的行为驱动力有助于改善底层网络,例如,在网站上生成新的超链接,或在社交网络上提出新的连接和朋友。以前的方法考虑了单个网络的不同假设,并评估哪种假设最适合。这些假设可以代表人类的直觉和专家意见,也可以基于以往的见解。在这项工作中,我们对这些方法进行了扩展,以便在多个网络之间对单一假设进行比较。我们揭示了天真方法的几个问题,这些问题可能会影响比较并导致不理想的结果。基于这些发现,我们提出了一个包含五个灵活组件的框架,可以根据应用场景实现特定的分析目标。通过将我们的方法应用于合成数据和几个真实世界的数据集,包括网络导航、文献计量导航和地理导航,我们展示了这种方法的优势和局限性。我们的工作可为从业人员和研究人员提供支持,帮助他们了解不同环境下人类行为的异同。
{"title":"CompTrails: comparing hypotheses across behavioral networks","authors":"Tobias Koopmann, Martin Becker, Florian Lemmerich, Andreas Hotho","doi":"10.1007/s10618-023-00996-8","DOIUrl":"https://doi.org/10.1007/s10618-023-00996-8","url":null,"abstract":"<p>The term <i>Behavioral Networks</i> describes networks that contain relational information on human behavior. This ranges from social networks that contain friendships or cooperations between individuals, to navigational networks that contain geographical or web navigation, and many more. Understanding the forces driving behavior within these networks can be beneficial to improving the underlying network, for example, by generating new hyperlinks on websites, or by proposing new connections and friends on social networks. Previous approaches considered different hypotheses on a single network and evaluated which hypothesis fits best. These hypotheses can represent human intuition and expert opinions or be based on previous insights. In this work, we extend these approaches to enable the comparison of a single hypothesis between multiple networks. We unveil several issues of naive approaches that potentially impact comparisons and lead to undesired results. Based on these findings, we propose a framework with five flexible components that allow addressing specific analysis goals tailored to the application scenario. We show the benefits and limits of our approach by applying it to synthetic data and several real-world datasets, including web navigation, bibliometric navigation, and geographic navigation. Our work supports practitioners and researchers with the aim of understanding similarities and differences in human behavior between environments.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"28 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139095516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective signal reconstruction from multiple ranked lists via convex optimization 通过凸优化从多个排序列表中有效重建信号
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-01-02 DOI: 10.1007/s10618-023-00991-z

Abstract

The ranking of objects is widely used to rate their relative quality or relevance across multiple assessments. Beyond classical rank aggregation, it is of interest to estimate the usually unobservable latent signals that inform a consensus ranking. Under the only assumption of independent assessments, which can be incomplete, we introduce indirect inference via convex optimization in combination with computationally efficient Poisson Bootstrap. Two different objective functions are suggested, one linear and the other quadratic. The mathematical formulation of the signal estimation problem is based on pairwise comparisons of all objects with respect to their rank positions. Sets of constraints represent the order relations. The transitivity property of rank scales allows us to reduce substantially the number of constraints associated with the full set of object comparisons. The key idea is to globally reduce the errors induced by the rankers until optimal latent signals can be obtained. Its main advantage is low computational costs, even when handling (n < < p) data problems. Exploratory tools can be developed based on the bootstrap signal estimates and standard errors. Simulation evidence, a comparison with the state-of-the-art rank centrality method, and two applications, one in higher education evaluation and the other in molecular cancer research, are presented.

摘要 物体排名被广泛用于在多个评估中评定其相对质量或相关性。除了传统的排名汇总外,人们还对估计通常无法观察到的潜在信号以达成一致排名很感兴趣。在独立评估(可能是不完整的)这一唯一假设下,我们通过凸优化结合计算效率高的泊松引导法引入了间接推理。我们提出了两种不同的目标函数,一种是线性函数,另一种是二次函数。信号估计问题的数学表述是基于所有对象在等级位置上的成对比较。一组约束条件代表了等级关系。秩标度的反演特性使我们能够大幅减少与全套对象比较相关的约束条件数量。其关键思路是全面减少排序器引起的误差,直至获得最佳的潜在信号。它的主要优点是计算成本低,即使在处理 (n < < p) 数据问题时也是如此。基于引导信号估计值和标准误差,可以开发探索工具。本文介绍了模拟证据、与最先进的秩中心性方法的比较以及两个应用,一个应用于高等教育评估,另一个应用于分子癌症研究。
{"title":"Effective signal reconstruction from multiple ranked lists via convex optimization","authors":"","doi":"10.1007/s10618-023-00991-z","DOIUrl":"https://doi.org/10.1007/s10618-023-00991-z","url":null,"abstract":"<h3>Abstract</h3> <p>The ranking of objects is widely used to rate their relative quality or relevance across multiple assessments. Beyond classical rank aggregation, it is of interest to estimate the usually unobservable latent signals that inform a consensus ranking. Under the only assumption of independent assessments, which can be incomplete, we introduce indirect inference via convex optimization in combination with computationally efficient Poisson Bootstrap. Two different objective functions are suggested, one linear and the other quadratic. The mathematical formulation of the signal estimation problem is based on pairwise comparisons of all objects with respect to their rank positions. Sets of constraints represent the order relations. The transitivity property of rank scales allows us to reduce substantially the number of constraints associated with the full set of object comparisons. The key idea is to globally reduce the errors induced by the rankers until optimal latent signals can be obtained. Its main advantage is low computational costs, even when handling <span> <span>(n &lt; &lt; p)</span> </span> data problems. Exploratory tools can be developed based on the bootstrap signal estimates and standard errors. Simulation evidence, a comparison with the state-of-the-art rank centrality method, and two applications, one in higher education evaluation and the other in molecular cancer research, are presented.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"52 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139083082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: A semi‑supervised interactive algorithm for change point detection 更正:用于变化点检测的半监督交互式算法
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-01-02 DOI: 10.1007/s10618-023-01000-z
Zhenxiang Cao, N. Seeuws, Maarten Vos, Alexander Bertrand
{"title":"Correction: A semi‑supervised interactive algorithm for change point detection","authors":"Zhenxiang Cao, N. Seeuws, Maarten Vos, Alexander Bertrand","doi":"10.1007/s10618-023-01000-z","DOIUrl":"https://doi.org/10.1007/s10618-023-01000-z","url":null,"abstract":"","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"54 20","pages":"1"},"PeriodicalIF":4.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139390140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting consumer choice from raw eye-movement data using the RETINA deep learning architecture 利用 RETINA 深度学习架构从原始眼动数据中预测消费者选择
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-29 DOI: 10.1007/s10618-023-00989-7
Moshe Unger, Michel Wedel, Alexander Tuzhilin

We propose the use of a deep learning architecture, called RETINA, to predict multi-alternative, multi-attribute consumer choice from eye movement data. RETINA directly uses the complete time series of raw eye-tracking data from both eyes as input to state-of-the art Transformer and Metric Learning Deep Learning methods. Using the raw data input eliminates the information loss that may result from first calculating fixations, deriving metrics from the fixations data and analysing those metrics, as has been often done in eye movement research, and allows us to apply Deep Learning to eye tracking data sets of the size commonly encountered in academic and applied research. Using a data set with 112 respondents who made choices among four laptops, we show that the proposed architecture outperforms other state-of-the-art machine learning methods (standard BERT, LSTM, AutoML, logistic regression) calibrated on raw data or fixation data. The analysis of partial time and partial data segments reveals the ability of RETINA to predict choice outcomes well before participants reach a decision. Specifically, we find that using a mere 5 s of data, the RETINA architecture achieves a predictive validation accuracy of over 0.7. We provide an assessment of which features of the eye movement data contribute to RETINA’s prediction accuracy. We make recommendations on how the proposed deep learning architecture can be used as a basis for future academic research, in particular its application to eye movements collected from front-facing video cameras.

我们建议使用一种名为 RETINA 的深度学习架构,从眼动数据中预测消费者的多选择、多属性选择。RETINA 直接使用来自双眼的原始眼动跟踪数据的完整时间序列,作为最先进的变换器和度量学习深度学习方法的输入。使用原始数据输入消除了眼动研究中经常使用的首先计算固定点、从固定点数据中得出度量值并分析这些度量值可能造成的信息损失,并使我们能够将深度学习应用于学术和应用研究中常见的眼动跟踪数据集。通过使用 112 名受访者在四台笔记本电脑中进行选择的数据集,我们发现,所提出的架构优于其他最先进的机器学习方法(标准 BERT、LSTM、AutoML、逻辑回归),这些方法都是在原始数据或固定数据上进行校准的。对部分时间和部分数据片段的分析表明,RETINA 能够在参与者做出决定之前预测选择结果。具体来说,我们发现 RETINA 架构使用短短 5 秒钟的数据就能达到 0.7 以上的预测验证准确率。我们评估了眼动数据的哪些特征有助于提高 RETINA 的预测准确率。我们就如何将所提出的深度学习架构作为未来学术研究的基础提出了建议,特别是将其应用于从前置摄像头收集的眼动数据。
{"title":"Predicting consumer choice from raw eye-movement data using the RETINA deep learning architecture","authors":"Moshe Unger, Michel Wedel, Alexander Tuzhilin","doi":"10.1007/s10618-023-00989-7","DOIUrl":"https://doi.org/10.1007/s10618-023-00989-7","url":null,"abstract":"<p>We propose the use of a deep learning architecture, called RETINA, to predict multi-alternative, multi-attribute consumer choice from eye movement data. RETINA directly uses the complete time series of raw eye-tracking data from both eyes as input to state-of-the art Transformer and Metric Learning Deep Learning methods. Using the raw data input eliminates the information loss that may result from first calculating fixations, deriving metrics from the fixations data and analysing those metrics, as has been often done in eye movement research, and allows us to apply Deep Learning to eye tracking data sets of the size commonly encountered in academic and applied research. Using a data set with 112 respondents who made choices among four laptops, we show that the proposed architecture outperforms other state-of-the-art machine learning methods (standard BERT, LSTM, AutoML, logistic regression) calibrated on raw data or fixation data. The analysis of partial time and partial data segments reveals the ability of RETINA to predict choice outcomes well before participants reach a decision. Specifically, we find that using a mere 5 s of data, the RETINA architecture achieves a predictive validation accuracy of over 0.7. We provide an assessment of which features of the eye movement data contribute to RETINA’s prediction accuracy. We make recommendations on how the proposed deep learning architecture can be used as a basis for future academic research, in particular its application to eye movements collected from front-facing video cameras.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"29 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139064560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session-based recommendation by exploiting substitutable and complementary relationships from multi-behavior data 利用多行为数据中的可替代和互补关系,进行基于会话的推荐
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-26 DOI: 10.1007/s10618-023-00994-w
Huizi Wu, Cong Geng, Hui Fang

Session-based recommendation (SR) aims to dynamically recommend items to a user based on a sequence of the most recent user-item interactions. Most existing studies on SR adopt advanced deep learning methods. However, the majority only consider a special behavior type (e.g., click), while those few considering multi-typed behaviors ignore to take full advantage of the relationships between products (items). In this case, the paper proposes a novel approach, called Substitutable and Complementary Relationships from Multi-behavior Data (denoted as SCRM) to better explore the relationships between products for effective recommendation. Specifically, we firstly construct substitutable and complementary graphs based on a user’s sequential behaviors in every session by jointly considering ‘click’ and ‘purchase’ behaviors. We then design a denoising network to remove false relationships, and further consider constraints on the two relationships via a particularly designed loss function. Extensive experiments on two e-commerce datasets demonstrate the superiority of our model over state-of-the-art methods, and the effectiveness of every component in SCRM.

基于会话的推荐(SR)旨在根据用户与物品最近的交互序列向用户动态推荐物品。关于会话推荐的现有研究大多采用先进的深度学习方法。然而,大多数研究只考虑了一种特殊的行为类型(如点击),而少数考虑多类型行为的研究则忽略了充分利用产品(项目)之间的关系。在这种情况下,本文提出了一种名为 "多行为数据中的可替代和互补关系"(Substitutable and Complementary Relationships from Multi-behavior Data,简称 SCRM)的新方法,以更好地探索产品之间的关系,从而实现有效的推荐。具体来说,我们首先通过联合考虑 "点击 "和 "购买 "行为,根据用户在每个会话中的连续行为构建可替代和互补图。然后,我们设计了一个去噪网络来去除虚假关系,并通过一个特别设计的损失函数进一步考虑对这两种关系的约束。在两个电子商务数据集上进行的广泛实验证明了我们的模型优于最先进的方法,以及 SCRM 中每个组件的有效性。
{"title":"Session-based recommendation by exploiting substitutable and complementary relationships from multi-behavior data","authors":"Huizi Wu, Cong Geng, Hui Fang","doi":"10.1007/s10618-023-00994-w","DOIUrl":"https://doi.org/10.1007/s10618-023-00994-w","url":null,"abstract":"<p>Session-based recommendation (SR) aims to dynamically recommend items to a user based on a sequence of the most recent user-item interactions. Most existing studies on SR adopt advanced deep learning methods. However, the majority only consider a special behavior type (e.g., click), while those few considering multi-typed behaviors ignore to take full advantage of the relationships between products (items). In this case, the paper proposes a novel approach, called Substitutable and Complementary Relationships from Multi-behavior Data (denoted as SCRM) to better explore the relationships between products for effective recommendation. Specifically, we firstly construct substitutable and complementary graphs based on a user’s sequential behaviors in every session by jointly considering ‘click’ and ‘purchase’ behaviors. We then design a denoising network to remove false relationships, and further consider constraints on the two relationships via a particularly designed loss function. Extensive experiments on two e-commerce datasets demonstrate the superiority of our model over state-of-the-art methods, and the effectiveness of every component in SCRM.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"37 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139057144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random walk with restart on hypergraphs: fast computation and an application to anomaly detection 超图上重新开始的随机行走:快速计算及异常检测应用
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-21 DOI: 10.1007/s10618-023-00995-9
Jaewan Chun, Geon Lee, Kijung Shin, Jinhong Jung

Random walk with restart (RWR) is a widely-used measure of node similarity in graphs, and it has proved useful for ranking, community detection, link prediction, anomaly detection, etc. Since RWR is typically required to be computed separately for a larger number of query nodes or even for all nodes, fast computation of it is indispensable. However, for hypergraphs, the fast computation of RWR has been unexplored, despite its great potential. In this paper, we propose ARCHER, a fast computation framework for RWR on hypergraphs. Specifically, we first formally define RWR on hypergraphs, and then we propose two computation methods that compose ARCHER. Since the two methods are complementary (i.e., offering relative advantages on different hypergraphs), we also develop a method for automatic selection between them, which takes a very short time compared to the total running time. Through our extensive experiments on 18 real-world hypergraphs, we demonstrate (a) the speed and space efficiency of ARCHER, (b) the complementary nature of the two computation methods composing ARCHER, (c) the accuracy of its automatic selection method, and (d) its successful application to anomaly detection on hypergraphs.

带重启的随机漫步(RWR)是一种广泛使用的图中节点相似性度量方法,已被证明可用于排名、群落检测、链接预测、异常检测等。由于 RWR 通常需要对大量查询节点甚至所有节点分别计算,因此快速计算 RWR 是必不可少的。然而,对于超图而言,尽管 RWR 具有巨大的潜力,但其快速计算却一直未被探索。本文提出了超图上 RWR 的快速计算框架 ARCHER。具体来说,我们首先正式定义了超图上的 RWR,然后提出了组成 ARCHER 的两种计算方法。由于这两种方法是互补的(即在不同的超图上具有相对优势),我们还开发了一种在它们之间进行自动选择的方法,与总运行时间相比,这种方法只需要很短的时间。通过在 18 个真实超图上的大量实验,我们证明了(a)ARCHER 的速度和空间效率,(b)组成 ARCHER 的两种计算方法的互补性,(c)其自动选择方法的准确性,以及(d)其在超图异常检测上的成功应用。
{"title":"Random walk with restart on hypergraphs: fast computation and an application to anomaly detection","authors":"Jaewan Chun, Geon Lee, Kijung Shin, Jinhong Jung","doi":"10.1007/s10618-023-00995-9","DOIUrl":"https://doi.org/10.1007/s10618-023-00995-9","url":null,"abstract":"<p>Random walk with restart (RWR) is a widely-used measure of node similarity in graphs, and it has proved useful for ranking, community detection, link prediction, anomaly detection, etc. Since RWR is typically required to be computed separately for a larger number of query nodes or even for all nodes, fast computation of it is indispensable. However, for hypergraphs, the fast computation of RWR has been unexplored, despite its great potential. In this paper, we propose <span>ARCHER</span>, a fast computation framework for RWR on hypergraphs. Specifically, we first formally define RWR on hypergraphs, and then we propose two computation methods that compose <span>ARCHER</span>. Since the two methods are complementary (i.e., offering relative advantages on different hypergraphs), we also develop a method for automatic selection between them, which takes a very short time compared to the total running time. Through our extensive experiments on 18 real-world hypergraphs, we demonstrate (a) the speed and space efficiency of <span>ARCHER</span>, (b) the complementary nature of the two computation methods composing <span>ARCHER</span>, (c) the accuracy of its automatic selection method, and (d) its successful application to anomaly detection on hypergraphs.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"69 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138823850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach 通过调整进化方法改进数据流的超参数自调整
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-21 DOI: 10.1007/s10618-023-00997-7
Antonio R. Moya, Bruno Veloso, João Gama, Sebastián Ventura
{"title":"Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach","authors":"Antonio R. Moya, Bruno Veloso, João Gama, Sebastián Ventura","doi":"10.1007/s10618-023-00997-7","DOIUrl":"https://doi.org/10.1007/s10618-023-00997-7","url":null,"abstract":"","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"52 11","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138952437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OEC: an online ensemble classifier for mining data streams with noisy labels OEC:用于挖掘带噪声标签数据流的在线集合分类器
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-12 DOI: 10.1007/s10618-023-00990-0
Ling Jian, Kai Shao, Ying Liu, Jundong Li, Xijun Liang

Distilling actionable patterns from large-scale streaming data in the presence of concept drift is a challenging problem, especially when data is polluted with noisy labels. To date, various data stream mining algorithms have been proposed and extensively used in many real-world applications. Considering the functional complementation of classical online learning algorithms and with the goal of combining their advantages, we propose an Online Ensemble Classification (OEC) algorithm to integrate the predictions obtained by different base online classification algorithms. The proposed OEC method works by learning weights of different base classifiers dynamically through the classical Normalized Exponentiated Gradient (NEG) algorithm framework. As a result, the proposed OEC inherits the adaptability and flexibility of concept drift-tracking online classifiers, while maintaining the robustness of noise-resistant online classifiers. Theoretically, we show OEC algorithm is a low regret algorithm which makes it a good candidate to learn from noisy streaming data. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed OEC method.

从存在概念漂移的大规模流数据中提取可操作模式是一个具有挑战性的问题,特别是当数据被噪声标签污染时。迄今为止,各种数据流挖掘算法已被提出并广泛应用于许多实际应用中。考虑到经典在线学习算法的功能互补,以结合它们的优点为目标,提出了一种在线集成分类(OEC)算法来整合不同基础在线分类算法得到的预测结果。该方法通过经典的归一化指数梯度(NEG)算法框架动态学习不同基分类器的权值。因此,所提出的OEC继承了概念漂移跟踪在线分类器的适应性和灵活性,同时保持了抗噪声在线分类器的鲁棒性。从理论上讲,我们证明了OEC算法是一种低遗憾算法,使其成为从有噪声流数据中学习的良好候选算法。在合成数据集和实际数据集上进行的大量实验证明了所提出的OEC方法的有效性。
{"title":"OEC: an online ensemble classifier for mining data streams with noisy labels","authors":"Ling Jian, Kai Shao, Ying Liu, Jundong Li, Xijun Liang","doi":"10.1007/s10618-023-00990-0","DOIUrl":"https://doi.org/10.1007/s10618-023-00990-0","url":null,"abstract":"<p>Distilling actionable patterns from large-scale streaming data in the presence of concept drift is a challenging problem, especially when data is polluted with noisy labels. To date, various data stream mining algorithms have been proposed and extensively used in many real-world applications. Considering the functional complementation of classical online learning algorithms and with the goal of combining their advantages, we propose an Online Ensemble Classification (OEC) algorithm to integrate the predictions obtained by different base online classification algorithms. The proposed OEC method works by learning weights of different base classifiers dynamically through the classical Normalized Exponentiated Gradient (NEG) algorithm framework. As a result, the proposed OEC inherits the adaptability and flexibility of concept drift-tracking online classifiers, while maintaining the robustness of noise-resistant online classifiers. Theoretically, we show OEC algorithm is a low regret algorithm which makes it a good candidate to learn from noisy streaming data. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed OEC method.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"177 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138628812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data Mining and Knowledge Discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1