首页 > 最新文献

Data Mining and Knowledge Discovery最新文献

英文 中文
Central node identification via weighted kernel density estimation 通过加权核密度估计识别中心节点
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-01-31 DOI: 10.1007/s10618-024-01003-4

Abstract

The detection of central nodes in a network is a fundamental task in network science and graph data analysis. During the past decades, numerous centrality measures have been presented to characterize what is a central node. However, few studies address this issue from a statistical inference perspective. In this paper, we formulate the central node identification issue as a weighted kernel density estimation problem on graphs. Such a formulation provides a generic framework for recognizing central nodes. On one hand, some existing centrality evaluation metrics can be unified under this framework through the manipulation of kernel functions. On the other hand, more effective methods for node centrality assessment can be developed based on proper weighting coefficient specification. Experimental results on 20 simulated networks and 53 real networks show that our method outperforms both six prior state-of-the-art centrality measures and two recently proposed centrality evaluation methods. To the best of our knowledge, this is the first piece of work that addresses the central node identification issue via weighted kernel density estimation.

摘要 检测网络中的中心节点是网络科学和图数据分析中的一项基本任务。在过去的几十年里,人们提出了许多中心性测量方法来描述什么是中心节点。然而,很少有研究从统计推断的角度来解决这个问题。在本文中,我们将中心节点识别问题表述为图上的加权核密度估计问题。这样的表述为识别中心节点提供了一个通用框架。一方面,通过对核函数的处理,一些现有的中心性评价指标可以统一到这一框架下。另一方面,基于适当的加权系数规范,可以开发出更有效的节点中心性评估方法。在 20 个模拟网络和 53 个真实网络上的实验结果表明,我们的方法优于之前六种最先进的中心性测量方法和最近提出的两种中心性评估方法。据我们所知,这是第一项通过加权核密度估计来解决中心节点识别问题的研究。
{"title":"Central node identification via weighted kernel density estimation","authors":"","doi":"10.1007/s10618-024-01003-4","DOIUrl":"https://doi.org/10.1007/s10618-024-01003-4","url":null,"abstract":"<h3>Abstract</h3> <p>The detection of central nodes in a network is a fundamental task in network science and graph data analysis. During the past decades, numerous centrality measures have been presented to characterize what is a central node. However, few studies address this issue from a statistical inference perspective. In this paper, we formulate the central node identification issue as a weighted kernel density estimation problem on graphs. Such a formulation provides a generic framework for recognizing central nodes. On one hand, some existing centrality evaluation metrics can be unified under this framework through the manipulation of kernel functions. On the other hand, more effective methods for node centrality assessment can be developed based on proper weighting coefficient specification. Experimental results on 20 simulated networks and 53 real networks show that our method outperforms both six prior state-of-the-art centrality measures and two recently proposed centrality evaluation methods. To the best of our knowledge, this is the first piece of work that addresses the central node identification issue via weighted kernel density estimation.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"21 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139658286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Bernstein change detector for high-dimensional data streams 用于高维数据流的自适应伯恩斯坦变化检测器
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-01-09 DOI: 10.1007/s10618-023-00999-5
Marco Heyden, Edouard Fouché, Vadim Arzamasov, Tanja Fenn, Florian Kalinke, Klemens Böhm

Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring and prediction systems to react, e.g., by issuing an alarm or by updating a learning algorithm. However, detecting changes is challenging when observations are high-dimensional. In high-dimensional data, change detectors should not only be able to identify when changes happen, but also in which subspace they occur. Ideally, one should also quantify how severe they are. Our approach, ABCD, has these properties. ABCD learns an encoder-decoder model and monitors its accuracy over a window of adaptive size. ABCD derives a change score based on Bernstein’s inequality to detect deviations in terms of accuracy, which indicate changes. Our experiments demonstrate that ABCD outperforms its best competitor by up to 20% in F1-score on average. It can also accurately estimate changes’ subspace, together with a severity measure that correlates with the ground truth.

在分析数据流时,变化检测至关重要。快速而准确地检测变化可使监控和预测系统做出反应,例如发出警报或更新学习算法。然而,当观测数据是高维数据时,检测变化是一项挑战。在高维数据中,变化检测器不仅要能识别变化发生的时间,还要能识别变化发生在哪个子空间。理想情况下,还应该量化变化的严重程度。我们的方法 ABCD 就具有这些特性。ABCD 学习编码器-解码器模型,并在一个自适应大小的窗口内监控其准确性。ABCD 基于伯恩斯坦不等式得出变化分数,以检测准确度方面的偏差,这表明发生了变化。我们的实验证明,ABCD 的 F1 分数平均比最佳竞争对手高出 20%。它还能准确估计变化的子空间,以及与地面实况相关的严重程度。
{"title":"Adaptive Bernstein change detector for high-dimensional data streams","authors":"Marco Heyden, Edouard Fouché, Vadim Arzamasov, Tanja Fenn, Florian Kalinke, Klemens Böhm","doi":"10.1007/s10618-023-00999-5","DOIUrl":"https://doi.org/10.1007/s10618-023-00999-5","url":null,"abstract":"<p>Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring and prediction systems to react, e.g., by issuing an alarm or by updating a learning algorithm. However, detecting changes is challenging when observations are high-dimensional. In high-dimensional data, change detectors should not only be able to identify when changes happen, but also in which subspace they occur. Ideally, one should also quantify how severe they are. Our approach, ABCD, has these properties. ABCD learns an encoder-decoder model and monitors its accuracy over a window of adaptive size. ABCD derives a change score based on Bernstein’s inequality to detect deviations in terms of accuracy, which indicate changes. Our experiments demonstrate that ABCD outperforms its best competitor by up to 20% in F1-score on average. It can also accurately estimate changes’ subspace, together with a severity measure that correlates with the ground truth.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"54 ","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139411891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When graph convolution meets double attention: online privacy disclosure detection with multi-label text classification 当图卷积遇到双重关注:利用多标签文本分类进行在线隐私披露检测
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-01-05 DOI: 10.1007/s10618-023-00992-y
Zhanbo Liang, Jie Guo, Weidong Qiu, Zheng Huang, Shujun Li

With the rise of Web 2.0 platforms such as online social media, people’s private information, such as their location, occupation and even family information, is often inadvertently disclosed through online discussions. Therefore, it is important to detect such unwanted privacy disclosures to help alert people affected and the online platform. In this paper, privacy disclosure detection is modeled as a multi-label text classification (MLTC) problem, and a new privacy disclosure detection model is proposed to construct an MLTC classifier for detecting online privacy disclosures. This classifier takes an online post as the input and outputs multiple labels, each reflecting a possible privacy disclosure. The proposed presentation method combines three different sources of information, the input text itself, the label-to-text correlation and the label-to-label correlation. A double-attention mechanism is used to combine the first two sources of information, and a graph convolutional network is employed to extract the third source of information that is then used to help fuse features extracted from the first two sources of information. Our extensive experimental results, obtained on a public dataset of privacy-disclosing posts on Twitter, demonstrated that our proposed privacy disclosure detection method significantly and consistently outperformed other state-of-the-art methods in terms of all key performance indicators.

随着网络社交媒体等 Web 2.0 平台的兴起,人们的私人信息,如位置、职业甚至家庭信息,往往会在网上讨论中不经意地泄露。因此,检测此类不必要的隐私泄露以帮助提醒受影响者和网络平台是非常重要的。本文将隐私披露检测建模为一个多标签文本分类(MLTC)问题,并提出了一个新的隐私披露检测模型,以构建一个用于检测在线隐私披露的 MLTC 分类器。该分类器以网上帖子为输入,输出多个标签,每个标签反映一个可能的隐私披露。所提出的呈现方法结合了三种不同的信息来源:输入文本本身、标签与文本之间的相关性以及标签与标签之间的相关性。双重关注机制用于结合前两个信息源,图卷积网络用于提取第三个信息源,然后用来帮助融合从前两个信息源中提取的特征。我们在 Twitter 上公开的隐私披露帖子数据集上取得的大量实验结果表明,我们提出的隐私披露检测方法在所有关键性能指标上都显著且持续地优于其他最先进的方法。
{"title":"When graph convolution meets double attention: online privacy disclosure detection with multi-label text classification","authors":"Zhanbo Liang, Jie Guo, Weidong Qiu, Zheng Huang, Shujun Li","doi":"10.1007/s10618-023-00992-y","DOIUrl":"https://doi.org/10.1007/s10618-023-00992-y","url":null,"abstract":"<p>With the rise of Web 2.0 platforms such as online social media, people’s private information, such as their location, occupation and even family information, is often inadvertently disclosed through online discussions. Therefore, it is important to detect such unwanted privacy disclosures to help alert people affected and the online platform. In this paper, privacy disclosure detection is modeled as a multi-label text classification (MLTC) problem, and a new privacy disclosure detection model is proposed to construct an MLTC classifier for detecting online privacy disclosures. This classifier takes an online post as the input and outputs multiple labels, each reflecting a possible privacy disclosure. The proposed presentation method combines three different sources of information, the input text itself, the label-to-text correlation and the label-to-label correlation. A double-attention mechanism is used to combine the first two sources of information, and a graph convolutional network is employed to extract the third source of information that is then used to help fuse features extracted from the first two sources of information. Our extensive experimental results, obtained on a public dataset of privacy-disclosing posts on Twitter, demonstrated that our proposed privacy disclosure detection method significantly and consistently outperformed other state-of-the-art methods in terms of all key performance indicators.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"80 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139376729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: A semi‑supervised interactive algorithm for change point detection 更正:用于变化点检测的半监督交互式算法
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-01-02 DOI: 10.1007/s10618-023-01000-z
Zhenxiang Cao, N. Seeuws, Maarten Vos, Alexander Bertrand
{"title":"Correction: A semi‑supervised interactive algorithm for change point detection","authors":"Zhenxiang Cao, N. Seeuws, Maarten Vos, Alexander Bertrand","doi":"10.1007/s10618-023-01000-z","DOIUrl":"https://doi.org/10.1007/s10618-023-01000-z","url":null,"abstract":"","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"54 20","pages":"1"},"PeriodicalIF":4.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139390140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting consumer choice from raw eye-movement data using the RETINA deep learning architecture 利用 RETINA 深度学习架构从原始眼动数据中预测消费者选择
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-29 DOI: 10.1007/s10618-023-00989-7
Moshe Unger, Michel Wedel, Alexander Tuzhilin

We propose the use of a deep learning architecture, called RETINA, to predict multi-alternative, multi-attribute consumer choice from eye movement data. RETINA directly uses the complete time series of raw eye-tracking data from both eyes as input to state-of-the art Transformer and Metric Learning Deep Learning methods. Using the raw data input eliminates the information loss that may result from first calculating fixations, deriving metrics from the fixations data and analysing those metrics, as has been often done in eye movement research, and allows us to apply Deep Learning to eye tracking data sets of the size commonly encountered in academic and applied research. Using a data set with 112 respondents who made choices among four laptops, we show that the proposed architecture outperforms other state-of-the-art machine learning methods (standard BERT, LSTM, AutoML, logistic regression) calibrated on raw data or fixation data. The analysis of partial time and partial data segments reveals the ability of RETINA to predict choice outcomes well before participants reach a decision. Specifically, we find that using a mere 5 s of data, the RETINA architecture achieves a predictive validation accuracy of over 0.7. We provide an assessment of which features of the eye movement data contribute to RETINA’s prediction accuracy. We make recommendations on how the proposed deep learning architecture can be used as a basis for future academic research, in particular its application to eye movements collected from front-facing video cameras.

我们建议使用一种名为 RETINA 的深度学习架构,从眼动数据中预测消费者的多选择、多属性选择。RETINA 直接使用来自双眼的原始眼动跟踪数据的完整时间序列,作为最先进的变换器和度量学习深度学习方法的输入。使用原始数据输入消除了眼动研究中经常使用的首先计算固定点、从固定点数据中得出度量值并分析这些度量值可能造成的信息损失,并使我们能够将深度学习应用于学术和应用研究中常见的眼动跟踪数据集。通过使用 112 名受访者在四台笔记本电脑中进行选择的数据集,我们发现,所提出的架构优于其他最先进的机器学习方法(标准 BERT、LSTM、AutoML、逻辑回归),这些方法都是在原始数据或固定数据上进行校准的。对部分时间和部分数据片段的分析表明,RETINA 能够在参与者做出决定之前预测选择结果。具体来说,我们发现 RETINA 架构使用短短 5 秒钟的数据就能达到 0.7 以上的预测验证准确率。我们评估了眼动数据的哪些特征有助于提高 RETINA 的预测准确率。我们就如何将所提出的深度学习架构作为未来学术研究的基础提出了建议,特别是将其应用于从前置摄像头收集的眼动数据。
{"title":"Predicting consumer choice from raw eye-movement data using the RETINA deep learning architecture","authors":"Moshe Unger, Michel Wedel, Alexander Tuzhilin","doi":"10.1007/s10618-023-00989-7","DOIUrl":"https://doi.org/10.1007/s10618-023-00989-7","url":null,"abstract":"<p>We propose the use of a deep learning architecture, called RETINA, to predict multi-alternative, multi-attribute consumer choice from eye movement data. RETINA directly uses the complete time series of raw eye-tracking data from both eyes as input to state-of-the art Transformer and Metric Learning Deep Learning methods. Using the raw data input eliminates the information loss that may result from first calculating fixations, deriving metrics from the fixations data and analysing those metrics, as has been often done in eye movement research, and allows us to apply Deep Learning to eye tracking data sets of the size commonly encountered in academic and applied research. Using a data set with 112 respondents who made choices among four laptops, we show that the proposed architecture outperforms other state-of-the-art machine learning methods (standard BERT, LSTM, AutoML, logistic regression) calibrated on raw data or fixation data. The analysis of partial time and partial data segments reveals the ability of RETINA to predict choice outcomes well before participants reach a decision. Specifically, we find that using a mere 5 s of data, the RETINA architecture achieves a predictive validation accuracy of over 0.7. We provide an assessment of which features of the eye movement data contribute to RETINA’s prediction accuracy. We make recommendations on how the proposed deep learning architecture can be used as a basis for future academic research, in particular its application to eye movements collected from front-facing video cameras.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"29 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139064560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session-based recommendation by exploiting substitutable and complementary relationships from multi-behavior data 利用多行为数据中的可替代和互补关系,进行基于会话的推荐
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-26 DOI: 10.1007/s10618-023-00994-w
Huizi Wu, Cong Geng, Hui Fang

Session-based recommendation (SR) aims to dynamically recommend items to a user based on a sequence of the most recent user-item interactions. Most existing studies on SR adopt advanced deep learning methods. However, the majority only consider a special behavior type (e.g., click), while those few considering multi-typed behaviors ignore to take full advantage of the relationships between products (items). In this case, the paper proposes a novel approach, called Substitutable and Complementary Relationships from Multi-behavior Data (denoted as SCRM) to better explore the relationships between products for effective recommendation. Specifically, we firstly construct substitutable and complementary graphs based on a user’s sequential behaviors in every session by jointly considering ‘click’ and ‘purchase’ behaviors. We then design a denoising network to remove false relationships, and further consider constraints on the two relationships via a particularly designed loss function. Extensive experiments on two e-commerce datasets demonstrate the superiority of our model over state-of-the-art methods, and the effectiveness of every component in SCRM.

基于会话的推荐(SR)旨在根据用户与物品最近的交互序列向用户动态推荐物品。关于会话推荐的现有研究大多采用先进的深度学习方法。然而,大多数研究只考虑了一种特殊的行为类型(如点击),而少数考虑多类型行为的研究则忽略了充分利用产品(项目)之间的关系。在这种情况下,本文提出了一种名为 "多行为数据中的可替代和互补关系"(Substitutable and Complementary Relationships from Multi-behavior Data,简称 SCRM)的新方法,以更好地探索产品之间的关系,从而实现有效的推荐。具体来说,我们首先通过联合考虑 "点击 "和 "购买 "行为,根据用户在每个会话中的连续行为构建可替代和互补图。然后,我们设计了一个去噪网络来去除虚假关系,并通过一个特别设计的损失函数进一步考虑对这两种关系的约束。在两个电子商务数据集上进行的广泛实验证明了我们的模型优于最先进的方法,以及 SCRM 中每个组件的有效性。
{"title":"Session-based recommendation by exploiting substitutable and complementary relationships from multi-behavior data","authors":"Huizi Wu, Cong Geng, Hui Fang","doi":"10.1007/s10618-023-00994-w","DOIUrl":"https://doi.org/10.1007/s10618-023-00994-w","url":null,"abstract":"<p>Session-based recommendation (SR) aims to dynamically recommend items to a user based on a sequence of the most recent user-item interactions. Most existing studies on SR adopt advanced deep learning methods. However, the majority only consider a special behavior type (e.g., click), while those few considering multi-typed behaviors ignore to take full advantage of the relationships between products (items). In this case, the paper proposes a novel approach, called Substitutable and Complementary Relationships from Multi-behavior Data (denoted as SCRM) to better explore the relationships between products for effective recommendation. Specifically, we firstly construct substitutable and complementary graphs based on a user’s sequential behaviors in every session by jointly considering ‘click’ and ‘purchase’ behaviors. We then design a denoising network to remove false relationships, and further consider constraints on the two relationships via a particularly designed loss function. Extensive experiments on two e-commerce datasets demonstrate the superiority of our model over state-of-the-art methods, and the effectiveness of every component in SCRM.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"37 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139057144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OEC: an online ensemble classifier for mining data streams with noisy labels OEC:用于挖掘带噪声标签数据流的在线集合分类器
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-12 DOI: 10.1007/s10618-023-00990-0
Ling Jian, Kai Shao, Ying Liu, Jundong Li, Xijun Liang

Distilling actionable patterns from large-scale streaming data in the presence of concept drift is a challenging problem, especially when data is polluted with noisy labels. To date, various data stream mining algorithms have been proposed and extensively used in many real-world applications. Considering the functional complementation of classical online learning algorithms and with the goal of combining their advantages, we propose an Online Ensemble Classification (OEC) algorithm to integrate the predictions obtained by different base online classification algorithms. The proposed OEC method works by learning weights of different base classifiers dynamically through the classical Normalized Exponentiated Gradient (NEG) algorithm framework. As a result, the proposed OEC inherits the adaptability and flexibility of concept drift-tracking online classifiers, while maintaining the robustness of noise-resistant online classifiers. Theoretically, we show OEC algorithm is a low regret algorithm which makes it a good candidate to learn from noisy streaming data. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed OEC method.

从存在概念漂移的大规模流数据中提取可操作模式是一个具有挑战性的问题,特别是当数据被噪声标签污染时。迄今为止,各种数据流挖掘算法已被提出并广泛应用于许多实际应用中。考虑到经典在线学习算法的功能互补,以结合它们的优点为目标,提出了一种在线集成分类(OEC)算法来整合不同基础在线分类算法得到的预测结果。该方法通过经典的归一化指数梯度(NEG)算法框架动态学习不同基分类器的权值。因此,所提出的OEC继承了概念漂移跟踪在线分类器的适应性和灵活性,同时保持了抗噪声在线分类器的鲁棒性。从理论上讲,我们证明了OEC算法是一种低遗憾算法,使其成为从有噪声流数据中学习的良好候选算法。在合成数据集和实际数据集上进行的大量实验证明了所提出的OEC方法的有效性。
{"title":"OEC: an online ensemble classifier for mining data streams with noisy labels","authors":"Ling Jian, Kai Shao, Ying Liu, Jundong Li, Xijun Liang","doi":"10.1007/s10618-023-00990-0","DOIUrl":"https://doi.org/10.1007/s10618-023-00990-0","url":null,"abstract":"<p>Distilling actionable patterns from large-scale streaming data in the presence of concept drift is a challenging problem, especially when data is polluted with noisy labels. To date, various data stream mining algorithms have been proposed and extensively used in many real-world applications. Considering the functional complementation of classical online learning algorithms and with the goal of combining their advantages, we propose an Online Ensemble Classification (OEC) algorithm to integrate the predictions obtained by different base online classification algorithms. The proposed OEC method works by learning weights of different base classifiers dynamically through the classical Normalized Exponentiated Gradient (NEG) algorithm framework. As a result, the proposed OEC inherits the adaptability and flexibility of concept drift-tracking online classifiers, while maintaining the robustness of noise-resistant online classifiers. Theoretically, we show OEC algorithm is a low regret algorithm which makes it a good candidate to learn from noisy streaming data. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed OEC method.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"177 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138628812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-aware decoupled imputation network for multivariate time series 多变量时间序列的结构感知解耦归因网络
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-08 DOI: 10.1007/s10618-023-00987-9
Nourhan Ahmed, Lars Schmidt-Thieme

Handling incomplete multivariate time series is an important and fundamental concern for a variety of domains. Existing time-series imputation approaches rely on basic assumptions regarding relationship information between sensors, posing significant challenges since inter-sensor interactions in the real world are often complex and unknown beforehand. Specifically, there is a lack of in-depth investigation into (1) the coexistence of relationships between sensors and (2) the incorporation of reciprocal impact between sensor properties and inter-sensor relationships for the time-series imputation problem. To fill this gap, we present the Structure-aware Decoupled imputation network (SaD), which is designed to model sensor characteristics and relationships between sensors in distinct latent spaces. Our approach is equipped with a two-step knowledge integration scheme that incorporates the influence between the sensor attribute information as well as sensor relationship information. The experimental results indicate that when compared to state-of-the-art models for time-series imputation tasks, our proposed method can reduce error by around 15%.

处理不完整的多变量时间序列是各种领域的重要基本问题。现有的时间序列估算方法依赖于有关传感器之间关系信息的基本假设,这带来了巨大的挑战,因为现实世界中传感器之间的交互通常是复杂的,而且事先是未知的。具体来说,目前缺乏对以下方面的深入研究:(1) 传感器之间关系的共存性;(2) 在时间序列估算问题中纳入传感器属性和传感器间关系之间的相互影响。为了填补这一空白,我们提出了结构感知解耦合估算网络(SaD),其目的是在不同的潜在空间中对传感器特性和传感器之间的关系进行建模。我们的方法配备了两步知识整合方案,将传感器属性信息和传感器关系信息之间的影响纳入其中。实验结果表明,与用于时间序列估算任务的最先进模型相比,我们提出的方法可将误差减少约 15%。
{"title":"Structure-aware decoupled imputation network for multivariate time series","authors":"Nourhan Ahmed, Lars Schmidt-Thieme","doi":"10.1007/s10618-023-00987-9","DOIUrl":"https://doi.org/10.1007/s10618-023-00987-9","url":null,"abstract":"<p>Handling incomplete multivariate time series is an important and fundamental concern for a variety of domains. Existing time-series imputation approaches rely on basic assumptions regarding relationship information between sensors, posing significant challenges since inter-sensor interactions in the real world are often complex and unknown beforehand. Specifically, there is a lack of in-depth investigation into (1) the coexistence of relationships between sensors and (2) the incorporation of reciprocal impact between sensor properties and inter-sensor relationships for the time-series imputation problem. To fill this gap, we present the Structure-aware Decoupled imputation network (SaD), which is designed to model sensor characteristics and relationships between sensors in distinct latent spaces. Our approach is equipped with a two-step knowledge integration scheme that incorporates the influence between the sensor attribute information as well as sensor relationship information. The experimental results indicate that when compared to state-of-the-art models for time-series imputation tasks, our proposed method can reduce error by around 15%.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"107 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138555888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Outcomes and Surgical Nuances of Minimally Invasive Parotid Surgery for Pleomorphic Adenoma. 微创腮腺手术治疗多形性腺瘤的疗效及手术特点。
IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-01 Epub Date: 2023-06-19 DOI: 10.1007/s12070-023-03947-3
Kalyana Sundaram Chidambaram, Manjul Muraleedharan, Amit Keshri, Sabaratnam Mayilvaganan, Nazrin Hameed, Mohd Aqib, Arushi Kumar, Ravi Sankar Manogaran, Raj Kumar

Benign parotid tumors follow an indolent course and present as slow-growing painless swelling in the pre-and-infra-auricular areas. The treatment of choice is surgery. Though the gold standard technique is Superficial Parotidectomy, Extracapsular Dissection (ECD) is an alternative option with the same outcome and decreased complications. This study discusses our experience with extracapsular dissection and the surgical nuances for better results. A retrospective study of histologically confirmed cases of pleomorphic adenoma of the parotid gland, who underwent Extracapsular dissection between September 2019 and March 2023, was done. The demographic details, clinical characteristics, and outcomes were evaluated. There were 33 patients, including 16 females and 17 males, with a mean age of 32.75 years. All cases presented as slow-growing painless swelling for a mean duration of 5 years. Most of the tumors (94%) were of size between 2 and 4 cm, with few tumors more than 4 cm. All underwent extracapsular dissection with complete excision. There was only one complication (seroma) and no incidence of facial palsy in our experience with ECD. The goal of a benign parotid surgery is the complete removal of the tumor with minimum complications, which could be achieved with ECD, which has good tumor clearance and lesser rates of complications with good cosmesis. Thus, this minimally invasive parotid surgery could be a worthwhile option in properly selected cases.

良性腮腺肿瘤的病程无痛,表现为耳前和耳下区域缓慢生长的无痛性肿胀。治疗的选择是手术。虽然金标准技术是腮腺表面切除术,但囊外剥离(ECD)是一种替代选择,具有相同的结果和减少并发症。本研究讨论我们的经验囊外剥离和手术的细微差别,以获得更好的结果。回顾性研究了2019年9月至2023年3月间经组织学证实的腮腺多形性腺瘤囊外剥离术。评估了人口统计学细节、临床特征和结果。33例患者,其中女性16例,男性17例,平均年龄32.75岁。所有病例均表现为缓慢生长的无痛性肿胀,平均持续时间5年。绝大多数(94%)肿瘤大小在2 ~ 4cm之间,少数肿瘤大于4cm。所有患者均行囊外剥离并完全切除。在我们治疗ECD的经验中,只有一个并发症(血清肿),没有面瘫的发生。良性腮腺手术的目标是在并发症最少的情况下完全切除肿瘤,这可以通过ECD来实现,ECD具有良好的肿瘤清除率和较低的并发症发生率,并且具有良好的美容效果。因此,在适当选择的病例中,这种微创腮腺手术是值得选择的。
{"title":"The Outcomes and Surgical Nuances of Minimally Invasive Parotid Surgery for Pleomorphic Adenoma.","authors":"Kalyana Sundaram Chidambaram, Manjul Muraleedharan, Amit Keshri, Sabaratnam Mayilvaganan, Nazrin Hameed, Mohd Aqib, Arushi Kumar, Ravi Sankar Manogaran, Raj Kumar","doi":"10.1007/s12070-023-03947-3","DOIUrl":"10.1007/s12070-023-03947-3","url":null,"abstract":"<p><p>Benign parotid tumors follow an indolent course and present as slow-growing painless swelling in the pre-and-infra-auricular areas. The treatment of choice is surgery. Though the gold standard technique is Superficial Parotidectomy, Extracapsular Dissection (ECD) is an alternative option with the same outcome and decreased complications. This study discusses our experience with extracapsular dissection and the surgical nuances for better results. A retrospective study of histologically confirmed cases of pleomorphic adenoma of the parotid gland, who underwent Extracapsular dissection between September 2019 and March 2023, was done. The demographic details, clinical characteristics, and outcomes were evaluated. There were 33 patients, including 16 females and 17 males, with a mean age of 32.75 years. All cases presented as slow-growing painless swelling for a mean duration of 5 years. Most of the tumors (94%) were of size between 2 and 4 cm, with few tumors more than 4 cm. All underwent extracapsular dissection with complete excision. There was only one complication (seroma) and no incidence of facial palsy in our experience with ECD. The goal of a benign parotid surgery is the complete removal of the tumor with minimum complications, which could be achieved with ECD, which has good tumor clearance and lesser rates of complications with good cosmesis. Thus, this minimally invasive parotid surgery could be a worthwhile option in properly selected cases.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"28 1","pages":"3256-3262"},"PeriodicalIF":2.8,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10645680/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73804083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Navigating the metric maze: a taxonomy of evaluation metrics for anomaly detection in time series 导航度量迷宫:在时间序列中异常检测的评估度量的分类
IF 4.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-18 DOI: 10.1007/s10618-023-00988-8
Sondre Sørbø, Massimiliano Ruocco

The field of time series anomaly detection is constantly advancing, with several methods available, making it a challenge to determine the most appropriate method for a specific domain. The evaluation of these methods is facilitated by the use of metrics, which vary widely in their properties. Despite the existence of new evaluation metrics, there is limited agreement on which metrics are best suited for specific scenarios and domains, and the most commonly used metrics have faced criticism in the literature. This paper provides a comprehensive overview of the metrics used for the evaluation of time series anomaly detection methods, and also defines a taxonomy of these based on how they are calculated. By defining a set of properties for evaluation metrics and a set of specific case studies and experiments, twenty metrics are analyzed and discussed in detail, highlighting the unique suitability of each for specific tasks. Through extensive experimentation and analysis, this paper argues that the choice of evaluation metric must be made with care, taking into account the specific requirements of the task at hand.

时间序列异常检测领域不断发展,有几种可用的方法,这使得确定最适合特定领域的方法成为一项挑战。这些方法的评价是通过使用度量来促进的,这些度量在性质上有很大的不同。尽管存在新的评估度量标准,但对于哪些度量标准最适合特定的场景和领域,存在有限的共识,并且最常用的度量标准在文献中面临批评。本文提供了用于评估时间序列异常检测方法的指标的全面概述,并根据它们的计算方式定义了这些方法的分类。通过为评估指标定义一组属性和一组具体的案例研究和实验,详细分析和讨论了20个指标,突出了每个指标对特定任务的独特适用性。通过广泛的实验和分析,本文认为必须谨慎地选择评估度量,考虑到手头任务的具体要求。
{"title":"Navigating the metric maze: a taxonomy of evaluation metrics for anomaly detection in time series","authors":"Sondre Sørbø, Massimiliano Ruocco","doi":"10.1007/s10618-023-00988-8","DOIUrl":"https://doi.org/10.1007/s10618-023-00988-8","url":null,"abstract":"<p>The field of time series anomaly detection is constantly advancing, with several methods available, making it a challenge to determine the most appropriate method for a specific domain. The evaluation of these methods is facilitated by the use of metrics, which vary widely in their properties. Despite the existence of new evaluation metrics, there is limited agreement on which metrics are best suited for specific scenarios and domains, and the most commonly used metrics have faced criticism in the literature. This paper provides a comprehensive overview of the metrics used for the evaluation of time series anomaly detection methods, and also defines a taxonomy of these based on how they are calculated. By defining a set of properties for evaluation metrics and a set of specific case studies and experiments, twenty metrics are analyzed and discussed in detail, highlighting the unique suitability of each for specific tasks. Through extensive experimentation and analysis, this paper argues that the choice of evaluation metric must be made with care, taking into account the specific requirements of the task at hand.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"13 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data Mining and Knowledge Discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1