Data Mining and Knowledge Discovery最新文献_第8页

Random walk with restart on hypergraphs: fast computation and an application to anomaly detection 超图上重新开始的随机行走：快速计算及异常检测应用

IF 4.8 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data Mining and Knowledge Discovery

Pub Date : 2023-12-21 DOI: 10.1007/s10618-023-00995-9

Jaewan Chun, Geon Lee, Kijung Shin, Jinhong Jung

Random walk with restart (RWR) is a widely-used measure of node similarity in graphs, and it has proved useful for ranking, community detection, link prediction, anomaly detection, etc. Since RWR is typically required to be computed separately for a larger number of query nodes or even for all nodes, fast computation of it is indispensable. However, for hypergraphs, the fast computation of RWR has been unexplored, despite its great potential. In this paper, we propose ARCHER, a fast computation framework for RWR on hypergraphs. Specifically, we first formally define RWR on hypergraphs, and then we propose two computation methods that compose ARCHER. Since the two methods are complementary (i.e., offering relative advantages on different hypergraphs), we also develop a method for automatic selection between them, which takes a very short time compared to the total running time. Through our extensive experiments on 18 real-world hypergraphs, we demonstrate (a) the speed and space efficiency of ARCHER, (b) the complementary nature of the two computation methods composing ARCHER, (c) the accuracy of its automatic selection method, and (d) its successful application to anomaly detection on hypergraphs.

带重启的随机漫步（RWR）是一种广泛使用的图中节点相似性度量方法，已被证明可用于排名、群落检测、链接预测、异常检测等。由于 RWR 通常需要对大量查询节点甚至所有节点分别计算，因此快速计算 RWR 是必不可少的。然而，对于超图而言，尽管 RWR 具有巨大的潜力，但其快速计算却一直未被探索。本文提出了超图上 RWR 的快速计算框架 ARCHER。具体来说，我们首先正式定义了超图上的 RWR，然后提出了组成 ARCHER 的两种计算方法。由于这两种方法是互补的（即在不同的超图上具有相对优势），我们还开发了一种在它们之间进行自动选择的方法，与总运行时间相比，这种方法只需要很短的时间。通过在 18 个真实超图上的大量实验，我们证明了（a）ARCHER 的速度和空间效率，（b）组成 ARCHER 的两种计算方法的互补性，（c）其自动选择方法的准确性，以及（d）其在超图异常检测上的成功应用。

{"title":"Random walk with restart on hypergraphs: fast computation and an application to anomaly detection","authors":"Jaewan Chun, Geon Lee, Kijung Shin, Jinhong Jung","doi":"10.1007/s10618-023-00995-9","DOIUrl":"https://doi.org/10.1007/s10618-023-00995-9","url":null,"abstract":"Random walk with restart (RWR) is a widely-used measure of node similarity in graphs, and it has proved useful for ranking, community detection, link prediction, anomaly detection, etc. Since RWR is typically required to be computed separately for a larger number of query nodes or even for all nodes, fast computation of it is indispensable. However, for hypergraphs, the fast computation of RWR has been unexplored, despite its great potential. In this paper, we propose ARCHER, a fast computation framework for RWR on hypergraphs. Specifically, we first formally define RWR on hypergraphs, and then we propose two computation methods that compose ARCHER. Since the two methods are complementary (i.e., offering relative advantages on different hypergraphs), we also develop a method for automatic selection between them, which takes a very short time compared to the total running time. Through our extensive experiments on 18 real-world hypergraphs, we demonstrate (a) the speed and space efficiency of ARCHER, (b) the complementary nature of the two computation methods composing ARCHER, (c) the accuracy of its automatic selection method, and (d) its successful application to anomaly detection on hypergraphs.","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"69 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138823850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach 通过调整进化方法改进数据流的超参数自调整

IF 4.8 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data Mining and Knowledge Discovery

Pub Date : 2023-12-21 DOI: 10.1007/s10618-023-00997-7

Antonio R. Moya, Bruno Veloso, João Gama, Sebastián Ventura

引用次数: 0

OEC: an online ensemble classifier for mining data streams with noisy labels OEC：用于挖掘带噪声标签数据流的在线集合分类器

IF 4.8 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data Mining and Knowledge Discovery

Pub Date : 2023-12-12 DOI: 10.1007/s10618-023-00990-0

Ling Jian, Kai Shao, Ying Liu, Jundong Li, Xijun Liang

Distilling actionable patterns from large-scale streaming data in the presence of concept drift is a challenging problem, especially when data is polluted with noisy labels. To date, various data stream mining algorithms have been proposed and extensively used in many real-world applications. Considering the functional complementation of classical online learning algorithms and with the goal of combining their advantages, we propose an Online Ensemble Classification (OEC) algorithm to integrate the predictions obtained by different base online classification algorithms. The proposed OEC method works by learning weights of different base classifiers dynamically through the classical Normalized Exponentiated Gradient (NEG) algorithm framework. As a result, the proposed OEC inherits the adaptability and flexibility of concept drift-tracking online classifiers, while maintaining the robustness of noise-resistant online classifiers. Theoretically, we show OEC algorithm is a low regret algorithm which makes it a good candidate to learn from noisy streaming data. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed OEC method.

从存在概念漂移的大规模流数据中提取可操作模式是一个具有挑战性的问题，特别是当数据被噪声标签污染时。迄今为止，各种数据流挖掘算法已被提出并广泛应用于许多实际应用中。考虑到经典在线学习算法的功能互补，以结合它们的优点为目标，提出了一种在线集成分类(OEC)算法来整合不同基础在线分类算法得到的预测结果。该方法通过经典的归一化指数梯度(NEG)算法框架动态学习不同基分类器的权值。因此，所提出的OEC继承了概念漂移跟踪在线分类器的适应性和灵活性，同时保持了抗噪声在线分类器的鲁棒性。从理论上讲，我们证明了OEC算法是一种低遗憾算法，使其成为从有噪声流数据中学习的良好候选算法。在合成数据集和实际数据集上进行的大量实验证明了所提出的OEC方法的有效性。

{"title":"OEC: an online ensemble classifier for mining data streams with noisy labels","authors":"Ling Jian, Kai Shao, Ying Liu, Jundong Li, Xijun Liang","doi":"10.1007/s10618-023-00990-0","DOIUrl":"https://doi.org/10.1007/s10618-023-00990-0","url":null,"abstract":"Distilling actionable patterns from large-scale streaming data in the presence of concept drift is a challenging problem, especially when data is polluted with noisy labels. To date, various data stream mining algorithms have been proposed and extensively used in many real-world applications. Considering the functional complementation of classical online learning algorithms and with the goal of combining their advantages, we propose an Online Ensemble Classification (OEC) algorithm to integrate the predictions obtained by different base online classification algorithms. The proposed OEC method works by learning weights of different base classifiers dynamically through the classical Normalized Exponentiated Gradient (NEG) algorithm framework. As a result, the proposed OEC inherits the adaptability and flexibility of concept drift-tracking online classifiers, while maintaining the robustness of noise-resistant online classifiers. Theoretically, we show OEC algorithm is a low regret algorithm which makes it a good candidate to learn from noisy streaming data. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed OEC method.","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"177 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138628812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Structure-aware decoupled imputation network for multivariate time series 多变量时间序列的结构感知解耦归因网络

IF 4.8 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data Mining and Knowledge Discovery

Pub Date : 2023-12-08 DOI: 10.1007/s10618-023-00987-9

Nourhan Ahmed, Lars Schmidt-Thieme

Handling incomplete multivariate time series is an important and fundamental concern for a variety of domains. Existing time-series imputation approaches rely on basic assumptions regarding relationship information between sensors, posing significant challenges since inter-sensor interactions in the real world are often complex and unknown beforehand. Specifically, there is a lack of in-depth investigation into (1) the coexistence of relationships between sensors and (2) the incorporation of reciprocal impact between sensor properties and inter-sensor relationships for the time-series imputation problem. To fill this gap, we present the Structure-aware Decoupled imputation network (SaD), which is designed to model sensor characteristics and relationships between sensors in distinct latent spaces. Our approach is equipped with a two-step knowledge integration scheme that incorporates the influence between the sensor attribute information as well as sensor relationship information. The experimental results indicate that when compared to state-of-the-art models for time-series imputation tasks, our proposed method can reduce error by around 15%.

处理不完整的多变量时间序列是各种领域的重要基本问题。现有的时间序列估算方法依赖于有关传感器之间关系信息的基本假设，这带来了巨大的挑战，因为现实世界中传感器之间的交互通常是复杂的，而且事先是未知的。具体来说，目前缺乏对以下方面的深入研究：(1) 传感器之间关系的共存性；(2) 在时间序列估算问题中纳入传感器属性和传感器间关系之间的相互影响。为了填补这一空白，我们提出了结构感知解耦合估算网络（SaD），其目的是在不同的潜在空间中对传感器特性和传感器之间的关系进行建模。我们的方法配备了两步知识整合方案，将传感器属性信息和传感器关系信息之间的影响纳入其中。实验结果表明，与用于时间序列估算任务的最先进模型相比，我们提出的方法可将误差减少约 15%。

{"title":"Structure-aware decoupled imputation network for multivariate time series","authors":"Nourhan Ahmed, Lars Schmidt-Thieme","doi":"10.1007/s10618-023-00987-9","DOIUrl":"https://doi.org/10.1007/s10618-023-00987-9","url":null,"abstract":"Handling incomplete multivariate time series is an important and fundamental concern for a variety of domains. Existing time-series imputation approaches rely on basic assumptions regarding relationship information between sensors, posing significant challenges since inter-sensor interactions in the real world are often complex and unknown beforehand. Specifically, there is a lack of in-depth investigation into (1) the coexistence of relationships between sensors and (2) the incorporation of reciprocal impact between sensor properties and inter-sensor relationships for the time-series imputation problem. To fill this gap, we present the Structure-aware Decoupled imputation network (SaD), which is designed to model sensor characteristics and relationships between sensors in distinct latent spaces. Our approach is equipped with a two-step knowledge integration scheme that incorporates the influence between the sensor attribute information as well as sensor relationship information. The experimental results indicate that when compared to state-of-the-art models for time-series imputation tasks, our proposed method can reduce error by around 15%.","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"107 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138555888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Outcomes and Surgical Nuances of Minimally Invasive Parotid Surgery for Pleomorphic Adenoma. 微创腮腺手术治疗多形性腺瘤的疗效及手术特点。

IF 2.8 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data Mining and Knowledge Discovery

Pub Date : 2023-12-01 Epub Date: 2023-06-19 DOI: 10.1007/s12070-023-03947-3

Kalyana Sundaram Chidambaram, Manjul Muraleedharan, Amit Keshri, Sabaratnam Mayilvaganan, Nazrin Hameed, Mohd Aqib, Arushi Kumar, Ravi Sankar Manogaran, Raj Kumar

Benign parotid tumors follow an indolent course and present as slow-growing painless swelling in the pre-and-infra-auricular areas. The treatment of choice is surgery. Though the gold standard technique is Superficial Parotidectomy, Extracapsular Dissection (ECD) is an alternative option with the same outcome and decreased complications. This study discusses our experience with extracapsular dissection and the surgical nuances for better results. A retrospective study of histologically confirmed cases of pleomorphic adenoma of the parotid gland, who underwent Extracapsular dissection between September 2019 and March 2023, was done. The demographic details, clinical characteristics, and outcomes were evaluated. There were 33 patients, including 16 females and 17 males, with a mean age of 32.75 years. All cases presented as slow-growing painless swelling for a mean duration of 5 years. Most of the tumors (94%) were of size between 2 and 4 cm, with few tumors more than 4 cm. All underwent extracapsular dissection with complete excision. There was only one complication (seroma) and no incidence of facial palsy in our experience with ECD. The goal of a benign parotid surgery is the complete removal of the tumor with minimum complications, which could be achieved with ECD, which has good tumor clearance and lesser rates of complications with good cosmesis. Thus, this minimally invasive parotid surgery could be a worthwhile option in properly selected cases.

良性腮腺肿瘤的病程无痛，表现为耳前和耳下区域缓慢生长的无痛性肿胀。治疗的选择是手术。虽然金标准技术是腮腺表面切除术，但囊外剥离(ECD)是一种替代选择，具有相同的结果和减少并发症。本研究讨论我们的经验囊外剥离和手术的细微差别，以获得更好的结果。回顾性研究了2019年9月至2023年3月间经组织学证实的腮腺多形性腺瘤囊外剥离术。评估了人口统计学细节、临床特征和结果。33例患者，其中女性16例，男性17例，平均年龄32.75岁。所有病例均表现为缓慢生长的无痛性肿胀，平均持续时间5年。绝大多数(94%)肿瘤大小在2 ~ 4cm之间，少数肿瘤大于4cm。所有患者均行囊外剥离并完全切除。在我们治疗ECD的经验中，只有一个并发症(血清肿)，没有面瘫的发生。良性腮腺手术的目标是在并发症最少的情况下完全切除肿瘤，这可以通过ECD来实现，ECD具有良好的肿瘤清除率和较低的并发症发生率，并且具有良好的美容效果。因此，在适当选择的病例中，这种微创腮腺手术是值得选择的。

{"title":"The Outcomes and Surgical Nuances of Minimally Invasive Parotid Surgery for Pleomorphic Adenoma.","authors":"Kalyana Sundaram Chidambaram, Manjul Muraleedharan, Amit Keshri, Sabaratnam Mayilvaganan, Nazrin Hameed, Mohd Aqib, Arushi Kumar, Ravi Sankar Manogaran, Raj Kumar","doi":"10.1007/s12070-023-03947-3","DOIUrl":"10.1007/s12070-023-03947-3","url":null,"abstract":"Benign parotid tumors follow an indolent course and present as slow-growing painless swelling in the pre-and-infra-auricular areas. The treatment of choice is surgery. Though the gold standard technique is Superficial Parotidectomy, Extracapsular Dissection (ECD) is an alternative option with the same outcome and decreased complications. This study discusses our experience with extracapsular dissection and the surgical nuances for better results. A retrospective study of histologically confirmed cases of pleomorphic adenoma of the parotid gland, who underwent Extracapsular dissection between September 2019 and March 2023, was done. The demographic details, clinical characteristics, and outcomes were evaluated. There were 33 patients, including 16 females and 17 males, with a mean age of 32.75 years. All cases presented as slow-growing painless swelling for a mean duration of 5 years. Most of the tumors (94%) were of size between 2 and 4 cm, with few tumors more than 4 cm. All underwent extracapsular dissection with complete excision. There was only one complication (seroma) and no incidence of facial palsy in our experience with ECD. The goal of a benign parotid surgery is the complete removal of the tumor with minimum complications, which could be achieved with ECD, which has good tumor clearance and lesser rates of complications with good cosmesis. Thus, this minimally invasive parotid surgery could be a worthwhile option in properly selected cases.","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"28 1","pages":"3256-3262"},"PeriodicalIF":2.8,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10645680/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73804083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Navigating the metric maze: a taxonomy of evaluation metrics for anomaly detection in time series 导航度量迷宫:在时间序列中异常检测的评估度量的分类

IF 4.8 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data Mining and Knowledge Discovery

Pub Date : 2023-11-18 DOI: 10.1007/s10618-023-00988-8

Sondre Sørbø, Massimiliano Ruocco

The field of time series anomaly detection is constantly advancing, with several methods available, making it a challenge to determine the most appropriate method for a specific domain. The evaluation of these methods is facilitated by the use of metrics, which vary widely in their properties. Despite the existence of new evaluation metrics, there is limited agreement on which metrics are best suited for specific scenarios and domains, and the most commonly used metrics have faced criticism in the literature. This paper provides a comprehensive overview of the metrics used for the evaluation of time series anomaly detection methods, and also defines a taxonomy of these based on how they are calculated. By defining a set of properties for evaluation metrics and a set of specific case studies and experiments, twenty metrics are analyzed and discussed in detail, highlighting the unique suitability of each for specific tasks. Through extensive experimentation and analysis, this paper argues that the choice of evaluation metric must be made with care, taking into account the specific requirements of the task at hand.

时间序列异常检测领域不断发展，有几种可用的方法，这使得确定最适合特定领域的方法成为一项挑战。这些方法的评价是通过使用度量来促进的，这些度量在性质上有很大的不同。尽管存在新的评估度量标准，但对于哪些度量标准最适合特定的场景和领域，存在有限的共识，并且最常用的度量标准在文献中面临批评。本文提供了用于评估时间序列异常检测方法的指标的全面概述，并根据它们的计算方式定义了这些方法的分类。通过为评估指标定义一组属性和一组具体的案例研究和实验，详细分析和讨论了20个指标，突出了每个指标对特定任务的独特适用性。通过广泛的实验和分析，本文认为必须谨慎地选择评估度量，考虑到手头任务的具体要求。

{"title":"Navigating the metric maze: a taxonomy of evaluation metrics for anomaly detection in time series","authors":"Sondre Sørbø, Massimiliano Ruocco","doi":"10.1007/s10618-023-00988-8","DOIUrl":"https://doi.org/10.1007/s10618-023-00988-8","url":null,"abstract":"The field of time series anomaly detection is constantly advancing, with several methods available, making it a challenge to determine the most appropriate method for a specific domain. The evaluation of these methods is facilitated by the use of metrics, which vary widely in their properties. Despite the existence of new evaluation metrics, there is limited agreement on which metrics are best suited for specific scenarios and domains, and the most commonly used metrics have faced criticism in the literature. This paper provides a comprehensive overview of the metrics used for the evaluation of time series anomaly detection methods, and also defines a taxonomy of these based on how they are calculated. By defining a set of properties for evaluation metrics and a set of specific case studies and experiments, twenty metrics are analyzed and discussed in detail, highlighting the unique suitability of each for specific tasks. Through extensive experimentation and analysis, this paper argues that the choice of evaluation metric must be made with care, taking into account the specific requirements of the task at hand.","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"13 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anomaly detection in sleep: detecting mouth breathing in children 睡眠异常检测:儿童口部呼吸检测

3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data Mining and Knowledge Discovery

Pub Date : 2023-11-13 DOI: 10.1007/s10618-023-00985-x

Luka Biedebach, María Óskarsdóttir, Erna Sif Arnardóttir, Sigridur Sigurdardóttir, Michael Valur Clausen, Sigurveig Þ. Sigurdardóttir, Marta Serwatko, Anna Sigridur Islind

Abstract Identifying mouth breathing during sleep in a reliable, non-invasive way is challenging and currently not included in sleep studies. However, it has a high clinical relevance in pediatrics, as it can negatively impact the physical and mental health of children. Since mouth breathing is an anomalous condition in the general population with only 2% prevalence in our data set, we are facing an anomaly detection problem. This type of human medical data is commonly approached with deep learning methods. However, applying multiple supervised and unsupervised machine learning methods to this anomaly detection problem showed that classic machine learning methods should also be taken into account. This paper compared deep learning and classic machine learning methods on respiratory data during sleep using a leave-one-out cross validation. This way we observed the uncertainty of the models and their performance across participants with varying signal quality and prevalence of mouth breathing. The main contribution is identifying the model with the highest clinical relevance to facilitate the diagnosis of chronic mouth breathing, which may allow more affected children to receive appropriate treatment.

以可靠、无创的方式识别睡眠中的口腔呼吸是一项挑战，目前尚未纳入睡眠研究。然而，它在儿科具有很高的临床相关性，因为它会对儿童的身心健康产生负面影响。由于在我们的数据集中，口呼吸在一般人群中是一种异常情况，患病率仅为2%，因此我们面临着异常检测问题。这种类型的人类医疗数据通常用深度学习方法来处理。然而，将多种有监督和无监督机器学习方法应用于该异常检测问题表明，还应考虑经典的机器学习方法。本文使用留一交叉验证比较了深度学习和经典机器学习方法在睡眠期间呼吸数据上的差异。通过这种方式，我们观察了模型的不确定性及其在不同信号质量和口呼吸流行率的参与者中的表现。主要贡献是确定具有最高临床相关性的模型，以促进慢性口腔呼吸的诊断，这可能使更多受影响的儿童得到适当的治疗。

{"title":"Anomaly detection in sleep: detecting mouth breathing in children","authors":"Luka Biedebach, María Óskarsdóttir, Erna Sif Arnardóttir, Sigridur Sigurdardóttir, Michael Valur Clausen, Sigurveig Þ. Sigurdardóttir, Marta Serwatko, Anna Sigridur Islind","doi":"10.1007/s10618-023-00985-x","DOIUrl":"https://doi.org/10.1007/s10618-023-00985-x","url":null,"abstract":"Abstract Identifying mouth breathing during sleep in a reliable, non-invasive way is challenging and currently not included in sleep studies. However, it has a high clinical relevance in pediatrics, as it can negatively impact the physical and mental health of children. Since mouth breathing is an anomalous condition in the general population with only 2% prevalence in our data set, we are facing an anomaly detection problem. This type of human medical data is commonly approached with deep learning methods. However, applying multiple supervised and unsupervised machine learning methods to this anomaly detection problem showed that classic machine learning methods should also be taken into account. This paper compared deep learning and classic machine learning methods on respiratory data during sleep using a leave-one-out cross validation. This way we observed the uncertainty of the models and their performance across participants with varying signal quality and prevalence of mouth breathing. The main contribution is identifying the model with the highest clinical relevance to facilitate the diagnosis of chronic mouth breathing, which may allow more affected children to receive appropriate treatment.","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"60 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136348550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiple-input neural networks for time series forecasting incorporating historical and prospective context 多输入神经网络时间序列预测结合历史和未来的背景

3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data Mining and Knowledge Discovery

Pub Date : 2023-10-27 DOI: 10.1007/s10618-023-00984-y

João Palet, Vasco Manquinho, Rui Henriques

Abstract Individual and societal systems are open systems continuously affected by their situational context. In recent years, context sources have been increasingly considered in different domains to aid short and long-term forecasts of systems’ behavior. Nevertheless, available research generally disregards the role of prospective context, such as calendrical planning or weather forecasts. This work proposes a multiple-input neural architecture consisting of a sequential composition of long short-term memory units or temporal convolutional networks able to incorporate both historical and prospective sources of situational context to aid time series forecasting tasks. Considering urban case studies, we further assess the impact that different sources of external context have on medical emergency and mobility forecasts. Results show that the incorporation of external context variables, including calendrical and weather variables, can significantly reduce forecasting errors against state-of-the-art forecasters. In particular, the incorporation of prospective context, generally neglected in related work, mitigates error increases along the forecasting horizon.

个体和社会系统都是开放的系统，不断受到其情境的影响。近年来，上下文源越来越多地应用于不同的领域，以帮助对系统行为进行短期和长期预测。然而，现有的研究通常忽略了预期背景的作用，例如日历规划或天气预报。这项工作提出了一种多输入神经结构，由长短期记忆单元或时间卷积网络的顺序组成，能够结合情景背景的历史和未来来源，以帮助时间序列预测任务。考虑到城市案例研究，我们进一步评估了不同来源的外部环境对医疗紧急情况和流动性预测的影响。结果表明，结合外部环境变量，包括日历和天气变量，可以显著减少对最先进的预报员的预测误差。特别是，在相关工作中通常被忽视的前瞻性背景的结合，减轻了沿预测范围的误差增加。

{"title":"Multiple-input neural networks for time series forecasting incorporating historical and prospective context","authors":"João Palet, Vasco Manquinho, Rui Henriques","doi":"10.1007/s10618-023-00984-y","DOIUrl":"https://doi.org/10.1007/s10618-023-00984-y","url":null,"abstract":"Abstract Individual and societal systems are open systems continuously affected by their situational context. In recent years, context sources have been increasingly considered in different domains to aid short and long-term forecasts of systems’ behavior. Nevertheless, available research generally disregards the role of prospective context, such as calendrical planning or weather forecasts. This work proposes a multiple-input neural architecture consisting of a sequential composition of long short-term memory units or temporal convolutional networks able to incorporate both historical and prospective sources of situational context to aid time series forecasting tasks. Considering urban case studies, we further assess the impact that different sources of external context have on medical emergency and mobility forecasts. Results show that the incorporation of external context variables, including calendrical and weather variables, can significantly reduce forecasting errors against state-of-the-art forecasters. In particular, the incorporation of prospective context, generally neglected in related work, mitigates error increases along the forecasting horizon.","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"11 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136316825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Federated singular value decomposition for high-dimensional data 高维数据的联邦奇异值分解

3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data Mining and Knowledge Discovery

Pub Date : 2023-10-26 DOI: 10.1007/s10618-023-00983-z

Anne Hartebrodt, Richard Röttger, David B. Blumenthal

Abstract Federated learning (FL) is emerging as a privacy-aware alternative to classical cloud-based machine learning. In FL, the sensitive data remains in data silos and only aggregated parameters are exchanged. Hospitals and research institutions which are not willing to share their data can join a federated study without breaching confidentiality. In addition to the extreme sensitivity of biomedical data, the high dimensionality poses a challenge in the context of federated genome-wide association studies (GWAS). In this article, we present a federated singular value decomposition algorithm, suitable for the privacy-related and computational requirements of GWAS. Notably, the algorithm has a transmission cost independent of the number of samples and is only weakly dependent on the number of features, because the singular vectors corresponding to the samples are never exchanged and the vectors associated with the features are only transmitted to an aggregator for a fixed number of iterations. Although motivated by GWAS, the algorithm is generically applicable for both horizontally and vertically partitioned data.

联邦学习(FL)正在成为传统基于云的机器学习的隐私意识替代品。在FL中，敏感数据保留在数据孤岛中，只交换聚合参数。不愿意分享数据的医院和研究机构可以加入联合研究，而不会违反保密规定。除了生物医学数据的极端敏感性外，高维度对联合全基因组关联研究(GWAS)提出了挑战。在本文中，我们提出了一种联邦奇异值分解算法，该算法适用于GWAS的隐私相关和计算需求。值得注意的是，该算法的传输成本与样本数量无关，仅弱依赖于特征数量，因为样本对应的奇异向量永远不会交换，与特征相关的向量仅在固定次数的迭代中传输到聚合器。虽然该算法是由GWAS驱动的，但它一般适用于水平和垂直分区的数据。

引用次数: 0

Somtimes: self organizing maps for time series clustering and its application to serious illness conversations 有时:时间序列聚类的自组织地图及其在重症对话中的应用

3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data Mining and Knowledge Discovery

Pub Date : 2023-10-20 DOI: 10.1007/s10618-023-00979-9

Ali Javed, Donna M. Rizzo, Byung Suk Lee, Robert Gramling

Abstract There is demand for scalable algorithms capable of clustering and analyzing large time series data. The Kohonen self-organizing map (SOM) is an unsupervised artificial neural network for clustering, visualizing, and reducing the dimensionality of complex data. Like all clustering methods, it requires a measure of similarity between input data (in this work time series). Dynamic time warping (DTW) is one such measure, and a top performer that accommodates distortions when aligning time series. Despite its popularity in clustering, DTW is limited in practice because the runtime complexity is quadratic with the length of the time series. To address this, we present a new a self-organizing map for clustering TIME Series, called SOMTimeS, which uses DTW as the distance measure. The method has similar accuracy compared with other DTW-based clustering algorithms, yet scales better and runs faster. The computational performance stems from the pruning of unnecessary DTW computations during the SOM’s training phase. For comparison, we implement a similar pruning strategy for K-means, and call the latter K-TimeS. SOMTimeS and K-TimeS pruned 43% and 50% of the total DTW computations, respectively. Pruning effectiveness, accuracy, execution time and scalability are evaluated using 112 benchmark time series datasets from the UC Riverside classification archive, and show that for similar accuracy, a 1.8 $$times$$ × speed-up on average for SOMTimeS and K-TimeS, respectively with that rates vary between 1 $$times$$ × and 18 $$times$$ × depending on the dataset. We also apply SOMTimeS to a healthcare study of patient-clinician serious illness conversations to demonstrate the algorithm’s utility with complex, temporally sequenced natural language.

摘要对能够聚类和分析大型时间序列数据的可扩展算法的需求越来越大。Kohonen自组织映射(SOM)是一种用于复杂数据聚类、可视化和降维的无监督人工神经网络。与所有聚类方法一样，它需要输入数据之间的相似性度量(在此工作时间序列中)。动态时间翘曲(DTW)就是这样一种度量方法，它在调整时间序列时能够适应扭曲。尽管DTW在聚类中很流行，但它在实践中受到限制，因为运行时复杂度是时间序列长度的二次元。为了解决这个问题，我们提出了一个新的用于聚类TIME Series的自组织映射，称为SOMTimeS，它使用DTW作为距离度量。与其他基于dtw的聚类算法相比，该方法具有相似的精度，但可扩展性更好，运行速度更快。计算性能源于在SOM的训练阶段修剪不必要的DTW计算。为了比较，我们对K-means实施了类似的修剪策略，并将后者称为K-TimeS。SOMTimeS和K-TimeS修剪43% and 50% of the total DTW computations, respectively. Pruning effectiveness, accuracy, execution time and scalability are evaluated using 112 benchmark time series datasets from the UC Riverside classification archive, and show that for similar accuracy, a 1.8 $$times$$ × speed-up on average for SOMTimeS and K-TimeS, respectively with that rates vary between 1 $$times$$ × and 18 $$times$$ × depending on the dataset. We also apply SOMTimeS to a healthcare study of patient-clinician serious illness conversations to demonstrate the algorithm’s utility with complex, temporally sequenced natural language.

{"title":"Somtimes: self organizing maps for time series clustering and its application to serious illness conversations","authors":"Ali Javed, Donna M. Rizzo, Byung Suk Lee, Robert Gramling","doi":"10.1007/s10618-023-00979-9","DOIUrl":"https://doi.org/10.1007/s10618-023-00979-9","url":null,"abstract":"Abstract There is demand for scalable algorithms capable of clustering and analyzing large time series data. The Kohonen self-organizing map (SOM) is an unsupervised artificial neural network for clustering, visualizing, and reducing the dimensionality of complex data. Like all clustering methods, it requires a measure of similarity between input data (in this work time series). Dynamic time warping (DTW) is one such measure, and a top performer that accommodates distortions when aligning time series. Despite its popularity in clustering, DTW is limited in practice because the runtime complexity is quadratic with the length of the time series. To address this, we present a new a self-organizing map for clustering TIME Series, called SOMTimeS, which uses DTW as the distance measure. The method has similar accuracy compared with other DTW-based clustering algorithms, yet scales better and runs faster. The computational performance stems from the pruning of unnecessary DTW computations during the SOM’s training phase. For comparison, we implement a similar pruning strategy for K-means, and call the latter K-TimeS. SOMTimeS and K-TimeS pruned 43% and 50% of the total DTW computations, respectively. Pruning effectiveness, accuracy, execution time and scalability are evaluated using 112 benchmark time series datasets from the UC Riverside classification archive, and show that for similar accuracy, a 1.8 $$times$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mo>×</mml:mo> </mml:math> speed-up on average for SOMTimeS and K-TimeS, respectively with that rates vary between 1 $$times$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mo>×</mml:mo> </mml:math> and 18 $$times$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mo>×</mml:mo> </mml:math> depending on the dataset. We also apply SOMTimeS to a healthcare study of patient-clinician serious illness conversations to demonstrate the algorithm’s utility with complex, temporally sequenced natural language.","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"1 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135513626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0