首页 > 最新文献

Journal of Computational Mathematics and Data Science最新文献

英文 中文
A DEIM-CUR factorization with iterative SVDs 采用迭代 SVD 的 DEIM-CUR 因式分解法
Pub Date : 2024-06-17 DOI: 10.1016/j.jcmds.2024.100095
Perfect Y. Gidisu, Michiel E. Hochstenbach

A CUR factorization is often utilized as a substitute for the singular value decomposition (SVD), especially when a concrete interpretation of the singular vectors is challenging. Moreover, if the original data matrix possesses properties like nonnegativity and sparsity, a CUR decomposition can better preserve them compared to the SVD. An essential aspect of this approach is the methodology used for selecting a subset of columns and rows from the original matrix. This study investigates the effectiveness of one-round sampling and iterative subselection techniques and introduces new iterative subselection strategies based on iterative SVDs. One provably appropriate technique for index selection in constructing a CUR factorization is the discrete empirical interpolation method (DEIM). Our contribution aims to improve the approximation quality of the DEIM scheme by iteratively invoking it in several rounds, in the sense that we select subsequent columns and rows based on the previously selected ones. Thus, we modify A after each iteration by removing the information that has been captured by the previously selected columns and rows. We also discuss how iterative procedures for computing a few singular vectors of large data matrices can be integrated with the new iterative subselection strategies. We present the results of numerical experiments, providing a comparison of one-round sampling and iterative subselection techniques, and demonstrating the improved approximation quality associated with using the latter.

CUR 因式分解经常被用来替代奇异值分解(SVD),尤其是在奇异向量的具体解释具有挑战性的情况下。此外,如果原始数据矩阵具有非负性和稀疏性等特性,CUR 分解与 SVD 相比能更好地保留这些特性。这种方法的一个重要方面是从原始矩阵中选择列和行子集的方法。本研究调查了单轮采样和迭代子选择技术的有效性,并在迭代 SVD 的基础上引入了新的迭代子选择策略。在构建 CUR 因式分解时,离散经验插值法(DEIM)是一种可证明的合适索引选择技术。我们的贡献旨在通过多轮迭代调用 DEIM 方案来提高其近似质量,即我们根据之前选择的列和行来选择后续的列和行。因此,我们在每次迭代后都会修改 A,删除之前选定的列和行所捕获的信息。我们还讨论了如何将计算大型数据矩阵几个奇异向量的迭代程序与新的迭代子选择策略相结合。我们介绍了数值实验的结果,对单轮采样和迭代子选择技术进行了比较,并证明使用后者可以提高近似质量。
{"title":"A DEIM-CUR factorization with iterative SVDs","authors":"Perfect Y. Gidisu,&nbsp;Michiel E. Hochstenbach","doi":"10.1016/j.jcmds.2024.100095","DOIUrl":"https://doi.org/10.1016/j.jcmds.2024.100095","url":null,"abstract":"<div><p>A CUR factorization is often utilized as a substitute for the singular value decomposition (SVD), especially when a concrete interpretation of the singular vectors is challenging. Moreover, if the original data matrix possesses properties like nonnegativity and sparsity, a CUR decomposition can better preserve them compared to the SVD. An essential aspect of this approach is the methodology used for selecting a subset of columns and rows from the original matrix. This study investigates the effectiveness of <em>one-round sampling</em> and iterative subselection techniques and introduces new iterative subselection strategies based on iterative SVDs. One provably appropriate technique for index selection in constructing a CUR factorization is the discrete empirical interpolation method (DEIM). Our contribution aims to improve the approximation quality of the DEIM scheme by iteratively invoking it in several rounds, in the sense that we select subsequent columns and rows based on the previously selected ones. Thus, we modify <span><math><mi>A</mi></math></span> after each iteration by removing the information that has been captured by the previously selected columns and rows. We also discuss how iterative procedures for computing a few singular vectors of large data matrices can be integrated with the new iterative subselection strategies. We present the results of numerical experiments, providing a comparison of one-round sampling and iterative subselection techniques, and demonstrating the improved approximation quality associated with using the latter.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"12 ","pages":"Article 100095"},"PeriodicalIF":0.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415824000063/pdfft?md5=16d9fd47f077d52851c28e4d876eb3c0&pid=1-s2.0-S2772415824000063-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian sparsity and class sparsity priors for dictionary learning and coding 用于字典学习和编码的贝叶斯稀疏性和类稀疏性先验
Pub Date : 2024-03-21 DOI: 10.1016/j.jcmds.2024.100094
A. Bocchinfuso , D. Calvetti, E. Somersalo

Dictionary learning methods continue to gain popularity for the solution of challenging inverse problems. In the dictionary learning approach, the computational forward model is replaced by a large dictionary of possible outcomes, and the problem is to identify the dictionary entries that best match the data, akin to traditional query matching in search engines. Sparse coding techniques are used to guarantee that the dictionary matching identifies only few of the dictionary entries, and dictionary compression methods are used to reduce the complexity of the matching problem. In this article, we propose a work flow to facilitate the dictionary matching process. First, the full dictionary is divided into subdictionaries that are separately compressed. The error introduced by the dictionary compression is handled in the Bayesian framework as a modeling error. Furthermore, we propose a new Bayesian data-driven group sparsity coding method to help identify subdictionaries that are not relevant for the dictionary matching. After discarding irrelevant subdictionaries, the dictionary matching is addressed as a deflated problem using sparse coding. The compression and deflation steps can lead to substantial decreases of the computational complexity. The effectiveness of compensating for the dictionary compression error and using the novel group sparsity promotion to deflate the original dictionary are illustrated by applying the methodology to real world problems, the glitch detection in the LIGO experiment and hyperspectral remote sensing.

字典学习方法在解决具有挑战性的逆问题方面越来越受欢迎。在字典学习方法中,计算前向模型被可能结果的大型字典所取代,问题是识别与数据最匹配的字典条目,类似于搜索引擎中的传统查询匹配。稀疏编码技术用于保证字典匹配只识别出少数字典条目,字典压缩方法用于降低匹配问题的复杂性。在本文中,我们提出了一个工作流程来促进字典匹配过程。首先,将完整的字典分为子字典,并分别进行压缩。字典压缩带来的误差在贝叶斯框架中作为建模误差处理。此外,我们还提出了一种新的贝叶斯数据驱动组稀疏性编码方法,以帮助识别与字典匹配无关的子字典。在剔除无关的子字典后,字典匹配将作为一个使用稀疏编码的缩减问题来处理。压缩和放缩步骤可大幅降低计算复杂度。通过将该方法应用于实际问题、LIGO 实验中的小故障检测和高光谱遥感,说明了补偿字典压缩误差和使用新颖的组稀疏性促进来压缩原始字典的有效性。
{"title":"Bayesian sparsity and class sparsity priors for dictionary learning and coding","authors":"A. Bocchinfuso ,&nbsp;D. Calvetti,&nbsp;E. Somersalo","doi":"10.1016/j.jcmds.2024.100094","DOIUrl":"https://doi.org/10.1016/j.jcmds.2024.100094","url":null,"abstract":"<div><p>Dictionary learning methods continue to gain popularity for the solution of challenging inverse problems. In the dictionary learning approach, the computational forward model is replaced by a large dictionary of possible outcomes, and the problem is to identify the dictionary entries that best match the data, akin to traditional query matching in search engines. Sparse coding techniques are used to guarantee that the dictionary matching identifies only few of the dictionary entries, and dictionary compression methods are used to reduce the complexity of the matching problem. In this article, we propose a work flow to facilitate the dictionary matching process. First, the full dictionary is divided into subdictionaries that are separately compressed. The error introduced by the dictionary compression is handled in the Bayesian framework as a modeling error. Furthermore, we propose a new Bayesian data-driven group sparsity coding method to help identify subdictionaries that are not relevant for the dictionary matching. After discarding irrelevant subdictionaries, the dictionary matching is addressed as a deflated problem using sparse coding. The compression and deflation steps can lead to substantial decreases of the computational complexity. The effectiveness of compensating for the dictionary compression error and using the novel group sparsity promotion to deflate the original dictionary are illustrated by applying the methodology to real world problems, the glitch detection in the LIGO experiment and hyperspectral remote sensing.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"11 ","pages":"Article 100094"},"PeriodicalIF":0.0,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415824000051/pdfft?md5=87116ca1a8ef189c30f80b5ed4b567bd&pid=1-s2.0-S2772415824000051-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140321133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simulation of Erlang and negative binomial distributions using the generalized Lambert W function 使用广义兰伯特 W 函数模拟二项分布和负二项分布
Pub Date : 2024-02-08 DOI: 10.1016/j.jcmds.2024.100092
C.Y. Chew , G. Teng , Y.S. Lai

We present a simulation method for generating random variables from Erlang and negative binomial distributions using the generalized Lambert W function. The generalized Lambert W function is utilized to solve the quantile functions of these distributions, allowing for efficient and accurate generation of random variables. The simulation procedure is based on Halley’s method and is demonstrated through the generation of 100,000 random variables for each distribution. The results show close agreement with the theoretical mean and variance values, indicating the effectiveness of the proposed method. This approach offers a valuable tool for generating random variables from Erlang and negative binomial distributions in various applications.

我们介绍了一种利用广义兰伯特 W 函数从二项分布和负二项分布生成随机变量的模拟方法。广义兰伯特 W 函数用于求解这些分布的量子函数,从而高效、准确地生成随机变量。模拟程序以哈雷法为基础,通过为每种分布生成 100,000 个随机变量进行了演示。结果显示与理论均值和方差值非常接近,说明所提出的方法非常有效。这种方法为在各种应用中从二项分布和负二项分布生成随机变量提供了宝贵的工具。
{"title":"Simulation of Erlang and negative binomial distributions using the generalized Lambert W function","authors":"C.Y. Chew ,&nbsp;G. Teng ,&nbsp;Y.S. Lai","doi":"10.1016/j.jcmds.2024.100092","DOIUrl":"https://doi.org/10.1016/j.jcmds.2024.100092","url":null,"abstract":"<div><p>We present a simulation method for generating random variables from Erlang and negative binomial distributions using the generalized Lambert <span><math><mi>W</mi></math></span> function. The generalized Lambert <span><math><mi>W</mi></math></span> function is utilized to solve the quantile functions of these distributions, allowing for efficient and accurate generation of random variables. The simulation procedure is based on Halley’s method and is demonstrated through the generation of 100,000 random variables for each distribution. The results show close agreement with the theoretical mean and variance values, indicating the effectiveness of the proposed method. This approach offers a valuable tool for generating random variables from Erlang and negative binomial distributions in various applications.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"10 ","pages":"Article 100092"},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415824000038/pdfft?md5=106597da03409e1369af24276ca25af6&pid=1-s2.0-S2772415824000038-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139737745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate Bayesian computational methods to estimate the strength of divergent selection in population genomics models 用近似贝叶斯计算方法估算群体基因组学模型中的分化选择强度
Pub Date : 2024-02-07 DOI: 10.1016/j.jcmds.2024.100091
Martyna Lukaszewicz , Ousseini Issaka Salia , Paul A. Hohenlohe , Erkan O. Buzbas

Statistical estimation of parameters in large models of evolutionary processes is often too computationally inefficient to pursue using exact model likelihoods, even with single-nucleotide polymorphism (SNP) data, which offers a way to reduce the size of genetic data while retaining relevant information. Approximate Bayesian Computation (ABC) to perform statistical inference about parameters of large models takes the advantage of simulations to bypass direct evaluation of model likelihoods. We develop a mechanistic model to simulate forward-in-time divergent selection with variable migration rates, modes of reproduction (sexual, asexual), length and number of migration-selection cycles. We investigate the computational feasibility of ABC to perform statistical inference and study the quality of estimates on the position of loci under selection and the strength of selection. To expand the parameter space of positions under selection, we enhance the model by implementing an outlier scan on summarized observed data. We evaluate the usefulness of summary statistics well-known to capture the strength of selection, and assess their informativeness under divergent selection. We also evaluate the effect of genetic drift with respect to an idealized deterministic model with single-locus selection. We discuss the role of the recombination rate as a confounding factor in estimating the strength of divergent selection, and emphasize its importance in break down of linkage disequilibrium (LD). We answer the question for which part of the parameter space of the model we recover strong signal for estimating the selection, and determine whether population differentiation-based summary statistics or LD–based summary statistics perform well in estimating selection.

对进化过程大型模型参数的统计估计通常计算效率太低,无法使用精确的模型似然,即使使用单核苷酸多态性(SNP)数据也是如此。利用近似贝叶斯计算(ABC)对大型模型的参数进行统计推断,可以利用模拟的优势绕过对模型似然的直接评估。我们建立了一个机理模型,以模拟具有可变迁移率、繁殖模式(有性繁殖、无性繁殖)、迁移-选择周期的长度和数量的前向时间分化选择。我们研究了用 ABC 进行统计推断的计算可行性,并研究了对受选择位点位置和选择强度的估计质量。为了扩展选择位置的参数空间,我们通过对汇总的观测数据进行离群扫描来增强模型。我们评估了众所周知的用于捕捉选择强度的汇总统计数据的有用性,并评估了它们在分歧选择下的信息量。我们还评估了遗传漂变对理想化的单病灶选择确定性模型的影响。我们讨论了重组率作为一个混杂因素在估计发散选择强度中的作用,并强调了它在打破连锁不平衡(LD)中的重要性。我们回答了在模型参数空间的哪一部分我们能恢复到估计选择的强信号这一问题,并确定了是基于种群分化的汇总统计还是基于 LD 的汇总统计在估计选择方面表现良好。
{"title":"Approximate Bayesian computational methods to estimate the strength of divergent selection in population genomics models","authors":"Martyna Lukaszewicz ,&nbsp;Ousseini Issaka Salia ,&nbsp;Paul A. Hohenlohe ,&nbsp;Erkan O. Buzbas","doi":"10.1016/j.jcmds.2024.100091","DOIUrl":"https://doi.org/10.1016/j.jcmds.2024.100091","url":null,"abstract":"<div><p>Statistical estimation of parameters in large models of evolutionary processes is often too computationally inefficient to pursue using exact model likelihoods, even with single-nucleotide polymorphism (SNP) data, which offers a way to reduce the size of genetic data while retaining relevant information. Approximate Bayesian Computation (ABC) to perform statistical inference about parameters of large models takes the advantage of simulations to bypass direct evaluation of model likelihoods. We develop a mechanistic model to simulate forward-in-time divergent selection with variable migration rates, modes of reproduction (sexual, asexual), length and number of migration-selection cycles. We investigate the computational feasibility of ABC to perform statistical inference and study the quality of estimates on the position of loci under selection and the strength of selection. To expand the parameter space of positions under selection, we enhance the model by implementing an outlier scan on summarized observed data. We evaluate the usefulness of summary statistics well-known to capture the strength of selection, and assess their informativeness under divergent selection. We also evaluate the effect of genetic drift with respect to an idealized deterministic model with single-locus selection. We discuss the role of the recombination rate as a confounding factor in estimating the strength of divergent selection, and emphasize its importance in break down of linkage disequilibrium (LD). We answer the question for which part of the parameter space of the model we recover strong signal for estimating the selection, and determine whether population differentiation-based summary statistics or LD–based summary statistics perform well in estimating selection.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"10 ","pages":"Article 100091"},"PeriodicalIF":0.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415824000026/pdfft?md5=74c5a713f0b6de0a968b0a22ee2b9d09&pid=1-s2.0-S2772415824000026-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139731929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Escaping saddle points efficiently with occupation-time-adapted perturbations 利用占用时间适应性扰动高效逃离鞍点
Pub Date : 2024-01-14 DOI: 10.1016/j.jcmds.2024.100090
Xin Guo , Jiequn Han , Mahan Tajrobehkar , Wenpin Tang

Motivated by the super-diffusivity of self-repelling random walk, which has roots in statistical physics, this paper develops a new perturbation mechanism for optimization algorithms. In this mechanism, perturbations are adapted to the history of states via the notion of occupation time. After integrating this mechanism into the framework of perturbed gradient descent (PGD) and perturbed accelerated gradient descent (PAGD), two new algorithms are proposed: perturbed gradient descent adapted to occupation time (PGDOT) and its accelerated version (PAGDOT). PGDOT and PAGDOT are guaranteed to avoid getting stuck at non-degenerate saddle points, and are shown to converge to second-order stationary points at least as fast as PGD and PAGD, respectively. The theoretical analysis is corroborated by empirical studies in which the new algorithms consistently escape saddle points and outperform not only their counterparts, PGD and PAGD, but also other popular alternatives including stochastic gradient descent, Adam, and several state-of-the-art adaptive gradient methods.

受源于统计物理学的自斥随机漫步超扩散性的启发,本文为优化算法开发了一种新的扰动机制。在这一机制中,扰动通过占用时间的概念来适应状态的历史。在将这一机制融入扰动梯度下降(PGD)和扰动加速梯度下降(PAGD)的框架后,本文提出了两种新算法:适应占用时间的扰动梯度下降(PGDOT)及其加速版本(PAGDOT)。PGDOT 和 PAGDOT 保证避免在非退化鞍点卡住,并分别以至少 PGD 和 PAGD 的速度收敛到二阶静止点。实证研究证实了理论分析的正确性,在实证研究中,新算法始终能摆脱鞍点,其性能不仅优于同类算法 PGD 和 PAGD,还优于其他流行算法,包括随机梯度下降法、亚当法和几种最先进的自适应梯度法。
{"title":"Escaping saddle points efficiently with occupation-time-adapted perturbations","authors":"Xin Guo ,&nbsp;Jiequn Han ,&nbsp;Mahan Tajrobehkar ,&nbsp;Wenpin Tang","doi":"10.1016/j.jcmds.2024.100090","DOIUrl":"https://doi.org/10.1016/j.jcmds.2024.100090","url":null,"abstract":"<div><p>Motivated by the super-diffusivity of self-repelling random walk, which has roots in statistical physics, this paper develops a new perturbation mechanism for optimization algorithms. In this mechanism, perturbations are adapted to the history of states via the notion of occupation time. After integrating this mechanism into the framework of perturbed gradient descent (PGD) and perturbed accelerated gradient descent (PAGD), two new algorithms are proposed: perturbed gradient descent adapted to occupation time (PGDOT) and its accelerated version (PAGDOT). PGDOT and PAGDOT are guaranteed to avoid getting stuck at non-degenerate saddle points, and are shown to converge to second-order stationary points at least as fast as PGD and PAGD, respectively. The theoretical analysis is corroborated by empirical studies in which the new algorithms consistently escape saddle points and outperform not only their counterparts, PGD and PAGD, but also other popular alternatives including stochastic gradient descent, Adam, and several state-of-the-art adaptive gradient methods.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"10 ","pages":"Article 100090"},"PeriodicalIF":0.0,"publicationDate":"2024-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415824000014/pdfft?md5=ef92b7ba4259b7a90a297dea99cfb00a&pid=1-s2.0-S2772415824000014-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139503605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
London street crime analysis and prediction using crowdsourced dataset 利用众包数据集进行伦敦街头犯罪分析和预测
Pub Date : 2023-12-18 DOI: 10.1016/j.jcmds.2023.100089
Ahmed Yunus, Jonathan Loo

To effectively prevent crimes, it is vital to anticipate their patterns and likely occurrences. Our efforts focused on analyzing diverse open-source datasets related to London, such as the Met police records, public social media posts, data from transportation hubs like bus and rail stations etc. These datasets provided rich insights into human behaviors, activities, and demographics across different parts of London, paving the way for a machine learning-driven prediction system. We developed this system using unique crime-related features extracted from these datasets. Furthermore, our study outlined methods to gather detailed street-level information from local communities using various applications. This innovative approach significantly enhances our ability to deeply understand and predict crime patterns. The proposed predictive system has the potential to forecast potential crimes in advance, enabling government bodies to proactively deploy targeted interventions, ultimately aiming to prevent and address criminal incidents more effectively.

为了有效预防犯罪,预测犯罪模式和可能发生的情况至关重要。我们的工作重点是分析与伦敦有关的各种开源数据集,如伦敦警察局的记录、公共社交媒体的帖子、公交车站和火车站等交通枢纽的数据等。这些数据集提供了有关伦敦不同地区人类行为、活动和人口统计的丰富信息,为机器学习驱动的预测系统铺平了道路。我们利用从这些数据集中提取的与犯罪相关的独特特征开发了这一系统。此外,我们的研究还概述了利用各种应用程序从当地社区收集详细街道信息的方法。这种创新方法大大提高了我们深入了解和预测犯罪模式的能力。所提出的预测系统有可能提前预测潜在的犯罪,使政府机构能够积极部署有针对性的干预措施,最终达到更有效地预防和解决犯罪事件的目的。
{"title":"London street crime analysis and prediction using crowdsourced dataset","authors":"Ahmed Yunus,&nbsp;Jonathan Loo","doi":"10.1016/j.jcmds.2023.100089","DOIUrl":"10.1016/j.jcmds.2023.100089","url":null,"abstract":"<div><p>To effectively prevent crimes, it is vital to anticipate their patterns and likely occurrences. Our efforts focused on analyzing diverse open-source datasets related to London, such as the Met police records, public social media posts, data from transportation hubs like bus and rail stations etc. These datasets provided rich insights into human behaviors, activities, and demographics across different parts of London, paving the way for a machine learning-driven prediction system. We developed this system using unique crime-related features extracted from these datasets. Furthermore, our study outlined methods to gather detailed street-level information from local communities using various applications. This innovative approach significantly enhances our ability to deeply understand and predict crime patterns. The proposed predictive system has the potential to forecast potential crimes in advance, enabling government bodies to proactively deploy targeted interventions, ultimately aiming to prevent and address criminal incidents more effectively.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"10 ","pages":"Article 100089"},"PeriodicalIF":0.0,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415823000160/pdfft?md5=9901b92589c99927f4a51aa0d969d7a5&pid=1-s2.0-S2772415823000160-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139017375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transfer learning across datasets with different input dimensions: An algorithm and analysis for the linear regression case 跨不同输入维度数据集的迁移学习:线性回归案例的算法和分析
Pub Date : 2023-11-02 DOI: 10.1016/j.jcmds.2023.100086
Luis Pedro Silvestrin , Harry van Zanten , Mark Hoogendoorn , Ger Koole

With the development of new sensors and monitoring devices, more sources of data become available to be used as inputs for machine learning models. These can on the one hand help to improve the accuracy of a model. On the other hand, combining these new inputs with historical data remains a challenge that has not yet been studied in enough detail. In this work, we propose a transfer learning algorithm that combines new and historical data with different input dimensions. This approach is easy to implement, efficient, with computational complexity equivalent to the ordinary least-squares method, and requires no hyperparameter tuning, making it straightforward to apply when the new data is limited. Different from other approaches, we provide a rigorous theoretical study of its robustness, showing that it cannot be outperformed by a baseline that utilizes only the new data. Our approach achieves state-of-the-art performance on 9 real-life datasets, outperforming the linear DSFT, another linear transfer learning algorithm, and performing comparably to non-linear DSFT.1

随着新型传感器和监测设备的发展,越来越多的数据来源可以用作机器学习模型的输入。这些一方面可以帮助提高模型的准确性。另一方面,将这些新的输入与历史数据相结合仍然是一个挑战,尚未得到足够详细的研究。在这项工作中,我们提出了一种迁移学习算法,该算法结合了不同输入维度的新数据和历史数据。这种方法易于实现,效率高,计算复杂度相当于普通的最小二乘法,并且不需要超参数调优,使得在新数据有限的情况下可以直接应用。与其他方法不同,我们对其稳健性进行了严格的理论研究,表明仅利用新数据的基线不能超越它。我们的方法在9个真实数据集上实现了最先进的性能,优于线性DSFT(另一种线性迁移学习算法),并且与非线性DSFT相媲美
{"title":"Transfer learning across datasets with different input dimensions: An algorithm and analysis for the linear regression case","authors":"Luis Pedro Silvestrin ,&nbsp;Harry van Zanten ,&nbsp;Mark Hoogendoorn ,&nbsp;Ger Koole","doi":"10.1016/j.jcmds.2023.100086","DOIUrl":"https://doi.org/10.1016/j.jcmds.2023.100086","url":null,"abstract":"<div><p>With the development of new sensors and monitoring devices, more sources of data become available to be used as inputs for machine learning models. These can on the one hand help to improve the accuracy of a model. On the other hand, combining these new inputs with historical data remains a challenge that has not yet been studied in enough detail. In this work, we propose a transfer learning algorithm that combines new and historical data with different input dimensions. This approach is easy to implement, efficient, with computational complexity equivalent to the ordinary least-squares method, and requires no hyperparameter tuning, making it straightforward to apply when the new data is limited. Different from other approaches, we provide a rigorous theoretical study of its robustness, showing that it cannot be outperformed by a baseline that utilizes only the new data. Our approach achieves state-of-the-art performance on 9 real-life datasets, outperforming the linear DSFT, another linear transfer learning algorithm, and performing comparably to non-linear DSFT.<span><sup>1</sup></span></p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"9 ","pages":"Article 100086"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415823000135/pdfft?md5=8c5d403909a1ea698959ce44c171ed61&pid=1-s2.0-S2772415823000135-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134657055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quarter match non-local means algorithm for noise removal 用于噪声去除的四分之一匹配非局部均值算法
Pub Date : 2023-09-30 DOI: 10.1016/j.jcmds.2023.100085
Chartese Jones

The notion of improving plays out in many forms in our lives. We look for better quality, faster speed, and leisurelier connections. To achieve our desired goals, we must ask questions. How to make a process stronger? How to make a process more efficient? How to make a process more effective? Image denoising plays a vital role in many professions and understanding how noise can be present in images has led to multiple denoising techniques. These techniques include total variation regularization, non-local regularization, sparse representation, and low-rank minimization just to name a few. Many of these techniques exist because of the concept of improvement. First, we have a change (problem). This change invokes thoughts and questions. How these changes occur and how they are handled play an essential role in the realization or malfunction of that process. With this understanding, first, we look to fully understand the process to achieve success. As it relates to image denoising, the non-local means is incredibly effective in image reconstruction. In particular, the non-local means filter removes noise and sharpens edges without losing too many fine structures and details. Also, the non-local means algorithm is amazingly accurate. Consequently, the disadvantage that plagues the non-local means filtering algorithm is the computational burden and it is due to the non-local averaging. In this paper, we investigate innovative ways to reduce the computational burden and enhance the effectiveness of this filtering process. Research examining image analysis shows there is a battle between noise reduction and the preservation of actual features, which makes the reduction of noise a difficult task. For exploration, we propose a quarter-match non-local means denoising filtering algorithm. The filters help to classify a more concentrated region in the image and thereby enhance the computational efficiency of the existing non-local means denoising methods and produce an enriched comparison for overlying in the restoration process. To survey the constructs of this new algorithm, the authors use the original non-local means filtering algorithm, which is coined, “State of the Art” and other selective processes to test the effectiveness and efficiency of the new model. When comparing the original non-local means with the new quarter match filtering algorithm, on average, we can reduce the computational cost by half, while improving the quality of the image. To further test our new algorithm, medical resonance (MR) and synthetic aperture radar (SAR) images are used as specimens for real-world applications.

改善的观念在我们的生活中以多种形式表现出来。我们寻求更好的质量、更快的速度和更轻松的连接。为了实现我们想要的目标,我们必须提出问题。如何使流程更强大?如何提高流程效率?如何使流程更有效?图像去噪在许多行业中起着至关重要的作用,了解图像中如何存在噪声导致了多种去噪技术。这些技术包括全变分正则化、非局部正则化、稀疏表示和低秩最小化等。其中许多技术的存在是因为改进的概念。首先,我们有一个变化(问题)。这一变化引发了思考和疑问。这些变化是如何发生的以及如何处理的,在该过程的实现或故障中起着至关重要的作用。有了这种认识,首先,我们希望充分了解取得成功的过程。由于它涉及到图像去噪,非局部方法在图像重建中非常有效。特别地,非局部均值滤波器在不损失太多精细结构和细节的情况下去除噪声并锐化边缘。此外,非局部均值算法也非常精确。因此,困扰非局部均值滤波算法的缺点是计算负担,这是由于非局部平均。在本文中,我们研究了减少计算负担和提高滤波过程有效性的创新方法。对图像分析的研究表明,在降噪和保留实际特征之间存在着一场斗争,这使得降噪成为一项艰巨的任务。为了探索,我们提出了一种四分之一匹配非局部均值去噪滤波算法。滤波器有助于对图像中更集中的区域进行分类,从而提高现有非局部均值去噪方法的计算效率,并在恢复过程中为覆盖产生丰富的比较。为了考察这种新算法的结构,作者使用了最初的非局部均值滤波算法,即“最新技术”和其他选择性过程来测试新模型的有效性和效率。当将原始的非局部均值与新的四分之一匹配滤波算法进行比较时,我们平均可以将计算成本降低一半,同时提高图像质量。为了进一步测试我们的新算法,医学共振(MR)和合成孔径雷达(SAR)图像被用作真实世界应用的样本。
{"title":"Quarter match non-local means algorithm for noise removal","authors":"Chartese Jones","doi":"10.1016/j.jcmds.2023.100085","DOIUrl":"https://doi.org/10.1016/j.jcmds.2023.100085","url":null,"abstract":"<div><p>The notion of improving plays out in many forms in our lives. We look for better quality, faster speed, and leisurelier connections. To achieve our desired goals, we must ask questions. How to make a process stronger? How to make a process more efficient? How to make a process more effective? Image denoising plays a vital role in many professions and understanding how noise can be present in images has led to multiple denoising techniques. These techniques include total variation regularization, non-local regularization, sparse representation, and low-rank minimization just to name a few. Many of these techniques exist because of the concept of improvement. First, we have a change (problem). This change invokes thoughts and questions. How these changes occur and how they are handled play an essential role in the realization or malfunction of that process. With this understanding, first, we look to fully understand the process to achieve success. As it relates to image denoising, the non-local means is incredibly effective in image reconstruction. In particular, the non-local means filter removes noise and sharpens edges without losing too many fine structures and details. Also, the non-local means algorithm is amazingly accurate. Consequently, the disadvantage that plagues the non-local means filtering algorithm is the computational burden and it is due to the non-local averaging. In this paper, we investigate innovative ways to reduce the computational burden and enhance the effectiveness of this filtering process. Research examining image analysis shows there is a battle between noise reduction and the preservation of actual features, which makes the reduction of noise a difficult task. For exploration, we propose a quarter-match non-local means denoising filtering algorithm. The filters help to classify a more concentrated region in the image and thereby enhance the computational efficiency of the existing non-local means denoising methods and produce an enriched comparison for overlying in the restoration process. To survey the constructs of this new algorithm, the authors use the original non-local means filtering algorithm, which is coined, “State of the Art” and other selective processes to test the effectiveness and efficiency of the new model. When comparing the original non-local means with the new quarter match filtering algorithm, on average, we can reduce the computational cost by half, while improving the quality of the image. To further test our new algorithm, medical resonance (MR) and synthetic aperture radar (SAR) images are used as specimens for real-world applications.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"9 ","pages":"Article 100085"},"PeriodicalIF":0.0,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50194986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The parameter inversion in coupled geomechanics and flow simulations using Bayesian inference 基于贝叶斯推理的地质力学与渗流耦合模拟中的参数反演
Pub Date : 2023-08-24 DOI: 10.1016/j.jcmds.2023.100083
Juarez S. Azevedo , Jarbas A. Fernandes

In many situations, uncertainty about the mechanical properties of surrounding soils due to the lack of data and spatial variations requires tools that involve the study of parameters by means of random variables or random functions. Usually only a few measurements of parameters, such as permeability or porosity, are available to build a model, and some measurements of the geomechanical behavior, such as displacements, stresses, and strains are needed to check/calibrate the model. In order to introduce this type of modeling in geomechanical analysis, taking into account the random nature of soil parameters, Bayesian inference techniques are implemented in highly heterogeneous porous media. Within the framework of a coupling algorithm, these are incorporated into the inverse poroelasticity problem, with porosity, permeability and Young modulus treated as stationary random fields obtained by the moving average (MA) method. To this end, the Metropolis–Hasting (MH) algorithm was chosen to seek the geomechanical parameters that yield the lowest misfit. Numerical simulations related to injection problems and fluid withdrawal in a 3D domain are performed to compare the performance of this methodology. We conclude with some remarks about numerical experiments.

在许多情况下,由于缺乏数据和空间变化,周围土壤的力学性质存在不确定性,需要使用通过随机变量或随机函数研究参数的工具。通常只有一些参数的测量值,如渗透率或孔隙度,可用于建立模型,并且需要一些地质力学行为的测量值(如位移、应力和应变)来检查/校准模型。为了在地质力学分析中引入这种类型的建模,考虑到土壤参数的随机性,在高度不均匀的多孔介质中实现了贝叶斯推理技术。在耦合算法的框架内,这些被纳入反孔弹性问题中,孔隙度、渗透率和杨氏模量被视为通过移动平均(MA)方法获得的平稳随机场。为此,选择Metropolis–Hasting(MH)算法来寻找产生最低失配的地质力学参数。在三维域中进行了与注入问题和流体抽取相关的数值模拟,以比较该方法的性能。最后,我们对数值实验做了一些评论。
{"title":"The parameter inversion in coupled geomechanics and flow simulations using Bayesian inference","authors":"Juarez S. Azevedo ,&nbsp;Jarbas A. Fernandes","doi":"10.1016/j.jcmds.2023.100083","DOIUrl":"https://doi.org/10.1016/j.jcmds.2023.100083","url":null,"abstract":"<div><p>In many situations, uncertainty about the mechanical properties of surrounding soils due to the lack of data and spatial variations requires tools that involve the study of parameters by means of random variables or random functions. Usually only a few measurements of parameters, such as permeability or porosity, are available to build a model, and some measurements of the geomechanical behavior, such as displacements, stresses, and strains are needed to check/calibrate the model. In order to introduce this type of modeling in geomechanical analysis, taking into account the random nature of soil parameters, Bayesian inference techniques are implemented in highly heterogeneous porous media. Within the framework of a coupling algorithm, these are incorporated into the inverse poroelasticity problem, with porosity, permeability and Young modulus treated as stationary random fields obtained by the moving average (MA) method. To this end, the Metropolis–Hasting (MH) algorithm was chosen to seek the geomechanical parameters that yield the lowest misfit. Numerical simulations related to injection problems and fluid withdrawal in a <span><math><mrow><mn>3</mn><mi>D</mi></mrow></math></span> domain are performed to compare the performance of this methodology. We conclude with some remarks about numerical experiments.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"9 ","pages":"Article 100083"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50194987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast discrete Laplace transforms 快速离散拉普拉斯变换
Pub Date : 2023-08-01 DOI: 10.1016/j.jcmds.2023.100082
Yen Lee Loh

The discrete Laplace transform (DLT) with M inputs and N outputs has a nominal computational cost of O(MN). There are approximate DLT algorithms with O(M+N) cost such that the output errors divided by the sum of the inputs are less than a fixed tolerance η. However, certain important applications of DLTs require a more stringent accuracy criterion, where the output errors divided by the true output values are less than η. We present a fast DLT algorithm combining two strategies. The bottom-up strategy exploits the Taylor expansion of the Laplace transform kernel. The top-down strategy chooses groups of terms in the DLT to include or neglect, based on the whole summand, and not just on the Laplace transform kernel. The overall effort is O(M+N) when the source and target points are very dense or very sparse, and appears to be O((M+N)1.5) in the intermediate regime. Our algorithm achieves the same accuracy as brute-force evaluation, and is typically 10–100 times faster.

具有M个输入和N个输出的离散拉普拉斯变换(DLT)具有O(MN)的标称计算成本。存在具有O(M+N)代价的近似DLT算法,使得输出误差除以输入之和小于固定容差η。然而,DLT的某些重要应用需要更严格的精度标准,其中输出误差除以真实输出值小于η。我们提出了一种结合两种策略的快速DLT算法。自下而上的策略利用拉普拉斯变换核的泰勒展开。自上而下的策略选择DLT中的项组来包括或忽略,这是基于整个被加数,而不仅仅是基于拉普拉斯变换核。当源点和目标点非常密集或非常稀疏时,总体努力是O(M+N),并且在中间状态下看起来是O((M+N)1.5)。我们的算法实现了与蛮力评估相同的准确性,通常速度快10-100倍。
{"title":"Fast discrete Laplace transforms","authors":"Yen Lee Loh","doi":"10.1016/j.jcmds.2023.100082","DOIUrl":"https://doi.org/10.1016/j.jcmds.2023.100082","url":null,"abstract":"<div><p>The discrete Laplace transform (DLT) with <span><math><mi>M</mi></math></span> inputs and <span><math><mi>N</mi></math></span> outputs has a nominal computational cost of <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>M</mi><mi>N</mi><mo>)</mo></mrow></mrow></math></span>. There are approximate DLT algorithms with <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>M</mi><mo>+</mo><mi>N</mi><mo>)</mo></mrow></mrow></math></span> cost such that the output errors divided by the <em>sum of the inputs</em> are less than a fixed tolerance <span><math><mi>η</mi></math></span>. However, certain important applications of DLTs require a more stringent accuracy criterion, where the output errors divided by the <em>true output values</em> are less than <span><math><mi>η</mi></math></span>. We present a fast DLT algorithm combining two strategies. The bottom-up strategy exploits the Taylor expansion of the Laplace transform kernel. The top-down strategy chooses groups of terms in the DLT to include or neglect, based on the whole summand, and not just on the Laplace transform kernel. The overall effort is <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>M</mi><mo>+</mo><mi>N</mi><mo>)</mo></mrow></mrow></math></span> when the source and target points are very dense or very sparse, and appears to be <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mrow><mo>(</mo><mi>M</mi><mo>+</mo><mi>N</mi><mo>)</mo></mrow></mrow><mrow><mn>1</mn><mo>.</mo><mn>5</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> in the intermediate regime. Our algorithm achieves the same accuracy as brute-force evaluation, and is typically 10–100 times faster.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"8 ","pages":"Article 100082"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50194985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Journal of Computational Mathematics and Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1