首页 > 最新文献

Foundations of data science (Springfield, Mo.)最新文献

英文 中文
HERMES: PERSISTENT SPECTRAL GRAPH SOFTWARE. hermes:持久光谱图软件。
Q2 MATHEMATICS, APPLIED Pub Date : 2021-03-01 DOI: 10.3934/fods.2021006
Rui Wang, Rundong Zhao, Emily Ribando-Gros, Jiahui Chen, Yiying Tong, Guo-Wei Wei

Persistent homology (PH) is one of the most popular tools in topological data analysis (TDA), while graph theory has had a significant impact on data science. Our earlier work introduced the persistent spectral graph (PSG) theory as a unified multiscale paradigm to encompass TDA and geometric analysis. In PSG theory, families of persistent Laplacian matrices (PLMs) corresponding to various topological dimensions are constructed via a filtration to sample a given dataset at multiple scales. The harmonic spectra from the null spaces of PLMs offer the same topological invariants, namely persistent Betti numbers, at various dimensions as those provided by PH, while the non-harmonic spectra of PLMs give rise to additional geometric analysis of the shape of the data. In this work, we develop an open-source software package, called highly efficient robust multidimensional evolutionary spectra (HERMES), to enable broad applications of PSGs in science, engineering, and technology. To ensure the reliability and robustness of HERMES, we have validated the software with simple geometric shapes and complex datasets from three-dimensional (3D) protein structures. We found that the smallest non-zero eigenvalues are very sensitive to data abnormality.

持久同源性(PH)是拓扑数据分析(TDA)中最流行的工具之一,而图理论则对数据科学产生了重大影响。我们早期的工作引入了持久谱图(PSG)理论,将其作为一种统一的多尺度范式,涵盖了拓扑数据分析和几何分析。在持久谱图理论中,通过过滤构建了对应于各种拓扑维度的持久拉普拉斯矩阵(PLM)族,以在多个尺度上对给定数据集进行采样。来自 PLMs 空域的谐波谱在不同维度上提供了与 PH 所提供的相同的拓扑不变式,即持久贝蒂数,而 PLMs 的非谐波谱则提供了对数据形状的额外几何分析。在这项工作中,我们开发了一个名为 "高效鲁棒多维进化谱(HERMES)"的开源软件包,以实现 PSG 在科学、工程和技术领域的广泛应用。为了确保 HERMES 的可靠性和鲁棒性,我们用简单的几何图形和来自三维(3D)蛋白质结构的复杂数据集对该软件进行了验证。我们发现,最小的非零特征值对数据异常非常敏感。
{"title":"HERMES: PERSISTENT SPECTRAL GRAPH SOFTWARE.","authors":"Rui Wang, Rundong Zhao, Emily Ribando-Gros, Jiahui Chen, Yiying Tong, Guo-Wei Wei","doi":"10.3934/fods.2021006","DOIUrl":"10.3934/fods.2021006","url":null,"abstract":"<p><p>Persistent homology (PH) is one of the most popular tools in topological data analysis (TDA), while graph theory has had a significant impact on data science. Our earlier work introduced the persistent spectral graph (PSG) theory as a unified multiscale paradigm to encompass TDA and geometric analysis. In PSG theory, families of persistent Laplacian matrices (PLMs) corresponding to various topological dimensions are constructed via a filtration to sample a given dataset at multiple scales. The harmonic spectra from the null spaces of PLMs offer the same topological invariants, namely persistent Betti numbers, at various dimensions as those provided by PH, while the non-harmonic spectra of PLMs give rise to additional geometric analysis of the shape of the data. In this work, we develop an open-source software package, called highly efficient robust multidimensional evolutionary spectra (HERMES), to enable broad applications of PSGs in science, engineering, and technology. To ensure the reliability and robustness of HERMES, we have validated the software with simple geometric shapes and complex datasets from three-dimensional (3D) protein structures. We found that the smallest non-zero eigenvalues are very sensitive to data abnormality.</p>","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"3 1","pages":"67-97"},"PeriodicalIF":0.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8411887/pdf/nihms-1717421.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39387483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A study of disproportionately affected populations by race/ethnicity during the SARS-CoV-2 pandemic using multi-population SEIR modeling and ensemble data assimilation 使用多人群SEIR模型和集合数据同化对SARS-CoV-2大流行期间按种族/族裔受不成比例影响人群的研究
Q2 MATHEMATICS, APPLIED Pub Date : 2021-01-01 DOI: 10.3934/fods.2021022
Emmanuel Fleurantin, C. Sampson, Daniel P. Maes, Justin P. Bennett, Tayler Fernandes-Nunez, S. Marx, G. Evensen

The disparity in the impact of COVID-19 on minority populations in the United States has been well established in the available data on deaths, case counts, and adverse outcomes. However, critical metrics used by public health officials and epidemiologists, such as a time dependent viral reproductive number (begin{document}$ R_t $end{document}), can be hard to calculate from this data especially for individual populations. Furthermore, disparities in the availability of testing, record keeping infrastructure, or government funding in disadvantaged populations can produce incomplete data sets. In this work, we apply ensemble data assimilation techniques which optimally combine model and data to produce a more complete data set providing better estimates of the critical metrics used by public health officials and epidemiologists. We employ a multi-population SEIR (Susceptible, Exposed, Infected and Recovered) model with a time dependent reproductive number and age stratified contact rate matrix for each population. We assimilate the daily death data for populations separated by ethnic/racial groupings using a technique called Ensemble Smoothing with Multiple Data Assimilation (ESMDA) to estimate model parameters and produce an begin{document}$R_t(n)$end{document} for the begin{document}$n^{th}$end{document} population. We do this with three distinct approaches, (1) using the same contact matrices and prior begin{document}$R_t(n)$end{document} for each population, (2) assigning contact matrices with increased contact rates for working age and older adults to populations experiencing disparity and (3) as in (2) but with a time-continuous update to begin{document}$R_t(n)$end{document}. We make a study of 9 U.S. states and the District of Columbia providing a complete time series of the pandemic in each and, in some cases, identifying disparities not otherwise evident in the aggregate statistics.

The disparity in the impact of COVID-19 on minority populations in the United States has been well established in the available data on deaths, case counts, and adverse outcomes. However, critical metrics used by public health officials and epidemiologists, such as a time dependent viral reproductive number (begin{document}$ R_t $end{document}), can be hard to calculate from this data especially for individual populations. Furthermore, disparities in the availability of testing, record keeping infrastructure, or government funding in disadvantaged populations can produce incomplete data sets. In this work, we apply ensemble data assimilation techniques which optimally combine model and data to produce a more complete data set providing better estimates of the critical metrics used by public health officials and epidemiologists. We employ a multi-population SEIR (Susceptible, Exposed, Infected and Recovered) model with a time dependent reproductive number and age stratified contact rate matrix for each population. We assimilate the daily death data for populations separated by ethnic/racial groupings using a technique called Ensemble Smoothing with Multiple Data Assimilation (ESMDA) to estimate model parameters and produce an begin{document}$R_t(n)$end{document} for the begin{document}$n^{th}$end{document} population. We do this with three distinct approaches, (1) using the same contact matrices and prior begin{document}$R_t(n)$end{document} for each population, (2) assigning contact matrices with increased contact rates for working age and older adults to populations experiencing disparity and (3) as in (2) but with a time-continuous update to begin{document}$R_t(n)$end{document}. We make a study of 9 U.S. states and the District of Columbia providing a complete time series of the pandemic in each and, in some cases, identifying disparities not otherwise evident in the aggregate statistics.
{"title":"A study of disproportionately affected populations by race/ethnicity during the SARS-CoV-2 pandemic using multi-population SEIR modeling and ensemble data assimilation","authors":"Emmanuel Fleurantin, C. Sampson, Daniel P. Maes, Justin P. Bennett, Tayler Fernandes-Nunez, S. Marx, G. Evensen","doi":"10.3934/fods.2021022","DOIUrl":"https://doi.org/10.3934/fods.2021022","url":null,"abstract":"<p style='text-indent:20px;'>The disparity in the impact of COVID-19 on minority populations in the United States has been well established in the available data on deaths, case counts, and adverse outcomes. However, critical metrics used by public health officials and epidemiologists, such as a time dependent viral reproductive number (<inline-formula><tex-math id=\"M1\">begin{document}$ R_t $end{document}</tex-math></inline-formula>), can be hard to calculate from this data especially for individual populations. Furthermore, disparities in the availability of testing, record keeping infrastructure, or government funding in disadvantaged populations can produce incomplete data sets. In this work, we apply ensemble data assimilation techniques which optimally combine model and data to produce a more complete data set providing better estimates of the critical metrics used by public health officials and epidemiologists. We employ a multi-population SEIR (Susceptible, Exposed, Infected and Recovered) model with a time dependent reproductive number and age stratified contact rate matrix for each population. We assimilate the daily death data for populations separated by ethnic/racial groupings using a technique called Ensemble Smoothing with Multiple Data Assimilation (ESMDA) to estimate model parameters and produce an <inline-formula><tex-math id=\"M10000\">begin{document}$R_t(n)$end{document}</tex-math></inline-formula> for the <inline-formula><tex-math id=\"M2000\">begin{document}$n^{th}$end{document}</tex-math></inline-formula> population. We do this with three distinct approaches, (1) using the same contact matrices and prior <inline-formula><tex-math id=\"M30000\">begin{document}$R_t(n)$end{document}</tex-math></inline-formula> for each population, (2) assigning contact matrices with increased contact rates for working age and older adults to populations experiencing disparity and (3) as in (2) but with a time-continuous update to <inline-formula><tex-math id=\"M4\">begin{document}$R_t(n)$end{document}</tex-math></inline-formula>. We make a study of 9 U.S. states and the District of Columbia providing a complete time series of the pandemic in each and, in some cases, identifying disparities not otherwise evident in the aggregate statistics.</p>","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"112 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Intrinsic disease maps using persistent cohomology 使用持续上同源的内在疾病图
Q2 MATHEMATICS, APPLIED Pub Date : 2021-01-01 DOI: 10.3934/FODS.2021008
Daniel Amin, Mikael Vejdemo-Johansson
We use persistent cohomology and circular coordinates to investigate three datasets related to infectious diseases. We show that all three datasets exhibit circular coordinates that carry information about the data itself. For one of the datasets we are able to recover time post infection from the circular coordinate itself – for the other datasets, this information was not available, but in one we were able to relate the circular coordinate to red blood cell counts and weight changes in the subjects.
我们使用持久上同源和圆坐标来调查三个与传染病相关的数据集。我们展示了所有三个数据集都显示了带有数据本身信息的圆形坐标。对于其中一个数据集,我们能够从圆形坐标本身恢复感染后的时间-对于其他数据集,该信息不可用,但在一个数据集中,我们能够将圆形坐标与受试者的红细胞计数和体重变化联系起来。
{"title":"Intrinsic disease maps using persistent cohomology","authors":"Daniel Amin, Mikael Vejdemo-Johansson","doi":"10.3934/FODS.2021008","DOIUrl":"https://doi.org/10.3934/FODS.2021008","url":null,"abstract":"We use persistent cohomology and circular coordinates to investigate three datasets related to infectious diseases. We show that all three datasets exhibit circular coordinates that carry information about the data itself. For one of the datasets we are able to recover time post infection from the circular coordinate itself – for the other datasets, this information was not available, but in one we were able to relate the circular coordinate to red blood cell counts and weight changes in the subjects.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ToFU: Topology functional units for deep learning 豆腐:深度学习的拓扑功能单元
Q2 MATHEMATICS, APPLIED Pub Date : 2021-01-01 DOI: 10.3934/fods.2021021
Christopher Oballe, D. Boothe, P. Franaszczuk, V. Maroulas
We propose ToFU, a new trainable neural network unit with a persistence diagram dissimilarity function as its activation. Since persistence diagrams are topological summaries of structures, this new activation measures and learns the topology of data to leverage it in machine learning tasks. We showcase the utility of ToFU in two experiments: one involving the classification of discrete-time autoregressive signals, and another involving a variational autoencoder. In the former, ToFU yields competitive results with networks that use spectral features while outperforming CNN architectures. In the latter, ToFU produces topologically-interpretable latent space representations of inputs without sacrificing reconstruction fidelity.
我们提出了一种新的可训练神经网络单元豆腐,该神经网络单元以一个持续图不相似函数作为其激活。由于持久性图是结构的拓扑摘要,这个新的激活测量和学习数据的拓扑,以便在机器学习任务中利用它。我们在两个实验中展示了豆腐的效用:一个涉及离散时间自回归信号的分类,另一个涉及变分自编码器。在前者中,豆腐与使用频谱特征的网络产生竞争结果,同时优于CNN架构。在后者中,豆腐在不牺牲重建保真度的情况下产生输入的拓扑可解释的潜在空间表示。
{"title":"ToFU: Topology functional units for deep learning","authors":"Christopher Oballe, D. Boothe, P. Franaszczuk, V. Maroulas","doi":"10.3934/fods.2021021","DOIUrl":"https://doi.org/10.3934/fods.2021021","url":null,"abstract":"We propose ToFU, a new trainable neural network unit with a persistence diagram dissimilarity function as its activation. Since persistence diagrams are topological summaries of structures, this new activation measures and learns the topology of data to leverage it in machine learning tasks. We showcase the utility of ToFU in two experiments: one involving the classification of discrete-time autoregressive signals, and another involving a variational autoencoder. In the former, ToFU yields competitive results with networks that use spectral features while outperforming CNN architectures. In the latter, ToFU produces topologically-interpretable latent space representations of inputs without sacrificing reconstruction fidelity.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A density-based approach to feature detection in persistence diagrams for firn data 一种基于密度的方法,用于在企业数据的持久性图中进行特征检测
Q2 MATHEMATICS, APPLIED Pub Date : 2021-01-01 DOI: 10.3934/FODS.2021012
A. Lawson, Tyler Hoffman, Yu-Min Chung, K. Keegan, S. Day
Topological data analysis, and in particular persistence diagrams, are gaining popularity as tools for extracting topological information from noisy point cloud and digital data. Persistence diagrams track topological features in the form of begin{document}$ k $end{document} -dimensional holes in the data. Here, we construct a new, automated approach for identifying persistence diagram points that represent robust long-life features. These features may be used to provide a more accurate estimate of Betti numbers for the underlying space. This approach extends the established practice of using a lifespan cutoff on the features in order to take advantage of the observation that noisy features typically appear in clusters in the persistence diagram. We show that this approach offers more flexibility in partitioning features in the persistence diagram, resulting in greater accuracy in computed Betti numbers, especially in the case of high noise levels and varying image illumination. This work is motivated by 3-dimensional Micro-CT imaging of ice core samples, and is applicable for separating noise from robust signals in persistence diagrams from noisy data.
Topological data analysis, and in particular persistence diagrams, are gaining popularity as tools for extracting topological information from noisy point cloud and digital data. Persistence diagrams track topological features in the form of begin{document}$ k $end{document} -dimensional holes in the data. Here, we construct a new, automated approach for identifying persistence diagram points that represent robust long-life features. These features may be used to provide a more accurate estimate of Betti numbers for the underlying space. This approach extends the established practice of using a lifespan cutoff on the features in order to take advantage of the observation that noisy features typically appear in clusters in the persistence diagram. We show that this approach offers more flexibility in partitioning features in the persistence diagram, resulting in greater accuracy in computed Betti numbers, especially in the case of high noise levels and varying image illumination. This work is motivated by 3-dimensional Micro-CT imaging of ice core samples, and is applicable for separating noise from robust signals in persistence diagrams from noisy data.
{"title":"A density-based approach to feature detection in persistence diagrams for firn data","authors":"A. Lawson, Tyler Hoffman, Yu-Min Chung, K. Keegan, S. Day","doi":"10.3934/FODS.2021012","DOIUrl":"https://doi.org/10.3934/FODS.2021012","url":null,"abstract":"Topological data analysis, and in particular persistence diagrams, are gaining popularity as tools for extracting topological information from noisy point cloud and digital data. Persistence diagrams track topological features in the form of begin{document}$ k $end{document} -dimensional holes in the data. Here, we construct a new, automated approach for identifying persistence diagram points that represent robust long-life features. These features may be used to provide a more accurate estimate of Betti numbers for the underlying space. This approach extends the established practice of using a lifespan cutoff on the features in order to take advantage of the observation that noisy features typically appear in clusters in the persistence diagram. We show that this approach offers more flexibility in partitioning features in the persistence diagram, resulting in greater accuracy in computed Betti numbers, especially in the case of high noise levels and varying image illumination. This work is motivated by 3-dimensional Micro-CT imaging of ice core samples, and is applicable for separating noise from robust signals in persistence diagrams from noisy data.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The rankability of weighted data from pairwise comparisons 两两比较中加权数据的排名
Q2 MATHEMATICS, APPLIED Pub Date : 2021-01-01 DOI: 10.3934/FODS.2021002
Paul E. Anderson, T. Chartier, A. Langville, Kathryn E. Pedings-Behling
In prior work [ 4 ], Anderson et al. introduced a new problem, the rankability problem, which refers to a dataset's inherent ability to produce a meaningful ranking of its items. Ranking is a fundamental data science task with numerous applications that include web search, data mining, cybersecurity, machine learning, and statistical learning theory. Yet little attention has been paid to the question of whether a dataset is suitable for ranking. As a result, when a ranking method is applied to a dataset with low rankability, the resulting ranking may not be reliable. Rankability paper [ 4 ] and its methods studied unweighted data for which the dominance relations are binary, i.e., an item either dominates or is dominated by another item. In this paper, we extend rankability methods to weighted data for which an item may dominate another by any finite amount. We present combinatorial approaches to a weighted rankability measure and apply our new measure to several weighted datasets.
在之前的工作[4]中,Anderson等人引入了一个新问题,即排名问题,这是指数据集对其项目产生有意义排名的固有能力。排名是一项基础数据科学任务,有许多应用,包括网络搜索、数据挖掘、网络安全、机器学习和统计学习理论。然而,很少有人关注数据集是否适合进行排名的问题。因此,当排名方法应用于排名性较低的数据集时,所得到的排名可能不可靠。排名论文[4]及其方法研究了优势关系为二元的未加权数据,即一个项目占主导地位或被另一个项目占主导地位。在本文中,我们将排名方法扩展到一个项目可以以任意有限的量支配另一个项目的加权数据。我们提出了加权排名度量的组合方法,并将我们的新度量应用于几个加权数据集。
{"title":"The rankability of weighted data from pairwise comparisons","authors":"Paul E. Anderson, T. Chartier, A. Langville, Kathryn E. Pedings-Behling","doi":"10.3934/FODS.2021002","DOIUrl":"https://doi.org/10.3934/FODS.2021002","url":null,"abstract":"In prior work [ 4 ], Anderson et al. introduced a new problem, the rankability problem, which refers to a dataset's inherent ability to produce a meaningful ranking of its items. Ranking is a fundamental data science task with numerous applications that include web search, data mining, cybersecurity, machine learning, and statistical learning theory. Yet little attention has been paid to the question of whether a dataset is suitable for ranking. As a result, when a ranking method is applied to a dataset with low rankability, the resulting ranking may not be reliable. Rankability paper [ 4 ] and its methods studied unweighted data for which the dominance relations are binary, i.e., an item either dominates or is dominated by another item. In this paper, we extend rankability methods to weighted data for which an item may dominate another by any finite amount. We present combinatorial approaches to a weighted rankability measure and apply our new measure to several weighted datasets.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Learning landmark geodesics using the ensemble Kalman filter 使用集合卡尔曼滤波器学习地标测地线
Q2 MATHEMATICS, APPLIED Pub Date : 2021-01-01 DOI: 10.3934/fods.2021020
Andreas Bock, C. Cotter
We study the problem of diffeomorphometric geodesic landmark matching where the objective is to find a diffeomorphism that, via its group action, maps between two sets of landmarks. It is well-known that the motion of the landmarks, and thereby the diffeomorphism, can be encoded by an initial momentum leading to a formulation where the landmark matching problem can be solved as an optimisation problem over such momenta. The novelty of our work lies in the application of a derivative-free Bayesian inverse method for learning the optimal momentum encoding the diffeomorphic mapping between the template and the target. The method we apply is the ensemble Kalman filter, an extension of the Kalman filter to nonlinear operators. We describe an efficient implementation of the algorithm and show several numerical results for various target shapes.
我们研究了微分同构的测地线地标匹配问题,其目标是找到一个通过群作用在两组地标之间映射的微分同构。众所周知,地标的运动,从而微分同构,可以通过一个初始动量编码,导致一个公式,其中地标匹配问题可以作为一个优化问题解决在这样的动量。我们工作的新颖之处在于应用无导数贝叶斯逆方法来学习模板和目标之间差分映射的最优动量编码。我们采用的方法是集合卡尔曼滤波,这是卡尔曼滤波在非线性算子上的扩展。我们描述了一种有效的算法实现,并给出了几种不同形状目标的数值结果。
{"title":"Learning landmark geodesics using the ensemble Kalman filter","authors":"Andreas Bock, C. Cotter","doi":"10.3934/fods.2021020","DOIUrl":"https://doi.org/10.3934/fods.2021020","url":null,"abstract":"We study the problem of diffeomorphometric geodesic landmark matching where the objective is to find a diffeomorphism that, via its group action, maps between two sets of landmarks. It is well-known that the motion of the landmarks, and thereby the diffeomorphism, can be encoded by an initial momentum leading to a formulation where the landmark matching problem can be solved as an optimisation problem over such momenta. The novelty of our work lies in the application of a derivative-free Bayesian inverse method for learning the optimal momentum encoding the diffeomorphic mapping between the template and the target. The method we apply is the ensemble Kalman filter, an extension of the Kalman filter to nonlinear operators. We describe an efficient implementation of the algorithm and show several numerical results for various target shapes.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reconstructing linearly embedded graphs: A first step to stratified space learning 重构线性嵌入图:分层空间学习的第一步
Q2 MATHEMATICS, APPLIED Pub Date : 2021-01-01 DOI: 10.3934/fods.2021026
Yossi Bokor Bleile, Katharine Turner, Christopher Williams
In this paper, we consider the simplest class of stratified spaces – linearly embedded graphs. We present an algorithm that learns the abstract structure of an embedded graph and models the specific embedding from a point cloud sampled from it. We use tools and inspiration from computational geometry, algebraic topology, and topological data analysis and prove the correctness of the identified abstract structure under assumptions on the embedding. The algorithm is implemented in the Julia package Skyler, which we used for the numerical simulations in this paper.
本文考虑了最简单的一类分层空间——线性嵌入图。我们提出了一种算法,该算法学习嵌入图的抽象结构,并从从中采样的点云中对特定嵌入建模。我们利用计算几何、代数拓扑和拓扑数据分析的工具和灵感,证明了在嵌入假设下识别的抽象结构的正确性。该算法在Julia软件包Skyler中实现,本文使用该软件包进行数值模拟。
{"title":"Reconstructing linearly embedded graphs: A first step to stratified space learning","authors":"Yossi Bokor Bleile, Katharine Turner, Christopher Williams","doi":"10.3934/fods.2021026","DOIUrl":"https://doi.org/10.3934/fods.2021026","url":null,"abstract":"In this paper, we consider the simplest class of stratified spaces – linearly embedded graphs. We present an algorithm that learns the abstract structure of an embedded graph and models the specific embedding from a point cloud sampled from it. We use tools and inspiration from computational geometry, algebraic topology, and topological data analysis and prove the correctness of the identified abstract structure under assumptions on the embedding. The algorithm is implemented in the Julia package Skyler, which we used for the numerical simulations in this paper.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Score matching filters for Gaussian Markov random fields with a linear model of the precision matrix 分数匹配滤波器高斯马尔可夫随机场与精度矩阵的线性模型
Q2 MATHEMATICS, APPLIED Pub Date : 2021-01-01 DOI: 10.3934/fods.2021030
Marie Turčičová, J. Mandel, K. Eben
We present an ensemble filtering method based on a linear model for the precision matrix (the inverse of the covariance) with the parameters determined by Score Matching Estimation. The method provides a rigorous covariance regularization when the underlying random field is Gaussian Markov. The parameters are found by solving a system of linear equations. The analysis step uses the inverse formulation of the Kalman update. Several filter versions, differing in the construction of the analysis ensemble, are proposed, as well as a Score matching version of the Extended Kalman Filter.
我们提出了一种基于精度矩阵(协方差逆)的线性模型的集成滤波方法,其参数由分数匹配估计确定。当底层随机场为高斯马尔可夫时,该方法提供了严格的协方差正则化。这些参数是通过求解一个线性方程组得到的。分析步骤使用卡尔曼更新的逆公式。提出了几种不同于分析集合构造的滤波器版本,以及扩展卡尔曼滤波器的分数匹配版本。
{"title":"Score matching filters for Gaussian Markov random fields with a linear model of the precision matrix","authors":"Marie Turčičová, J. Mandel, K. Eben","doi":"10.3934/fods.2021030","DOIUrl":"https://doi.org/10.3934/fods.2021030","url":null,"abstract":"We present an ensemble filtering method based on a linear model for the precision matrix (the inverse of the covariance) with the parameters determined by Score Matching Estimation. The method provides a rigorous covariance regularization when the underlying random field is Gaussian Markov. The parameters are found by solving a system of linear equations. The analysis step uses the inverse formulation of the Kalman update. Several filter versions, differing in the construction of the analysis ensemble, are proposed, as well as a Score matching version of the Extended Kalman Filter.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An international initiative of predicting the SARS-CoV-2 pandemic using ensemble data assimilation 利用集合数据同化预测SARS-CoV-2大流行的国际倡议
Q2 MATHEMATICS, APPLIED Pub Date : 2020-12-11 DOI: 10.3934/fods.2021001
G. Evensen, Javier Amezcua, M. Bocquet, A. Carrassi, A. Farchi, A. Fowler, P. Houtekamer, C. Jones, R. Moraes, M. Pulido, C. Sampson, F. Vossepoel
This work demonstrates the efficiency of using iterative ensemble smoothers to estimate the parameters of an SEIR model. We have extended a standard SEIR model with age-classes and compartments of sick, hospitalized, and dead. The data conditioned on are the daily numbers of accumulated deaths and the number of hospitalized. Also, it is possible to condition the model on the number of cases obtained from testing. We start from a wide prior distribution for the model parameters; then, the ensemble conditioning leads to a posterior ensemble of estimated parameters yielding model predictions in close agreement with the observations. The updated ensemble of model simulations has predictive capabilities and include uncertainty estimates. In particular, we estimate the effective reproductive number as a function of time, and we can assess the impact of different intervention measures. By starting from the updated set of model parameters, we can make accurate short-term predictions of the epidemic development assuming knowledge of the future effective reproductive number. Also, the model system allows for the computation of long-term scenarios of the epidemic under different assumptions. We have applied the model system on data sets from several countries, i.e., the four European countries Norway, England, The Netherlands, and France; the province of Quebec in Canada; the South American countries Argentina and Brazil; and the four US states Alabama, North Carolina, California, and New York. These countries and states all have vastly different developments of the epidemic, and we could accurately model the SARS-CoV-2 outbreak in all of them. We realize that more complex models, e.g., with regional compartments, may be desirable, and we suggest that the approach used here should be applicable also for these models.
这项工作证明了使用迭代集成平滑器来估计SEIR模型参数的效率。我们扩展了一个标准的SEIR模型,该模型包含了患病、住院和死亡的年龄等级和隔间。以每日累计死亡人数和住院人数为条件的数据。此外,可以根据从测试中获得的病例数量来调整模型。我们从模型参数的广泛先验分布开始;然后,集合条件导致估计参数的后验集合,从而产生与观测结果非常一致的模型预测。更新后的模型模拟集合具有预测能力,并包括不确定性估计。特别是,我们将有效繁殖数量估计为时间的函数,我们可以评估不同干预措施的影响。通过从更新后的一组模型参数开始,假设知道未来的有效繁殖数,我们可以对疫情发展做出准确的短期预测。此外,该模型系统允许在不同假设下计算疫情的长期情景。我们已经将模型系统应用于几个国家的数据集,即四个欧洲国家挪威、英国、荷兰和法国;加拿大魁北克省;南美洲国家阿根廷和巴西;以及美国四个州阿拉巴马州、北卡罗来纳州、加利福尼亚州和纽约州。这些国家和州的疫情发展都大不相同,我们可以准确地模拟所有国家的严重急性呼吸系统综合征冠状病毒2型疫情。我们意识到,可能需要更复杂的模型,例如具有区域分区的模型,我们建议此处使用的方法也应适用于这些模型。
{"title":"An international initiative of predicting the SARS-CoV-2 pandemic using ensemble data assimilation","authors":"G. Evensen, Javier Amezcua, M. Bocquet, A. Carrassi, A. Farchi, A. Fowler, P. Houtekamer, C. Jones, R. Moraes, M. Pulido, C. Sampson, F. Vossepoel","doi":"10.3934/fods.2021001","DOIUrl":"https://doi.org/10.3934/fods.2021001","url":null,"abstract":"This work demonstrates the efficiency of using iterative ensemble smoothers to estimate the parameters of an SEIR model. We have extended a standard SEIR model with age-classes and compartments of sick, hospitalized, and dead. The data conditioned on are the daily numbers of accumulated deaths and the number of hospitalized. Also, it is possible to condition the model on the number of cases obtained from testing. We start from a wide prior distribution for the model parameters; then, the ensemble conditioning leads to a posterior ensemble of estimated parameters yielding model predictions in close agreement with the observations. The updated ensemble of model simulations has predictive capabilities and include uncertainty estimates. In \u0000particular, we estimate the effective reproductive number as a function of time, and we can assess the impact of different intervention measures. By starting from the updated set of model parameters, we can make accurate short-term predictions of the epidemic development assuming \u0000knowledge of the future effective reproductive number. Also, the model system allows for the computation of long-term scenarios of the epidemic under different assumptions. We have applied the model system on data sets from several countries, i.e., the four European countries Norway, England, The Netherlands, and France; the province of Quebec in Canada; the South American countries Argentina and Brazil; and the four US states Alabama, North Carolina, California, and New York. These countries and states all have vastly different developments of the epidemic, and we could accurately model the SARS-CoV-2 outbreak in all of them. We realize that more complex models, e.g., with regional compartments, may be desirable, and we suggest that the approach used here should be applicable also for these models.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43519659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
Foundations of data science (Springfield, Mo.)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1