首页 > 最新文献

Foundations of data science (Springfield, Mo.)最新文献

英文 中文
EmT: Locating empty territories of homology group generators in a dataset EmT:在数据集中定位同源群生成器的空区域
Q2 MATHEMATICS, APPLIED Pub Date : 2019-06-03 DOI: 10.3934/FODS.2019010
Xin Xu, J. Cisewski-Kehe
Persistent homology is a tool within topological data analysis to detect different dimensional holes in a dataset. The boundaries of the empty territories (i.e., holes) are not well-defined and each has multiple representations. The proposed method, Empty Territory (EmT), provides representations of different dimensional holes with a specified level of complexity of the territory boundary. EmT is designed for the setting where persistent homology uses a Vietoris-Rips complex filtration, and works as a post-analysis to refine the hole representation of the persistent homology algorithm. In particular, EmT uses alpha shapes to obtain a special class of representations that captures the empty territories with a complexity determined by the size of the alpha balls. With a fixed complexity, EmT returns the representation that contains the most points within the special class of representations. This method is limited to finding 1D holes in 2D data and 2D holes in 3D data, and is illustrated on simulation datasets of a homogeneous Poisson point process in 2D and a uniform sampling in 3D. Furthermore, the method is applied to a 2D cell tower location geography dataset and 3D Sloan Digital Sky Survey (SDSS) galaxy dataset, where it works well in capturing the empty territories.
持久同源性是拓扑数据分析中的一种工具,用于检测数据集中不同维度的漏洞。空白区域(即孔洞)的边界没有明确定义,每个区域都有多个表示。所提出的方法,空区域(EmT),提供了具有特定复杂程度的区域边界的不同尺寸孔的表示。EmT是为持久同源性使用Vietoris Rips复杂过滤的环境而设计的,并作为后分析来完善持久同源性算法的空穴表示。特别地,EmT使用阿尔法形状来获得一类特殊的表示,该表示捕捉由阿尔法球的大小决定的复杂度的空白区域。在固定复杂度的情况下,EmT返回在特殊表示类中包含最多点的表示。该方法仅限于在2D数据中找到1D空穴和在3D数据中找到2D空穴,并在2D中的齐次泊松点过程和3D中的均匀采样的模拟数据集上进行了说明。此外,该方法还应用于2D细胞塔位置地理数据集和3D斯隆数字巡天(SDSS)星系数据集,在那里它可以很好地捕捉空白区域。
{"title":"EmT: Locating empty territories of homology group generators in a dataset","authors":"Xin Xu, J. Cisewski-Kehe","doi":"10.3934/FODS.2019010","DOIUrl":"https://doi.org/10.3934/FODS.2019010","url":null,"abstract":"Persistent homology is a tool within topological data analysis to detect different dimensional holes in a dataset. The boundaries of the empty territories (i.e., holes) are not well-defined and each has multiple representations. The proposed method, Empty Territory (EmT), provides representations of different dimensional holes with a specified level of complexity of the territory boundary. EmT is designed for the setting where persistent homology uses a Vietoris-Rips complex filtration, and works as a post-analysis to refine the hole representation of the persistent homology algorithm. In particular, EmT uses alpha shapes to obtain a special class of representations that captures the empty territories with a complexity determined by the size of the alpha balls. With a fixed complexity, EmT returns the representation that contains the most points within the special class of representations. This method is limited to finding 1D holes in 2D data and 2D holes in 3D data, and is illustrated on simulation datasets of a homogeneous Poisson point process in 2D and a uniform sampling in 3D. Furthermore, the method is applied to a 2D cell tower location geography dataset and 3D Sloan Digital Sky Survey (SDSS) galaxy dataset, where it works well in capturing the empty territories.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42374169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Levels and trends in the sex ratio at birth and missing female births for 29 states and union territories in India 1990–2016: A Bayesian modeling study 1990-2016年印度29个邦和联邦属地出生性别比和失踪女婴的水平和趋势:贝叶斯模型研究
Q2 MATHEMATICS, APPLIED Pub Date : 2019-06-03 DOI: 10.3934/FODS.2019008
Fengqing Chao, A. Yadav
The sex ratio at birth (SRB) has risen in India and reaches well beyond the levels under normal circumstances since the 1970s. The lasting imbalanced SRB has resulted in much more males than females in India. A population with severely distorted sex ratio is more likely to have prolonged struggle for stability and sustainability. It is crucial to estimate SRB and its imbalance for India on state level and assess the uncertainty around estimates. We develop a Bayesian model to estimate SRB in India from 1990 to 2016 for 29 states and union territories. Our analyses are based on a comprehensive database on state-level SRB with data from the sample registration system, census and Demographic and Health Surveys. The SRB varies greatly across Indian states and union territories in 2016: ranging from 1.026 (95% uncertainty interval [0.971; 1.087]) in Mizoram to 1.181 [1.143; 1.128] in Haryana. We identify 18 states and union territories with imbalanced SRB during 1990–2016, resulting in 14.9 [13.2; 16.5] million of missing female births in India. Uttar Pradesh has the largest share of the missing female births among all states and union territories, taking up to 32.8% [29.5%; 36.3%] of the total number.
自20世纪70年代以来,印度的出生性比例(SRB)一直在上升,远远超过了正常情况下的水平。长期的男女性别比失衡导致印度男性比女性多得多。性别比例严重扭曲的人口更有可能为稳定和可持续发展而进行长期斗争。至关重要的是要估计印度邦一级的SRB及其不平衡,并评估估计的不确定性。我们开发了一个贝叶斯模型来估计1990年至2016年印度29个邦和联邦领土的SRB。我们的分析是基于一个国家级SRB的综合数据库,其中的数据来自抽样登记系统、人口普查和人口与健康调查。2016年印度各邦和联邦属地的SRB差异很大:从1.026(95%不确定区间[0.971;1.087]),米佐拉姆邦为1.181 [1.143;[1.28]哈里亚纳邦。在1990-2016年期间,我们确定了18个州和联邦领土的性别性别失衡,导致14.9 [13.2;1650万印度失踪的女婴。北方邦在所有邦和联邦领土中失踪女婴的比例最大,占32.8% [29.5%;占总数的36.3%。
{"title":"Levels and trends in the sex ratio at birth and missing female births for 29 states and union territories in India 1990–2016: A Bayesian modeling study","authors":"Fengqing Chao, A. Yadav","doi":"10.3934/FODS.2019008","DOIUrl":"https://doi.org/10.3934/FODS.2019008","url":null,"abstract":"The sex ratio at birth (SRB) has risen in India and reaches well beyond the levels under normal circumstances since the 1970s. The lasting imbalanced SRB has resulted in much more males than females in India. A population with severely distorted sex ratio is more likely to have prolonged struggle for stability and sustainability. It is crucial to estimate SRB and its imbalance for India on state level and assess the uncertainty around estimates. We develop a Bayesian model to estimate SRB in India from 1990 to 2016 for 29 states and union territories. Our analyses are based on a comprehensive database on state-level SRB with data from the sample registration system, census and Demographic and Health Surveys. The SRB varies greatly across Indian states and union territories in 2016: ranging from 1.026 (95% uncertainty interval [0.971; 1.087]) in Mizoram to 1.181 [1.143; 1.128] in Haryana. We identify 18 states and union territories with imbalanced SRB during 1990–2016, resulting in 14.9 [13.2; 16.5] million of missing female births in India. Uttar Pradesh has the largest share of the missing female births among all states and union territories, taking up to 32.8% [29.5%; 36.3%] of the total number.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47172022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Power weighted shortest paths for clustering Euclidean data 欧氏数据聚类的权加权最短路径
Q2 MATHEMATICS, APPLIED Pub Date : 2019-05-30 DOI: 10.3934/fods.2019014
Daniel Mckenzie, S. Damelin
We study the use of power weighted shortest path distance functions for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for computing these distances.
在假设高维欧几里德数据来自不相交的低维流形集合的情况下,研究了幂加权最短路径距离函数在高维欧几里德数据聚类中的应用。我们认为,从理论上和实验上,这将导致更高的聚类精度。我们还提出了一种计算这些距离的快速算法。
{"title":"Power weighted shortest paths for clustering Euclidean data","authors":"Daniel Mckenzie, S. Damelin","doi":"10.3934/fods.2019014","DOIUrl":"https://doi.org/10.3934/fods.2019014","url":null,"abstract":"We study the use of power weighted shortest path distance functions for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for computing these distances.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70247788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
General risk measures for robust machine learning 鲁棒机器学习的一般风险度量
Q2 MATHEMATICS, APPLIED Pub Date : 2019-04-26 DOI: 10.3934/fods.2019011
É. Chouzenoux, Henri G'erard, J. Pesquet
A wide array of machine learning problems are formulated as the minimization of the expectation of a convex loss function on some parameter space. Since the probability distribution of the data of interest is usually unknown, it is is often estimated from training sets, which may lead to poor out-of-sample performance. In this work, we bring new insights in this problem by using the framework which has been developed in quantitative finance for risk measures. We show that the original min-max problem can be recast as a convex minimization problem under suitable assumptions. We discuss several important examples of robust formulations, in particular by defining ambiguity sets based on $varphi$-divergences and the Wasserstein metric.We also propose an efficient algorithm for solving the corresponding convex optimization problems involving complex convex constraints. Through simulation examples, we demonstrate that this algorithm scales well on real data sets.
一系列广泛的机器学习问题被公式化为在一些参数空间上凸损失函数的期望的最小化。由于感兴趣数据的概率分布通常是未知的,因此通常是根据训练集来估计的,这可能会导致样本外性能较差。在这项工作中,我们通过使用量化金融中开发的风险度量框架,为这个问题带来了新的见解。我们证明了在适当的假设下,原始的最小-最大问题可以被重新定义为凸最小化问题。我们讨论了鲁棒公式的几个重要例子,特别是通过定义基于$varphi$-differences和Wasserstein度量的模糊集。我们还提出了一种有效的算法来解决涉及复杂凸约束的相应凸优化问题。通过仿真实例,我们证明了该算法在真实数据集上的良好扩展性。
{"title":"General risk measures for robust machine learning","authors":"É. Chouzenoux, Henri G'erard, J. Pesquet","doi":"10.3934/fods.2019011","DOIUrl":"https://doi.org/10.3934/fods.2019011","url":null,"abstract":"A wide array of machine learning problems are formulated as the minimization of the expectation of a convex loss function on some parameter space. Since the probability distribution of the data of interest is usually unknown, it is is often estimated from training sets, which may lead to poor out-of-sample performance. In this work, we bring new insights in this problem by using the framework which has been developed in quantitative finance for risk measures. We show that the original min-max problem can be recast as a convex minimization problem under suitable assumptions. We discuss several important examples of robust formulations, in particular by defining ambiguity sets based on $varphi$-divergences and the Wasserstein metric.We also propose an efficient algorithm for solving the corresponding convex optimization problems involving complex convex constraints. Through simulation examples, we demonstrate that this algorithm scales well on real data sets.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43459497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Estimation and uncertainty quantification for the output from quantum simulators 量子模拟器输出的估计与不确定性量化
Q2 MATHEMATICS, APPLIED Pub Date : 2019-03-07 DOI: 10.3934/FODS.2019007
R. Bennink, A. Jasra, K. Law, P. Lougovski
The problem of estimating certain distributions over {0, 1}d is considered here. The distribution represents a quantum system of d qubits, where there are non-trivial dependencies between the qubits. A maximum entropy approach is adopted to reconstruct the distribution from exact moments or observed empirical moments. The Robbins Monro algorithm is used to solve the intractable maximum entropy problem, by constructing an unbiased estimator of the un-normalized target with a sequential Monte Carlo sampler at each iteration. In the case of empirical moments, this coincides with a maximum likelihood estimator. A Bayesian formulation is also considered in order to quantify uncertainty a posteriori. Several approaches are proposed in order to tackle this challenging problem, based on recently developed methodologies. In particular, unbiased estimators of the gradient of the log posterior are constructed and used within a provably convergent Langevin-based Markov chain Monte Carlo method. The methods are illustrated on classically simulated output from quantum simulators.
这里考虑了在{0,1}d上估计某些分布的问题。该分布表示d个量子位的量子系统,其中量子位之间存在非平凡的依赖关系。采用最大熵方法从精确矩或观测到的经验矩重建分布。Robbins-Monro算法用于解决棘手的最大熵问题,方法是在每次迭代时用顺序蒙特卡罗采样器构造未归一化目标的无偏估计器。在经验矩的情况下,这与最大似然估计器一致。为了对不确定性进行后验量化,还考虑了贝叶斯公式。根据最近开发的方法,提出了几种方法来解决这一具有挑战性的问题。特别地,在可证明收敛的基于Langevin的马尔可夫链蒙特卡罗方法中,构造并使用对数后验梯度的无偏估计量。这些方法在量子模拟器的经典模拟输出上进行了说明。
{"title":"Estimation and uncertainty quantification for the output from quantum simulators","authors":"R. Bennink, A. Jasra, K. Law, P. Lougovski","doi":"10.3934/FODS.2019007","DOIUrl":"https://doi.org/10.3934/FODS.2019007","url":null,"abstract":"The problem of estimating certain distributions over {0, 1}d is considered here. The distribution represents a quantum system of d qubits, where there are non-trivial dependencies between the qubits. A maximum entropy approach is adopted to reconstruct the distribution from exact moments or observed empirical moments. The Robbins Monro algorithm is used to solve the intractable maximum entropy problem, by constructing an unbiased estimator of the un-normalized target with a sequential Monte Carlo sampler at each iteration. In the case of empirical moments, this coincides with a maximum likelihood estimator. A Bayesian formulation is also considered in order to quantify uncertainty a posteriori. Several approaches are proposed in order to tackle this challenging problem, based on recently developed methodologies. In particular, unbiased estimators of the gradient of the log posterior are constructed and used within a provably convergent Langevin-based Markov chain Monte Carlo method. The methods are illustrated on classically simulated output from quantum simulators.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42733584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Approximate bayesian inference for geostatistical generalised linear models 地质统计广义线性模型的近似贝叶斯推断
Q2 MATHEMATICS, APPLIED Pub Date : 2019-03-07 DOI: 10.3934/FODS.2019002
E. Evangelou
The aim of this paper is to bring together recent developments in Bayesian generalised linear mixed models and geostatistics. We focus on approximate methods on both areas. A technique known as full-scale approximation, proposed by Sang and Huang (2012) for improving the computational drawbacks of large geostatistical data, is incorporated into the INLA methodology, used for approximate Bayesian inference. We also discuss how INLA can be used for approximating the posterior distribution of transformations of parameters, useful for practical applications. Issues regarding the choice of the parameters of the approximation such as the knots and taper range are also addressed. Emphasis is given in applications in the context of disease mapping by illustrating the methodology for modelling the loa loa prevalence in Cameroon and malaria in the Gambia.
本文的目的是汇集贝叶斯广义线性混合模型和地质统计学的最新发展。我们着重于这两个领域的近似方法。Sang和Huang(2012)提出了一种称为全面近似的技术,用于改善大型地质统计数据的计算缺陷,该技术被纳入INLA方法中,用于近似贝叶斯推断。我们还讨论了如何使用INLA来近似参数变换的后验分布,这对实际应用很有用。关于选择参数的近似,如节和锥度范围的问题也进行了讨论。重点介绍了在绘制疾病地图方面的应用,说明了对喀麦隆的疟疾流行率和冈比亚的疟疾进行建模的方法。
{"title":"Approximate bayesian inference for geostatistical generalised linear models","authors":"E. Evangelou","doi":"10.3934/FODS.2019002","DOIUrl":"https://doi.org/10.3934/FODS.2019002","url":null,"abstract":"The aim of this paper is to bring together recent developments in Bayesian generalised linear mixed models and geostatistics. We focus on approximate methods on both areas. A technique known as full-scale approximation, proposed by Sang and Huang (2012) for improving the computational drawbacks of large geostatistical data, is incorporated into the INLA methodology, used for approximate Bayesian inference. We also discuss how INLA can be used for approximating the posterior distribution of transformations of parameters, useful for practical applications. Issues regarding the choice of the parameters of the approximation such as the knots and taper range are also addressed. Emphasis is given in applications in the context of disease mapping by illustrating the methodology for modelling the loa loa prevalence in Cameroon and malaria in the Gambia.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44770194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combinatorial Hodge theory for equitable kidney paired donation 公平配对肾脏捐献的组合Hodge理论
Q2 MATHEMATICS, APPLIED Pub Date : 2019-03-07 DOI: 10.3934/FODS.2019004
Joshua L. Mike, V. Maroulas
Kidney Paired Donation (KPD) is a system whereby incompatible patient-donor pairs (PD pairs) are entered into a pool to find compatible cyclic kidney exchanges where each pair gives and receives a kidney. The donation allocation decision problem for a KPD pool has traditionally been viewed within an economic theory and integer-programming framework. While previous allocation schema work well to donate the maximum number of kidneys at a specific time, certain subgroups of patients are rarely matched in such an exchange. Consequently, these methods lead to systematic inequity in the exchange, where many patients are rejected a kidney repeatedly. Our goal is to investigate inequity within the distribution of kidney allocation among patients, and to present an algorithm which minimizes allocation disparities. The method presented is inspired by cohomology and describes the cyclic structure in a kidney exchange efficiently; this structure is then used to search for an equitable kidney allocation. Another key result of our approach is a score function defined on PD pairs which measures cycle disparity within a KPD pool; i.e., this function measures the relative chance for each PD pair to take part in the kidney exchange if cycles are chosen uniformly. Specifically, we show that PD pairs with underdemanded donors or highly sensitized patients have lower scores than typical PD pairs. Furthermore, our results demonstrate that PD pair score and the chance to obtain a kidney are positively correlated when allocation is done by utility-optimal integer programming methods. In contrast, the chance to obtain a kidney through our method is independent of score, and thus unbiased in this regard.
肾脏配对捐献(KPD)是一种将不相容的患者-供体配对(PD对)输入池中以寻找相容的循环肾脏交换的系统,其中每对配对提供和接受一个肾脏。传统上,人们是从经济理论和整数规划框架来看待捐赠池分配决策问题的。虽然以前的分配模式可以很好地在特定时间捐献最大数量的肾脏,但在这样的交换中,某些亚组患者很少匹配。因此,这些方法导致了器官交换系统的不公平,许多患者反复拒绝换肾。我们的目标是调查患者之间肾脏分配分配的不公平,并提出一种最小化分配差异的算法。该方法受上同调的启发,有效地描述了肾脏交换中的循环结构;然后使用这个结构来寻找一个公平的肾脏分配。我们的方法的另一个关键结果是在PD对上定义的分数函数,它测量KPD池内的周期差异;也就是说,如果周期选择一致,该函数测量每个PD对参与肾脏交换的相对机会。具体来说,我们发现供体需求不足或高度敏感的PD配对比典型的PD配对得分低。此外,我们的研究结果表明,当使用效用最优整数规划方法进行分配时,PD对评分和获得肾脏的机会呈正相关。相比之下,通过我们的方法获得肾脏的机会与得分无关,因此在这方面是公正的。
{"title":"Combinatorial Hodge theory for equitable kidney paired donation","authors":"Joshua L. Mike, V. Maroulas","doi":"10.3934/FODS.2019004","DOIUrl":"https://doi.org/10.3934/FODS.2019004","url":null,"abstract":"Kidney Paired Donation (KPD) is a system whereby incompatible patient-donor pairs (PD pairs) are entered into a pool to find compatible cyclic kidney exchanges where each pair gives and receives a kidney. The donation allocation decision problem for a KPD pool has traditionally been viewed within an economic theory and integer-programming framework. While previous allocation schema work well to donate the maximum number of kidneys at a specific time, certain subgroups of patients are rarely matched in such an exchange. Consequently, these methods lead to systematic inequity in the exchange, where many patients are rejected a kidney repeatedly. Our goal is to investigate inequity within the distribution of kidney allocation among patients, and to present an algorithm which minimizes allocation disparities. The method presented is inspired by cohomology and describes the cyclic structure in a kidney exchange efficiently; this structure is then used to search for an equitable kidney allocation. Another key result of our approach is a score function defined on PD pairs which measures cycle disparity within a KPD pool; i.e., this function measures the relative chance for each PD pair to take part in the kidney exchange if cycles are chosen uniformly. Specifically, we show that PD pairs with underdemanded donors or highly sensitized patients have lower scores than typical PD pairs. Furthermore, our results demonstrate that PD pair score and the chance to obtain a kidney are positively correlated when allocation is done by utility-optimal integer programming methods. In contrast, the chance to obtain a kidney through our method is independent of score, and thus unbiased in this regard.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44209556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Particle filters for inference of high-dimensional multivariate stochastic volatility models with cross-leverage effects 具有交叉杠杆效应的高维多元随机波动模型的粒子滤波
Q2 MATHEMATICS, APPLIED Pub Date : 2019-02-25 DOI: 10.3934/fods.2019003
Yaxian Xu, A. Jasra
Multivariate stochastic volatility models are a popular and well-known class of models in the analysis of financial time series because of their abilities to capture the important stylized facts of financial returns data. We consider the problems of filtering distribution estimation and also marginal likelihood calculation for multivariate stochastic volatility models with cross-leverage effects in the high dimensional case, that is when the number of financial time series that we analyze simultaneously (denoted by begin{document}$ d $end{document} ) is large. The standard particle filter has been widely used in the literature to solve these intractable inference problems. It has excellent performance in low to moderate dimensions, but collapses in the high dimensional case. In this article, two new and advanced particle filters proposed in [ 4 ], named the space-time particle filter and the marginal space-time particle filter, are explored for these estimation problems. The better performance in both the accuracy and stability for the two advanced particle filters are shown using simulation and empirical studies in comparison with the standard particle filter. In addition, Bayesian static model parameter estimation problem is considered with the advances in particle Markov chain Monte Carlo methods. The particle marginal Metropolis-Hastings algorithm is applied together with the likelihood estimates from the space-time particle filter to infer the static model parameter successfully when that using the likelihood estimates from the standard particle filter fails.
多元随机波动率模型是金融时间序列分析中一类流行且知名的模型,因为它们能够捕捉金融回报数据的重要程式化事实。我们考虑了在高维情况下,具有交叉杠杆效应的多元随机波动率模型的滤波分布估计和边际似然计算问题,也就是说,当我们同时分析的金融时间序列的数量(用 begin{document}$d$ end{document}表示)很大时。标准粒子滤波器在文献中被广泛用于解决这些棘手的推理问题。它在低到中等维度上具有出色的性能,但在高维度的情况下会崩溃。本文针对这些估计问题,探讨了[4]中提出的两种新的高级粒子滤波器,即时空粒子滤波器和边缘时空粒子滤波器。通过模拟和实证研究,与标准粒子滤波器相比,两种先进粒子滤波器在精度和稳定性方面都表现出更好的性能。此外,结合粒子马尔可夫链蒙特卡罗方法的进展,考虑了贝叶斯静态模型参数估计问题。当使用来自标准粒子滤波器的似然估计失败时,粒子边际Metropolis-Hastings算法与来自时空粒子滤波器的概率估计一起应用,以成功地推断静态模型参数。
{"title":"Particle filters for inference of high-dimensional multivariate stochastic volatility models with cross-leverage effects","authors":"Yaxian Xu, A. Jasra","doi":"10.3934/fods.2019003","DOIUrl":"https://doi.org/10.3934/fods.2019003","url":null,"abstract":"Multivariate stochastic volatility models are a popular and well-known class of models in the analysis of financial time series because of their abilities to capture the important stylized facts of financial returns data. We consider the problems of filtering distribution estimation and also marginal likelihood calculation for multivariate stochastic volatility models with cross-leverage effects in the high dimensional case, that is when the number of financial time series that we analyze simultaneously (denoted by begin{document}$ d $end{document} ) is large. The standard particle filter has been widely used in the literature to solve these intractable inference problems. It has excellent performance in low to moderate dimensions, but collapses in the high dimensional case. In this article, two new and advanced particle filters proposed in [ 4 ], named the space-time particle filter and the marginal space-time particle filter, are explored for these estimation problems. The better performance in both the accuracy and stability for the two advanced particle filters are shown using simulation and empirical studies in comparison with the standard particle filter. In addition, Bayesian static model parameter estimation problem is considered with the advances in particle Markov chain Monte Carlo methods. The particle marginal Metropolis-Hastings algorithm is applied together with the likelihood estimates from the space-time particle filter to infer the static model parameter successfully when that using the likelihood estimates from the standard particle filter fails.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43334711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Spectral methods to study the robustness of residual neural networks with infinite layers 用谱方法研究无穷层残差神经网络的鲁棒性
Q2 MATHEMATICS, APPLIED Pub Date : 2019-01-01 DOI: 10.3934/fods.2020012
T. Trimborn, Stephan Gerster, G. Visconti
Recently, neural networks (NN) with an infinite number of layers have been introduced. Especially for these very large NN the training procedure is very expensive. Hence, there is interest to study their robustness with respect to input data to avoid unnecessarily retraining the network. Typically, model-based statistical inference methods, e.g. Bayesian neural networks, are used to quantify uncertainties. Here, we consider a special class of residual neural networks and we study the case, when the number of layers can be arbitrarily large. Then, kinetic theory allows to interpret the network as a dynamical system, described by a partial differential equation. We study the robustness of the mean-field neural network with respect to perturbations in initial data by applying UQ approaches on the loss functions.
近年来,具有无限层数的神经网络(NN)被引入。特别是对于这些非常大的神经网络,训练过程是非常昂贵的。因此,有兴趣研究它们相对于输入数据的鲁棒性,以避免不必要的再训练网络。通常,基于模型的统计推理方法,如贝叶斯神经网络,被用来量化不确定性。在这里,我们考虑一类特殊的残差神经网络,并研究了当层数可以任意大时的情况。然后,动力学理论允许将网络解释为一个动力系统,用偏微分方程来描述。通过对损失函数应用UQ方法,研究了平均场神经网络对初始数据扰动的鲁棒性。
{"title":"Spectral methods to study the robustness of residual neural networks with infinite layers","authors":"T. Trimborn, Stephan Gerster, G. Visconti","doi":"10.3934/fods.2020012","DOIUrl":"https://doi.org/10.3934/fods.2020012","url":null,"abstract":"Recently, neural networks (NN) with an infinite number of layers have been introduced. Especially for these very large NN the training procedure is very expensive. Hence, there is interest to study their robustness with respect to input data to avoid unnecessarily retraining the network. Typically, model-based statistical inference methods, e.g. Bayesian neural networks, are used to quantify uncertainties. Here, we consider a special class of residual neural networks and we study the case, when the number of layers can be arbitrarily large. Then, kinetic theory allows to interpret the network as a dynamical system, described by a partial differential equation. We study the robustness of the mean-field neural network with respect to perturbations in initial data by applying UQ approaches on the loss functions.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70247997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Issues using logistic regression with class imbalance, with a case study from credit risk modelling 使用逻辑回归与阶级不平衡的问题,并以信用风险模型为例进行研究
Q2 MATHEMATICS, APPLIED Pub Date : 2019-01-01 DOI: 10.3934/fods.2019016
Yazhe Li, T. Bellotti, N. Adams
The class imbalance problem arises in two-class classification problems, when the less frequent (minority) class is observed much less than the majority class. This characteristic is endemic in many problems such as modeling default or fraud detection. Recent work by Owen [ 19 ] has shown that, in a theoretical context related to infinite imbalance, logistic regression behaves in such a way that all data in the rare class can be replaced by their mean vector to achieve the same coefficient estimates. We build on Owen's results to show the phenomenon remains true for both weighted and penalized likelihood methods. Such results suggest that problems may occur if there is structure within the rare class that is not captured by the mean vector. We demonstrate this problem and suggest a relabelling solution based on clustering the minority class. In a simulation and a real mortgage dataset, we show that logistic regression is not able to provide the best out-of-sample predictive performance and that an approach that is able to model underlying structure in the minority class is often superior.
类不平衡问题出现在两类分类问题中,当观察到频率较低的(少数)类比多数类少得多时。这个特征在许多问题中都很普遍,比如建模默认值或欺诈检测。Owen[19]最近的工作表明,在与无限不平衡相关的理论背景下,逻辑回归的行为方式是,所有罕见类中的数据都可以用它们的均值向量替换,以获得相同的系数估计。我们以欧文的结果为基础,表明这种现象对于加权和惩罚似然方法都是正确的。这样的结果表明,如果在稀有类中存在未被平均向量捕获的结构,则可能会出现问题。我们论证了这个问题,并提出了一种基于少数类聚类的重新标记解决方案。在模拟和真实抵押数据集中,我们表明逻辑回归无法提供最佳的样本外预测性能,并且能够在少数类别中建模底层结构的方法通常更优越。
{"title":"Issues using logistic regression with class imbalance, with a case study from credit risk modelling","authors":"Yazhe Li, T. Bellotti, N. Adams","doi":"10.3934/fods.2019016","DOIUrl":"https://doi.org/10.3934/fods.2019016","url":null,"abstract":"The class imbalance problem arises in two-class classification problems, when the less frequent (minority) class is observed much less than the majority class. This characteristic is endemic in many problems such as modeling default or fraud detection. Recent work by Owen [ 19 ] has shown that, in a theoretical context related to infinite imbalance, logistic regression behaves in such a way that all data in the rare class can be replaced by their mean vector to achieve the same coefficient estimates. We build on Owen's results to show the phenomenon remains true for both weighted and penalized likelihood methods. Such results suggest that problems may occur if there is structure within the rare class that is not captured by the mean vector. We demonstrate this problem and suggest a relabelling solution based on clustering the minority class. In a simulation and a real mortgage dataset, we show that logistic regression is not able to provide the best out-of-sample predictive performance and that an approach that is able to model underlying structure in the minority class is often superior.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70247842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Foundations of data science (Springfield, Mo.)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1