首页 > 最新文献

Journal of Machine Learning Research最新文献

英文 中文
Rethinking Nonlinear Instrumental Variable Models through Prediction Validity. 通过预测有效性反思非线性工具变量模型。
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2022-01-01
Chunxiao Li, Cynthia Rudin, Tyler H McCormick

Instrumental variables (IV) are widely used in the social and health sciences in situations where a researcher would like to measure a causal effect but cannot perform an experiment. For valid causal inference in an IV model, there must be external (exogenous) variation that (i) has a sufficiently large impact on the variable of interest (called the relevance assumption) and where (ii) the only pathway through which the external variation impacts the outcome is via the variable of interest (called the exclusion restriction). For statistical inference, researchers must also make assumptions about the functional form of the relationship between the three variables. Current practice assumes (i) and (ii) are met, then postulates a functional form with limited input from the data. In this paper, we describe a framework that leverages machine learning to validate these typically unchecked but consequential assumptions in the IV framework, providing the researcher empirical evidence about the quality of the instrument given the data at hand. Central to the proposed approach is the idea of prediction validity. Prediction validity checks that error terms - which should be independent from the instrument - cannot be modeled with machine learning any better than a model that is identically zero. We use prediction validity to develop both one-stage and two-stage approaches for IV, and demonstrate their performance on an example relevant to climate change policy.

工具变量(IV)在社会科学和健康科学中被广泛应用于研究人员想要测量因果效应但又无法进行实验的情况。要在 IV 模型中进行有效的因果推断,必须存在以下外部(外生)变量:(i) 对相关变量有足够大的影响(称为相关性假设);(ii) 外部变量影响结果的唯一途径是通过相关变量(称为排除限制)。为了进行统计推断,研究人员还必须对这三个变量之间关系的函数形式做出假设。目前的做法是先假设满足(i)和(ii),然后在数据输入有限的情况下假设函数形式。在本文中,我们描述了一个框架,该框架利用机器学习来验证 IV 框架中这些通常未被检查但却具有重要意义的假设,从而为研究人员提供有关手头数据下工具质量的经验证据。预测有效性是所提方法的核心。预测有效性检验了误差项(应独立于工具)的机器学习建模效果是否优于同为零的模型。我们利用预测有效性开发了单阶段和双阶段 IV 方法,并在一个与气候变化政策相关的例子中展示了它们的性能。
{"title":"Rethinking Nonlinear Instrumental Variable Models through Prediction Validity.","authors":"Chunxiao Li, Cynthia Rudin, Tyler H McCormick","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Instrumental variables (IV) are widely used in the social and health sciences in situations where a researcher would like to measure a causal effect but cannot perform an experiment. For valid causal inference in an IV model, there must be external (exogenous) variation that (i) has a sufficiently large impact on the variable of interest (called the <i>relevance assumption</i>) and where (ii) the only pathway through which the external variation impacts the outcome is via the variable of interest (called the <i>exclusion restriction</i>). For statistical inference, researchers must also make assumptions about the functional form of the relationship between the three variables. Current practice assumes (i) and (ii) are met, then postulates a functional form with limited input from the data. In this paper, we describe a framework that leverages machine learning to validate these typically unchecked but consequential assumptions in the IV framework, providing the researcher empirical evidence about the quality of the instrument given the data at hand. Central to the proposed approach is the idea of <i>prediction validity</i>. Prediction validity checks that error terms - which should be independent from the instrument - cannot be modeled with machine learning any better than a model that is identically zero. We use prediction validity to develop both one-stage and two-stage approaches for IV, and demonstrate their performance on an example relevant to climate change policy.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Covariate-Dependent Gaussian Graphical Models with Varying Structure. 具有变结构的贝叶斯协变量相关高斯图形模型。
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2022-01-01
Yang Ni, Francesco C Stingo, Veerabhadran Baladandayuthapani

We introduce Bayesian Gaussian graphical models with covariates (GGMx), a class of multivariate Gaussian distributions with covariate-dependent sparse precision matrix. We propose a general construction of a functional mapping from the covariate space to the cone of sparse positive definite matrices, which encompasses many existing graphical models for heterogeneous settings. Our methodology is based on a novel mixture prior for precision matrices with a non-local component that admits attractive theoretical and empirical properties. The flexible formulation of GGMx allows both the strength and the sparsity pattern of the precision matrix (hence the graph structure) change with the covariates. Posterior inference is carried out with a carefully designed Markov chain Monte Carlo algorithm, which ensures the positive definiteness of sparse precision matrices at any given covariates' values. Extensive simulations and a case study in cancer genomics demonstrate the utility of the proposed model.

我们引入了具有协变量的贝叶斯-高斯图形模型(GGMx),这是一类具有协变量相关稀疏精度矩阵的多变量高斯分布。我们提出了一个从协变空间到稀疏正定矩阵锥的函数映射的一般构造,它包含了许多现有的异构环境的图形模型。我们的方法基于具有非局部分量的精确矩阵的新的混合先验,该混合先验具有吸引人的理论和经验性质。GGMx的灵活公式允许精度矩阵(因此图结构)的强度和稀疏性模式随协变量而变化。后验推理是用精心设计的马尔可夫链蒙特卡罗算法进行的,该算法确保了稀疏精度矩阵在任何给定协变量值下的正定性。癌症基因组学的广泛模拟和案例研究证明了所提出的模型的实用性。
{"title":"Bayesian Covariate-Dependent Gaussian Graphical Models with Varying Structure.","authors":"Yang Ni,&nbsp;Francesco C Stingo,&nbsp;Veerabhadran Baladandayuthapani","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We introduce Bayesian Gaussian graphical models with covariates (GGMx), a class of multivariate Gaussian distributions with covariate-dependent sparse precision matrix. We propose a general construction of a functional mapping from the covariate space to the cone of sparse positive definite matrices, which encompasses many existing graphical models for heterogeneous settings. Our methodology is based on a novel mixture prior for precision matrices with a non-local component that admits attractive theoretical and empirical properties. The flexible formulation of GGMx allows both the strength and the sparsity pattern of the precision matrix (hence the graph structure) change with the covariates. Posterior inference is carried out with a carefully designed Markov chain Monte Carlo algorithm, which ensures the positive definiteness of sparse precision matrices at any given covariates' values. Extensive simulations and a case study in cancer genomics demonstrate the utility of the proposed model.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 242","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552903/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41161813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials. 序贯规则自适应试验个体化治疗规则的非渐近性质。
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2022-01-01
Daiqi Gao, Yufeng Liu, Donglin Zeng

Learning optimal individualized treatment rules (ITRs) has become increasingly important in the modern era of precision medicine. Many statistical and machine learning methods for learning optimal ITRs have been developed in the literature. However, most existing methods are based on data collected from traditional randomized controlled trials and thus cannot take advantage of the accumulative evidence when patients enter the trials sequentially. It is also ethically important that future patients should have a high probability to be treated optimally based on the updated knowledge so far. In this work, we propose a new design called sequentially rule-adaptive trials to learn optimal ITRs based on the contextual bandit framework, in contrast to the response-adaptive design in traditional adaptive trials. In our design, each entering patient will be allocated with a high probability to the current best treatment for this patient, which is estimated using the past data based on some machine learning algorithm (for example, outcome weighted learning in our implementation). We explore the tradeoff between training and test values of the estimated ITR in single-stage problems by proving theoretically that for a higher probability of following the estimated ITR, the training value converges to the optimal value at a faster rate, while the test value converges at a slower rate. This problem is different from traditional decision problems in the sense that the training data are generated sequentially and are dependent. We also develop a tool that combines martingale with empirical process to tackle the problem that cannot be solved by previous techniques for i.i.d. data. We show by numerical examples that without much loss of the test value, our proposed algorithm can improve the training value significantly as compared to existing methods. Finally, we use a real data study to illustrate the performance of the proposed method.

在现代精准医学时代,学习最佳个体化治疗规则(ITRs)变得越来越重要。在文献中已经开发了许多用于学习最优itr的统计和机器学习方法。然而,现有的方法大多是基于传统的随机对照试验收集的数据,无法利用患者顺序进入试验时累积的证据。根据目前的最新知识,未来的患者应该有很高的概率得到最佳治疗,这在伦理上也很重要。在这项工作中,我们提出了一种新的设计,称为顺序规则自适应试验,以学习基于上下文强盗框架的最优itr,而不是传统自适应试验中的响应自适应设计。在我们的设计中,每个进入的患者将以高概率分配给该患者当前的最佳治疗,这是使用基于某些机器学习算法的过去数据来估计的(例如,我们实现中的结果加权学习)。我们通过从理论上证明,对于遵循估计ITR的概率越高,训练值收敛到最优值的速度越快,而测试值收敛的速度越慢,从而探讨了单阶段问题中估计ITR的训练值和测试值之间的权衡。这个问题与传统的决策问题不同,因为训练数据是顺序生成的,并且是相互依赖的。我们还开发了一个将鞅与经验过程相结合的工具,以解决以前的i.i.d数据技术无法解决的问题。通过算例表明,在不损失测试值的情况下,与现有方法相比,本文提出的算法可以显著提高训练值。最后,我们用一个真实的数据研究来说明所提出的方法的性能。
{"title":"Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials.","authors":"Daiqi Gao,&nbsp;Yufeng Liu,&nbsp;Donglin Zeng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Learning optimal individualized treatment rules (ITRs) has become increasingly important in the modern era of precision medicine. Many statistical and machine learning methods for learning optimal ITRs have been developed in the literature. However, most existing methods are based on data collected from traditional randomized controlled trials and thus cannot take advantage of the accumulative evidence when patients enter the trials sequentially. It is also ethically important that future patients should have a high probability to be treated optimally based on the updated knowledge so far. In this work, we propose a new design called sequentially rule-adaptive trials to learn optimal ITRs based on the contextual bandit framework, in contrast to the response-adaptive design in traditional adaptive trials. In our design, each entering patient will be allocated with a high probability to the current best treatment for this patient, which is estimated using the past data based on some machine learning algorithm (for example, outcome weighted learning in our implementation). We explore the tradeoff between training and test values of the estimated ITR in single-stage problems by proving theoretically that for a higher probability of following the estimated ITR, the training value converges to the optimal value at a faster rate, while the test value converges at a slower rate. This problem is different from traditional decision problems in the sense that the training data are generated sequentially and are dependent. We also develop a tool that combines martingale with empirical process to tackle the problem that cannot be solved by previous techniques for i.i.d. data. We show by numerical examples that without much loss of the test value, our proposed algorithm can improve the training value significantly as compared to existing methods. Finally, we use a real data study to illustrate the performance of the proposed method.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 250","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10419117/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10008225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data. D-GCCA:基于分解的多视角高维数据广义典范相关分析。
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2022-01-01
Hai Shu, Zhe Qu, Hongtu Zhu

Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A popular model in high-dimensional multi-view data analysis is to decompose each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views, a low-rank distinctive-source matrix corresponding to each view, and an additive noise matrix. We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA). The D-GCCA rigorously defines the decomposition on the L 2 space of random variables in contrast to the Euclidean dot product space used by most existing methods, thereby being able to provide the estimation consistency for the low-rank matrix recovery. Moreover, to well calibrate common latent factors, we impose a desirable orthogonality constraint on distinctive latent factors. Existing methods, however, inadequately consider such orthogonality and may thus suffer from substantial loss of undetected common-source variation. Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables, while enjoying an appealing interpretation from the perspective of principal component analysis. Furthermore, we propose to use the variable-level proportion of signal variance explained by common or distinctive latent factors for selecting the variables most influenced. Consistent estimators of our D-GCCA method are established with good finite-sample numerical performance, and have closed-form expressions leading to efficient computation especially for large-scale data. The superiority of D-GCCA over state-of-the-art methods is also corroborated in simulations and real-world data examples.

现代生物医学研究经常收集多视图数据,即对同一组对象测量的多种类型数据。高维多视图数据分析中的一种流行模型是将每个视图的数据矩阵分解为由所有数据视图中共同的潜在因子生成的低阶共源矩阵、与每个视图相对应的低阶独特源矩阵以及加性噪声矩阵。我们为此模型提出了一种新颖的分解方法,称为基于分解的广义典型相关分析(D-GCCA)。与大多数现有方法使用的欧几里得点积空间不同,D-GCCA 在随机变量的 L 2 空间上严格定义了分解,因此能为低阶矩阵恢复提供估计一致性。此外,为了很好地校准共同潜因,我们对不同的潜因施加了理想的正交性约束。然而,现有的方法没有充分考虑到这种正交性,因此可能会导致大量未检测到的共源变异损失。我们的 D-GCCA 比广义典型相关分析更进了一步,它在典型变量中分离了共同成分和独特成分,同时从主成分分析的角度进行了有吸引力的解释。此外,我们还建议使用由共同或独特潜在因素解释的信号方差的变量级比例来选择受影响最大的变量。我们的 D-GCCA 方法建立了一致的估计值,具有良好的有限样本数值性能,并且具有闭式表达式,特别适合大规模数据的高效计算。模拟和实际数据实例也证实了 D-GCCA 方法优于最先进的方法。
{"title":"D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data.","authors":"Hai Shu, Zhe Qu, Hongtu Zhu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A popular model in high-dimensional multi-view data analysis is to decompose each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views, a low-rank distinctive-source matrix corresponding to each view, and an additive noise matrix. We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA). The D-GCCA rigorously defines the decomposition on the <math> <mrow><msup><mi>L</mi> <mn>2</mn></msup> </mrow> </math> space of random variables in contrast to the Euclidean dot product space used by most existing methods, thereby being able to provide the estimation consistency for the low-rank matrix recovery. Moreover, to well calibrate common latent factors, we impose a desirable orthogonality constraint on distinctive latent factors. Existing methods, however, inadequately consider such orthogonality and may thus suffer from substantial loss of undetected common-source variation. Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables, while enjoying an appealing interpretation from the perspective of principal component analysis. Furthermore, we propose to use the variable-level proportion of signal variance explained by common or distinctive latent factors for selecting the variables most influenced. Consistent estimators of our D-GCCA method are established with good finite-sample numerical performance, and have closed-form expressions leading to efficient computation especially for large-scale data. The superiority of D-GCCA over state-of-the-art methods is also corroborated in simulations and real-world data examples.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9380864/pdf/nihms-1815754.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10468609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable Classification of Categorical Time Series Using the Spectral Envelope and Optimal Scalings. 使用谱包络和最优标度的分类时间序列的可解释分类。
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2022-01-01
Zeda Li, Scott A Bruce, Tian Cai

This article introduces a novel approach to the classification of categorical time series under the supervised learning paradigm. To construct meaningful features for categorical time series classification, we consider two relevant quantities: the spectral envelope and its corresponding set of optimal scalings. These quantities characterize oscillatory patterns in a categorical time series as the largest possible power at each frequency, or spectral envelope, obtained by assigning numerical values, or scalings, to categories that optimally emphasize oscillations at each frequency. Our procedure combines these two quantities to produce an interpretable and parsimonious feature-based classifier that can be used to accurately determine group membership for categorical time series. Classification consistency of the proposed method is investigated, and simulation studies are used to demonstrate accuracy in classifying categorical time series with various underlying group structures. Finally, we use the proposed method to explore key differences in oscillatory patterns of sleep stage time series for patients with different sleep disorders and accurately classify patients accordingly. The code for implementing the proposed method is available at https://github.com/zedali16/envsca.

本文介绍了一种在监督学习范式下分类时间序列的新方法。为了构造对分类时间序列分类有意义的特征,我们考虑了两个相关的量:谱包络及其相应的最优尺度集。这些量将分类时间序列中的振荡模式表征为每个频率或频谱包络的最大可能功率,通过分配数值或缩放来获得,以最优地强调每个频率的振荡。我们的程序将这两个量结合起来,产生一个可解释且简洁的基于特征的分类器,可用于准确确定分类时间序列的组成员关系。研究了该方法的分类一致性,并用仿真研究证明了该方法对具有不同底层群结构的分类时间序列进行分类的准确性。最后,我们使用该方法探索不同睡眠障碍患者睡眠阶段时间序列振荡模式的关键差异,并据此对患者进行准确分类。实现所建议的方法的代码可在https://github.com/zedali16/envsca上获得。
{"title":"Interpretable Classification of Categorical Time Series Using the Spectral Envelope and Optimal Scalings.","authors":"Zeda Li,&nbsp;Scott A Bruce,&nbsp;Tian Cai","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This article introduces a novel approach to the classification of categorical time series under the supervised learning paradigm. To construct meaningful features for categorical time series classification, we consider two relevant quantities: the spectral envelope and its corresponding set of optimal scalings. These quantities characterize oscillatory patterns in a categorical time series as the largest possible power at each frequency, or <i>spectral envelope</i>, obtained by assigning numerical values, or <i>scalings</i>, to categories that optimally emphasize oscillations at each frequency. Our procedure combines these two quantities to produce an interpretable and parsimonious feature-based classifier that can be used to accurately determine group membership for categorical time series. Classification consistency of the proposed method is investigated, and simulation studies are used to demonstrate accuracy in classifying categorical time series with various underlying group structures. Finally, we use the proposed method to explore key differences in oscillatory patterns of sleep stage time series for patients with different sleep disorders and accurately classify patients accordingly. The code for implementing the proposed method is available at https://github.com/zedali16/envsca.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 299","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10210597/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9529646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian subset selection and variable importance for interpretable prediction and classification. 用于可解释预测和分类的贝叶斯子集选择和变量重要性。
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2022-01-01
Daniel R Kowal

Subset selection is a valuable tool for interpretable learning, scientific discovery, and data compression. However, classical subset selection is often avoided due to selection instability, lack of regularization, and difficulties with post-selection inference. We address these challenges from a Bayesian perspective. Given any Bayesian predictive model , we extract a family of near-optimal subsets of variables for linear prediction or classification. This strategy deemphasizes the role of a single "best" subset and instead advances the broader perspective that often many subsets are highly competitive. The acceptable family of subsets offers a new pathway for model interpretation and is neatly summarized by key members such as the smallest acceptable subset, along with new (co-) variable importance metrics based on whether variables (co-) appear in all, some, or no acceptable subsets. More broadly, we apply Bayesian decision analysis to derive the optimal linear coefficients for any subset of variables. These coefficients inherit both regularization and predictive uncertainty quantification via . For both simulated and real data, the proposed approach exhibits better prediction, interval estimation, and variable selection than competing Bayesian and frequentist selection methods. These tools are applied to a large education dataset with highly correlated covariates. Our analysis provides unique insights into the combination of environmental, socioeconomic, and demographic factors that predict educational outcomes, and identifies over 200 distinct subsets of variables that offer near-optimal out-of-sample predictive accuracy.

子集选择是可解释学习、科学发现和数据压缩的重要工具。然而,由于选择的不稳定性、缺乏正则化以及选择后推理的困难,经典的子集选择常常被回避。我们从贝叶斯的角度来解决这些难题。给定任何贝叶斯预测模型ℳ,我们就能为线性预测或分类提取一系列近乎最优的变量子集。这一策略不再强调单一 "最佳 "子集的作用,而是从更广阔的视角出发,认为许多子集往往具有很强的竞争力。可接受子集系列为模型解释提供了一条新途径,其主要成员(如最小可接受子集)以及新的(共同)变量重要性度量(基于变量(共同)是否出现在所有、部分或无可接受子集中)均可清晰概括。更广义地说,我们应用贝叶斯决策分析为任何变量子集推导出最优线性系数。这些系数通过ℳ继承了正则化和预测不确定性量化。对于模拟数据和真实数据,所提出的方法在预测、区间估计和变量选择方面都优于其他贝叶斯和频数选择方法。这些工具被应用于具有高度相关协变量的大型教育数据集。我们的分析为预测教育结果的环境、社会经济和人口因素组合提供了独特的见解,并确定了 200 多个不同的变量子集,这些变量子集提供了接近最优的样本外预测准确性。
{"title":"Bayesian subset selection and variable importance for interpretable prediction and classification.","authors":"Daniel R Kowal","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Subset selection is a valuable tool for interpretable learning, scientific discovery, and data compression. However, classical subset selection is often avoided due to selection instability, lack of regularization, and difficulties with post-selection inference. We address these challenges from a Bayesian perspective. Given any Bayesian predictive model <math><mi>ℳ</mi></math>, we extract a <i>family</i> of near-optimal subsets of variables for linear prediction or classification. This strategy deemphasizes the role of a single \"best\" subset and instead advances the broader perspective that often many subsets are highly competitive. The <i>acceptable family</i> of subsets offers a new pathway for model interpretation and is neatly summarized by key members such as the smallest acceptable subset, along with new (co-) variable importance metrics based on whether variables (co-) appear in all, some, or no acceptable subsets. More broadly, we apply Bayesian decision analysis to derive the optimal linear coefficients for <i>any</i> subset of variables. These coefficients inherit both regularization and predictive uncertainty quantification via <math><mi>ℳ</mi></math>. For both simulated and real data, the proposed approach exhibits better prediction, interval estimation, and variable selection than competing Bayesian and frequentist selection methods. These tools are applied to a large education dataset with highly correlated covariates. Our analysis provides unique insights into the combination of environmental, socioeconomic, and demographic factors that predict educational outcomes, and identifies over 200 distinct subsets of variables that offer near-optimal out-of-sample predictive accuracy.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10723825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138811860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A flexible model-free prediction-based framework for feature ranking. 一个灵活的、无模型的、基于预测的特征排序框架。
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2021-05-01
Jingyi Jessica Li, Yiling Elaine Chen, Xin Tong

Despite the availability of numerous statistical and machine learning tools for joint feature modeling, many scientists investigate features marginally, i.e., one feature at a time. This is partly due to training and convention but also roots in scientists' strong interests in simple visualization and interpretability. As such, marginal feature ranking for some predictive tasks, e.g., prediction of cancer driver genes, is widely practiced in the process of scientific discoveries. In this work, we focus on marginal ranking for binary classification, one of the most common predictive tasks. We argue that the most widely used marginal ranking criteria, including the Pearson correlation, the two-sample t test, and two-sample Wilcoxon rank-sum test, do not fully take feature distributions and prediction objectives into account. To address this gap in practice, we propose two ranking criteria corresponding to two prediction objectives: the classical criterion (CC) and the Neyman-Pearson criterion (NPC), both of which use model-free nonparametric implementations to accommodate diverse feature distributions. Theoretically, we show that under regularity conditions, both criteria achieve sample-level ranking that is consistent with their population-level counterpart with high probability. Moreover, NPC is robust to sampling bias when the two class proportions in a sample deviate from those in the population. This property endows NPC good potential in biomedical research where sampling biases are ubiquitous. We demonstrate the use and relative advantages of CC and NPC in simulation and real data studies. Our model-free objective-based ranking idea is extendable to ranking feature subsets and generalizable to other prediction tasks and learning objectives.

尽管有许多统计和机器学习工具可用于联合特征建模,但许多科学家对特征进行了边缘研究,即一次研究一个特征。这部分是由于训练和惯例,但也源于科学家对简单可视化和可解释性的强烈兴趣。因此,在科学发现的过程中,对某些预测任务(如癌症驱动基因的预测)的边缘特征排序被广泛应用。在这项工作中,我们专注于二元分类的边缘排序,这是最常见的预测任务之一。我们认为,最广泛使用的边际排序标准,包括Pearson相关性、两样本t检验和两样本Wilcoxon秩和检验,没有充分考虑特征分布和预测目标。为了解决实践中的这一差距,我们提出了两个与两个预测目标相对应的排名标准:经典标准(CC)和Neyman-Pearson标准(NPC),两者都使用无模型非参数实现来适应不同的特征分布。从理论上讲,我们证明了在规则条件下,这两个标准都以高概率实现了与其总体水平对应的样本水平排名一致。此外,当样本中的两个类别比例偏离总体时,NPC对抽样偏差具有鲁棒性。这一特性使NPC在抽样偏差普遍存在的生物医学研究中具有良好的潜力。我们展示了CC和NPC在仿真和实际数据研究中的使用及其相对优势。我们的无模型的基于目标的排序思想可以扩展到对特征子集进行排序,并且可以推广到其他预测任务和学习目标。
{"title":"A flexible model-free prediction-based framework for feature ranking.","authors":"Jingyi Jessica Li,&nbsp;Yiling Elaine Chen,&nbsp;Xin Tong","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Despite the availability of numerous statistical and machine learning tools for joint feature modeling, many scientists investigate features marginally, i.e., one feature at a time. This is partly due to training and convention but also roots in scientists' strong interests in simple visualization and interpretability. As such, marginal feature ranking for some predictive tasks, e.g., prediction of cancer driver genes, is widely practiced in the process of scientific discoveries. In this work, we focus on marginal ranking for binary classification, one of the most common predictive tasks. We argue that the most widely used marginal ranking criteria, including the Pearson correlation, the two-sample <i>t</i> test, and two-sample Wilcoxon rank-sum test, do not fully take feature distributions and prediction objectives into account. To address this gap in practice, we propose two ranking criteria corresponding to two prediction objectives: the classical criterion (CC) and the Neyman-Pearson criterion (NPC), both of which use model-free nonparametric implementations to accommodate diverse feature distributions. Theoretically, we show that under regularity conditions, both criteria achieve sample-level ranking that is consistent with their population-level counterpart with high probability. Moreover, NPC is robust to sampling bias when the two class proportions in a sample deviate from those in the population. This property endows NPC good potential in biomedical research where sampling biases are ubiquitous. We demonstrate the use and relative advantages of CC and NPC in simulation and real data studies. Our model-free objective-based ranking idea is extendable to ranking feature subsets and generalizable to other prediction tasks and learning objectives.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"22 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8939838/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10265462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints. 数据共享约束下的异质性整合高维多重测试
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2021-04-01
Molei Liu, Yin Xia, Kelly Cho, Tianxi Cai

Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multiple studies which address the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between-study heterogeneity. The challenge is even more pronounced with additional data sharing constraints under which only summary data can be shared across different sites. In this paper, we propose a novel data shielding integrative large-scale testing (DSILT) approach to signal detection allowing between-study heterogeneity and not requiring the sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the proposed method incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling the false discovery rate (FDR) and false discovery proportion (FDP). Theoretical comparisons of the new testing procedure with the ideal individual-level meta-analysis (ILMA) approach and other distributed inference methods are investigated. Simulation studies demonstrate that the proposed testing procedure performs well in both controlling false discovery and attaining power. The new method is applied to a real example detecting interaction effects of the genetic variants for statins and obesity on the risk for type II diabetes.

在高维回归模型中识别有信息量的预测因子是关联分析和预测建模的关键步骤。由于样本量有限,高维环境下的信号检测往往会失败。提高分析能力的一种方法是对涉及同一科学问题的多项研究进行荟萃分析。然而,在存在研究间异质性的情况下,对来自多项研究的高维数据进行综合分析具有挑战性。在额外的数据共享限制条件下,不同研究地点之间只能共享摘要数据,因此这一挑战就更加突出。在本文中,我们提出了一种新颖的数据屏蔽集成大规模测试(DSILT)方法来进行信号检测,这种方法允许研究间异质性,而且不需要共享个体水平的数据。假设不同研究的基础高维数据回归模型各不相同,但具有相似的支持,所提出的方法结合了适当的整合估计和去杂程序,以构建特定协变量总体效应的检验统计量。我们还开发了多重检验程序,在控制误发现率(FDR)和误发现比例(FDP)的同时识别显著效应。我们研究了新测试程序与理想个体水平荟萃分析(ILMA)方法和其他分布式推断方法的理论比较。模拟研究表明,建议的测试程序在控制误发现率和获得功率方面都表现出色。新方法被应用于一个实际例子,检测他汀类药物和肥胖的遗传变异对 II 型糖尿病风险的交互效应。
{"title":"Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints.","authors":"Molei Liu, Yin Xia, Kelly Cho, Tianxi Cai","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multiple studies which address the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between-study heterogeneity. The challenge is even more pronounced with additional data sharing constraints under which only summary data can be shared across different sites. In this paper, we propose a novel data shielding integrative large-scale testing (DSILT) approach to signal detection allowing between-study heterogeneity and not requiring the sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the proposed method incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling the false discovery rate (FDR) and false discovery proportion (FDP). Theoretical comparisons of the new testing procedure with the ideal individual-level meta-analysis (ILMA) approach and other distributed inference methods are investigated. Simulation studies demonstrate that the proposed testing procedure performs well in both controlling false discovery and attaining power. The new method is applied to a real example detecting interaction effects of the genetic variants for statins and obesity on the risk for type II diabetes.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"22 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327421/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9811440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference for Multiple Heterogeneous Networks with a Common Invariant Subspace. 具有共同不变子空间的多个异构网络的推理。
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2021-03-01
Jesús Arroyo, Avanti Athreya, Joshua Cape, Guodong Chen, Carey E Priebe, Joshua T Vogelstein

The development of models and methodology for the analysis of data from multiple heterogeneous networks is of importance both in statistical network theory and across a wide spectrum of application domains. Although single-graph analysis is well-studied, multiple graph inference is largely unexplored, in part because of the challenges inherent in appropriately modeling graph differences and yet retaining sufficient model simplicity to render estimation feasible. This paper addresses exactly this gap, by introducing a new model, the common subspace independent-edge multiple random graph model, which describes a heterogeneous collection of networks with a shared latent structure on the vertices but potentially different connectivity patterns for each graph. The model encompasses many popular network representations, including the stochastic blockmodel. The model is both flexible enough to meaningfully account for important graph differences, and tractable enough to allow for accurate inference in multiple networks. In particular, a joint spectral embedding of adjacency matrices-the multiple adjacency spectral embedding-leads to simultaneous consistent estimation of underlying parameters for each graph. Under mild additional assumptions, the estimates satisfy asymptotic normality and yield improvements for graph eigenvalue estimation. In both simulated and real data, the model and the embedding can be deployed for a number of subsequent network inference tasks, including dimensionality reduction, classification, hypothesis testing, and community detection. Specifically, when the embedding is applied to a data set of connectomes constructed through diffusion magnetic resonance imaging, the result is an accurate classification of brain scans by human subject and a meaningful determination of heterogeneity across scans of different individuals.

开发用于分析来自多个异构网络的数据的模型和方法在统计网络理论和广泛的应用领域中都具有重要意义。虽然单图分析已被广泛研究,但多图推断在很大程度上还未被探索,部分原因是在对图差异进行适当建模的同时又要保持足够的模型简洁性以保证估算的可行性所面临的固有挑战。本文正是为了弥补这一不足,引入了一个新模型--公共子空间独立边多随机图模型,该模型描述了具有共享顶点潜在结构但每个图的连接模式可能不同的异构网络集合。该模型涵盖了许多流行的网络表示法,包括随机块模型。该模型既具有足够的灵活性,可以有意义地解释重要的图差异,又具有足够的可操作性,可以在多个网络中进行精确推断。特别是,邻接矩阵的联合谱嵌入--多邻接谱嵌入--可同时一致地估计每个图的基本参数。在温和的附加假设条件下,估计值满足渐近正态性,并改进了图特征值估计。在模拟数据和真实数据中,该模型和嵌入可用于一系列后续网络推断任务,包括降维、分类、假设检验和群落检测。具体来说,当嵌入应用于通过扩散磁共振成像构建的连接组数据集时,结果是按人类主体对大脑扫描进行了准确分类,并对不同个体扫描的异质性做出了有意义的判断。
{"title":"Inference for Multiple Heterogeneous Networks with a Common Invariant Subspace.","authors":"Jesús Arroyo, Avanti Athreya, Joshua Cape, Guodong Chen, Carey E Priebe, Joshua T Vogelstein","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The development of models and methodology for the analysis of data from multiple heterogeneous networks is of importance both in statistical network theory and across a wide spectrum of application domains. Although single-graph analysis is well-studied, multiple graph inference is largely unexplored, in part because of the challenges inherent in appropriately modeling graph differences and yet retaining sufficient model simplicity to render estimation feasible. This paper addresses exactly this gap, by introducing a new model, the common subspace independent-edge multiple random graph model, which describes a heterogeneous collection of networks with a shared latent structure on the vertices but potentially different connectivity patterns for each graph. The model encompasses many popular network representations, including the stochastic blockmodel. The model is both flexible enough to meaningfully account for important graph differences, and tractable enough to allow for accurate inference in multiple networks. In particular, a joint spectral embedding of adjacency matrices-the multiple adjacency spectral embedding-leads to simultaneous consistent estimation of underlying parameters for each graph. Under mild additional assumptions, the estimates satisfy asymptotic normality and yield improvements for graph eigenvalue estimation. In both simulated and real data, the model and the embedding can be deployed for a number of subsequent network inference tasks, including dimensionality reduction, classification, hypothesis testing, and community detection. Specifically, when the embedding is applied to a data set of connectomes constructed through diffusion magnetic resonance imaging, the result is an accurate classification of brain scans by human subject and a meaningful determination of heterogeneity across scans of different individuals.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"22 141","pages":"1-49"},"PeriodicalIF":4.3,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8513708/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39543833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating Uncertainty Intervals from Collaborating Networks. 从协作网络中估算不确定性区间。
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2021-01-01
Tianhui Zhou, Yitong Li, Yuan Wu, David Carlson

Effective decision making requires understanding the uncertainty inherent in a prediction. In regression, this uncertainty can be estimated by a variety of methods; however, many of these methods are laborious to tune, generate overconfident uncertainty intervals, or lack sharpness (give imprecise intervals). We address these challenges by proposing a novel method to capture predictive distributions in regression by defining two neural networks with two distinct loss functions. Specifically, one network approximates the cumulative distribution function, and the second network approximates its inverse. We refer to this method as Collaborating Networks (CN). Theoretical analysis demonstrates that a fixed point of the optimization is at the idealized solution, and that the method is asymptotically consistent to the ground truth distribution. Empirically, learning is straightforward and robust. We benchmark CN against several common approaches on two synthetic and six real-world datasets, including forecasting A1c values in diabetic patients from electronic health records, where uncertainty is critical. In the synthetic data, the proposed approach essentially matches ground truth. In the real-world datasets, CN improves results on many performance metrics, including log-likelihood estimates, mean absolute errors, coverage estimates, and prediction interval widths.

有效的决策需要了解预测中固有的不确定性。在回归中,这种不确定性可以通过多种方法进行估算;然而,其中许多方法在调整时非常费力,会产生过于自信的不确定性区间,或者缺乏锐度(给出不精确的区间)。为了应对这些挑战,我们提出了一种在回归中捕捉预测分布的新方法,即定义两个具有两种不同损失函数的神经网络。具体来说,一个网络逼近累积分布函数,第二个网络逼近其逆分布函数。我们将这种方法称为协作网络(CN)。理论分析表明,优化的固定点位于理想化解,而且该方法与地面实况分布渐近一致。从经验上看,学习是直接而稳健的。我们在两个合成数据集和六个真实数据集上,将 CN 与几种常见方法进行了比较,包括预测电子健康记录中糖尿病患者的 A1c 值,其中不确定性是至关重要的。在合成数据中,所提出的方法与地面实况基本吻合。在真实世界数据集中,CN 提高了许多性能指标,包括对数似然估计、平均绝对误差、覆盖估计和预测区间宽度。
{"title":"Estimating Uncertainty Intervals from Collaborating Networks.","authors":"Tianhui Zhou, Yitong Li, Yuan Wu, David Carlson","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Effective decision making requires understanding the uncertainty inherent in a prediction. In regression, this uncertainty can be estimated by a variety of methods; however, many of these methods are laborious to tune, generate overconfident uncertainty intervals, or lack sharpness (give imprecise intervals). We address these challenges by proposing a novel method to capture predictive distributions in regression by defining two neural networks with two distinct loss functions. Specifically, one network approximates the cumulative distribution function, and the second network approximates its inverse. We refer to this method as Collaborating Networks (CN). Theoretical analysis demonstrates that a fixed point of the optimization is at the idealized solution, and that the method is asymptotically consistent to the ground truth distribution. Empirically, learning is straightforward and robust. We benchmark CN against several common approaches on two synthetic and six real-world datasets, including forecasting A1c values in diabetic patients from electronic health records, where uncertainty is critical. In the synthetic data, the proposed approach essentially matches ground truth. In the real-world datasets, CN improves results on many performance metrics, including log-likelihood estimates, mean absolute errors, coverage estimates, and prediction interval widths.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"22 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9231643/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9138923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Machine Learning Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1