{"title":"Set Identification and Estimation of Factor and Topic Models","authors":"C. Adams","doi":"10.2139/ssrn.2685218","DOIUrl":null,"url":null,"abstract":"The paper presents sharp bounds on the identified set for classical factor models and non-parametric topic models based on results from the non-negative factorization literature. It compares the standard assumption (for factor models) of orthonormality of the factors (principal components analysis) to the \"natural\" assumption of topic models of additivity and non-negativity. For the former, the model is point identified when the number of factors is \"small\" but further restrictions such as those presented in Bai and Ng (2013) are needed to identify larger models. Under the latter, the paper characterizes the identified set and shows the necessary condition for point identification presented in Huang et al (2013) is also sufficient. In the two factor case this condition states that for each latent factor there must be some asset whose return gives it zero weight and there must be some time periods where each factor's normalized return is zero. These \"sparsity\" conditions are characteristics of the observed data, not assumptions on the data generating process. The paper presents a \"least squares\" estimator where the number of parameters to be estimated is not increasing in the size of the data set. The paper shows that this estimator is consistent both when the number time periods increases in the factor model and when the number of documents increases in the topic model. Unlike the similar estimator presented in the classical factor model literature (Stock and Watson (2002), Bai (2003)) this estimator does not rely on orthonormality.","PeriodicalId":11744,"journal":{"name":"ERN: Nonparametric Methods (Topic)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Nonparametric Methods (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.2685218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The paper presents sharp bounds on the identified set for classical factor models and non-parametric topic models based on results from the non-negative factorization literature. It compares the standard assumption (for factor models) of orthonormality of the factors (principal components analysis) to the "natural" assumption of topic models of additivity and non-negativity. For the former, the model is point identified when the number of factors is "small" but further restrictions such as those presented in Bai and Ng (2013) are needed to identify larger models. Under the latter, the paper characterizes the identified set and shows the necessary condition for point identification presented in Huang et al (2013) is also sufficient. In the two factor case this condition states that for each latent factor there must be some asset whose return gives it zero weight and there must be some time periods where each factor's normalized return is zero. These "sparsity" conditions are characteristics of the observed data, not assumptions on the data generating process. The paper presents a "least squares" estimator where the number of parameters to be estimated is not increasing in the size of the data set. The paper shows that this estimator is consistent both when the number time periods increases in the factor model and when the number of documents increases in the topic model. Unlike the similar estimator presented in the classical factor model literature (Stock and Watson (2002), Bai (2003)) this estimator does not rely on orthonormality.
本文基于非负因子分解文献的结果,给出了经典因子模型和非参数主题模型的识别集的明确界限。将因子(主成分分析)正交性的标准假设与可加性和非负性的主题模型的“自然”假设进行了比较。对于前者,模型是在因素数量“小”时确定的,但需要进一步的限制,如Bai和Ng(2013)提出的限制,以确定更大的模型。在后一种情况下,本文对识别集进行了表征,并证明Huang et al(2013)提出的点识别的必要条件也是充分的。在两个因素的情况下,这个条件表明,对于每个潜在因素,必须有一些资产的回报使其权重为零,并且必须有一些时间段,每个因素的标准化回报为零。这些“稀疏性”条件是观测数据的特征,而不是数据生成过程的假设。本文提出了一种“最小二乘”估计量,其中待估计参数的数量不随数据集的大小而增加。研究表明,无论在因子模型中时间段数量增加,还是在主题模型中文档数量增加时,该估计量都是一致的。与经典因子模型文献(Stock and Watson (2002), Bai(2003))中提出的类似估计量不同,该估计量不依赖于正交性。