{"title":"连续因变量的乘性模型:未记录与记录形式的估计","authors":"Trond Petersen","doi":"10.1177/0081175017730108","DOIUrl":null,"url":null,"abstract":"In regression analysis with a continuous and positive dependent variable, a multiplicative relationship between the unlogged dependent variable and the independent variables is often specified. It can then be estimated on its unlogged or logged form. The two procedures may yield major differences in estimates, even opposite signs. The reason is that estimation on the unlogged form yields coefficients for the relative arithmetic mean of the unlogged dependent variable, whereas estimation on the logged form gives coefficients for the relative geometric mean for the unlogged dependent variable (or for absolute differences in the arithmetic mean of the logged dependent variable). Estimated coefficients from the two forms may therefore vary widely, because of their different foci, relative arithmetic versus relative geometric means. The first goal of this article is to explain why major divergencies in coefficients can occur. Although well understood in the statistical literature, this is not widely understood in sociological research, and it is hence of significant practical interest. The second goal is to derive conditions under which divergencies will not occur, where estimation on the logged form will give unbiased estimators for relative arithmetic means. First, it derives the necessary and sufficient conditions for when estimation on the logged form will give unbiased estimators for the parameters for the relative arithmetic mean. This requires not only that there is arithmetic mean independence of the unlogged error term but that there is also geometric mean independence. Second, it shows that statistical independence of the error terms on regressors implies that there is both arithmetic and geometric mean independence for the error terms, and it is hence a sufficient condition for absence of bias. Third, it shows that although statistical independence is a sufficient condition, it is not a necessary one for lack of bias. Fourth, it demonstrates that homoskedasticity of error terms is neither a necessary nor a sufficient condition for absence of bias. Fifth, it shows that in the semi-logarithmic specification, for a logged error term with the same qualitative distributional shape at each value of independent variables (e.g., normal), arithmetic mean independence, but heteroskedasticity, estimation on the logged form will give biased estimators for the parameters for the arithmetic mean (whereas with homoskedasticity, and for this case thus statistical independence, estimators are unbiased, from the second result above).","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":"47 1","pages":"113 - 164"},"PeriodicalIF":2.4000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/0081175017730108","citationCount":"15","resultStr":"{\"title\":\"Multiplicative Models For Continuous Dependent Variables: Estimation on Unlogged versus Logged Form\",\"authors\":\"Trond Petersen\",\"doi\":\"10.1177/0081175017730108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In regression analysis with a continuous and positive dependent variable, a multiplicative relationship between the unlogged dependent variable and the independent variables is often specified. It can then be estimated on its unlogged or logged form. The two procedures may yield major differences in estimates, even opposite signs. The reason is that estimation on the unlogged form yields coefficients for the relative arithmetic mean of the unlogged dependent variable, whereas estimation on the logged form gives coefficients for the relative geometric mean for the unlogged dependent variable (or for absolute differences in the arithmetic mean of the logged dependent variable). Estimated coefficients from the two forms may therefore vary widely, because of their different foci, relative arithmetic versus relative geometric means. The first goal of this article is to explain why major divergencies in coefficients can occur. Although well understood in the statistical literature, this is not widely understood in sociological research, and it is hence of significant practical interest. The second goal is to derive conditions under which divergencies will not occur, where estimation on the logged form will give unbiased estimators for relative arithmetic means. First, it derives the necessary and sufficient conditions for when estimation on the logged form will give unbiased estimators for the parameters for the relative arithmetic mean. This requires not only that there is arithmetic mean independence of the unlogged error term but that there is also geometric mean independence. Second, it shows that statistical independence of the error terms on regressors implies that there is both arithmetic and geometric mean independence for the error terms, and it is hence a sufficient condition for absence of bias. Third, it shows that although statistical independence is a sufficient condition, it is not a necessary one for lack of bias. Fourth, it demonstrates that homoskedasticity of error terms is neither a necessary nor a sufficient condition for absence of bias. Fifth, it shows that in the semi-logarithmic specification, for a logged error term with the same qualitative distributional shape at each value of independent variables (e.g., normal), arithmetic mean independence, but heteroskedasticity, estimation on the logged form will give biased estimators for the parameters for the arithmetic mean (whereas with homoskedasticity, and for this case thus statistical independence, estimators are unbiased, from the second result above).\",\"PeriodicalId\":48140,\"journal\":{\"name\":\"Sociological Methodology\",\"volume\":\"47 1\",\"pages\":\"113 - 164\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1177/0081175017730108\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sociological Methodology\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1177/0081175017730108\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sociological Methodology","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/0081175017730108","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIOLOGY","Score":null,"Total":0}
Multiplicative Models For Continuous Dependent Variables: Estimation on Unlogged versus Logged Form
In regression analysis with a continuous and positive dependent variable, a multiplicative relationship between the unlogged dependent variable and the independent variables is often specified. It can then be estimated on its unlogged or logged form. The two procedures may yield major differences in estimates, even opposite signs. The reason is that estimation on the unlogged form yields coefficients for the relative arithmetic mean of the unlogged dependent variable, whereas estimation on the logged form gives coefficients for the relative geometric mean for the unlogged dependent variable (or for absolute differences in the arithmetic mean of the logged dependent variable). Estimated coefficients from the two forms may therefore vary widely, because of their different foci, relative arithmetic versus relative geometric means. The first goal of this article is to explain why major divergencies in coefficients can occur. Although well understood in the statistical literature, this is not widely understood in sociological research, and it is hence of significant practical interest. The second goal is to derive conditions under which divergencies will not occur, where estimation on the logged form will give unbiased estimators for relative arithmetic means. First, it derives the necessary and sufficient conditions for when estimation on the logged form will give unbiased estimators for the parameters for the relative arithmetic mean. This requires not only that there is arithmetic mean independence of the unlogged error term but that there is also geometric mean independence. Second, it shows that statistical independence of the error terms on regressors implies that there is both arithmetic and geometric mean independence for the error terms, and it is hence a sufficient condition for absence of bias. Third, it shows that although statistical independence is a sufficient condition, it is not a necessary one for lack of bias. Fourth, it demonstrates that homoskedasticity of error terms is neither a necessary nor a sufficient condition for absence of bias. Fifth, it shows that in the semi-logarithmic specification, for a logged error term with the same qualitative distributional shape at each value of independent variables (e.g., normal), arithmetic mean independence, but heteroskedasticity, estimation on the logged form will give biased estimators for the parameters for the arithmetic mean (whereas with homoskedasticity, and for this case thus statistical independence, estimators are unbiased, from the second result above).
期刊介绍:
Sociological Methodology is a compendium of new and sometimes controversial advances in social science methodology. Contributions come from diverse areas and have something useful -- and often surprising -- to say about a wide range of topics ranging from legal and ethical issues surrounding data collection to the methodology of theory construction. In short, Sociological Methodology holds something of value -- and an interesting mix of lively controversy, too -- for nearly everyone who participates in the enterprise of sociological research.