{"title":"为社会科学家分析数据流","authors":"Lianne Ippel, M. Kaptein, J. Vermunt","doi":"10.4324/9781003025245-6","DOIUrl":null,"url":null,"abstract":"The technological developments of the last decades have created opportunities to efficiently collect data of many individuals over time. While these technologies provide exciting research opportunities, they also provide challenges: datasets collected using these technologies grow increasingly large, or be continuously augmented with new observations. These data streams make the standard computation of well-known estimators inefficient, as computations are repeated each time new data enter. This chapter details online learning, an analysis method that updates parameter estimates instead of re-estimating them to analyze large and/or streaming data. The chapter presents several simple (and exact) examples of the online estimation for independent observations. Additionally, social scientists are often faced with nested data: pupils are nested within schools, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. Estimating multilevel models, however, can be challenging in data streams: the standard algorithms used to fit these models repeatedly revisit all data points, which becomes infeasible in a data stream context. We present a solution to this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online. We end this chapter with a discussion of the methodological challenges that remain.","PeriodicalId":422456,"journal":{"name":"Handbook of Computational Social Science, Volume 2","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analyzing data streams for social scientists\",\"authors\":\"Lianne Ippel, M. Kaptein, J. Vermunt\",\"doi\":\"10.4324/9781003025245-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The technological developments of the last decades have created opportunities to efficiently collect data of many individuals over time. While these technologies provide exciting research opportunities, they also provide challenges: datasets collected using these technologies grow increasingly large, or be continuously augmented with new observations. These data streams make the standard computation of well-known estimators inefficient, as computations are repeated each time new data enter. This chapter details online learning, an analysis method that updates parameter estimates instead of re-estimating them to analyze large and/or streaming data. The chapter presents several simple (and exact) examples of the online estimation for independent observations. Additionally, social scientists are often faced with nested data: pupils are nested within schools, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. Estimating multilevel models, however, can be challenging in data streams: the standard algorithms used to fit these models repeatedly revisit all data points, which becomes infeasible in a data stream context. We present a solution to this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online. We end this chapter with a discussion of the methodological challenges that remain.\",\"PeriodicalId\":422456,\"journal\":{\"name\":\"Handbook of Computational Social Science, Volume 2\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Handbook of Computational Social Science, Volume 2\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4324/9781003025245-6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Handbook of Computational Social Science, Volume 2","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4324/9781003025245-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The technological developments of the last decades have created opportunities to efficiently collect data of many individuals over time. While these technologies provide exciting research opportunities, they also provide challenges: datasets collected using these technologies grow increasingly large, or be continuously augmented with new observations. These data streams make the standard computation of well-known estimators inefficient, as computations are repeated each time new data enter. This chapter details online learning, an analysis method that updates parameter estimates instead of re-estimating them to analyze large and/or streaming data. The chapter presents several simple (and exact) examples of the online estimation for independent observations. Additionally, social scientists are often faced with nested data: pupils are nested within schools, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. Estimating multilevel models, however, can be challenging in data streams: the standard algorithms used to fit these models repeatedly revisit all data points, which becomes infeasible in a data stream context. We present a solution to this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online. We end this chapter with a discussion of the methodological challenges that remain.