Pub Date : 2024-06-24DOI: 10.1007/s11943-024-00341-5
Joachim Wagner
For many attempts to inform evidence-based policymaking (or policy-makers in general) researchers have to rely on already available (instead of newly collected) data. These data have to be reliable, accessible (at best, without high hurdles, and with low or no fees to be paid) and findable. One way that helps to find suitable data that are easily accessible (and hopefully reliable) is to look at the contributions published in the Data Observer series described in this paper.
{"title":"Data Observer—a guide to data that can help to inform evidence-based policymaking","authors":"Joachim Wagner","doi":"10.1007/s11943-024-00341-5","DOIUrl":"10.1007/s11943-024-00341-5","url":null,"abstract":"<div><p>For many attempts to inform evidence-based policymaking (or policy-makers in general) researchers have to rely on already available (instead of newly collected) data. These data have to be reliable, accessible (at best, without high hurdles, and with low or no fees to be paid) and findable. One way that helps to find suitable data that are easily accessible (and hopefully reliable) is to look at the contributions published in the <i>Data Observer</i> series described in this paper.</p></div>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 2","pages":"279 - 287"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142451138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-24DOI: 10.1007/s11943-024-00340-6
Camilo Meyberg, Ulrich Rendtel, Holger Leerhoff
Internet data pose a challenge to the traditional system of official statistics, which relies on more conventional sources such as surveys and registers, not readily adaptable to rapid changes. Expanding this system to include internet data is currently at an experimental stage, exploring these sources’ potentials and benefits. This paper describes a project conducted within the ESSnet Trusted Smart Statistics – Web Intelligence Network framework. It investigates the use of online apartment listings to analyze the rental market. We used web scraping to extract information from two online real estate portals for flats in the city of Berlin. Using this data, we developed a model to predict rental prices per square meter based on the accommodation’s features and location within the city. We detected offers which appear in both portals by means of statistical matching and removed duplicate offers. Missing values were treated by multiple imputation. The prediction model is a semi-parametric approach where the postal districts are used to describe the location effect. Comparisons with microcensus results and the local rent index reveal significant differences between the market of online flat offers and the stock of existing flat contracts. Interested readers will find the commented programming code in the internet supplement.
{"title":"Flat rent price prediction in Berlin with web scraping","authors":"Camilo Meyberg, Ulrich Rendtel, Holger Leerhoff","doi":"10.1007/s11943-024-00340-6","DOIUrl":"10.1007/s11943-024-00340-6","url":null,"abstract":"<div><p>Internet data pose a challenge to the traditional system of official statistics, which relies on more conventional sources such as surveys and registers, not readily adaptable to rapid changes. Expanding this system to include internet data is currently at an experimental stage, exploring these sources’ potentials and benefits. This paper describes a project conducted within the ESSnet <i>Trusted Smart Statistics – Web Intelligence Network</i> framework. It investigates the use of online apartment listings to analyze the rental market. We used web scraping to extract information from two online real estate portals for flats in the city of Berlin. Using this data, we developed a model to predict rental prices per square meter based on the accommodation’s features and location within the city. We detected offers which appear in both portals by means of statistical matching and removed duplicate offers. Missing values were treated by multiple imputation. The prediction model is a semi-parametric approach where the postal districts are used to describe the location effect. Comparisons with microcensus results and the local rent index reveal significant differences between the market of online flat offers and the stock of existing flat contracts. Interested readers will find the commented programming code in the internet supplement.</p></div>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 2","pages":"245 - 278"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-024-00340-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142451139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-17DOI: 10.1007/s11943-024-00339-z
Markus Zwick, Jan Pablo Burgard
{"title":"Vorwort der Herausgeber","authors":"Markus Zwick, Jan Pablo Burgard","doi":"10.1007/s11943-024-00339-z","DOIUrl":"10.1007/s11943-024-00339-z","url":null,"abstract":"","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 1","pages":"1 - 4"},"PeriodicalIF":0.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-024-00339-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142412142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-08DOI: 10.1007/s11943-024-00336-2
Olaf Hübler
Bürgerbeteiligungen finden sich in nahezu allen Bereichen des öffentlichen Lebens. Häufig sind Unzufriedenheit mit öffentlichen Entscheidungen und Politikverdrossenheit dafür ausschlaggebend, dass es zu einem Engagement der Bürger außerhalb des Berufslebens kommt. Über Auswirkungen und Struktur von Bürgerinitiativen ist wenig bekannt. Empirische Untersuchungen beschränken sich häufig auf Einzelfallanalysen. Eine breitere Datenbasis unter Verwendung von statistisch-ökonometrischen Verfahren ist notwendig, um zu verallgemeinerbaren Aussagen zu gelangen. Welcher Typ Mensch ist bei über das Private hinausgehenden Angelegenheiten aktiv und beteiligt sich an diesen? Inwiefern wird er davon in seiner Einstellung und seinen Verhaltensweisen beeinflusst. Bürgerräte sind ein vergleichsweise neu entwickeltes Instrument zur Bürgerbeteiligung, zu dem aus statistischer Sicht noch eine Reihe an Informationen fehlt. Zufallsgesteuerte Auswahlverfahren sollen dazu beitragen, dass sich Politikempfehlungen und Politikentscheidungen stärker am Bevölkerungswillen orientieren. Welche persönlichen Merkmale sind für Bürgerratsmitglieder typisch? Entspricht die Verteilung dieser Merkmale der in der Gesamtbevölkerung?
Die empirische Untersuchung zeigt, dass übliche demographische Merkmale nur beschränkt die Teilnahme an Bürgerinitiativen erklären können und dass eine wechselseitige Beziehung zur Beteiligung an Bürgerinitiativen besteht. Von zusätzlicher Bedeutung sind Big 5 Charakteristika und Beurteilungen, was als gerecht empfunden wird. Lebenszufriedenheit und Vertrauen in Politiker offenbaren sich bei Personen mit und ohne Erfahrung im Bereich der Bürgerinitiativen unterschiedlich. Insgesamt ist die Bedeutung von Bürgerbeteiligung geringer einzuschätzen als die anderer altruistisch orientierter Aktivitäten.
{"title":"Bürgerbeteiligung in Deutschland – Wer beteiligt sich wofür mit welchen Auswirkungen?","authors":"Olaf Hübler","doi":"10.1007/s11943-024-00336-2","DOIUrl":"10.1007/s11943-024-00336-2","url":null,"abstract":"<p>Bürgerbeteiligungen finden sich in nahezu allen Bereichen des öffentlichen Lebens. Häufig sind Unzufriedenheit mit öffentlichen Entscheidungen und Politikverdrossenheit dafür ausschlaggebend, dass es zu einem Engagement der Bürger außerhalb des Berufslebens kommt. Über Auswirkungen und Struktur von Bürgerinitiativen ist wenig bekannt. Empirische Untersuchungen beschränken sich häufig auf Einzelfallanalysen. Eine breitere Datenbasis unter Verwendung von statistisch-ökonometrischen Verfahren ist notwendig, um zu verallgemeinerbaren Aussagen zu gelangen. Welcher Typ Mensch ist bei über das Private hinausgehenden Angelegenheiten aktiv und beteiligt sich an diesen? Inwiefern wird er davon in seiner Einstellung und seinen Verhaltensweisen beeinflusst. Bürgerräte sind ein vergleichsweise neu entwickeltes Instrument zur Bürgerbeteiligung, zu dem aus statistischer Sicht noch eine Reihe an Informationen fehlt. Zufallsgesteuerte Auswahlverfahren sollen dazu beitragen, dass sich Politikempfehlungen und Politikentscheidungen stärker am Bevölkerungswillen orientieren. Welche persönlichen Merkmale sind für Bürgerratsmitglieder typisch? Entspricht die Verteilung dieser Merkmale der in der Gesamtbevölkerung?</p><p>Die empirische Untersuchung zeigt, dass übliche demographische Merkmale nur beschränkt die Teilnahme an Bürgerinitiativen erklären können und dass eine wechselseitige Beziehung zur Beteiligung an Bürgerinitiativen besteht. Von zusätzlicher Bedeutung sind Big 5 Charakteristika und Beurteilungen, was als gerecht empfunden wird. Lebenszufriedenheit und Vertrauen in Politiker offenbaren sich bei Personen mit und ohne Erfahrung im Bereich der Bürgerinitiativen unterschiedlich. Insgesamt ist die Bedeutung von Bürgerbeteiligung geringer einzuschätzen als die anderer altruistisch orientierter Aktivitäten.</p>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 1","pages":"99 - 116"},"PeriodicalIF":0.0,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-024-00336-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142410616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-04DOI: 10.1007/s11943-024-00335-3
Regina T. Riphahn
Die Grohmann-Vorlesung des Jahres 2023 beschäftigt sich mit dem Phänomen der „kleinen Jobs“ in Deutschland. Zunächst wird der institutionelle und historische Hintergrund von Minijobs erläutert und die Intensität ihrer Nutzung beschrieben. Anschließend fasst der Text die Inhalte von drei empirischen Studien zusammen. Diese setzen sich mit der Frage auseinander ob (i) Arbeitgeber reguläre Beschäftigung durch Minijobs ersetzen, (ii) Minijobs zur „motherhood penalty“ in Deutschland beitragen und (iii) ob Midijobs Übergänge aus Minijobs in reguläre sozialversicherungspflichtige Beschäftigung erleichtert haben. Die Vorlesung schließt mit einer Betrachtung möglicher Regelungsalternativen für „kleine Jobs“ in Deutschland.
{"title":"Subventionen für „kleine Jobs“:","authors":"Regina T. Riphahn","doi":"10.1007/s11943-024-00335-3","DOIUrl":"10.1007/s11943-024-00335-3","url":null,"abstract":"<p>Die Grohmann-Vorlesung des Jahres 2023 beschäftigt sich mit dem Phänomen der „kleinen Jobs“ in Deutschland. Zunächst wird der institutionelle und historische Hintergrund von Minijobs erläutert und die Intensität ihrer Nutzung beschrieben. Anschließend fasst der Text die Inhalte von drei empirischen Studien zusammen. Diese setzen sich mit der Frage auseinander ob (i) Arbeitgeber reguläre Beschäftigung durch Minijobs ersetzen, (ii) Minijobs zur „motherhood penalty“ in Deutschland beitragen und (iii) ob Midijobs Übergänge aus Minijobs in reguläre sozialversicherungspflichtige Beschäftigung erleichtert haben. Die Vorlesung schließt mit einer Betrachtung möglicher Regelungsalternativen für „kleine Jobs“ in Deutschland.</p>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 1","pages":"5 - 14"},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-024-00335-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140079905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-04DOI: 10.1007/s11943-024-00338-0
Hans Walter Steinhauer, Jean Philippe Décieux, Manuel Siegert, Andreas Ette, Sabine Zinn
Following Russia’s invasion of Ukraine in early 2022, more than one million refugees have arrived in Germany. These Ukrainian refugees differ in many aspects from Germany’s past forced migration experiences and there exists an urgent need for sound data and information for politics, practitioners, and academics. In response, the IAB-BiB/FReDA-BAMF-SOEP study was established to provide high-quality longitudinal data following a register-based probability sample. We detail on an approach for sampling refugees in brief time, making use of two different registers—the German population register and the central register of foreigners—and discuss the quality of the final sample with respect to potential selectivity of participation in the panel. Overall, we demonstrate the benefits and feasibility of establishing register-based samples even in the context of a geopolitical crisis and the necessity of sound data within brief time horizons. We provide guidance that can be followed for similar events in the future.
{"title":"Establishing a probability sample in a crisis context: the example of Ukrainian refugees in Germany in 2022","authors":"Hans Walter Steinhauer, Jean Philippe Décieux, Manuel Siegert, Andreas Ette, Sabine Zinn","doi":"10.1007/s11943-024-00338-0","DOIUrl":"10.1007/s11943-024-00338-0","url":null,"abstract":"<div><p>Following Russia’s invasion of Ukraine in early 2022, more than one million refugees have arrived in Germany. These Ukrainian refugees differ in many aspects from Germany’s past forced migration experiences and there exists an urgent need for sound data and information for politics, practitioners, and academics. In response, the IAB-BiB/FReDA-BAMF-SOEP study was established to provide high-quality longitudinal data following a register-based probability sample. We detail on an approach for sampling refugees in brief time, making use of two different registers—the German population register and the central register of foreigners—and discuss the quality of the final sample with respect to potential selectivity of participation in the panel. Overall, we demonstrate the benefits and feasibility of establishing register-based samples even in the context of a geopolitical crisis and the necessity of sound data within brief time horizons. We provide guidance that can be followed for similar events in the future.</p></div>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 1","pages":"77 - 97"},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-024-00338-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142409775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12DOI: 10.1007/s11943-023-00333-x
Oliver Trinkaus, Göran Kauermann
In this paper we discuss the use and potential advantages and disadvantages of machine learning driven models in rental guides. Rental guides are a formal legal instrument in Germany for surveying rents of flats in cities and municipalities, which are today based on regression models or simple contingency tables. We discuss if and how modern and timely methods of machine learning outperform existing and established routines. We make use of data from the Munich rental guide and mainly focus on the predictive power of these models. We discuss the “black-box” character making some of these models difficult to interpret and hence challenging for applications in the rental guide context. Still, it is of interest to see how “black-box” models perform with respect to prediction error. Moreover, we study adversarial effects, i.e. we investigate robustness in the sense how corrupted data influence the performance of the prediction models. With the data at hand we show that models with promising predictive performance suffer from being more vulnerable to corruptions than classic linear models including Ridge or Lasso regularization.
{"title":"Can machine learning algorithms deliver superior models for rental guides?","authors":"Oliver Trinkaus, Göran Kauermann","doi":"10.1007/s11943-023-00333-x","DOIUrl":"10.1007/s11943-023-00333-x","url":null,"abstract":"<div><p>In this paper we discuss the use and potential advantages and disadvantages of machine learning driven models in rental guides. Rental guides are a formal legal instrument in Germany for surveying rents of flats in cities and municipalities, which are today based on regression models or simple contingency tables. We discuss if and how modern and timely methods of machine learning outperform existing and established routines. We make use of data from the Munich rental guide and mainly focus on the predictive power of these models. We discuss the “black-box” character making some of these models difficult to interpret and hence challenging for applications in the rental guide context. Still, it is of interest to see how “black-box” models perform with respect to prediction error. Moreover, we study adversarial effects, i.e. we investigate robustness in the sense how corrupted data influence the performance of the prediction models. With the data at hand we show that models with promising predictive performance suffer from being more vulnerable to corruptions than classic linear models including Ridge or Lasso regularization.</p></div>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"17 3-4","pages":"305 - 330"},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-023-00333-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138987242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-07DOI: 10.1007/s11943-023-00330-0
Arnout van Delden, Joep Burger, Marco Puts
Machine learning (ML) is increasingly being used in official statistics with a range of different applications. The main focus of ML models is to accurately predict attributes of new, unlabeled cases whereas the focus of classical statistical models is to describe the relations between independent and dependent variables. There is already a lot of experience in the sound use of classical statistical models in official statistics, but for ML models this is still under development. Recent discussions concerning the quality aspects of using ML in official statistics have concentrated on its implications for existing quality frameworks. We are in favor of the use of ML in official statistics, but the main question remains as to what factors need to be considered when using ML models in official statistics. As a means of raising awareness regarding these factors, we pose ten propositions regarding the (sensible) use of ML in official statistics.
机器学习(ML)正越来越多地应用于官方统计中的一系列不同领域。ML 模型的主要重点是准确预测未标记的新案例的属性,而经典统计模型的重点是描述自变量和因变量之间的关系。在官方统计中合理使用经典统计模型方面已经有了很多经验,但对于 ML 模型来说,这仍处于发展阶段。最近有关在官方统计中使用 ML 的质量问题的讨论主要集中在其对现有质量框架的影响上。我们赞成在官方统计中使用 ML,但主要问题仍然是在官方统计中使用 ML 模型时需要考虑哪些因素。为了提高对这些因素的认识,我们提出了关于在官方统计中(合理)使用 ML 的十项主张。
{"title":"Ten propositions on machine learning in official statistics","authors":"Arnout van Delden, Joep Burger, Marco Puts","doi":"10.1007/s11943-023-00330-0","DOIUrl":"10.1007/s11943-023-00330-0","url":null,"abstract":"<div><p>Machine learning (ML) is increasingly being used in official statistics with a range of different applications. The main focus of ML models is to accurately predict attributes of new, unlabeled cases whereas the focus of classical statistical models is to describe the relations between independent and dependent variables. There is already a lot of experience in the sound use of classical statistical models in official statistics, but for ML models this is still under development. Recent discussions concerning the quality aspects of using ML in official statistics have concentrated on its implications for existing quality frameworks. We are in favor of the use of ML in official statistics, but the main question remains as to what factors need to be considered when using ML models in official statistics. As a means of raising awareness regarding these factors, we pose ten propositions regarding the (sensible) use of ML in official statistics.</p></div>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"17 3-4","pages":"195 - 221"},"PeriodicalIF":0.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138590780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}