V. Bystrov, V. Naboka, A. Staszewska-Bystrova, P. Winker
{"title":"跨语料库的话题比较和话题趋势","authors":"V. Bystrov, V. Naboka, A. Staszewska-Bystrova, P. Winker","doi":"10.1515/jbnst-2022-0024","DOIUrl":null,"url":null,"abstract":"Abstract Textual data gained relevance as a novel source of information for applied economic research. When considering longer periods or international comparisons, often different text corpora have to be used and combined for the analysis. A methods pipeline is presented for identifying topics in different corpora, matching these topics across corpora and comparing the resulting time series of topic importance. The relative importance of topics over time in a text corpus is used as an additional indicator in econometric models and for forecasting as well as for identifying changing foci of economic studies. The methods pipeline is illustrated using scientific publications from Poland and Germany in English and German for the period 1984–2020. As methodological contributions, a novel tool for data based model selection, sBIC, is impelemented, and approaches for mapping of topics of different corpora (including different languages) are presented.","PeriodicalId":45967,"journal":{"name":"Jahrbucher Fur Nationalokonomie Und Statistik","volume":"242 1","pages":"433 - 469"},"PeriodicalIF":1.1000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Cross-Corpora Comparisons of Topics and Topic Trends\",\"authors\":\"V. Bystrov, V. Naboka, A. Staszewska-Bystrova, P. Winker\",\"doi\":\"10.1515/jbnst-2022-0024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Textual data gained relevance as a novel source of information for applied economic research. When considering longer periods or international comparisons, often different text corpora have to be used and combined for the analysis. A methods pipeline is presented for identifying topics in different corpora, matching these topics across corpora and comparing the resulting time series of topic importance. The relative importance of topics over time in a text corpus is used as an additional indicator in econometric models and for forecasting as well as for identifying changing foci of economic studies. The methods pipeline is illustrated using scientific publications from Poland and Germany in English and German for the period 1984–2020. As methodological contributions, a novel tool for data based model selection, sBIC, is impelemented, and approaches for mapping of topics of different corpora (including different languages) are presented.\",\"PeriodicalId\":45967,\"journal\":{\"name\":\"Jahrbucher Fur Nationalokonomie Und Statistik\",\"volume\":\"242 1\",\"pages\":\"433 - 469\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jahrbucher Fur Nationalokonomie Und Statistik\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://doi.org/10.1515/jbnst-2022-0024\",\"RegionNum\":4,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jahrbucher Fur Nationalokonomie Und Statistik","FirstCategoryId":"96","ListUrlMain":"https://doi.org/10.1515/jbnst-2022-0024","RegionNum":4,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ECONOMICS","Score":null,"Total":0}
Cross-Corpora Comparisons of Topics and Topic Trends
Abstract Textual data gained relevance as a novel source of information for applied economic research. When considering longer periods or international comparisons, often different text corpora have to be used and combined for the analysis. A methods pipeline is presented for identifying topics in different corpora, matching these topics across corpora and comparing the resulting time series of topic importance. The relative importance of topics over time in a text corpus is used as an additional indicator in econometric models and for forecasting as well as for identifying changing foci of economic studies. The methods pipeline is illustrated using scientific publications from Poland and Germany in English and German for the period 1984–2020. As methodological contributions, a novel tool for data based model selection, sBIC, is impelemented, and approaches for mapping of topics of different corpora (including different languages) are presented.
期刊介绍:
Die Jahrbücher für Nationalökonomie und Statistik existieren seit dem Jahr 1863. Die Herausgeber fühlen sich der Tradition verpflichtet, die Zeitschrift für kritische, innovative und entwicklungsträchtige Beiträge offen zu halten. Weder thematisch noch methodisch sollen die Veröffentlichungen auf jeweils herrschende Lehrmeinungen eingeengt werden.