Pub Date : 2021-07-06DOI: 10.4312/SLO2.0.2021.1.I-VI
Darja Fišer, Tomaž Erjavec, Ajda Pretnar
{"title":"Slovenščina 2.0: Language Technologies and Digital Humanities","authors":"Darja Fišer, Tomaž Erjavec, Ajda Pretnar","doi":"10.4312/SLO2.0.2021.1.I-VI","DOIUrl":"https://doi.org/10.4312/SLO2.0.2021.1.I-VI","url":null,"abstract":"","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133003390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-06DOI: 10.4312/SLO2.0.2021.1.1-25
M. Robnik-Sikonja, Kristjan Reba, I. Mozetič
Word embeddings represent words in a numeric space so that semantic relations between words are represented as distances and directions in the vector space. Cross-lingual word embeddings transform vector spaces of different languages so that similar words are aligned. This is done by mapping one language’s vector space to the vector space of another language or by construction of a joint vector space for multiple languages. Cross-lingual embeddings can be used to transfer machine learning models between languages, thereby compensating for insufficient data in less-resourced languages. We use cross-lingual word embeddings to transfer machine learning prediction models for Twitter sentiment between 13 languages. We focus on two transfer mechanisms that recently show superior transfer performance. The first mechanism uses the trained models whose input is the joint numerical space for many languages as implemented in the LASER library. The second mechanism uses large pretrained multilingual BERT language models. Our experiments show that the transfer of models between similar languages is sensible, even with no target language data. The performance of cross-lingual models obtained with the multilingual BERT and LASER library is comparable, and the differences are language-dependent. The transfer with CroSloEngual BERT, pretrained on only three languages, is superior on these and some closely related languages.
{"title":"Cross-lingual transfer of sentiment classifiers","authors":"M. Robnik-Sikonja, Kristjan Reba, I. Mozetič","doi":"10.4312/SLO2.0.2021.1.1-25","DOIUrl":"https://doi.org/10.4312/SLO2.0.2021.1.1-25","url":null,"abstract":"Word embeddings represent words in a numeric space so that semantic relations between words are represented as distances and directions in the vector space. Cross-lingual word embeddings transform vector spaces of different languages so that similar words are aligned. This is done by mapping one language’s vector space to the vector space of another language or by construction of a joint vector space for multiple languages. Cross-lingual embeddings can be used to transfer machine learning models between languages, thereby compensating for insufficient data in less-resourced languages. We use cross-lingual word embeddings to transfer machine learning prediction models for Twitter sentiment between 13 languages. We focus on two transfer mechanisms that recently show superior transfer performance. The first mechanism uses the trained models whose input is the joint numerical space for many languages as implemented in the LASER library. The second mechanism uses large pretrained multilingual BERT language models. Our experiments show that the transfer of models between similar languages is sensible, even with no target language data. The performance of cross-lingual models obtained with the multilingual BERT and LASER library is comparable, and the differences are language-dependent. The transfer with CroSloEngual BERT, pretrained on only three languages, is superior on these and some closely related languages.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127193710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-21DOI: 10.4312/slo2.0.2020.1.92-112
I. Ferbežar, Igor Cetina, Alojz Ihan, Marko Stabej, Lana Zdravković, Tina Zupančič
V Ljubljani sta med 6. in 8. 11. 2019 potekala 54. srečanje in javni posvet ALTE (Association of Language Testers in Europe). Srečanje na temo Enojezično testiranje v večjezični realnosti: jezikovne ideologije in njihov vpliv na jezikovno testiranje sta organizirala Univerza v Ljubljani, Filozofska fakulteta in njen Center za slovenščino kot drugi in tuji jezik pri Oddelku za slovenistiko. V tem okviru je 8. 11. 2019 potekala okrogla miza (Bližnja) srečanja oblikovalcev jezikovne politike. Objavljamo zapis posnetka pogovora sodelujočih na dogodku.
{"title":"Okrogla miza »(Bližnja) srečanja oblikovalcev jezikovne politike«","authors":"I. Ferbežar, Igor Cetina, Alojz Ihan, Marko Stabej, Lana Zdravković, Tina Zupančič","doi":"10.4312/slo2.0.2020.1.92-112","DOIUrl":"https://doi.org/10.4312/slo2.0.2020.1.92-112","url":null,"abstract":"V Ljubljani sta med 6. in 8. 11. 2019 potekala 54. srečanje in javni posvet ALTE (Association of Language Testers in Europe). Srečanje na temo Enojezično testiranje v večjezični realnosti: jezikovne ideologije in njihov vpliv na jezikovno testiranje sta organizirala Univerza v Ljubljani, Filozofska fakulteta in njen Center za slovenščino kot drugi in tuji jezik pri Oddelku za slovenistiko. V tem okviru je 8. 11. 2019 potekala okrogla miza (Bližnja) srečanja oblikovalcev jezikovne politike. Objavljamo zapis posnetka pogovora sodelujočih na dogodku.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117049880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-21DOI: 10.4312/slo2.0.2020.1.113-119
Katja Meden
Konferenca Jezikovne tehnologije in digitalna humanistika 2020, ki jo skupaj z Inštitutom za novejšo zgodovino , Centrom za jezikovne vire in tehnologije Univerze v Ljubljani (CJVT) ter raziskovalnima infrastrukturama CLARIN.SI in DARIAH-SI organizira Slovensko društvo za jezikovne tehnologije (SDJT), je letos potekala 24. in 25. septembra 2020, že tretjo multidisciplinarno izvedbo konference pa je podprl CLARIN ERIC. Konferenca, ki se lahko pohvali z več kot 20-letno tradicijo delovanja, je leta 2016 v svoj program vključila tudi področje digitalne humanistike in s tem postala pomemben povezovalni člen med omenjenima disciplinama.
{"title":"Konferenca Jezikovne tehnologije in digitalna humanistika 2020","authors":"Katja Meden","doi":"10.4312/slo2.0.2020.1.113-119","DOIUrl":"https://doi.org/10.4312/slo2.0.2020.1.113-119","url":null,"abstract":"Konferenca Jezikovne tehnologije in digitalna humanistika 2020, ki jo skupaj z Inštitutom za novejšo zgodovino , Centrom za jezikovne vire in tehnologije Univerze v Ljubljani (CJVT) ter raziskovalnima infrastrukturama CLARIN.SI in DARIAH-SI organizira Slovensko društvo za jezikovne tehnologije (SDJT), je letos potekala 24. in 25. septembra 2020, že tretjo multidisciplinarno izvedbo konference pa je podprl CLARIN ERIC. Konferenca, ki se lahko pohvali z več kot 20-letno tradicijo delovanja, je leta 2016 v svoj program vključila tudi področje digitalne humanistike in s tem postala pomemben povezovalni člen med omenjenima disciplinama.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122062142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-21DOI: 10.4312/slo2.0.2020.1.66-91
Rok Chitrakar
The aim of this article is to describe the perception of refugees as a threat in Slovene online discourse, based on a critical analysis of commenters’ responses to popular media posts at the height of the European migrant crisis. The proposition of the study is that the perception of migration as a threat is at the core of socially unacceptable discourse (SUD), portraying refugees and migrants as an undesirable and potentially dangerous presence. Within the framework of a comprehensive project examining public responses to media coverage of the arrival of migrants to Slovenia, online comments classified as SUD targeting refugees were extracted and annotated to reveal the recurring themes of threat perception. The analysis focused on describing the main categories of threat, as well as the various discursive features and strategies employed. Although the approach to observing this subject is essentially qualitative, a general case-specific overview of the frequency and distribution of identifiable categories is also given.
{"title":"Threat perception in online anti-migrant speech: a Slovene case study","authors":"Rok Chitrakar","doi":"10.4312/slo2.0.2020.1.66-91","DOIUrl":"https://doi.org/10.4312/slo2.0.2020.1.66-91","url":null,"abstract":"The aim of this article is to describe the perception of refugees as a threat in Slovene online discourse, based on a critical analysis of commenters’ responses to popular media posts at the height of the European migrant crisis. The proposition of the study is that the perception of migration as a threat is at the core of socially unacceptable discourse (SUD), portraying refugees and migrants as an undesirable and potentially dangerous presence. Within the framework of a comprehensive project examining public responses to media coverage of the arrival of migrants to Slovenia, online comments classified as SUD targeting refugees were extracted and annotated to reveal the recurring themes of threat perception. The analysis focused on describing the main categories of threat, as well as the various discursive features and strategies employed. Although the approach to observing this subject is essentially qualitative, a general case-specific overview of the frequency and distribution of identifiable categories is also given.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115023730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-10DOI: 10.4312/slo2.0.2020.2.139-167
Ene Vainik, Maria Tuulik, Kristina Koppel
The paper provides a comparative study of the collocational and associative structures in Estonian with respect to the role of parts of speech. The lists of collocations and associations of an equal set of nouns, verbs and adjectives, originating from the respective dictionaries, is analysed to find both the range of coincidences and differences. The results show a moderate overlap, among which the biggest overlap occurs in the range of the adjectival associates and collocates. There is an overall prevalence for nouns appearing among the associated and collocated items. The coincidental sets of relations are tentatively explained by the influence of grammatical relations i.e. the patterns of local grammar binding together the collocations and motivating the associations. The results are discussed with respect to the possible reasons causing the associations-collocations mismatch and in relation to the application of these findings in the fields of lexicography and second language acquisition.
{"title":"A comparison of collocations and word associations in Estonian from the perspective of parts of speech","authors":"Ene Vainik, Maria Tuulik, Kristina Koppel","doi":"10.4312/slo2.0.2020.2.139-167","DOIUrl":"https://doi.org/10.4312/slo2.0.2020.2.139-167","url":null,"abstract":"The paper provides a comparative study of the collocational and associative structures in Estonian with respect to the role of parts of speech. The lists of collocations and associations of an equal set of nouns, verbs and adjectives, originating from the respective dictionaries, is analysed to find both the range of coincidences and differences. The results show a moderate overlap, among which the biggest overlap occurs in the range of the adjectival associates and collocates. There is an overall prevalence for nouns appearing among the associated and collocated items. The coincidental sets of relations are tentatively explained by the influence of grammatical relations i.e. the patterns of local grammar binding together the collocations and motivating the associations. The results are discussed with respect to the possible reasons causing the associations-collocations mismatch and in relation to the application of these findings in the fields of lexicography and second language acquisition.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131084233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-13DOI: 10.4312/slo2.0.2019.1.98-112
T. Kuhn
This paper is a minireview of the current status of monolingual lexicography in Brazil. Firstly, a brief contextualization of the origins of Brazilian Portuguese dictionary-making is provided. Then, an account of contemporary monolingual dictionaries is given and a more detailed overview on print, digital, spelling, and school dictionaries is presented. Next, research into dictionary use is reviewed. Finally, the perception among the Brazilians with regards to corpora and use of crowdsourcing in lexicography is discussed.
{"title":"State-of-the-art on monolingual lexicography for Brazil (Brazilian Portuguese)","authors":"T. Kuhn","doi":"10.4312/slo2.0.2019.1.98-112","DOIUrl":"https://doi.org/10.4312/slo2.0.2019.1.98-112","url":null,"abstract":"This paper is a minireview of the current status of monolingual lexicography in Brazil. Firstly, a brief contextualization of the origins of Brazilian Portuguese dictionary-making is provided. Then, an account of contemporary monolingual dictionaries is given and a more detailed overview on print, digital, spelling, and school dictionaries is presented. Next, research into dictionary use is reviewed. Finally, the perception among the Brazilians with regards to corpora and use of crowdsourcing in lexicography is discussed.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121513904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-13DOI: 10.4312/slo2.0.2019.1.78-97
Stella Markantonatou, Voula Giouli
The authors report on a recent survey on monolingual dictionaries available on the Greek market. General dictionaries outnumber spelling and educational ones and enjoy a prestigious status. Only one general dictionary is digitally born and only two are available through the web, but several are available as CDs. Most of the prestigious dictionaries have received public funding but not all. Lexicography is well considered in Greece where printed dictionaries seem to still have the lead.
{"title":"State-of-the-art on monolingual lexicography for Greece (EL)","authors":"Stella Markantonatou, Voula Giouli","doi":"10.4312/slo2.0.2019.1.78-97","DOIUrl":"https://doi.org/10.4312/slo2.0.2019.1.78-97","url":null,"abstract":"\u0000 \u0000 \u0000The authors report on a recent survey on monolingual dictionaries available on the Greek market. General dictionaries outnumber spelling and educational ones and enjoy a prestigious status. Only one general dictionary is digitally born and only two are available through the web, but several are available as CDs. Most of the prestigious dictionaries have received public funding but not all. Lexicography is well considered in Greece where printed dictionaries seem to still have the lead. \u0000 \u0000 \u0000","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124431628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-18DOI: 10.4312/slo2.0.2019.1.i-ii
Vojko Gorjanc, Špela Arhar Holdt
Poseben tematski sklop letošnje prve številke vsebuje osem kratkih znanstvenih prispevkov, ki pregledno opisujejo trenutno stanje na področju leksikografije na Danskem, Švedskem, Norveškem, Hrvaškem, v Grčiji, Baskiji, Estoniji in Braziliji. Prispevki so nastali kot rezultat znanstvenega sodelovanja v evropski mreži ENeL – European Network of e-Lexicography [ISCH COST Action IS1305].
{"title":"Uvodnik","authors":"Vojko Gorjanc, Špela Arhar Holdt","doi":"10.4312/slo2.0.2019.1.i-ii","DOIUrl":"https://doi.org/10.4312/slo2.0.2019.1.i-ii","url":null,"abstract":"Poseben tematski sklop letošnje prve številke vsebuje osem kratkih znanstvenih prispevkov, ki pregledno opisujejo trenutno stanje na področju leksikografije na Danskem, Švedskem, Norveškem, Hrvaškem, v Grčiji, Baskiji, Estoniji in Braziliji. Prispevki so nastali kot rezultat znanstvenega sodelovanja v evropski mreži ENeL – European Network of e-Lexicography [ISCH COST Action IS1305].","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123682930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-18DOI: 10.4312/SLO2.0.2019.1.13-24
Emma Sköldberg, L. Holmer, Elena Volodina, I. Pilán
The minireview describes the state-of-the-art of Swedish monolingual lexicography. The main actors in the field, both commercial and non-commercial, are mentioned alongside with the description of lexicographic products that have been offered by them to the lexicon users. The minireview makes it clear that there is an obvious tendency among the Swedish dictionary users to abandon paper-based dictionaries and switch over to online portals and apps, which influences the practices adopted by commercial publishing houses, such as Norstedts, Bonniers, Natur & Kultur. Among the leading non-commercial players, the Swedish Academy, the Swedish Language Bank, Institute for Language and Folklore are named. Swedish monolingual lexicography offers, however, dictionaries produced not only by experts but also by non-experts (i.e. using the efforts of the crowd).
{"title":"State-of-the-art on monolingual lexicography for Sweden","authors":"Emma Sköldberg, L. Holmer, Elena Volodina, I. Pilán","doi":"10.4312/SLO2.0.2019.1.13-24","DOIUrl":"https://doi.org/10.4312/SLO2.0.2019.1.13-24","url":null,"abstract":"The minireview describes the state-of-the-art of Swedish monolingual lexicography. The main actors in the field, both commercial and non-commercial, are mentioned alongside with the description of lexicographic products that have been offered by them to the lexicon users. The minireview makes it clear that there is an obvious tendency among the Swedish dictionary users to abandon paper-based dictionaries and switch over to online portals and apps, which influences the practices adopted by commercial publishing houses, such as Norstedts, Bonniers, Natur & Kultur. Among the leading non-commercial players, the Swedish Academy, the Swedish Language Bank, Institute for Language and Folklore are named. Swedish monolingual lexicography offers, however, dictionaries produced not only by experts but also by non-experts (i.e. using the efforts of the crowd).","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116690235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}