{"title":"Conditional Language Models for Community-Level Linguistic Variation","authors":"Bill Noble, Jean-Philippe Bernardy","doi":"10.18653/v1/2022.nlpcss-1.9","DOIUrl":null,"url":null,"abstract":"Community-level linguistic variation is a core concept in sociolinguistics. In this paper, we use conditioned neural language models to learn vector representations for 510 online communities. We use these representations to measure linguistic variation between commu-nities and investigate the degree to which linguistic variation corresponds with social connections between communities. We find that our sociolinguistic embeddings are highly correlated with a social network-based representation that does not use any linguistic input.","PeriodicalId":438120,"journal":{"name":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","volume":"22 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.nlpcss-1.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Community-level linguistic variation is a core concept in sociolinguistics. In this paper, we use conditioned neural language models to learn vector representations for 510 online communities. We use these representations to measure linguistic variation between commu-nities and investigate the degree to which linguistic variation corresponds with social connections between communities. We find that our sociolinguistic embeddings are highly correlated with a social network-based representation that does not use any linguistic input.