{"title":"A Demographic Sampling Model and Database for Addressing Racial, Ethnic, and Gender Bias in Popular-music Empirical Research","authors":"Nicholas J. Shea","doi":"10.18061/emr.v17i1.8531","DOIUrl":null,"url":null,"abstract":"This report summarizes the development and application of a demographic encoding model designed to assist researchers in aligning dataset diversity with real-world diversity in popular-music corpus studies. Drawing on sampling strategies in machine-learning research and encoding procedures in health sciences and the humanities, the model and its associated open-access data provides researchers with a tool to generate more inclusive databases along the parameters of race, ethnicity, and gender. The model itself attempts to reconcile the intersectional boundaries of personal identity with the binarity required by statistical encoding and analysis. Importantly, it facilitates a mindful approach through conditional parameters; for example, by minimizing the risk of tokenizing minoritized artists in multi-member ensembles by considering said artist’s agency and demographic proportion within the group. Applying the model to artist samples from various popular-music corpora affirms the underrepresentation of non-white and non-male artists in related research. In response, the report outlines how a researcher might utilize intentional demographic sampling when developing future corpus-based popular-music studies.","PeriodicalId":44128,"journal":{"name":"Empirical Musicology Review","volume":" ","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Musicology Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18061/emr.v17i1.8531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"MUSIC","Score":null,"Total":0}
引用次数: 0
Abstract
This report summarizes the development and application of a demographic encoding model designed to assist researchers in aligning dataset diversity with real-world diversity in popular-music corpus studies. Drawing on sampling strategies in machine-learning research and encoding procedures in health sciences and the humanities, the model and its associated open-access data provides researchers with a tool to generate more inclusive databases along the parameters of race, ethnicity, and gender. The model itself attempts to reconcile the intersectional boundaries of personal identity with the binarity required by statistical encoding and analysis. Importantly, it facilitates a mindful approach through conditional parameters; for example, by minimizing the risk of tokenizing minoritized artists in multi-member ensembles by considering said artist’s agency and demographic proportion within the group. Applying the model to artist samples from various popular-music corpora affirms the underrepresentation of non-white and non-male artists in related research. In response, the report outlines how a researcher might utilize intentional demographic sampling when developing future corpus-based popular-music studies.