David Bamman, Rachael Samberg, Richard Jean So, Naitian Zhou
{"title":"Measuring diversity in Hollywood through the large-scale computational analysis of film.","authors":"David Bamman, Rachael Samberg, Richard Jean So, Naitian Zhou","doi":"10.1073/pnas.2409770121","DOIUrl":null,"url":null,"abstract":"<p><p>Movies are a massively popular and influential form of media, but their computational study at scale has largely been off-limits to researchers in the United States due to the Digital Millennium Copyright Act. In this work, we illustrate use of a new regulatory framework to enable computational research on film that permits circumvention of technological protection measures on digital video discs (DVDs). We use this exemption to legally digitize a collection of 2,307 films representing the top 50 movies by U.S. box office over the period 1980 to 2022, along with award nominees. We design a computational pipeline for measuring the representation of gender and race/ethnicity in film, drawing on computer vision models for recognizing actors and human perceptions of gender and race/ethnicity. Doing so allows us to learn substantive facts about representation and diversity in Hollywood over this period, confirming earlier studies that see an increase in diversity over the past decade, while allowing us to use computational methods to uncover a range of ad hoc analytical findings. Our work illustrates the affordances of the data-driven analysis of film at a large scale.</p>","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":null,"pages":null},"PeriodicalIF":9.4000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2409770121","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Movies are a massively popular and influential form of media, but their computational study at scale has largely been off-limits to researchers in the United States due to the Digital Millennium Copyright Act. In this work, we illustrate use of a new regulatory framework to enable computational research on film that permits circumvention of technological protection measures on digital video discs (DVDs). We use this exemption to legally digitize a collection of 2,307 films representing the top 50 movies by U.S. box office over the period 1980 to 2022, along with award nominees. We design a computational pipeline for measuring the representation of gender and race/ethnicity in film, drawing on computer vision models for recognizing actors and human perceptions of gender and race/ethnicity. Doing so allows us to learn substantive facts about representation and diversity in Hollywood over this period, confirming earlier studies that see an increase in diversity over the past decade, while allowing us to use computational methods to uncover a range of ad hoc analytical findings. Our work illustrates the affordances of the data-driven analysis of film at a large scale.
期刊介绍:
The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.