Pub Date : 2024-08-08DOI: 10.1177/08944393241269098
Daria Dementeva, Cecil Meeusen, Bart Meuleman
Neighborhoods are important contexts in shaping interethnic group relationships and sites in which these may materialize through everyday routines in shared local spaces. In this paper, we approach neighborhoods as a small-scale set of spaces of encounter, defined as local public or semi-public spaces, where residents of different ethnic backgrounds may meet. Relying on the classical contact and group threat theories, the main assumption is that local spaces of encounter are facets of an intergroup neighborhood context and may shape intergroup relations, defined as perceived ethnic threat and intergroup friendship. Drawing on the georeferenced survey data from the Belgian National Election Study 2020 enriched with spatial features from OpenStreetMap, an innovative big geospatial data source, and census-based neighborhood characteristics, the study employs machine learning algorithms to investigate whether, which, and how neighborhood spaces of encounter can predict perceived ethnic threat and intergroup friendship, while also taking into account traditional local ethnic, socioeconomic, and individual indicators. By using OpenStreetMap to measure spaces of encounter as a novel neighborhood indicator, we develop a fine-grained typology of local spaces that is rooted in urban and intergroup relations research. The results show that for predicting intergroup friendship, the important spaces were educational, functional, public open, and user-selecting spaces, while for predicting threat functional, third, retail, and other spaces stood out prediction-wise. The results also revealed the predictive importance of individual characteristics for intergroup relations, while neighborhood characteristics were not so important, both in absolute and relative terms. We conclude by reflecting on what local spaces might matter and discuss the combination of OpenStreetMap and intergroup relations as a proof of concept and prospects for future research.
{"title":"Using OpenStreetMap, Census, and Survey Data to Predict Interethnic Group Relations in Belgium: A Machine Learning Approach","authors":"Daria Dementeva, Cecil Meeusen, Bart Meuleman","doi":"10.1177/08944393241269098","DOIUrl":"https://doi.org/10.1177/08944393241269098","url":null,"abstract":"Neighborhoods are important contexts in shaping interethnic group relationships and sites in which these may materialize through everyday routines in shared local spaces. In this paper, we approach neighborhoods as a small-scale set of spaces of encounter, defined as local public or semi-public spaces, where residents of different ethnic backgrounds may meet. Relying on the classical contact and group threat theories, the main assumption is that local spaces of encounter are facets of an intergroup neighborhood context and may shape intergroup relations, defined as perceived ethnic threat and intergroup friendship. Drawing on the georeferenced survey data from the Belgian National Election Study 2020 enriched with spatial features from OpenStreetMap, an innovative big geospatial data source, and census-based neighborhood characteristics, the study employs machine learning algorithms to investigate whether, which, and how neighborhood spaces of encounter can predict perceived ethnic threat and intergroup friendship, while also taking into account traditional local ethnic, socioeconomic, and individual indicators. By using OpenStreetMap to measure spaces of encounter as a novel neighborhood indicator, we develop a fine-grained typology of local spaces that is rooted in urban and intergroup relations research. The results show that for predicting intergroup friendship, the important spaces were educational, functional, public open, and user-selecting spaces, while for predicting threat functional, third, retail, and other spaces stood out prediction-wise. The results also revealed the predictive importance of individual characteristics for intergroup relations, while neighborhood characteristics were not so important, both in absolute and relative terms. We conclude by reflecting on what local spaces might matter and discuss the combination of OpenStreetMap and intergroup relations as a proof of concept and prospects for future research.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"26 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141908960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1177/08944393241260242
Julia Marti-Ochoa, Eva Martin-Fuentes, Berta Ferrer-Rosell
This study delves into Airbnb’s brand presence on TikTok by analyzing textual content in posts, and human audio in videos. This approach aims to decipher the brand narrative and gauge user engagement. In the dynamic realm of social media marketing, TikTok has emerged as a key platform in shaping brand perception. This research specifically concentrates on Airbnb’s content, distinguishing between official narratives and user-generated content (UGC). Notably, themes of “Travel” dominate official posts, contrasting with “Real Estate” and “Business” in UGC. The methodology employed involves advanced data collection techniques, including web scraping for textual data and artificial intelligence for transcribing human audio to text. The findings reveal that UGC commands greater engagement and volume compared to Airbnb’s own brand content, underscoring the increasing significance of user involvement in brand storytelling. An analysis of the study results is conducted using linguistic natural processing (LNP) for the sentiment base, and the vector space model for emotion analysis. Sentiment analysis reveals a predominance of the emotion “happiness” and a significant presence of “surprise” in the posts, both of which are critical for audience engagement. Moreover, the study indicates a high approval rate for Airbnb-related content, reflecting a positive reception of the brand. Additionally, the research observes that influencers, particularly nano influencers, have higher engagement rates, indicating that their authenticity and relatability appeal especially to Generation Z audiences. This study not only sheds light on the intricate relationship between brand narrative, user engagement, and sentiment on TikTok but also offers valuable insights into effective brand image construction and propagation in the digital era, highlighting the importance of diverse emotions in enhancing audience engagement.
{"title":"Airbnb on TikTok: Brand Perception Through User Engagement and Sentiment Trends","authors":"Julia Marti-Ochoa, Eva Martin-Fuentes, Berta Ferrer-Rosell","doi":"10.1177/08944393241260242","DOIUrl":"https://doi.org/10.1177/08944393241260242","url":null,"abstract":"This study delves into Airbnb’s brand presence on TikTok by analyzing textual content in posts, and human audio in videos. This approach aims to decipher the brand narrative and gauge user engagement. In the dynamic realm of social media marketing, TikTok has emerged as a key platform in shaping brand perception. This research specifically concentrates on Airbnb’s content, distinguishing between official narratives and user-generated content (UGC). Notably, themes of “Travel” dominate official posts, contrasting with “Real Estate” and “Business” in UGC. The methodology employed involves advanced data collection techniques, including web scraping for textual data and artificial intelligence for transcribing human audio to text. The findings reveal that UGC commands greater engagement and volume compared to Airbnb’s own brand content, underscoring the increasing significance of user involvement in brand storytelling. An analysis of the study results is conducted using linguistic natural processing (LNP) for the sentiment base, and the vector space model for emotion analysis. Sentiment analysis reveals a predominance of the emotion “happiness” and a significant presence of “surprise” in the posts, both of which are critical for audience engagement. Moreover, the study indicates a high approval rate for Airbnb-related content, reflecting a positive reception of the brand. Additionally, the research observes that influencers, particularly nano influencers, have higher engagement rates, indicating that their authenticity and relatability appeal especially to Generation Z audiences. This study not only sheds light on the intricate relationship between brand narrative, user engagement, and sentiment on TikTok but also offers valuable insights into effective brand image construction and propagation in the digital era, highlighting the importance of diverse emotions in enhancing audience engagement.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"62 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141908959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1177/08944393241270633
Dariusz Jemielniak, Maciej Wilamowski
Academic publishing gender gap has been surprisingly under covered across all disciplines and over a longer timeframe. Our study fills this gap, by analyzing how the proportions of women authors change in academic publications over 20 years in all fields from 31,219 journals from 2001 to 2021. Our results indicate that the ratio of female to male authors keeps increasing steadily across disciplines. The increases are field-neutral—in other words, they are not bigger, for example, in science, technology, engineering, and mathematics, in spite of multiple initiatives focusing specifically on STEM. The increases are also decelerating in time, which could suggest that the equilibrium of female to male authors may be plateauing. Finally, although the within-field gender gap is decreasing, it actually widened between fields. Thus, our results have major consequences for science policy in the area of the gender gap.
{"title":"Gender Gap in All Academic Fields Over Time","authors":"Dariusz Jemielniak, Maciej Wilamowski","doi":"10.1177/08944393241270633","DOIUrl":"https://doi.org/10.1177/08944393241270633","url":null,"abstract":"Academic publishing gender gap has been surprisingly under covered across all disciplines and over a longer timeframe. Our study fills this gap, by analyzing how the proportions of women authors change in academic publications over 20 years in all fields from 31,219 journals from 2001 to 2021. Our results indicate that the ratio of female to male authors keeps increasing steadily across disciplines. The increases are field-neutral—in other words, they are not bigger, for example, in science, technology, engineering, and mathematics, in spite of multiple initiatives focusing specifically on STEM. The increases are also decelerating in time, which could suggest that the equilibrium of female to male authors may be plateauing. Finally, although the within-field gender gap is decreasing, it actually widened between fields. Thus, our results have major consequences for science policy in the area of the gender gap.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"83 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141908958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1177/08944393241269415
Elia A. G. Arfini, Luigi Curini, Fabiana G. Giannuzzi
Acknowledging the importance of focusing on media’s communication for studying linguistic sexism, we propose a new method to analyze a corpus of texts via a machine learning approach built around an original training-set. We seek to establish a framework of the current use of talking about women in newspapers that expands beyond merely the objective forms of discrimination by also measuring the degree to which it implicitly conveys sexist messages through combination of words, expressions, and lexical aspects of language. As an illustrative example, we then apply such an approach to around 15,000 Italian newspapers’ headlines to investigate the impact of newspapers’ political orientations on the linguistic choices made by journalists in writing articles’ headlines.
{"title":"Sexism and Media Communication. An Application to the Italian Case","authors":"Elia A. G. Arfini, Luigi Curini, Fabiana G. Giannuzzi","doi":"10.1177/08944393241269415","DOIUrl":"https://doi.org/10.1177/08944393241269415","url":null,"abstract":"Acknowledging the importance of focusing on media’s communication for studying linguistic sexism, we propose a new method to analyze a corpus of texts via a machine learning approach built around an original training-set. We seek to establish a framework of the current use of talking about women in newspapers that expands beyond merely the objective forms of discrimination by also measuring the degree to which it implicitly conveys sexist messages through combination of words, expressions, and lexical aspects of language. As an illustrative example, we then apply such an approach to around 15,000 Italian newspapers’ headlines to investigate the impact of newspapers’ political orientations on the linguistic choices made by journalists in writing articles’ headlines.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"2 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141899594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1177/08944393241269417
Maria Iranzo-Cabrera, Maria Jose Castro-Bleda, Iris Simón-Astudillo, Lluís-F. Hurtado
Social media has led to a redefinition of the journalist’s role. Specifically on Twitter, these professionals assume an influential position and their discourse is dominated by personal opinions. Taking into consideration that this platform has proven to be a breeding ground for polarization, digital harassment and hate speech, notably against women politicians, this research aims to analyze journalists’ involvement in this complex scenario. The investigation aims to determine whether, immersed in online and gender defamation campaigns, journalists enhance the quality of public debate or, on the contrary, they reinforce the visibility of this hostile content. To this end, we examined a sample of 63,926 tweets published from 23 to 25 November 2022 related to a campaign of political violence against the Spanish Minister of Equality using Natural Language Processing tools and qualitative content analysis. Results show that during those three days, at least half of the tweets contained hate speech and improper language. In this climate of hostility, journalists participating in the debate not only have an ability to attract likes and retweets but also exhibit polarization and use hate speech. Each ideological position—for and against the Minister—is also reflected in their own uncivil strategies. Under the umbrella of free speech and regardless of argumentative discourses, those journalists who lean towards ideological progressivism tend to insult their opponents, and those on the political right use divisive constructions, stereotyping and irony as attack techniques.
{"title":"Journalists’ Ethical Responsibility: Tackling Hate Speech Against Women Politicians in Social Media Through Natural Language Processing Techniques","authors":"Maria Iranzo-Cabrera, Maria Jose Castro-Bleda, Iris Simón-Astudillo, Lluís-F. Hurtado","doi":"10.1177/08944393241269417","DOIUrl":"https://doi.org/10.1177/08944393241269417","url":null,"abstract":"Social media has led to a redefinition of the journalist’s role. Specifically on Twitter, these professionals assume an influential position and their discourse is dominated by personal opinions. Taking into consideration that this platform has proven to be a breeding ground for polarization, digital harassment and hate speech, notably against women politicians, this research aims to analyze journalists’ involvement in this complex scenario. The investigation aims to determine whether, immersed in online and gender defamation campaigns, journalists enhance the quality of public debate or, on the contrary, they reinforce the visibility of this hostile content. To this end, we examined a sample of 63,926 tweets published from 23 to 25 November 2022 related to a campaign of political violence against the Spanish Minister of Equality using Natural Language Processing tools and qualitative content analysis. Results show that during those three days, at least half of the tweets contained hate speech and improper language. In this climate of hostility, journalists participating in the debate not only have an ability to attract likes and retweets but also exhibit polarization and use hate speech. Each ideological position—for and against the Minister—is also reflected in their own uncivil strategies. Under the umbrella of free speech and regardless of argumentative discourses, those journalists who lean towards ideological progressivism tend to insult their opponents, and those on the political right use divisive constructions, stereotyping and irony as attack techniques.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"55 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1177/08944393241269394
Noel George, Azhar Sham, Thanvi Ajith, Marco Bastos
Successful disinformation campaigns depend on the availability of fake social media profiles used for coordinated inauthentic behavior with networks of false accounts including bots, trolls, and sockpuppets. This study presents a scalable and unsupervised framework to identify visual elements in user profiles strategically exploited in nearly 60 influence operations, including camera angle, photo composition, gender, and race, but also more context-dependent categories like sensuality and emotion. We leverage Google’s Teachable Machine and the DeepFace Library to classify fake user accounts in the Twitter Moderation Research Consortium database, a large repository of social media accounts linked to foreign influence operations. We discuss the performance of these classifiers against manually coded data and their applicability in large-scale data analysis. The proposed framework demonstrates promising results for the identification of fake online profiles used in influence operations and by the cottage industry specialized in crafting desirable online personas.
{"title":"Forty Thousand Fake Twitter Profiles: A Computational Framework for the Visual Analysis of Social Media Propaganda","authors":"Noel George, Azhar Sham, Thanvi Ajith, Marco Bastos","doi":"10.1177/08944393241269394","DOIUrl":"https://doi.org/10.1177/08944393241269394","url":null,"abstract":"Successful disinformation campaigns depend on the availability of fake social media profiles used for coordinated inauthentic behavior with networks of false accounts including bots, trolls, and sockpuppets. This study presents a scalable and unsupervised framework to identify visual elements in user profiles strategically exploited in nearly 60 influence operations, including camera angle, photo composition, gender, and race, but also more context-dependent categories like sensuality and emotion. We leverage Google’s Teachable Machine and the DeepFace Library to classify fake user accounts in the Twitter Moderation Research Consortium database, a large repository of social media accounts linked to foreign influence operations. We discuss the performance of these classifiers against manually coded data and their applicability in large-scale data analysis. The proposed framework demonstrates promising results for the identification of fake online profiles used in influence operations and by the cottage industry specialized in crafting desirable online personas.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"75 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141880310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-31DOI: 10.1177/08944393241269097
Emanuele Brugnoli, Rosaria Simone, Marco Delmastro
The media attention to the personal sphere of famous and important individuals has become a key element of the gender narrative. In this setting, we aim at assessing gender gaps in the mediated personalization of a wide range of political office holders in Italy during the period 2017–2020 by means of a combination of NLP and statistical methods. The proposed analysis hinges on the definition of a new score for each word in the corpus that adjusts the incidence rate for the under representation of women in politics. On this basis, evidence is found that political personalization in Italy is more detrimental for women than it is for men, with the persistence of entrenched stereotypes including a masculine connotation of leadership, the resulting women’s unsuitability to hold political functions, and a greater deal of focus on their attractiveness and body parts. In addition, women politicians are covered with a more negative tone than their men counterpart when personal details are reported. By distinguishing between different types of media, we also show that the observed gender differences are primarily found in online news rather than print news. This suggests that the expression of certain stereotypes may be favored when click baiting and personal targeting have a major impact.
{"title":"Combining Natural Language Processing and Statistical Methods to Assess Gender Gaps in the Mediated Personalization of Politics","authors":"Emanuele Brugnoli, Rosaria Simone, Marco Delmastro","doi":"10.1177/08944393241269097","DOIUrl":"https://doi.org/10.1177/08944393241269097","url":null,"abstract":"The media attention to the personal sphere of famous and important individuals has become a key element of the gender narrative. In this setting, we aim at assessing gender gaps in the mediated personalization of a wide range of political office holders in Italy during the period 2017–2020 by means of a combination of NLP and statistical methods. The proposed analysis hinges on the definition of a new score for each word in the corpus that adjusts the incidence rate for the under representation of women in politics. On this basis, evidence is found that political personalization in Italy is more detrimental for women than it is for men, with the persistence of entrenched stereotypes including a masculine connotation of leadership, the resulting women’s unsuitability to hold political functions, and a greater deal of focus on their attractiveness and body parts. In addition, women politicians are covered with a more negative tone than their men counterpart when personal details are reported. By distinguishing between different types of media, we also show that the observed gender differences are primarily found in online news rather than print news. This suggests that the expression of certain stereotypes may be favored when click baiting and personal targeting have a major impact.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"178 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141877344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1177/08944393231225547
Donghee Shin, Kulsawasd Jitkajornwanich
Algorithmic radicalization is the idea that algorithms used by social media platforms push people down digital “rabbit holes” by framing personal online activity. Algorithms control what people see and when they see it and learn from their past activities. As such, people gradually and subconsciously adopt the ideas presented to them by the rabbit hole down which they have been pushed. In this study, TikTok’s role in fostering radicalized ideology is examined to offer a critical analysis of the state of radicalism and extremism on platforms. This study conducted an algorithm audit of the role of radicalizing information in social media by examining how TikTok’s algorithms are being used to radicalize, polarize, and spread extremism and societal instability. The results revealed that the pathways through which users access far-right content are manifold and that a large portion of the content can be ascribed to platform recommendations through radicalization pipelines. Algorithms are not simple tools that offer personalized services but rather contributors to radicalism, societal violence, and polarization. Such personalization processes have been instrumental in how artificial intelligence (AI) has been deployed, designed, and used to the detrimental outcomes that it has generated. Thus, the generation and adoption of extreme content on TikTok are, by and large, not only a reflection of user inputs and interactions with the platform but also the platform’s ability to slot users into specific categories and reinforce their ideas.
{"title":"How Algorithms Promote Self-Radicalization: Audit of TikTok’s Algorithm Using a Reverse Engineering Method","authors":"Donghee Shin, Kulsawasd Jitkajornwanich","doi":"10.1177/08944393231225547","DOIUrl":"https://doi.org/10.1177/08944393231225547","url":null,"abstract":"Algorithmic radicalization is the idea that algorithms used by social media platforms push people down digital “rabbit holes” by framing personal online activity. Algorithms control what people see and when they see it and learn from their past activities. As such, people gradually and subconsciously adopt the ideas presented to them by the rabbit hole down which they have been pushed. In this study, TikTok’s role in fostering radicalized ideology is examined to offer a critical analysis of the state of radicalism and extremism on platforms. This study conducted an algorithm audit of the role of radicalizing information in social media by examining how TikTok’s algorithms are being used to radicalize, polarize, and spread extremism and societal instability. The results revealed that the pathways through which users access far-right content are manifold and that a large portion of the content can be ascribed to platform recommendations through radicalization pipelines. Algorithms are not simple tools that offer personalized services but rather contributors to radicalism, societal violence, and polarization. Such personalization processes have been instrumental in how artificial intelligence (AI) has been deployed, designed, and used to the detrimental outcomes that it has generated. Thus, the generation and adoption of extreme content on TikTok are, by and large, not only a reflection of user inputs and interactions with the platform but also the platform’s ability to slot users into specific categories and reinforce their ideas.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"26 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1177/08944393241268461
Mao Li, Frederick Conrad
From the start of data collection for the 2020 US Census, official and celebrity users tweeted about the importance of everyone being counted in the Census and urged followers to complete the questionnaire (so-called social media campaign.) At the same time, social media posts expressing skepticism about the Census became increasingly common. This study distinguishes between different prototypical Twitter user groups and investigates their possible impact on (online) self-completion rate for the 2020 Census, according to Census Bureau data. Using a network analysis method, Community Detection, and a clustering algorithm, Latent Dirichlet Allocation (LDA), three prototypical user groups were identified: “Official Government Agency,” “Census Advocate,” and “Census Skeptic.” The prototypical Census Skeptic user was motivated by events about which an influential person had tweeted (e.g., “Republicans in Congress signal Census cannot take extra time to count”). This group became the largest one over the study period. The prototypical Census Advocate was motivated more by official tweets and was more active than the prototypical Census Skeptic. The Official Government Agency user group was the smallest of the three, but their messages—primarily promoting completion of the Census—seemed to have been amplified by Census Advocate, especially celebrities and politicians. We found that the daily size of the Census Advocate user group—but not the other two—predicted the 2020 Census online self-completion rate within five days after a tweet was posted. This finding suggests that the Census social media campaign was successful in promoting completion, apparently due to the help of Census Advocate users who encouraged people to fill out the Census and amplified official tweets. This finding demonstrates that a social media campaign can positively affect public behavior regarding an essential national project like the Decennial Census.
{"title":"Tracking Census Online Self-Completion Using Twitter Posts","authors":"Mao Li, Frederick Conrad","doi":"10.1177/08944393241268461","DOIUrl":"https://doi.org/10.1177/08944393241268461","url":null,"abstract":"From the start of data collection for the 2020 US Census, official and celebrity users tweeted about the importance of everyone being counted in the Census and urged followers to complete the questionnaire (so-called social media campaign.) At the same time, social media posts expressing skepticism about the Census became increasingly common. This study distinguishes between different prototypical Twitter user groups and investigates their possible impact on (online) self-completion rate for the 2020 Census, according to Census Bureau data. Using a network analysis method, Community Detection, and a clustering algorithm, Latent Dirichlet Allocation (LDA), three prototypical user groups were identified: “Official Government Agency,” “Census Advocate,” and “Census Skeptic.” The prototypical Census Skeptic user was motivated by events about which an influential person had tweeted (e.g., “Republicans in Congress signal Census cannot take extra time to count”). This group became the largest one over the study period. The prototypical Census Advocate was motivated more by official tweets and was more active than the prototypical Census Skeptic. The Official Government Agency user group was the smallest of the three, but their messages—primarily promoting completion of the Census—seemed to have been amplified by Census Advocate, especially celebrities and politicians. We found that the daily size of the Census Advocate user group—but not the other two—predicted the 2020 Census online self-completion rate within five days after a tweet was posted. This finding suggests that the Census social media campaign was successful in promoting completion, apparently due to the help of Census Advocate users who encouraged people to fill out the Census and amplified official tweets. This finding demonstrates that a social media campaign can positively affect public behavior regarding an essential national project like the Decennial Census.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"81 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-24DOI: 10.1177/08944393241266220
Fernanda Barzallo, Maria Baldeon-Calisto, Margorie Pérez, Maria Emilia Moscoso, Danny Navarrete, Daniel Riofrío, Pablo Medina-Peréz, Susana K Lai-Yuen, Diego Benítez, Noel Peréz, Ricardo Flores Moyano, Mateo Fierro
Content analysis of political manifestos is necessary to understand the policies and proposed actions of a party. However, manually labeling political texts is time-consuming and labor-intensive. Transformer networks have become essential tools for automating this task. Nevertheless, these models require extensive datasets to achieve good performance. This can be a limitation in manifesto classification, where the availability of publicly labeled datasets can be scarce. To address this challenge, in this work, we developed a Transformer network for the classification of manifestos using a cross-domain training strategy. Using the database of the Comparative Manifesto Project, we implemented a fractional factorial experimental design to determine which Spanish-written manifestos form the best training set for Ecuadorian manifesto labeling. Furthermore, we statistically analyzed which Transformer architecture and preprocessing operations improve the model accuracy. The results indicate that creating a training set with manifestos from Spain and Uruguay, along with implementing stemming and lemmatization preprocessing operations, produces the highest classification accuracy. In addition, we found that the DistilBERT and RoBERTa transformer networks perform statistically similarly and consistently well in manifesto classification. Using the cross-context training strategy, DistilBERT and RoBERTa achieve 60.05% and 57.64% accuracy, respectively, in the classification of the Ecuadorian manifesto. Finally, we investigated the effect of the composition of the training set on performance. The experiments demonstrate that training DistilBERT solely with Ecuadorian manifestos achieves the highest accuracy and F1-score. Furthermore, in the absence of the Ecuadorian dataset, competitive performance is achieved by training the model with datasets from Spain and Uruguay.
要了解一个政党的政策和拟议行动,就必须对政治宣言进行内容分析。然而,手动标注政治文本既耗时又耗力。变压器网络已成为实现这一任务自动化的重要工具。然而,这些模型需要大量的数据集才能实现良好的性能。这在宣言分类中可能是一个限制,因为公开标注的数据集可能很少。为了应对这一挑战,在这项工作中,我们采用跨领域训练策略,开发了一种用于宣言分类的 Transformer 网络。利用比较宣言项目的数据库,我们实施了一个分数因子实验设计,以确定哪些西班牙文撰写的宣言是厄瓜多尔宣言标注的最佳训练集。此外,我们还统计分析了哪些 Transformer 架构和预处理操作可以提高模型的准确性。结果表明,创建一个包含西班牙和乌拉圭宣言的训练集,并实施词干化和词素化预处理操作,能产生最高的分类准确率。此外,我们还发现 DistilBERT 和 RoBERTa 变换器网络在宣言分类方面的表现在统计上相似且一致良好。使用跨语境训练策略,DistilBERT 和 RoBERTa 在厄瓜多尔宣言的分类中分别达到了 60.05% 和 57.64% 的准确率。最后,我们研究了训练集的组成对性能的影响。实验表明,仅使用厄瓜多尔宣言对 DistilBERT 进行训练可获得最高的准确率和 F1 分数。此外,在没有厄瓜多尔数据集的情况下,使用西班牙和乌拉圭的数据集对该模型进行训练,也能获得具有竞争力的性能。
{"title":"A Transformer Model for Manifesto Classification Using Cross-Context Training: An Ecuadorian Case Study","authors":"Fernanda Barzallo, Maria Baldeon-Calisto, Margorie Pérez, Maria Emilia Moscoso, Danny Navarrete, Daniel Riofrío, Pablo Medina-Peréz, Susana K Lai-Yuen, Diego Benítez, Noel Peréz, Ricardo Flores Moyano, Mateo Fierro","doi":"10.1177/08944393241266220","DOIUrl":"https://doi.org/10.1177/08944393241266220","url":null,"abstract":"Content analysis of political manifestos is necessary to understand the policies and proposed actions of a party. However, manually labeling political texts is time-consuming and labor-intensive. Transformer networks have become essential tools for automating this task. Nevertheless, these models require extensive datasets to achieve good performance. This can be a limitation in manifesto classification, where the availability of publicly labeled datasets can be scarce. To address this challenge, in this work, we developed a Transformer network for the classification of manifestos using a cross-domain training strategy. Using the database of the Comparative Manifesto Project, we implemented a fractional factorial experimental design to determine which Spanish-written manifestos form the best training set for Ecuadorian manifesto labeling. Furthermore, we statistically analyzed which Transformer architecture and preprocessing operations improve the model accuracy. The results indicate that creating a training set with manifestos from Spain and Uruguay, along with implementing stemming and lemmatization preprocessing operations, produces the highest classification accuracy. In addition, we found that the DistilBERT and RoBERTa transformer networks perform statistically similarly and consistently well in manifesto classification. Using the cross-context training strategy, DistilBERT and RoBERTa achieve 60.05% and 57.64% accuracy, respectively, in the classification of the Ecuadorian manifesto. Finally, we investigated the effect of the composition of the training set on performance. The experiments demonstrate that training DistilBERT solely with Ecuadorian manifestos achieves the highest accuracy and F1-score. Furthermore, in the absence of the Ecuadorian dataset, competitive performance is achieved by training the model with datasets from Spain and Uruguay.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"53 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141755367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}