这一切都在名字里:一种基于人物的方法来推断宗教

IF 5.4 2区社会学 Q1 POLITICAL SCIENCE Political Analysis Pub Date : 2023-03-23 DOI:10.1017/pan.2023.6

Rochana Chaturvedi, Sugat Chaturvedi

{"title":"这一切都在名字里:一种基于人物的方法来推断宗教","authors":"Rochana Chaturvedi, Sugat Chaturvedi","doi":"10.1017/pan.2023.6","DOIUrl":null,"url":null,"abstract":"Abstract Large-scale microdata on group identity are critical for studies on identity politics and violence but remain largely unavailable for developing countries. We use personal names to infer religion in South Asia—where religion is a salient social division, and yet, disaggregated data on it are scarce. Existing work predicts religion using a dictionary-based method and, therefore, cannot classify unseen names. We provide character-based machine-learning models that can classify unseen names too with high accuracy. Our models are also much faster and, hence, scalable to large datasets. We explain the classification decisions of one of our models using the layer-wise relevance propagation technique. The character patterns learned by the classifier are rooted in the linguistic origins of names. We apply these to infer the religion of electoral candidates using historical data on Indian elections and observe a trend of declining Muslim representation. Our approach can be used to detect identity groups across the world for whom the underlying names might have different linguistic roots.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":"49 1","pages":"0"},"PeriodicalIF":5.4000,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"It’s All in the Name: A Character-Based Approach to Infer Religion\",\"authors\":\"Rochana Chaturvedi, Sugat Chaturvedi\",\"doi\":\"10.1017/pan.2023.6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Large-scale microdata on group identity are critical for studies on identity politics and violence but remain largely unavailable for developing countries. We use personal names to infer religion in South Asia—where religion is a salient social division, and yet, disaggregated data on it are scarce. Existing work predicts religion using a dictionary-based method and, therefore, cannot classify unseen names. We provide character-based machine-learning models that can classify unseen names too with high accuracy. Our models are also much faster and, hence, scalable to large datasets. We explain the classification decisions of one of our models using the layer-wise relevance propagation technique. The character patterns learned by the classifier are rooted in the linguistic origins of names. We apply these to infer the religion of electoral candidates using historical data on Indian elections and observe a trend of declining Muslim representation. Our approach can be used to detect identity groups across the world for whom the underlying names might have different linguistic roots.\",\"PeriodicalId\":48270,\"journal\":{\"name\":\"Political Analysis\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2023-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Political Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/pan.2023.6\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"POLITICAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Political Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/pan.2023.6","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"POLITICAL SCIENCE","Score":null,"Total":0}

引用次数: 1

摘要

群体认同的大规模微观数据对于身份政治和暴力的研究至关重要，但在发展中国家仍然难以获得。在南亚，我们用人名来推断宗教信仰——在那里，宗教是一个显著的社会分支，然而，关于它的分类数据却很少。现有的研究使用基于字典的方法来预测宗教，因此无法对未见过的名字进行分类。我们提供了基于字符的机器学习模型，可以对未见过的名字进行高精度分类。我们的模型也更快，因此可以扩展到大型数据集。我们使用分层相关传播技术解释其中一个模型的分类决策。分类器学习的字符模式根植于名字的语言来源。我们利用印度选举的历史数据来推断选举候选人的宗教信仰，并观察到穆斯林代表人数下降的趋势。我们的方法可以用来检测世界各地的身份群体，他们的潜在名字可能有不同的语言根源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

It’s All in the Name: A Character-Based Approach to Infer Religion

Abstract Large-scale microdata on group identity are critical for studies on identity politics and violence but remain largely unavailable for developing countries. We use personal names to infer religion in South Asia—where religion is a salient social division, and yet, disaggregated data on it are scarce. Existing work predicts religion using a dictionary-based method and, therefore, cannot classify unseen names. We provide character-based machine-learning models that can classify unseen names too with high accuracy. Our models are also much faster and, hence, scalable to large datasets. We explain the classification decisions of one of our models using the layer-wise relevance propagation technique. The character patterns learned by the classifier are rooted in the linguistic origins of names. We apply these to infer the religion of electoral candidates using historical data on Indian elections and observe a trend of declining Muslim representation. Our approach can be used to detect identity groups across the world for whom the underlying names might have different linguistic roots.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Political Analysis POLITICAL SCIENCE-

CiteScore

8.80

自引率

3.70%

发文量

期刊介绍： Political Analysis chronicles these exciting developments by publishing the most sophisticated scholarship in the field. It is the place to learn new methods, to find some of the best empirical scholarship, and to publish your best research.