Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?

IF 2.7 2区 社会学 Q1 SOCIOLOGY Sociological Science Pub Date : 2023-03-03 DOI:10.15195/v10.a3
Gaël Le Mens, Balázs Kovács, Michael T. Hannan, Guillem Pros
{"title":"Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?","authors":"Gaël Le Mens, Balázs Kovács, Michael T. Hannan, Guillem Pros","doi":"10.15195/v10.a3","DOIUrl":null,"url":null,"abstract":"Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.","PeriodicalId":22029,"journal":{"name":"Sociological Science","volume":"41 4","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sociological Science","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.15195/v10.a3","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIOLOGY","Score":null,"Total":0}
引用次数: 5

Abstract

Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用机器学习揭示概念的语义:从BERT文本分类器中提取的典型性度量与人类对类型典型性的判断相匹配的程度如何?
长期以来,社会科学家一直对了解概念中对象的典型性在多大程度上与社会行为者对其的评价有关感兴趣。事实证明,回答这个问题具有挑战性,因为精确的测量需要对物体进行基于特征的描述。然而,这样的描述经常是不可用的。本文介绍了一种基于文本数据的典型度量方法。我们的方法包括训练一个基于BERT语言表示的深度学习文本分类器,并根据训练的分类器产生的分类概率来定义概念中对象的典型性。模型训练允许构建适应分类任务的特征空间,以及特征组合和典型性之间的映射,从而赋予对分类更重要的特征维度更多权重。我们通过比较基于bert的文学类型书籍描述的典型性度量与平均人类典型性评级来验证该方法。所得相关性大于0.85。与先前研究中使用的其他典型性测量方法的比较表明,我们基于bert的测量方法更好地反映了人类的典型性判断。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Sociological Science
Sociological Science Social Sciences-Social Sciences (all)
CiteScore
4.90
自引率
2.90%
发文量
13
审稿时长
6 weeks
期刊介绍: Sociological Science is an open-access, online, peer-reviewed, international journal for social scientists committed to advancing a general understanding of social processes. Sociological Science welcomes original research and commentary from all subfields of sociology, and does not privilege any particular theoretical or methodological approach.
期刊最新文献
New OMB’s Race and Ethnicity Standards Will Affect How Americans Self-Identify The Diffusion and Reach of (Mis)Information on Facebook During the U.S. 2020 Election The Multiracial Complication: The 2020 Census and the Fictitious Multiracial Boom Opportunities for Faculty Tenure at Globally Ranked Universities: Cross-National Differences by Gender, Fields, and Tenure Status Some Birds Have Mixed Feathers: Bringing the Multiracial Population into the Study of Race Homophily
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1