How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format

P. Sebo
{"title":"How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format","authors":"P. Sebo","doi":"10.5195/jmla.2022.1289","DOIUrl":null,"url":null,"abstract":"Objective: We recently showed that the gender detection tools NamSor, Gender API, and Wiki-Gendersort accurately predicted the gender of individuals with Western given names. Here, we aimed to evaluate the performance of these tools with Chinese given names in Pinyin format. Methods: We constructed two datasets for the purpose of the study. File #1 was created by randomly drawing 20,000 names from a gender-labeled database of 52,414 Chinese given names in Pinyin format. File #2, which contained 9,077 names, was created by removing from File #1 all unisex names that we were able to identify (i.e., those that were listed in the database as both male and female names). We recorded for both files the number of correct classifications (correct gender assigned to a name), misclassifications (wrong gender assigned to a name), and nonclassifications (no gender assigned). We then calculated the proportion of misclassifications and nonclassifications (errorCoded). Results: For File #1, errorCoded was 53% for NamSor, 65% for Gender API, and 90% for Wiki-Gendersort. For File #2, errorCoded was 43% for NamSor, 66% for Gender API, and 94% for Wiki-Gendersort. Conclusion: We found that all three gender detection tools inaccurately predicted the gender of individuals with Chinese given names in Pinyin format and therefore should not be used in this population.","PeriodicalId":227502,"journal":{"name":"Journal of the Medical Library Association : JMLA","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Medical Library Association : JMLA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5195/jmla.2022.1289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Objective: We recently showed that the gender detection tools NamSor, Gender API, and Wiki-Gendersort accurately predicted the gender of individuals with Western given names. Here, we aimed to evaluate the performance of these tools with Chinese given names in Pinyin format. Methods: We constructed two datasets for the purpose of the study. File #1 was created by randomly drawing 20,000 names from a gender-labeled database of 52,414 Chinese given names in Pinyin format. File #2, which contained 9,077 names, was created by removing from File #1 all unisex names that we were able to identify (i.e., those that were listed in the database as both male and female names). We recorded for both files the number of correct classifications (correct gender assigned to a name), misclassifications (wrong gender assigned to a name), and nonclassifications (no gender assigned). We then calculated the proportion of misclassifications and nonclassifications (errorCoded). Results: For File #1, errorCoded was 53% for NamSor, 65% for Gender API, and 90% for Wiki-Gendersort. For File #2, errorCoded was 43% for NamSor, 66% for Gender API, and 94% for Wiki-Gendersort. Conclusion: We found that all three gender detection tools inaccurately predicted the gender of individuals with Chinese given names in Pinyin format and therefore should not be used in this population.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
性别检测工具预测中文姓名性别的准确度如何?2万个人名的拼音研究
目的:我们最近发现性别检测工具NamSor、gender API和Wiki-Gendersort能够准确预测西方名字个体的性别。在这里,我们的目的是评估这些工具的性能与汉语拼音格式的名字。方法:为研究目的,我们构建了两个数据集。文件#1是从52,414个汉语拼音名字的性别标记数据库中随机抽取2万个名字创建的。文件#2包含9,077个名字,通过从文件#1中删除我们能够识别的所有男女通用的名字(即,在数据库中列出的男性和女性名字)来创建。我们记录了这两个文件的正确分类(正确的性别分配给一个名字)、错误分类(错误的性别分配给一个名字)和非分类(没有分配性别)的数量。然后我们计算了错误分类和非分类的比例(errorCoded)。结果:对于文件#1,NamSor的errorCoded为53%,Gender API为65%,Wiki-Gendersort为90%。对于文件#2,NamSor的errorCoded为43%,Gender API为66%,Wiki-Gendersort为94%。结论:我们发现这三种性别检测工具都不能准确地预测具有汉语拼音形式姓名的个体的性别,因此不应在该人群中使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Virtual Services in the Health Sciences Library: A Handbook Finding Your Seat at the Table: Roles for Librarians on In-stitutional Regulatory Boards and Com-mittees Assessing Academic Library Perfor-mance: A Handbook Dark Archives: A Librarian's Investigation into the Sci-ence and History of Books Bound in Hu-man Skin The National Rehabilitation Information Center (NARIC)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1