Shubo Tian, Qingyu Chen, Donald C Comeau, W John Wilbur, Zhiyong Lu
{"title":"2024 年的 PubMed 计算作者:生物医学文献中已消歧作者姓名的开放资源。","authors":"Shubo Tian, Qingyu Chen, Donald C Comeau, W John Wilbur, Zhiyong Lu","doi":"10.1093/bioinformatics/btae672","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Over 55% of author names in PubMed are ambiguous: the same name is shared by different individual researchers. This poses significant challenges on precise literature retrieval for author name queries, a common behavior in biomedical literature search. In response, we present a comprehensive dataset of disambiguated authors. Specifically, we complement the automatic PubMed Computed Authors algorithm with the latest ORCID data for improved accuracy. As a result, the enhanced algorithm achieves high performance in author name disambiguation, and subsequently our dataset contains more than 21 million disambiguated authors for over 35 million PubMed articles and is incrementally updated on a weekly basis. More importantly, we make the dataset publicly available for the community such that it can be utilized in a wide variety of potential applications beyond assisting PubMed's author name queries. Finally, we propose a set of guidelines for best practices of authors pertaining to use of their names.</p><p><strong>Availability and implementation: </strong>The PubMed Computed Authors dataset is publicly available for bulk download at: https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/. Additionally, it is available for query through web API at: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11588201/pdf/","citationCount":"0","resultStr":"{\"title\":\"PubMed Computed Authors in 2024: an open resource of disambiguated author names in biomedical literature.\",\"authors\":\"Shubo Tian, Qingyu Chen, Donald C Comeau, W John Wilbur, Zhiyong Lu\",\"doi\":\"10.1093/bioinformatics/btae672\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Summary: </strong>Over 55% of author names in PubMed are ambiguous: the same name is shared by different individual researchers. This poses significant challenges on precise literature retrieval for author name queries, a common behavior in biomedical literature search. In response, we present a comprehensive dataset of disambiguated authors. Specifically, we complement the automatic PubMed Computed Authors algorithm with the latest ORCID data for improved accuracy. As a result, the enhanced algorithm achieves high performance in author name disambiguation, and subsequently our dataset contains more than 21 million disambiguated authors for over 35 million PubMed articles and is incrementally updated on a weekly basis. More importantly, we make the dataset publicly available for the community such that it can be utilized in a wide variety of potential applications beyond assisting PubMed's author name queries. Finally, we propose a set of guidelines for best practices of authors pertaining to use of their names.</p><p><strong>Availability and implementation: </strong>The PubMed Computed Authors dataset is publicly available for bulk download at: https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/. Additionally, it is available for query through web API at: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11588201/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btae672\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
PubMed Computed Authors in 2024: an open resource of disambiguated author names in biomedical literature.
Summary: Over 55% of author names in PubMed are ambiguous: the same name is shared by different individual researchers. This poses significant challenges on precise literature retrieval for author name queries, a common behavior in biomedical literature search. In response, we present a comprehensive dataset of disambiguated authors. Specifically, we complement the automatic PubMed Computed Authors algorithm with the latest ORCID data for improved accuracy. As a result, the enhanced algorithm achieves high performance in author name disambiguation, and subsequently our dataset contains more than 21 million disambiguated authors for over 35 million PubMed articles and is incrementally updated on a weekly basis. More importantly, we make the dataset publicly available for the community such that it can be utilized in a wide variety of potential applications beyond assisting PubMed's author name queries. Finally, we propose a set of guidelines for best practices of authors pertaining to use of their names.
Availability and implementation: The PubMed Computed Authors dataset is publicly available for bulk download at: https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/. Additionally, it is available for query through web API at: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/.