Expert-guided evaluation of medical research may promote publishing low-quality studies and increase research waste: A comparative analysis of Journal Impact Factor and Polish expert-based journal ranking list

IF 3.5 2区医学 Q1 MEDICINE, GENERAL & INTERNAL Journal of Evidence‐Based Medicine Pub Date : 2024-06-21 DOI:10.1111/jebm.12615

Albert Stachura, Łukasz Banaszek, Paweł K. Włodarski

{"title":"Expert-guided evaluation of medical research may promote publishing low-quality studies and increase research waste: A comparative analysis of Journal Impact Factor and Polish expert-based journal ranking list","authors":"Albert Stachura, Łukasz Banaszek, Paweł K. Włodarski","doi":"10.1111/jebm.12615","DOIUrl":null,"url":null,"abstract":"An ever-growing amount of medical literature has created a need for evaluating scientific merit. The Journal Impact Factor (JIF) is a metric relying on citation count and may indicate the prestige of a scientific journal.1 A study quality is often assessed based on a publication venue—hence JIF may be indirectly used to evaluate research. Such an approach is not flawless. Using JIF as a surrogate of a journal's quality has been widely criticized.2 In the Leiden Manifesto Hicks et al. advocated for putting more emphasis on qualitative assessment, transparency, robust locally relevant research, and accounting for variation by field of study.3Despite these criticisms, JIF is still used to assess scientific output.4 In the United States and Canada, journal ranking is based on indicators such as JIF, CiteScore, SCImago Journal Rank, or Hirsh index. In some countries, journal rankings have been created to assess the research performance of scientists and institutions. Two main approaches are prevalent: (1) based solely on metrics or (2) determined by experts who may (or may not) take such metrics into consideration.5 The first model is used, for example, in Turkey (the TÜBİTAK Incentive Program for International Scientific Publications) or China (Chinese Academy of Sciences Journal Ranking List), the second one, for example, in Finland (the Publication Forum Journal list), Norway (the Norwegian Register for Scientific Journals, Series and Publishers), Italy (the Ratings of scientific and class A journals), Denmark (the BFI List of Series), and Poland (Polish Journal Ranking).6 Though both models rely to some degree on JIF, the latter is more subjective and likely to be shaped by the national science policy objectives. This significantly increases the risk of politicization, which might lead to adjusting the assigned journal rank to own professional goals of experts involved in producing rankings, potentially creating a conflict of interests.5Funding, grants, and scholarships are awarded to scientists publishing in top journals from the national ranking lists. In Poland, the evaluation system is based on points awarded by the Ministry of Education and Science (MEiN—pol. Ministerstwo Edukacji i Nauki). The latest edition was released on January 5, 2024, more than 1 year after the 2022 Journal Citation Report had been announced (June 2022).7 Since JIF is an imperfect surrogate of journal quality, supplementing assessment systems with expert opinion may potentially help promote good research. The objective of this study is to compare the MEiN ranking system with JIF and discuss the consequences of potential discrepancies between the two models.A total of 5326 journals appeared both in JCR Clinical Medicine category and on the MEiN ranking list (Medical sciences category). Additionally, 582 (10%) were considered in JCR but not included within the MEiN Medical Sciences category (Tables S1 and S2). Some of the omitted titles had a JIF of over 17. Across ranks, minimal JIFs were low (0–2.2) and variations of JIF values were considerable. In extreme cases, journals with Impact Factor of over 30 were assigned 20 MEiN points, while those with JIF of 2.2 were included in the 200 MEiN points group. The number of journals included within each subsequent rank was not decreasing, as one would expect, but was irregular. More journals were assigned 70 or 100 points as compared to only 40. Ranks 140 and 200 were the most elite comprising a total of 690 journals. Additionally, 2219 journals were assigned MEiN rank but were not listed in JCR Clinical Medicine category. Of them, 1092 (49.2%) were assigned 20 points, 353 (15.9%)—40 points, 355 (16%)—70 points, 243 (10.9%)—100 points, 110 (4.9%)—140 points, and 66 (3%)—200 points. JIF of all journals with Impact Factor lower than 40 was plotted against MEiN ranking and presented in Figure 1.Within each JCR Clinical Medicine category, we ranked journals from 1 to n (number of journals within a category) based on their JIF. Therefore, a journal with the highest JIF was assigned a rank of 1, second best a rank of 2, and so on. Later we correlated said ranking within each category with MEiN scores (Tables S3 and S4). The results varied considerably with the lowest correlation coefficient noted for Medical Informatics (r = −0.18) and the highest for Neuroimaging (r = −0.93). In more than half of categories (41/59), the correlation coefficient was weaker than −0.7. It suggests experts scoring was informed by more than just JIF-based prestige of a given journal within a field. What guided their decisions remains unknown. For 14 (24%) specialties from the Clinical Medicine category, no journal was assigned 200 MEiN points. Strikingly, some of the best journals included in these groups, not assigned a rank of 200, had JIF of between 3.4 and 30.8.It seems clear that some of the most prestigious titles in certain fields were undervalued by the experts assigning MEiN ranks, as explained above. What about journals considered worthy of 200 MEiN points? Here are some familiar titles: The New England Journal of Medicine, The Lancet, JAMA, The BMJ, and Annals of Internal Medicine. However, some lesser-known titles were also included in this group: Bioethics (JIF 2.2) or Application of Clinical Genetics (JIF 3.1). Some journals not listed in the JCR Clinical Medicine group were also assigned 200 points: Journal of Quantitative Criminology or Journal of Anthropological Archaeology. Articles published in any of the above-mentioned journals are therefore assessed to be of equal scientific merit.Impact Factor was first introduced by Eugene Garfield in 1975 and was meant to become an indicator of usage of scholarly literature, as well as help identify potential venues for publication, especially for interdisciplinary research.8 It has been deemed a valid measure of journal quality among researchers and practicing physicians9 but also received criticism as a bibliometric indicator.8 But for all these criticisms, JIF remains a widely used bibliometric indicator and introduction of any new metric should mitigate rather than replicate its flaws.Any ranking system should ideally aim to promote researchers doing “good” research. In his famous editorial, Doug Altman argued that “we need less research, better research, and research done for the right reasons” and “(…) much poor research arises because researchers feel compelled for career reasons to carry out research that they are ill equipped to perform, and nobody stops them.”10 The Cochrane Methodology Review Group created a list of outcome measures that would facilitate identifying a good-quality biomedical study. It ideally should be important, useful, relevant, methodologically sound, ethical, complete, and accurate.11 Higher JIF is associated with better adherence to reporting guidelines in health care literature12—a surrogate outcome for methodological soundness.11The process by which the current MEiN ranking list was created lacks transparency. No clear criteria for assigning ranks were published. Much like JIF, MEiN ranking helps identify venues for publication but promotes cleverness rather than skill and hard work. The key question comes to mind: is it policymakers’ job to artificially inflate the value of selected journals and even put them on a par with top medical titles? In our view, more emphasis should be put on improving the quality of peer-review, editorial process, and strict adherence to reporting guidelines to promote transparent, reproducible and locally relevant research. Instead of encouraging scientists to publish poor studies in “high-rank” journals, they should be equipped with skills Altman argued few possessed. Courses in basic medical statistics and statistical inference, research design and critical appraisal should be promoted. The joined effort of policymakers, clinicians, and researchers would likely result in better research addressing significant clinical problems to find better solutions for patients.Our analysis had some limitations. We only extracted data on journals included in the Medical Sciences category by MEiN. Therefore, some of the titles from JCR Clinical Medicine list that were not present within this MEiN category could be listed elsewhere. We assumed clinical medicine was a narrower term than medical sciences and that the latter should contain all the journals from the former group. As shown above, some journals unrelated to health care sciences were also included by MEiN in the Medical Sciences group. How titles were assigned to categories remains unclear. Another limitation was that authors of this study were Polish scientists who underwent evaluation based on the said ranking and therefore were biased. We aimed to analyze bibliometric data in an objective way and focused on the most extreme outliers, regardless of our own study fields.In conclusion, in Poland, medical research assessment is based on a nontransparent and unbalanced system created by experts. It may encourage publishing low-quality research in journals assigned high expert rank but with low JIF. A comprehensive review of the journal ranking list is needed to promote researchers adequately addressing relevant clinical problems. It is our hope this analysis will spark discussion in other countries currently using similar expert-based assessment systems.The authors declare no conflict of interest.The authors have not declared a specific grant for this research from any funding agency in the public, commercial, or not-for-profit sectors.","PeriodicalId":16090,"journal":{"name":"Journal of Evidence‐Based Medicine","volume":"17 2","pages":"256-258"},"PeriodicalIF":3.5000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jebm.12615","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Evidence‐Based Medicine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jebm.12615","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

An ever-growing amount of medical literature has created a need for evaluating scientific merit. The Journal Impact Factor (JIF) is a metric relying on citation count and may indicate the prestige of a scientific journal.¹ A study quality is often assessed based on a publication venue—hence JIF may be indirectly used to evaluate research. Such an approach is not flawless. Using JIF as a surrogate of a journal's quality has been widely criticized.² In the Leiden Manifesto Hicks et al. advocated for putting more emphasis on qualitative assessment, transparency, robust locally relevant research, and accounting for variation by field of study.³

Despite these criticisms, JIF is still used to assess scientific output.⁴ In the United States and Canada, journal ranking is based on indicators such as JIF, CiteScore, SCImago Journal Rank, or Hirsh index. In some countries, journal rankings have been created to assess the research performance of scientists and institutions. Two main approaches are prevalent: (1) based solely on metrics or (2) determined by experts who may (or may not) take such metrics into consideration.⁵ The first model is used, for example, in Turkey (the TÜBİTAK Incentive Program for International Scientific Publications) or China (Chinese Academy of Sciences Journal Ranking List), the second one, for example, in Finland (the Publication Forum Journal list), Norway (the Norwegian Register for Scientific Journals, Series and Publishers), Italy (the Ratings of scientific and class A journals), Denmark (the BFI List of Series), and Poland (Polish Journal Ranking).⁶ Though both models rely to some degree on JIF, the latter is more subjective and likely to be shaped by the national science policy objectives. This significantly increases the risk of politicization, which might lead to adjusting the assigned journal rank to own professional goals of experts involved in producing rankings, potentially creating a conflict of interests.⁵

Funding, grants, and scholarships are awarded to scientists publishing in top journals from the national ranking lists. In Poland, the evaluation system is based on points awarded by the Ministry of Education and Science (MEiN—pol. Ministerstwo Edukacji i Nauki). The latest edition was released on January 5, 2024, more than 1 year after the 2022 Journal Citation Report had been announced (June 2022).⁷ Since JIF is an imperfect surrogate of journal quality, supplementing assessment systems with expert opinion may potentially help promote good research. The objective of this study is to compare the MEiN ranking system with JIF and discuss the consequences of potential discrepancies between the two models.

A total of 5326 journals appeared both in JCR Clinical Medicine category and on the MEiN ranking list (Medical sciences category). Additionally, 582 (10%) were considered in JCR but not included within the MEiN Medical Sciences category (Tables S1 and S2). Some of the omitted titles had a JIF of over 17. Across ranks, minimal JIFs were low (0–2.2) and variations of JIF values were considerable. In extreme cases, journals with Impact Factor of over 30 were assigned 20 MEiN points, while those with JIF of 2.2 were included in the 200 MEiN points group. The number of journals included within each subsequent rank was not decreasing, as one would expect, but was irregular. More journals were assigned 70 or 100 points as compared to only 40. Ranks 140 and 200 were the most elite comprising a total of 690 journals. Additionally, 2219 journals were assigned MEiN rank but were not listed in JCR Clinical Medicine category. Of them, 1092 (49.2%) were assigned 20 points, 353 (15.9%)—40 points, 355 (16%)—70 points, 243 (10.9%)—100 points, 110 (4.9%)—140 points, and 66 (3%)—200 points. JIF of all journals with Impact Factor lower than 40 was plotted against MEiN ranking and presented in Figure 1.

Within each JCR Clinical Medicine category, we ranked journals from 1 to n (number of journals within a category) based on their JIF. Therefore, a journal with the highest JIF was assigned a rank of 1, second best a rank of 2, and so on. Later we correlated said ranking within each category with MEiN scores (Tables S3 and S4). The results varied considerably with the lowest correlation coefficient noted for Medical Informatics (r = −0.18) and the highest for Neuroimaging (r = −0.93). In more than half of categories (41/59), the correlation coefficient was weaker than −0.7. It suggests experts scoring was informed by more than just JIF-based prestige of a given journal within a field. What guided their decisions remains unknown. For 14 (24%) specialties from the Clinical Medicine category, no journal was assigned 200 MEiN points. Strikingly, some of the best journals included in these groups, not assigned a rank of 200, had JIF of between 3.4 and 30.8.

It seems clear that some of the most prestigious titles in certain fields were undervalued by the experts assigning MEiN ranks, as explained above. What about journals considered worthy of 200 MEiN points? Here are some familiar titles: The New England Journal of Medicine, The Lancet, JAMA, The BMJ, and Annals of Internal Medicine. However, some lesser-known titles were also included in this group: Bioethics (JIF 2.2) or Application of Clinical Genetics (JIF 3.1). Some journals not listed in the JCR Clinical Medicine group were also assigned 200 points: Journal of Quantitative Criminology or Journal of Anthropological Archaeology. Articles published in any of the above-mentioned journals are therefore assessed to be of equal scientific merit.

Impact Factor was first introduced by Eugene Garfield in 1975 and was meant to become an indicator of usage of scholarly literature, as well as help identify potential venues for publication, especially for interdisciplinary research.⁸ It has been deemed a valid measure of journal quality among researchers and practicing physicians⁹ but also received criticism as a bibliometric indicator.⁸ But for all these criticisms, JIF remains a widely used bibliometric indicator and introduction of any new metric should mitigate rather than replicate its flaws.

Any ranking system should ideally aim to promote researchers doing “good” research. In his famous editorial, Doug Altman argued that “we need less research, better research, and research done for the right reasons” and “(…) much poor research arises because researchers feel compelled for career reasons to carry out research that they are ill equipped to perform, and nobody stops them.”¹⁰ The Cochrane Methodology Review Group created a list of outcome measures that would facilitate identifying a good-quality biomedical study. It ideally should be important, useful, relevant, methodologically sound, ethical, complete, and accurate.¹¹ Higher JIF is associated with better adherence to reporting guidelines in health care literature¹²—a surrogate outcome for methodological soundness.¹¹

The process by which the current MEiN ranking list was created lacks transparency. No clear criteria for assigning ranks were published. Much like JIF, MEiN ranking helps identify venues for publication but promotes cleverness rather than skill and hard work. The key question comes to mind: is it policymakers’ job to artificially inflate the value of selected journals and even put them on a par with top medical titles? In our view, more emphasis should be put on improving the quality of peer-review, editorial process, and strict adherence to reporting guidelines to promote transparent, reproducible and locally relevant research. Instead of encouraging scientists to publish poor studies in “high-rank” journals, they should be equipped with skills Altman argued few possessed. Courses in basic medical statistics and statistical inference, research design and critical appraisal should be promoted. The joined effort of policymakers, clinicians, and researchers would likely result in better research addressing significant clinical problems to find better solutions for patients.

Our analysis had some limitations. We only extracted data on journals included in the Medical Sciences category by MEiN. Therefore, some of the titles from JCR Clinical Medicine list that were not present within this MEiN category could be listed elsewhere. We assumed clinical medicine was a narrower term than medical sciences and that the latter should contain all the journals from the former group. As shown above, some journals unrelated to health care sciences were also included by MEiN in the Medical Sciences group. How titles were assigned to categories remains unclear. Another limitation was that authors of this study were Polish scientists who underwent evaluation based on the said ranking and therefore were biased. We aimed to analyze bibliometric data in an objective way and focused on the most extreme outliers, regardless of our own study fields.

In conclusion, in Poland, medical research assessment is based on a nontransparent and unbalanced system created by experts. It may encourage publishing low-quality research in journals assigned high expert rank but with low JIF. A comprehensive review of the journal ranking list is needed to promote researchers adequately addressing relevant clinical problems. It is our hope this analysis will spark discussion in other countries currently using similar expert-based assessment systems.

The authors declare no conflict of interest.

The authors have not declared a specific grant for this research from any funding agency in the public, commercial, or not-for-profit sectors.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

专家指导下的医学研究评估可能会促进低质量研究的发表并增加研究浪费：期刊影响因子与波兰专家期刊排名榜的比较分析》。

医学文献数量的不断增长催生了评估科学价值的需求。期刊影响因子（JIF）是一种依赖于引用次数的指标，可显示科学期刊的声望1。研究质量通常根据发表地点进行评估，因此 JIF 可间接用于评估研究。这种方法并非完美无缺。2 在《莱顿宣言》中，希克斯等人主张更加重视定性评估、透明度、稳健的本地相关研究，并考虑研究领域的差异。3 尽管存在这些批评，JIF 仍被用于评估科学产出。4 在美国和加拿大，期刊排名基于 JIF、CiteScore、SCImago 期刊排名或 Hirsh 指数等指标。4 在美国和加拿大，期刊排名以 JIF、CiteScore、SCImago 期刊排名或 Hirsh 指数等指标为基础。主要有两种方法：(1) 完全基于指标，或 (2) 由专家确定，专家可能（也可能不）考虑这些指标5。5 第一种模式在土耳其（TÜBİTAK 国际科学出版物奖励计划）或中国（中国科学院期刊排名榜）等国使用，第二种模式在芬兰（出版论坛期刊目录）、挪威（挪威科学期刊、丛书和出版商登记册）、意大利（科学和 A 类期刊评级）、丹麦（BFI 丛书目录）和波兰（波兰期刊排名）等国使用。6 虽然两种模式都在一定程度上依赖于 JIF，但后者更为主观，很可能受国家科学政策目标的影响。这大大增加了政治化的风险，可能导致参与排名的专家根据自己的专业目标调整指定的期刊排名，从而可能造成利益冲突5 。在波兰，评估系统基于教育和科学部（MEiN-pol. Ministerstwo Edukacji i Nauki）的评分。最新版于 2024 年 1 月 5 日发布，比 2022 年《期刊引证报告》的发布时间（2022 年 6 月）晚了一年多。7 由于 JIF 并非期刊质量的完美代用指标，以专家意见作为评估系统的补充可能有助于促进良好的研究。本研究的目的是比较 MEiN 排名系统与 JIF，并讨论两种模型之间潜在差异的后果。此外，有 582 种期刊（10%）被列入 JCR，但未被纳入 MEiN 医学科学类别（表 S1 和 S2）。一些被忽略的期刊的 JIF 超过 17。在各个级别中，最小 JIF 值都很低（0-2.2），而且 JIF 值的变化很大。在极端情况下，影响因子超过 30 的期刊可获得 20 个 MEiN 点，而影响因子为 2.2 的期刊则可获得 200 个 MEiN 点。在随后的每一等级中，期刊的数量并没有像人们想象的那样减少，而是不规则的。获得 70 分或 100 分的期刊多于获得 40 分的期刊。排名 140 和 200 的期刊最多，共有 690 种。此外，2219 种期刊被评为 MEiN 等级，但未被列入 JCR 临床医学类别。其中，1092 种（49.2%）被评为 20 分，353 种（15.9%）-40 分，355 种（16%）-70 分，243 种（10.9%）-100 分，110 种（4.9%）-140 分，66 种（3%）-200 分。图1显示了所有影响因子低于40的期刊的JIF与MEiN排名的关系。在每个JCR临床医学类别中，我们根据期刊的JIF从1到n（类别中期刊的数量）进行排名。因此，JIF最高的期刊排名为1，第二名的期刊排名为2，以此类推。之后，我们将每类期刊的排名与 MEiN 分数进行了关联（表 S3 和 S4）。结果差异很大，相关系数最低的是医学信息学（r = -0.18），最高的是神经影像学（r = -0.93）。在一半以上的类别中（41/59），相关系数低于-0.7。这表明，专家打分的依据不仅仅是基于 JIF 的某一期刊在某一领域的声望。究竟是什么引导了他们的决定，目前还不得而知。在临床医学类别的 14 个（24%）专业中，没有期刊获得 200 MEiN 分。令人震惊的是，在这些未被评为 200 分的期刊中，有些最好的期刊的 JIF 值介于 3.4 和 30.8 之间。正如上文所述，某些领域中最有声望的期刊显然被评定 MEiN 等级的专家低估了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Evidence‐Based Medicine MEDICINE, GENERAL & INTERNAL-

CiteScore

11.20

自引率

1.40%

发文量

期刊介绍： The Journal of Evidence-Based Medicine (EMB) is an esteemed international healthcare and medical decision-making journal, dedicated to publishing groundbreaking research outcomes in evidence-based decision-making, research, practice, and education. Serving as the official English-language journal of the Cochrane China Centre and West China Hospital of Sichuan University, we eagerly welcome editorials, commentaries, and systematic reviews encompassing various topics such as clinical trials, policy, drug and patient safety, education, and knowledge translation.