Are the confidence scores of reviewers consistent with the review content? Evidence from top conference proceedings in AI

IF 3.5 3区 管理学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Scientometrics Pub Date : 2024-06-20 DOI:10.1007/s11192-024-05070-8
Wenqing Wu, Haixu Xi, Chengzhi Zhang
{"title":"Are the confidence scores of reviewers consistent with the review content? Evidence from top conference proceedings in AI","authors":"Wenqing Wu, Haixu Xi, Chengzhi Zhang","doi":"10.1007/s11192-024-05070-8","DOIUrl":null,"url":null,"abstract":"<p>Peer review is a critical process used in academia to assess the quality and validity of research articles. Top-tier conferences in the field of artificial intelligence (e.g. ICLR and ACL et al.) require reviewers to provide confidence scores to ensure the reliability of their review reports. However, existing studies on confidence scores have neglected to measure the consistency between the comment text and the confidence score in a more refined way, which may overlook more detailed details (such as aspects) in the text, leading to incomplete understanding of the results and insufficient objective analysis of the results. In this work, we propose assessing the consistency between the textual content of the review reports and the assigned scores at a fine-grained level, including word, sentence and aspect levels. The data used in this paper is derived from the peer review comments of conferences in the fields of deep learning and natural language processing. We employed deep learning models to detect hedge sentences and their corresponding aspects. Furthermore, we conducted statistical analyses of the length of review reports, frequency of hedge word usage, number of hedge sentences, frequency of aspect mentions, and their associated sentiment to assess the consistency between the textual content and confidence scores. Finally, we performed correlation analysis, significance tests and regression analysis on the data to examine the impact of confidence scores on the outcomes of the papers. The results indicate that textual content of the review reports and their confidence scores have high level of consistency at the word, sentence, and aspect levels. The regression results reveal a negative correlation between confidence scores and paper outcomes, indicating that higher confidence scores given by reviewers were associated with paper rejection. This indicates that current overall assessment of the paper’s content and quality by the experts is reliable, making the transparency and fairness of the peer review process convincing. We release our data and associated codes at https://github.com/njust-winchy/confidence_score.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"62 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientometrics","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1007/s11192-024-05070-8","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Peer review is a critical process used in academia to assess the quality and validity of research articles. Top-tier conferences in the field of artificial intelligence (e.g. ICLR and ACL et al.) require reviewers to provide confidence scores to ensure the reliability of their review reports. However, existing studies on confidence scores have neglected to measure the consistency between the comment text and the confidence score in a more refined way, which may overlook more detailed details (such as aspects) in the text, leading to incomplete understanding of the results and insufficient objective analysis of the results. In this work, we propose assessing the consistency between the textual content of the review reports and the assigned scores at a fine-grained level, including word, sentence and aspect levels. The data used in this paper is derived from the peer review comments of conferences in the fields of deep learning and natural language processing. We employed deep learning models to detect hedge sentences and their corresponding aspects. Furthermore, we conducted statistical analyses of the length of review reports, frequency of hedge word usage, number of hedge sentences, frequency of aspect mentions, and their associated sentiment to assess the consistency between the textual content and confidence scores. Finally, we performed correlation analysis, significance tests and regression analysis on the data to examine the impact of confidence scores on the outcomes of the papers. The results indicate that textual content of the review reports and their confidence scores have high level of consistency at the word, sentence, and aspect levels. The regression results reveal a negative correlation between confidence scores and paper outcomes, indicating that higher confidence scores given by reviewers were associated with paper rejection. This indicates that current overall assessment of the paper’s content and quality by the experts is reliable, making the transparency and fairness of the peer review process convincing. We release our data and associated codes at https://github.com/njust-winchy/confidence_score.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
审稿人的信心分数与审稿内容一致吗?来自人工智能顶级会议论文集的证据
同行评议是学术界用来评估研究文章质量和有效性的重要程序。人工智能领域的顶级会议(如 ICLR 和 ACL 等)都要求审稿人提供置信度分数,以确保审稿报告的可靠性。然而,现有关于置信度评分的研究忽略了以更精细的方式衡量评论文本与置信度评分之间的一致性,这可能会忽略文本中更详细的细节(如方面),导致对结果的理解不全面,对结果的分析不够客观。在这项工作中,我们建议在细粒度层面(包括单词、句子和方面层面)评估综述报告的文本内容与指定分数之间的一致性。本文使用的数据来自深度学习和自然语言处理领域会议的同行评审意见。我们采用深度学习模型来检测对冲句子及其相应方面。此外,我们还对评论报告的长度、对冲词的使用频率、对冲句子的数量、方面的提及频率及其相关情感进行了统计分析,以评估文本内容与置信度得分之间的一致性。最后,我们对数据进行了相关性分析、显著性检验和回归分析,以研究置信度得分对论文结果的影响。结果表明,综述报告的文本内容与置信度得分在词、句和方面层面上具有高度一致性。回归结果显示,可信度得分与论文结果之间呈负相关,表明审稿人给出的可信度得分越高,论文被拒的可能性越大。这表明目前专家对论文内容和质量的总体评价是可靠的,从而使同行评审过程的透明度和公平性令人信服。我们在 https://github.com/njust-winchy/confidence_score 上发布了我们的数据和相关代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Scientometrics
Scientometrics 管理科学-计算机:跨学科应用
CiteScore
7.20
自引率
17.90%
发文量
351
审稿时长
1.5 months
期刊介绍: Scientometrics aims at publishing original studies, short communications, preliminary reports, review papers, letters to the editor and book reviews on scientometrics. The topics covered are results of research concerned with the quantitative features and characteristics of science. Emphasis is placed on investigations in which the development and mechanism of science are studied by means of (statistical) mathematical methods. The Journal also provides the reader with important up-to-date information about international meetings and events in scientometrics and related fields. Appropriate bibliographic compilations are published as a separate section. Due to its fully interdisciplinary character, Scientometrics is indispensable to research workers and research administrators throughout the world. It provides valuable assistance to librarians and documentalists in central scientific agencies, ministries, research institutes and laboratories. Scientometrics includes the Journal of Research Communication Studies. Consequently its aims and scope cover that of the latter, namely, to bring the results of research investigations together in one place, in such a form that they will be of use not only to the investigators themselves but also to the entrepreneurs and research workers who form the object of these studies.
期刊最新文献
Evaluating the wisdom of scholar crowds from the perspective of knowledge diffusion Automatic gender detection: a methodological procedure and recommendations to computationally infer the gender from names with ChatGPT and gender APIs An integrated indicator for evaluating scientific papers: considering academic impact and novelty Measuring hotness transfer of individual papers based on citation relationship Prevalence and characteristics of graphical abstracts in a specialist pharmacology journal
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1