{"title":"审稿人的信心分数与审稿内容一致吗?来自人工智能顶级会议论文集的证据","authors":"Wenqing Wu, Haixu Xi, Chengzhi Zhang","doi":"10.1007/s11192-024-05070-8","DOIUrl":null,"url":null,"abstract":"<p>Peer review is a critical process used in academia to assess the quality and validity of research articles. Top-tier conferences in the field of artificial intelligence (e.g. ICLR and ACL et al.) require reviewers to provide confidence scores to ensure the reliability of their review reports. However, existing studies on confidence scores have neglected to measure the consistency between the comment text and the confidence score in a more refined way, which may overlook more detailed details (such as aspects) in the text, leading to incomplete understanding of the results and insufficient objective analysis of the results. In this work, we propose assessing the consistency between the textual content of the review reports and the assigned scores at a fine-grained level, including word, sentence and aspect levels. The data used in this paper is derived from the peer review comments of conferences in the fields of deep learning and natural language processing. We employed deep learning models to detect hedge sentences and their corresponding aspects. Furthermore, we conducted statistical analyses of the length of review reports, frequency of hedge word usage, number of hedge sentences, frequency of aspect mentions, and their associated sentiment to assess the consistency between the textual content and confidence scores. Finally, we performed correlation analysis, significance tests and regression analysis on the data to examine the impact of confidence scores on the outcomes of the papers. The results indicate that textual content of the review reports and their confidence scores have high level of consistency at the word, sentence, and aspect levels. The regression results reveal a negative correlation between confidence scores and paper outcomes, indicating that higher confidence scores given by reviewers were associated with paper rejection. This indicates that current overall assessment of the paper’s content and quality by the experts is reliable, making the transparency and fairness of the peer review process convincing. We release our data and associated codes at https://github.com/njust-winchy/confidence_score.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"62 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Are the confidence scores of reviewers consistent with the review content? Evidence from top conference proceedings in AI\",\"authors\":\"Wenqing Wu, Haixu Xi, Chengzhi Zhang\",\"doi\":\"10.1007/s11192-024-05070-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Peer review is a critical process used in academia to assess the quality and validity of research articles. Top-tier conferences in the field of artificial intelligence (e.g. ICLR and ACL et al.) require reviewers to provide confidence scores to ensure the reliability of their review reports. However, existing studies on confidence scores have neglected to measure the consistency between the comment text and the confidence score in a more refined way, which may overlook more detailed details (such as aspects) in the text, leading to incomplete understanding of the results and insufficient objective analysis of the results. In this work, we propose assessing the consistency between the textual content of the review reports and the assigned scores at a fine-grained level, including word, sentence and aspect levels. The data used in this paper is derived from the peer review comments of conferences in the fields of deep learning and natural language processing. We employed deep learning models to detect hedge sentences and their corresponding aspects. Furthermore, we conducted statistical analyses of the length of review reports, frequency of hedge word usage, number of hedge sentences, frequency of aspect mentions, and their associated sentiment to assess the consistency between the textual content and confidence scores. Finally, we performed correlation analysis, significance tests and regression analysis on the data to examine the impact of confidence scores on the outcomes of the papers. The results indicate that textual content of the review reports and their confidence scores have high level of consistency at the word, sentence, and aspect levels. The regression results reveal a negative correlation between confidence scores and paper outcomes, indicating that higher confidence scores given by reviewers were associated with paper rejection. This indicates that current overall assessment of the paper’s content and quality by the experts is reliable, making the transparency and fairness of the peer review process convincing. We release our data and associated codes at https://github.com/njust-winchy/confidence_score.</p>\",\"PeriodicalId\":21755,\"journal\":{\"name\":\"Scientometrics\",\"volume\":\"62 1\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientometrics\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1007/s11192-024-05070-8\",\"RegionNum\":3,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientometrics","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1007/s11192-024-05070-8","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Are the confidence scores of reviewers consistent with the review content? Evidence from top conference proceedings in AI
Peer review is a critical process used in academia to assess the quality and validity of research articles. Top-tier conferences in the field of artificial intelligence (e.g. ICLR and ACL et al.) require reviewers to provide confidence scores to ensure the reliability of their review reports. However, existing studies on confidence scores have neglected to measure the consistency between the comment text and the confidence score in a more refined way, which may overlook more detailed details (such as aspects) in the text, leading to incomplete understanding of the results and insufficient objective analysis of the results. In this work, we propose assessing the consistency between the textual content of the review reports and the assigned scores at a fine-grained level, including word, sentence and aspect levels. The data used in this paper is derived from the peer review comments of conferences in the fields of deep learning and natural language processing. We employed deep learning models to detect hedge sentences and their corresponding aspects. Furthermore, we conducted statistical analyses of the length of review reports, frequency of hedge word usage, number of hedge sentences, frequency of aspect mentions, and their associated sentiment to assess the consistency between the textual content and confidence scores. Finally, we performed correlation analysis, significance tests and regression analysis on the data to examine the impact of confidence scores on the outcomes of the papers. The results indicate that textual content of the review reports and their confidence scores have high level of consistency at the word, sentence, and aspect levels. The regression results reveal a negative correlation between confidence scores and paper outcomes, indicating that higher confidence scores given by reviewers were associated with paper rejection. This indicates that current overall assessment of the paper’s content and quality by the experts is reliable, making the transparency and fairness of the peer review process convincing. We release our data and associated codes at https://github.com/njust-winchy/confidence_score.
期刊介绍:
Scientometrics aims at publishing original studies, short communications, preliminary reports, review papers, letters to the editor and book reviews on scientometrics. The topics covered are results of research concerned with the quantitative features and characteristics of science. Emphasis is placed on investigations in which the development and mechanism of science are studied by means of (statistical) mathematical methods.
The Journal also provides the reader with important up-to-date information about international meetings and events in scientometrics and related fields. Appropriate bibliographic compilations are published as a separate section. Due to its fully interdisciplinary character, Scientometrics is indispensable to research workers and research administrators throughout the world. It provides valuable assistance to librarians and documentalists in central scientific agencies, ministries, research institutes and laboratories.
Scientometrics includes the Journal of Research Communication Studies. Consequently its aims and scope cover that of the latter, namely, to bring the results of research investigations together in one place, in such a form that they will be of use not only to the investigators themselves but also to the entrepreneurs and research workers who form the object of these studies.