面向软件从业者博客文章可信度评估的语料库

Ashley Williams, M. Shardlow, A. Rainer
{"title":"面向软件从业者博客文章可信度评估的语料库","authors":"Ashley Williams, M. Shardlow, A. Rainer","doi":"10.1145/3463274.3463330","DOIUrl":null,"url":null,"abstract":"Background: Blogs are a source of grey literature which are widely adopted by software practitioners for disseminating opinion and experience. Analysing such articles can provide useful insights into the state–of–practice for software engineering research. However, there are challenges in identifying higher quality content from the large quantity of articles available. Credibility assessment can help in identifying quality content, though there is a lack of existing corpora. Credibility is typically measured through a series of conceptual criteria, with ’argumentation’ and ’evidence’ being two important criteria. Objective: We create a corpus labelled for argumentation and evidence that can aid the credibility community. The corpus consists of articles from the blog of a single software practitioner and is publicly available. Method: Three annotators label the corpus with a series of conceptual credibility criteria, reaching an agreement of 0.82 (Fleiss’ Kappa). We present preliminary analysis of the corpus by using it to investigate the identification of claim sentences (one of our ten labels). Results: We train four systems (Bert, KNN, Decision Tree and SVM) using three feature sets (Bag of Words, Topic Modelling and InferSent), achieving an F1 score of 0.64 using InferSent and a Linear SVM. Conclusions: Our preliminary results are promising, indicating that the corpus can help future studies in detecting the credibility of grey literature. Future research will investigate the degree to which the sentence level annotations can infer the credibility of the overall document.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Towards a corpus for credibility assessment in software practitioner blog articles\",\"authors\":\"Ashley Williams, M. Shardlow, A. Rainer\",\"doi\":\"10.1145/3463274.3463330\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Blogs are a source of grey literature which are widely adopted by software practitioners for disseminating opinion and experience. Analysing such articles can provide useful insights into the state–of–practice for software engineering research. However, there are challenges in identifying higher quality content from the large quantity of articles available. Credibility assessment can help in identifying quality content, though there is a lack of existing corpora. Credibility is typically measured through a series of conceptual criteria, with ’argumentation’ and ’evidence’ being two important criteria. Objective: We create a corpus labelled for argumentation and evidence that can aid the credibility community. The corpus consists of articles from the blog of a single software practitioner and is publicly available. Method: Three annotators label the corpus with a series of conceptual credibility criteria, reaching an agreement of 0.82 (Fleiss’ Kappa). We present preliminary analysis of the corpus by using it to investigate the identification of claim sentences (one of our ten labels). Results: We train four systems (Bert, KNN, Decision Tree and SVM) using three feature sets (Bag of Words, Topic Modelling and InferSent), achieving an F1 score of 0.64 using InferSent and a Linear SVM. Conclusions: Our preliminary results are promising, indicating that the corpus can help future studies in detecting the credibility of grey literature. Future research will investigate the degree to which the sentence level annotations can infer the credibility of the overall document.\",\"PeriodicalId\":328024,\"journal\":{\"name\":\"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3463274.3463330\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3463274.3463330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

背景:博客是灰色文献的来源,被软件从业者广泛采用,用于传播意见和经验。分析这些文章可以为软件工程研究的实践状态提供有用的见解。然而,在从大量可用文章中识别高质量内容方面存在挑战。尽管缺乏现有的语料库,但可信度评估可以帮助识别高质量的内容。可信度通常是通过一系列概念标准来衡量的,其中“论证”和“证据”是两个重要的标准。目的:我们创建一个标记为论证和证据的语料库,可以帮助可信度社区。语料库由来自单个软件从业者博客的文章组成,并且是公开可用的。方法:三位注释者用一系列概念可信度标准对语料库进行标注,一致性为0.82 (Fleiss’Kappa)。我们提出了语料库的初步分析,使用它来调查索赔句(我们的十个标签之一)的识别。结果:我们使用三个特征集(Bag of Words, Topic Modelling和InferSent)训练了四个系统(Bert, KNN, Decision Tree和SVM),使用InferSent和线性支持向量机获得了0.64的F1分数。结论:我们的初步结果是有希望的,表明语料库可以帮助未来的研究检测灰色文献的可信度。未来的研究将探讨句子级注释在多大程度上可以推断整个文档的可信度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Towards a corpus for credibility assessment in software practitioner blog articles
Background: Blogs are a source of grey literature which are widely adopted by software practitioners for disseminating opinion and experience. Analysing such articles can provide useful insights into the state–of–practice for software engineering research. However, there are challenges in identifying higher quality content from the large quantity of articles available. Credibility assessment can help in identifying quality content, though there is a lack of existing corpora. Credibility is typically measured through a series of conceptual criteria, with ’argumentation’ and ’evidence’ being two important criteria. Objective: We create a corpus labelled for argumentation and evidence that can aid the credibility community. The corpus consists of articles from the blog of a single software practitioner and is publicly available. Method: Three annotators label the corpus with a series of conceptual credibility criteria, reaching an agreement of 0.82 (Fleiss’ Kappa). We present preliminary analysis of the corpus by using it to investigate the identification of claim sentences (one of our ten labels). Results: We train four systems (Bert, KNN, Decision Tree and SVM) using three feature sets (Bag of Words, Topic Modelling and InferSent), achieving an F1 score of 0.64 using InferSent and a Linear SVM. Conclusions: Our preliminary results are promising, indicating that the corpus can help future studies in detecting the credibility of grey literature. Future research will investigate the degree to which the sentence level annotations can infer the credibility of the overall document.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
About the Assessment of Grey Literature in Software Engineering Towards an Automated Classification Approach for Software Engineering Research Fog Based Energy Efficient Process Framework for Smart Building Open Data-driven Usability Improvements of Static Code Analysis and its Challenges Towards a corpus for credibility assessment in software practitioner blog articles
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1