Too long; didn’t read: Automatic summarization of GitHub README.MD with Transformers

Thu T. H. Doan, P. Nguyen, Juri Di Rocco, Davide Di Ruscio
{"title":"Too long; didn’t read: Automatic summarization of GitHub README.MD with Transformers","authors":"Thu T. H. Doan, P. Nguyen, Juri Di Rocco, Davide Di Ruscio","doi":"10.1145/3593434.3593448","DOIUrl":null,"url":null,"abstract":"The ability to allow developers to share their source code and collaborate on software projects has made GitHub a widely used open source platform. Each repository in GitHub is generally equipped with a README.MD file to exhibit an overview of the main functionalities. Nevertheless, while offering useful information, README.MD is usually lengthy, requiring time and effort to read and comprehend. Thus, besides README.MD, GitHub also allows its users to add a short description called “About,” giving a brief but informative summary about the repository. This enables visitors to quickly grasp the main content and decide whether to continue reading. Unfortunately, due to various reasons–not excluding laziness–oftentimes this field is left blank by developers. This paper proposes GitSum as a novel approach to the summarization of README.MD. GitSum is built on top of BART and T5, two cutting-edge deep learning techniques, learning from existing data to perform recommendations for repositories with a missing description. We test its performance using two datasets collected from GitHub. The evaluation shows that GitSum can generate relevant predictions, outperforming a well-established baseline.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3593434.3593448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The ability to allow developers to share their source code and collaborate on software projects has made GitHub a widely used open source platform. Each repository in GitHub is generally equipped with a README.MD file to exhibit an overview of the main functionalities. Nevertheless, while offering useful information, README.MD is usually lengthy, requiring time and effort to read and comprehend. Thus, besides README.MD, GitHub also allows its users to add a short description called “About,” giving a brief but informative summary about the repository. This enables visitors to quickly grasp the main content and decide whether to continue reading. Unfortunately, due to various reasons–not excluding laziness–oftentimes this field is left blank by developers. This paper proposes GitSum as a novel approach to the summarization of README.MD. GitSum is built on top of BART and T5, two cutting-edge deep learning techniques, learning from existing data to perform recommendations for repositories with a missing description. We test its performance using two datasets collected from GitHub. The evaluation shows that GitSum can generate relevant predictions, outperforming a well-established baseline.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
太长时间;没有读:GitHub README的自动摘要。带变压器的MD
允许开发人员共享源代码并在软件项目上进行协作的能力使GitHub成为一个广泛使用的开源平台。GitHub中的每个存储库通常都配有一个README。MD文件来展示主要功能的概述。然而,在提供有用信息的同时,README。MD通常很长,需要时间和精力来阅读和理解。因此,除了README之外。此外,GitHub还允许用户添加一个名为“About”的简短描述,提供一个关于存储库的简短但信息丰富的摘要。这使访问者能够快速掌握主要内容,并决定是否继续阅读。不幸的是,由于各种原因(不排除懒惰),开发人员经常将此字段留白。本文提出GitSum作为一种新的方法来总结README.MD。GitSum建立在BART和T5这两种尖端的深度学习技术之上,从现有数据中学习,为缺少描述的存储库执行推荐。我们使用从GitHub收集的两个数据集来测试它的性能。评估表明,GitSum可以生成相关的预测,优于已建立的基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Classification-based Static Collection Selection for Java: Effectiveness and Adaptability Decentralised Autonomous Organisations for Public Procurement Analyzing the Resource Usage Overhead of Mobile App Development Frameworks Investigation of Security-related Commits in Android Apps Exploring the UK Cyber Skills Gap through a mapping of active job listings to the Cyber Security Body of Knowledge (CyBOK)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1