Challenges in Analyzing Software Documentation in Portuguese

Christoph Treude, C. Prolo, Fernando Marques Figueira Filho
{"title":"Challenges in Analyzing Software Documentation in Portuguese","authors":"Christoph Treude, C. Prolo, Fernando Marques Figueira Filho","doi":"10.1109/SBES.2015.27","DOIUrl":null,"url":null,"abstract":"Many tools that automatically analyze, summarize, or transform software artifacts rely on natural language processing tooling for the interpretation of natural language text produced by software developers, such as documentation, code comments, commit messages, or bug reports. Processing natural language text produced by software developers is challenging because of unique characteristics not found in other texts, such as the presence of code terms and the systematic use of incomplete sentences. In addition, texts produced by Portuguese-speaking developers mix languages since many keywords and programming concepts are referred to by their English name. In this paper, we provide empirical insights into the challenges of analyzing software artifacts written in Portuguese. We analyzed 100 question titles from the Portuguese version of Stack Overflow with two Portuguese language tools and identified multiple problems which resulted in very few sentences being tagged completely correctly. Based on these results, we propose heuristics to improve the analysis of natural language text produced by software developers in Portuguese.","PeriodicalId":329313,"journal":{"name":"2015 29th Brazilian Symposium on Software Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 29th Brazilian Symposium on Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBES.2015.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Many tools that automatically analyze, summarize, or transform software artifacts rely on natural language processing tooling for the interpretation of natural language text produced by software developers, such as documentation, code comments, commit messages, or bug reports. Processing natural language text produced by software developers is challenging because of unique characteristics not found in other texts, such as the presence of code terms and the systematic use of incomplete sentences. In addition, texts produced by Portuguese-speaking developers mix languages since many keywords and programming concepts are referred to by their English name. In this paper, we provide empirical insights into the challenges of analyzing software artifacts written in Portuguese. We analyzed 100 question titles from the Portuguese version of Stack Overflow with two Portuguese language tools and identified multiple problems which resulted in very few sentences being tagged completely correctly. Based on these results, we propose heuristics to improve the analysis of natural language text produced by software developers in Portuguese.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分析葡萄牙语软件文档的挑战
许多自动分析、总结或转换软件工件的工具依赖于自然语言处理工具来解释由软件开发人员生成的自然语言文本,例如文档、代码注释、提交消息或错误报告。处理由软件开发人员生成的自然语言文本是具有挑战性的,因为在其他文本中没有发现独特的特征,例如代码术语的存在和不完整句子的系统使用。此外,由于许多关键字和编程概念都是用英文名称指代的,所以说葡萄牙语的开发人员编写的文本会混合多种语言。在本文中,我们提供了分析用葡萄牙语编写的软件工件的挑战的经验见解。我们使用两个葡萄牙语工具分析了来自Stack Overflow葡萄牙语版本的100个问题标题,并发现了导致很少句子被完全正确标记的多个问题。基于这些结果,我们提出了启发式方法来改进软件开发人员用葡萄牙语生成的自然语言文本的分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Bayesian Network Model to Assess Agile Teams' Teamwork Quality Evaluating Collaborative Practices in Acquiring Programming Skills: Findings of a Controlled Experiment A Method to Derive Metric Thresholds for Software Product Lines An Experiment on Process Model Understandability Using Textual Work Instructions and BPMN Models Influence of the Review of Executed Activities Utilizing Planning Poker
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1