用语义聚类丰富逆向工程

Adrian Kuhn, Stéphane Ducasse, Tudor Gîrba
{"title":"用语义聚类丰富逆向工程","authors":"Adrian Kuhn, Stéphane Ducasse, Tudor Gîrba","doi":"10.1109/WCRE.2005.16","DOIUrl":null,"url":null,"abstract":"Understanding a software system by just analyzing the structure of the system reveals only half of the picture, since the structure tells us only how the code is working but not what the code is about. What the code is about can be found in the semantics of the source code: names of identifiers, comments etc. In this paper, we analyze how these terms are spread over the source artifacts using latent semantic indexing, an information retrieval technique. We use the assumption that parts of the system that use similar terms are related. We cluster artifacts that use similar terms, and we reveal the most relevant terms for the computed clusters. Our approach works at the level of the source code which makes it language independent. Nevertheless, we correlated the semantics with structural information and we applied it at different levels of abstraction (e.g. classes, methods). We applied our approach on three large case studies and we report the results we obtained.","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"125","resultStr":"{\"title\":\"Enriching reverse engineering with semantic clustering\",\"authors\":\"Adrian Kuhn, Stéphane Ducasse, Tudor Gîrba\",\"doi\":\"10.1109/WCRE.2005.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding a software system by just analyzing the structure of the system reveals only half of the picture, since the structure tells us only how the code is working but not what the code is about. What the code is about can be found in the semantics of the source code: names of identifiers, comments etc. In this paper, we analyze how these terms are spread over the source artifacts using latent semantic indexing, an information retrieval technique. We use the assumption that parts of the system that use similar terms are related. We cluster artifacts that use similar terms, and we reveal the most relevant terms for the computed clusters. Our approach works at the level of the source code which makes it language independent. Nevertheless, we correlated the semantics with structural information and we applied it at different levels of abstraction (e.g. classes, methods). We applied our approach on three large case studies and we report the results we obtained.\",\"PeriodicalId\":119724,\"journal\":{\"name\":\"12th Working Conference on Reverse Engineering (WCRE'05)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"125\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"12th Working Conference on Reverse Engineering (WCRE'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WCRE.2005.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"12th Working Conference on Reverse Engineering (WCRE'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCRE.2005.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 125

摘要

通过分析系统的结构来理解一个软件系统只揭示了一半的情况,因为结构只告诉我们代码是如何工作的,而不是代码是关于什么的。代码的内容可以在源代码的语义中找到:标识符的名称,注释等。在本文中,我们分析了这些术语是如何使用潜在语义索引(一种信息检索技术)在源工件上传播的。我们假设系统中使用相似术语的部分是相关的。我们对使用相似术语的工件进行聚类,并为计算出的聚类揭示最相关的术语。我们的方法在源代码级别上工作,这使得它与语言无关。然而,我们将语义与结构信息关联起来,并将其应用于不同的抽象级别(例如类、方法)。我们将我们的方法应用于三个大型案例研究,并报告了我们获得的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Enriching reverse engineering with semantic clustering
Understanding a software system by just analyzing the structure of the system reveals only half of the picture, since the structure tells us only how the code is working but not what the code is about. What the code is about can be found in the semantics of the source code: names of identifiers, comments etc. In this paper, we analyze how these terms are spread over the source artifacts using latent semantic indexing, an information retrieval technique. We use the assumption that parts of the system that use similar terms are related. We cluster artifacts that use similar terms, and we reveal the most relevant terms for the computed clusters. Our approach works at the level of the source code which makes it language independent. Nevertheless, we correlated the semantics with structural information and we applied it at different levels of abstraction (e.g. classes, methods). We applied our approach on three large case studies and we report the results we obtained.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Workshop on Program Comprehension through Dynamic Analysis (PCODA ‘05) When functions change their names: automatic detection of origin relationships Source versus object code extraction for recovering software architecture Symbolic interpretation of legacy assembly language Multiple layer clustering of large software systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1