Software maintenance and evolution can be made easier if program comprehension techniques are used. Understanding a software system would typically necessitate a combination of static and dynamic analysis techniques. The aim of this workshop is to gather researchers working in the area of program comprehension with an emphasis on dynamic analysis. We are interested in investigating how dynamic analysis techniques are used or can be used to enable better comprehension of a software system.
{"title":"Workshop on Program Comprehension through Dynamic Analysis (PCODA ‘05)","authors":"O. Greevy, A. Hamou-Lhadj, A. Zaidman","doi":"10.1109/WCRE.2005.35","DOIUrl":"https://doi.org/10.1109/WCRE.2005.35","url":null,"abstract":"Software maintenance and evolution can be made easier if program comprehension techniques are used. Understanding a software system would typically necessitate a combination of static and dynamic analysis techniques. The aim of this workshop is to gather researchers working in the area of program comprehension with an emphasis on dynamic analysis. We are interested in investigating how dynamic analysis techniques are used or can be used to enable better comprehension of a software system.","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122709634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuck is a new code browser that allows navigation of a code base along semantic structures, such as data-flow and higher-order control-flow relationships. Employing the fast DDP type inferencer, it is effective on dynamically typed code bases an order of magnitude larger than the code bases supported by previous browsers. Chuck supports the full Smalltalk language, and is now shipped as a standard component of the Squeak open-source Smalltalk system, where it routinely works with code bases exceeding 300,000 lines of code. Chuck's implementation is tuned for interactive use, and is transparently integrated with the Squeak system's existing code-browsing tools. Thus, it provides semantic navigation of a live code base that is still being edited without requiring long pauses for reanalysis due to edits of the code.
{"title":"Semantic navigation of large code bases in higher-order, dynamically typed languages","authors":"S. Spoon, O. Shivers","doi":"10.1109/WCRE.2005.29","DOIUrl":"https://doi.org/10.1109/WCRE.2005.29","url":null,"abstract":"Chuck is a new code browser that allows navigation of a code base along semantic structures, such as data-flow and higher-order control-flow relationships. Employing the fast DDP type inferencer, it is effective on dynamically typed code bases an order of magnitude larger than the code bases supported by previous browsers. Chuck supports the full Smalltalk language, and is now shipped as a standard component of the Squeak open-source Smalltalk system, where it routinely works with code bases exceeding 300,000 lines of code. Chuck's implementation is tuned for interactive use, and is transparently integrated with the Squeak system's existing code-browsing tools. Thus, it provides semantic navigation of a live code base that is still being edited without requiring long pauses for reanalysis due to edits of the code.","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117351921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is a common understanding that identifying the same entity such as module, file, and function between revisions is important for software evolution related analysis. Most software evolution researchers use entity names, such as file names and function names, as entity identifiers based on the assumption that each entity is uniquely identifiable by its name. Unfortunately names change over time. In this paper, we propose an automated algorithm that identifies entity mapping at the function level across revisions even when an entity's name changes in the new revision. This algorithm is based on computing function similarities. We introduce eight similarity factors to determine if a function is renamed from a function. To find out which similarity factors are dominant, a significance analysis is performed on each factor. To validate our algorithm and for factor significance analysis, ten human judges manually identified renamed entities across revisions for two open source projects: Subversion and Apache2. Using the manually identified result set we trained weights for each similarity factor and measured the accuracy of our algorithm. We computed the accuracies among human judges. We found our algorithm's accuracy is better than the average accuracy among human judges. We also show that trained weights for similarity factors from one period in one project are reusable for other periods and/or other projects. Finally we combined all possible factor combinations and computed the accuracy of each combination. We found that adding more factors does not necessarily improve the accuracy of origin detection.
{"title":"When functions change their names: automatic detection of origin relationships","authors":"Sunghun Kim, Kai Pan, E. J. Whitehead","doi":"10.1109/WCRE.2005.33","DOIUrl":"https://doi.org/10.1109/WCRE.2005.33","url":null,"abstract":"It is a common understanding that identifying the same entity such as module, file, and function between revisions is important for software evolution related analysis. Most software evolution researchers use entity names, such as file names and function names, as entity identifiers based on the assumption that each entity is uniquely identifiable by its name. Unfortunately names change over time. In this paper, we propose an automated algorithm that identifies entity mapping at the function level across revisions even when an entity's name changes in the new revision. This algorithm is based on computing function similarities. We introduce eight similarity factors to determine if a function is renamed from a function. To find out which similarity factors are dominant, a significance analysis is performed on each factor. To validate our algorithm and for factor significance analysis, ten human judges manually identified renamed entities across revisions for two open source projects: Subversion and Apache2. Using the manually identified result set we trained weights for each similarity factor and measured the accuracy of our algorithm. We computed the accuracies among human judges. We found our algorithm's accuracy is better than the average accuracy among human judges. We also show that trained weights for similarity factors from one period in one project are reusable for other periods and/or other projects. Finally we combined all possible factor combinations and computed the accuracy of each combination. We found that adding more factors does not necessarily improve the accuracy of origin detection.","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114886365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today, software intensive systems are developed more and more using product line approaches. These approaches require the definition of a product line architecture that implicitly or explicitly specifies some degree of variability. This variability is used to instantiate concrete software product instances. A product line approach not only implies reuse of architecture-level design knowledge, it also facilitates reuse of implementation-level artefacts, such as source code and executable components. The use of software product lines can reduce the cost of developing new products significantly.
{"title":"Reengineering towards Product Lines (R2PL 2005)","authors":"B. Graaf, L. O'Brien, Rafael Capilla","doi":"10.1109/WCRE.2005.26","DOIUrl":"https://doi.org/10.1109/WCRE.2005.26","url":null,"abstract":"Today, software intensive systems are developed more and more using product line approaches. These approaches require the definition of a product line architecture that implicitly or explicitly specifies some degree of variability. This variability is used to instantiate concrete software product instances. A product line approach not only implies reuse of architecture-level design knowledge, it also facilitates reuse of implementation-level artefacts, such as source code and executable components. The use of software product lines can reduce the cost of developing new products significantly.","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130347767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate estimation of project costs is an essential prerequisite to making a reengineering project. Existing systems are usually reengineered because it is cheaper to reengineer them than to redevelop or to replace them. However, to make this decision, management must know what the reengineering will cost. This contribution describes an eight step tool supported process for calculating the time and the costs required to reengineer an existing system. The process is derived from the author's 20 year experience in estimating reengineering projects and has been validated by several real life field experiments in which it has been refined and calibrated
{"title":"Estimating the costs of a reengineering project","authors":"H. Sneed","doi":"10.1109/WCRE.2005.18","DOIUrl":"https://doi.org/10.1109/WCRE.2005.18","url":null,"abstract":"Accurate estimation of project costs is an essential prerequisite to making a reengineering project. Existing systems are usually reengineered because it is cheaper to reengineer them than to redevelop or to replace them. However, to make this decision, management must know what the reengineering will cost. This contribution describes an eight step tool supported process for calculating the time and the costs required to reengineer an existing system. The process is derived from the author's 20 year experience in estimating reengineering projects and has been validated by several real life field experiments in which it has been refined and calibrated","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131688854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, code obfuscation has attracted attention as a low cost approach to improving software security by making it difficult for attackers to understand the inner workings of proprietary software systems. This paper examines techniques for automatic deobfuscation of obfuscated programs, as a step towards reverse engineering such programs. Our results indicate that much of the effects of code obfuscation, designed to increase the difficulty of static analyses, can be defeated using simple combinations of straightforward static and dynamic analyses. Our results have applications to both software engineering and software security. In the context of software engineering, we show how dynamic analyses can be used to enhance reverse engineering, even for code that has been designed to be difficult to reverse engineer. For software security, our results serve as an attack model for code obfuscators, and can help with the development of obfuscation techniques that are more resilient to straightforward reverse engineering.
{"title":"Deobfuscation: reverse engineering obfuscated code","authors":"Sharath K. Udupa, S. Debray, Matias Madou","doi":"10.1109/WCRE.2005.13","DOIUrl":"https://doi.org/10.1109/WCRE.2005.13","url":null,"abstract":"In recent years, code obfuscation has attracted attention as a low cost approach to improving software security by making it difficult for attackers to understand the inner workings of proprietary software systems. This paper examines techniques for automatic deobfuscation of obfuscated programs, as a step towards reverse engineering such programs. Our results indicate that much of the effects of code obfuscation, designed to increase the difficulty of static analyses, can be defeated using simple combinations of straightforward static and dynamic analyses. Our results have applications to both software engineering and software security. In the context of software engineering, we show how dynamic analyses can be used to enhance reverse engineering, even for code that has been designed to be difficult to reverse engineer. For software security, our results serve as an attack model for code obfuscators, and can help with the development of obfuscation techniques that are more resilient to straightforward reverse engineering.","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124160514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper introduces a number of mapping rules for reverse engineering UML class models from C++ source code. The mappings focus on accurately identifying such elements as relationship types, multiplicities, and aggregation semantics. These mappings are based on domain knowledge of the C++ language and common programming conventions and idioms. An application implementing these heuristics is used to reverse engineer a moderately sized open source, C++ application, and the resultant class model is compared against those produced by other UML reverse engineering applications. A comparison shows that these presented mapping rules effectively produce meaningful, semantically accurate UML models
{"title":"Mappings for accurately reverse engineering UML class models from C++","authors":"A. Sutton, Jonathan I. Maletic","doi":"10.1109/WCRE.2005.21","DOIUrl":"https://doi.org/10.1109/WCRE.2005.21","url":null,"abstract":"The paper introduces a number of mapping rules for reverse engineering UML class models from C++ source code. The mappings focus on accurately identifying such elements as relationship types, multiplicities, and aggregation semantics. These mappings are based on domain knowledge of the C++ language and common programming conventions and idioms. An application implementing these heuristics is used to reverse engineer a moderately sized open source, C++ application, and the resultant class model is compared against those produced by other UML reverse engineering applications. A comparison shows that these presented mapping rules effectively produce meaningful, semantically accurate UML models","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131718939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amorphous slicing is an automated source code extraction technique with applications in many areas of software engineering, including comprehension, reuse, testing and reverse engineering. Algorithms for syntax-preserving slicing are well established, but amorphous slicing is harder because it requires arbitrary transformation; finding good general purpose amorphous slicing algorithms therefore remains as hard as general program transformation. In this paper we show how amorphous slices can be computed using search techniques. The paper presents results from a set of experiments designed to explore the application of genetic algorithms, hill climbing, random search and systematic search to a set of six subject programs. As a benchmark, the results are compared to those from an existing analytical algorithm for amorphous slicing, which was written specifically to perform well with the sorts of program under consideration. The results, while tentative at this stage, do give grounds for optimism. The search techniques proved able to reduce the size of the programs under consideration in all cases, sometimes equaling the performance of the specifically-tailored analytic algorithm. In one case, the search techniques performed better, highlighting a fault in the existing algorithm
{"title":"Search-based amorphous slicing","authors":"D. Fatiregun, M. Harman, R. Hierons","doi":"10.1109/WCRE.2005.28","DOIUrl":"https://doi.org/10.1109/WCRE.2005.28","url":null,"abstract":"Amorphous slicing is an automated source code extraction technique with applications in many areas of software engineering, including comprehension, reuse, testing and reverse engineering. Algorithms for syntax-preserving slicing are well established, but amorphous slicing is harder because it requires arbitrary transformation; finding good general purpose amorphous slicing algorithms therefore remains as hard as general program transformation. In this paper we show how amorphous slices can be computed using search techniques. The paper presents results from a set of experiments designed to explore the application of genetic algorithms, hill climbing, random search and systematic search to a set of six subject programs. As a benchmark, the results are compared to those from an existing analytical algorithm for amorphous slicing, which was written specifically to perform well with the sorts of program under consideration. The results, while tentative at this stage, do give grounds for optimism. The search techniques proved able to reduce the size of the programs under consideration in all cases, sometimes equaling the performance of the specifically-tailored analytic algorithm. In one case, the search techniques performed better, highlighting a fault in the existing algorithm","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123188005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software decay is a phenomenon that plagues aging software systems. While in recent years, there has been significant progress in the area of automatic detection of "code smells" on one hand, and code refactorings on the other hand, we claim that existing restructuring practices are seriously hampered by their symptomatic and informal (non-repeatable) nature. This paper makes a clear distinction between structural problems and structural symptoms (also known as code smells), and presents a novel, causal approach to restructuring object oriented systems. Our approach is based on two innovations: the encapsulation of correlations of symptoms and additional contextual information into higher-level design problems, and the univocal, explicit mapping of problems to unique refactoring solutions. Due to its explicit, repeatable nature, the approach shows high potential for increased levels of automation in the restructuring process, and consequently a decrease in maintenance costs.
{"title":"Diagnosing design problems in object oriented systems","authors":"A. Trifu, Radu Marinescu","doi":"10.1109/WCRE.2005.15","DOIUrl":"https://doi.org/10.1109/WCRE.2005.15","url":null,"abstract":"Software decay is a phenomenon that plagues aging software systems. While in recent years, there has been significant progress in the area of automatic detection of \"code smells\" on one hand, and code refactorings on the other hand, we claim that existing restructuring practices are seriously hampered by their symptomatic and informal (non-repeatable) nature. This paper makes a clear distinction between structural problems and structural symptoms (also known as code smells), and presents a novel, causal approach to restructuring object oriented systems. Our approach is based on two innovations: the encapsulation of correlations of symptoms and additional contextual information into higher-level design problems, and the univocal, explicit mapping of problems to unique refactoring solutions. Due to its explicit, repeatable nature, the approach shows high potential for increased levels of automation in the restructuring process, and consequently a decrease in maintenance costs.","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126570808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding a software system by just analyzing the structure of the system reveals only half of the picture, since the structure tells us only how the code is working but not what the code is about. What the code is about can be found in the semantics of the source code: names of identifiers, comments etc. In this paper, we analyze how these terms are spread over the source artifacts using latent semantic indexing, an information retrieval technique. We use the assumption that parts of the system that use similar terms are related. We cluster artifacts that use similar terms, and we reveal the most relevant terms for the computed clusters. Our approach works at the level of the source code which makes it language independent. Nevertheless, we correlated the semantics with structural information and we applied it at different levels of abstraction (e.g. classes, methods). We applied our approach on three large case studies and we report the results we obtained.
{"title":"Enriching reverse engineering with semantic clustering","authors":"Adrian Kuhn, Stéphane Ducasse, Tudor Gîrba","doi":"10.1109/WCRE.2005.16","DOIUrl":"https://doi.org/10.1109/WCRE.2005.16","url":null,"abstract":"Understanding a software system by just analyzing the structure of the system reveals only half of the picture, since the structure tells us only how the code is working but not what the code is about. What the code is about can be found in the semantics of the source code: names of identifiers, comments etc. In this paper, we analyze how these terms are spread over the source artifacts using latent semantic indexing, an information retrieval technique. We use the assumption that parts of the system that use similar terms are related. We cluster artifacts that use similar terms, and we reveal the most relevant terms for the computed clusters. Our approach works at the level of the source code which makes it language independent. Nevertheless, we correlated the semantics with structural information and we applied it at different levels of abstraction (e.g. classes, methods). We applied our approach on three large case studies and we report the results we obtained.","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131927935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}