Shaowei Wang, D. Lo, Zhenchang Xing, Lingxiao Jiang
Many software maintenance activities need to find code units (functions, files, etc.) that implement a certain concern (features, bugs, etc.). To facilitate such activities, many approaches have been proposed to automatically link code units with concerns described in natural languages, which are termed as concern localization and often employ Information Retrieval (IR) techniques. There has not been a study that evaluates and compares the effectiveness of latest IR techniques on a large dataset. This study fills this gap by investigating ten IR techniques, some of which are new and have not been used for concern localization, on a Linux kernel dataset. The Linux kernel dataset contains more than 1,500 concerns that are linked to over 85,000 C functions. We have evaluated the effectiveness of the ten techniques on recovering the links between the concerns and the implementing functions and ranked the IR techniques based on their precisions on concern localization. Keywords-concern localization; information retrieval; Linux kernel; mean average precision;
{"title":"Concern Localization using Information Retrieval: An Empirical Study on Linux Kernel","authors":"Shaowei Wang, D. Lo, Zhenchang Xing, Lingxiao Jiang","doi":"10.1109/WCRE.2011.72","DOIUrl":"https://doi.org/10.1109/WCRE.2011.72","url":null,"abstract":"Many software maintenance activities need to find code units (functions, files, etc.) that implement a certain concern (features, bugs, etc.). To facilitate such activities, many approaches have been proposed to automatically link code units with concerns described in natural languages, which are termed as concern localization and often employ Information Retrieval (IR) techniques. There has not been a study that evaluates and compares the effectiveness of latest IR techniques on a large dataset. This study fills this gap by investigating ten IR techniques, some of which are new and have not been used for concern localization, on a Linux kernel dataset. The Linux kernel dataset contains more than 1,500 concerns that are linked to over 85,000 C functions. We have evaluated the effectiveness of the ten techniques on recovering the links between the concerns and the implementing functions and ranked the IR techniques based on their precisions on concern localization. Keywords-concern localization; information retrieval; Linux kernel; mean average precision;","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131009516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While the reconstruction of the control-flow graph of a binary has received wide attention, the challenge of categorizing code into defect-free and possibly incorrect remains a challenge for current static analyses. We present the intermediate language RREIL and a corresponding analysis framework that is able to infer precise numeric information on variables without resorting to an expensive analysis at the bit-level. Specifically, we propose a hierarchy of three interfaces to abstract domains, namely for inferring memory layout, bit-level information and numeric information. Our framework can be easily enriched with new abstract domains at each level. We demonstrate the extensibility of our framework by detailing a novel acceleration technique (a so-called widening) as an abstract domain that helps to find precise fix points of loops.
{"title":"Precise Static Analysis of Binaries by Extracting Relational Information","authors":"Alexander Sepp, B. Mihaila, A. Simon","doi":"10.1109/WCRE.2011.50","DOIUrl":"https://doi.org/10.1109/WCRE.2011.50","url":null,"abstract":"While the reconstruction of the control-flow graph of a binary has received wide attention, the challenge of categorizing code into defect-free and possibly incorrect remains a challenge for current static analyses. We present the intermediate language RREIL and a corresponding analysis framework that is able to infer precise numeric information on variables without resorting to an expensive analysis at the bit-level. Specifically, we propose a hierarchy of three interfaces to abstract domains, namely for inferring memory layout, bit-level information and numeric information. Our framework can be easily enriched with new abstract domains at each level. We demonstrate the extensibility of our framework by detailing a novel acceleration technique (a so-called widening) as an abstract domain that helps to find precise fix points of loops.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125591374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Requirements trace ability ensures that source code is consistent with documentation and that all requirements have been implemented. During software evolution, features are added, removed, or modified, the code drifts away from its original requirements. Thus trace ability recovery approaches becomes necessary to re-establish the trace ability relations between requirements and source code. This paper presents an approach (Coparvo) complementary to existing trace ability recovery approaches for object-oriented programs. Coparvo reduces false positive links recovered by traditional trace ability recovery processes thus reducing the manual validation effort. Coparvo assumes that information extracted from different entities (i.e., class names, comments, class variables, or methods signatures) are different information sources, they may have different level of reliability in requirements trace ability and each information source may act as a different expert recommending trace ability links. We applied Coparvo on three data sets, Pooka, SIP Communicator, and iTrust, to filter out false positive links recovered via the information retrieval approach, i.e., vector space model. The results show that Coparvo significantly improves the of the recovered links accuracy and also reduces up to 83% effort required to manually remove false positive links.
{"title":"Requirements Traceability for Object Oriented Systems by Partitioning Source Code","authors":"Nasir Ali, Yann-Gaël Guéhéneuc, G. Antoniol","doi":"10.1109/WCRE.2011.16","DOIUrl":"https://doi.org/10.1109/WCRE.2011.16","url":null,"abstract":"Requirements trace ability ensures that source code is consistent with documentation and that all requirements have been implemented. During software evolution, features are added, removed, or modified, the code drifts away from its original requirements. Thus trace ability recovery approaches becomes necessary to re-establish the trace ability relations between requirements and source code. This paper presents an approach (Coparvo) complementary to existing trace ability recovery approaches for object-oriented programs. Coparvo reduces false positive links recovered by traditional trace ability recovery processes thus reducing the manual validation effort. Coparvo assumes that information extracted from different entities (i.e., class names, comments, class variables, or methods signatures) are different information sources, they may have different level of reliability in requirements trace ability and each information source may act as a different expert recommending trace ability links. We applied Coparvo on three data sets, Pooka, SIP Communicator, and iTrust, to filter out false positive links recovered via the information retrieval approach, i.e., vector space model. The results show that Coparvo significantly improves the of the recovered links accuracy and also reduces up to 83% effort required to manually remove false positive links.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132508505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Fokin, Egor Derevenetc, A. Chernov, K. Troshina
Decompilation is a reconstruction of a program in a high-level language from a program in a low-level language. Typical applications of decompilation are software security assessment, malware analysis, error correction and reverse engineering for interoperability. Native code decompilation is traditionally considered in the context of the C programming language. C++ presents new challenges for decompilation, since the rules of translation from C++ to assembly language are far more complex than those of C. In addition, when decompiling a program that was originally written in C++, reconstruction of C++ specific constructs is desired. In this paper we discuss new methods that allow partial recovery of C++ specific language constructs from a low-level code provided that this code was obtained from a C++ compiler. The challenges that arise when decompiling such code are described. These challenges include reconstruction of polymorphic classes, class hierarchies, member functions and exception handling constructs. An approach to decompilation that is used to overcome these challenges is presented. Smart Dec, a native code to C++ decompiler that is being developed by the authors at Select LTD is presented. It reconstructs expressions, function arguments, local and global variables, integral and composite types, loops and compound conditional statements, C++ class hierarchies and exception handling constructs. An empirical study of the decompiler is provided.
{"title":"SmartDec: Approaching C++ Decompilation","authors":"A. Fokin, Egor Derevenetc, A. Chernov, K. Troshina","doi":"10.1109/WCRE.2011.49","DOIUrl":"https://doi.org/10.1109/WCRE.2011.49","url":null,"abstract":"Decompilation is a reconstruction of a program in a high-level language from a program in a low-level language. Typical applications of decompilation are software security assessment, malware analysis, error correction and reverse engineering for interoperability. Native code decompilation is traditionally considered in the context of the C programming language. C++ presents new challenges for decompilation, since the rules of translation from C++ to assembly language are far more complex than those of C. In addition, when decompiling a program that was originally written in C++, reconstruction of C++ specific constructs is desired. In this paper we discuss new methods that allow partial recovery of C++ specific language constructs from a low-level code provided that this code was obtained from a C++ compiler. The challenges that arise when decompiling such code are described. These challenges include reconstruction of polymorphic classes, class hierarchies, member functions and exception handling constructs. An approach to decompilation that is used to overcome these challenges is presented. Smart Dec, a native code to C++ decompiler that is being developed by the authors at Select LTD is presented. It reconstructs expressions, function arguments, local and global variables, integral and composite types, loops and compound conditional statements, C++ class hierarchies and exception handling constructs. An empirical study of the decompiler is provided.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130427038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Because licensing an open source software (OSS) product restricts its reuse, the developer of the product has to consider the impact on reuse when choosing the license. However, to the best of our knowledge, there are no quantitative studies on the impact of software licenses on software reuse. To identify the impact, this paper presents a quantitative investigation into the relationship between the software license and copy-and-paste reuse on actual OSS products. The results show that the license of a product affects the frequency of reuse. On the other hand, copy-and-paste reuse occurs mostly in the source files distributed under the same license.
{"title":"An Investigation into the Impact of Software Licenses on Copy-and-paste Reuse among OSS Projects","authors":"Yu Kashima, Yasuhiro Hayase, Norihiro Yoshida, Yuki Manabe, Katsuro Inoue","doi":"10.1109/WCRE.2011.14","DOIUrl":"https://doi.org/10.1109/WCRE.2011.14","url":null,"abstract":"Because licensing an open source software (OSS) product restricts its reuse, the developer of the product has to consider the impact on reuse when choosing the license. However, to the best of our knowledge, there are no quantitative studies on the impact of software licenses on software reuse. To identify the impact, this paper presents a quantitative investigation into the relationship between the software license and copy-and-paste reuse on actual OSS products. The results show that the license of a product affects the frequency of reuse. On the other hand, copy-and-paste reuse occurs mostly in the source files distributed under the same license.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133190297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christoph Treude, Fernando Marques Figueira Filho, M. Storey, M. Salois
Illegal cyberspace activities are increasing rapidly and many software engineers are using reverse engineering methods to respond to attacks. The security-sensitive nature of these tasks, such as the understanding of malware or the decryption of encrypted content, brings unique challenges to reverse engineering: work has to be done offline, files can rarely be shared, time pressure is immense, and there is a lack of tool and process support for capturing and sharing the knowledge obtained while trying to understand plain assembly code. To help us gain an understanding of this reverse engineering work, we report on an exploratory study done in a security context at a research and development government organization to explore their work processes, tools, and artifacts. In this paper, we identify challenges, such as the management and navigation of a myriad of artifacts, and we conclude by offering suggestions for tool and process improvements.
{"title":"An Exploratory Study of Software Reverse Engineering in a Security Context","authors":"Christoph Treude, Fernando Marques Figueira Filho, M. Storey, M. Salois","doi":"10.1109/WCRE.2011.30","DOIUrl":"https://doi.org/10.1109/WCRE.2011.30","url":null,"abstract":"Illegal cyberspace activities are increasing rapidly and many software engineers are using reverse engineering methods to respond to attacks. The security-sensitive nature of these tasks, such as the understanding of malware or the decryption of encrypted content, brings unique challenges to reverse engineering: work has to be done offline, files can rarely be shared, time pressure is immense, and there is a lack of tool and process support for capturing and sharing the knowledge obtained while trying to understand plain assembly code. To help us gain an understanding of this reverse engineering work, we report on an exploratory study done in a security context at a research and development government organization to explore their work processes, tools, and artifacts. In this paper, we identify challenges, such as the management and navigation of a myriad of artifacts, and we conclude by offering suggestions for tool and process improvements.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114876417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software reuse approaches, such as software product lines, can help to achieve considerable effort and cost savings when developing families of software systems with a significant overlap in functionality. In practice, however, the need for strategic reuse often becomes apparent only after a number of product variants have already been delivered. Hence, a reuse approach has to be introduced afterwards. To plan for such a reuse introduction, it is crucial to have precise information about the distribution of commonality and variability in the source code of each system variant. However, this information is often not available because each variant has evolved independently over time and the source code does not exhibit explicit variation points. In this paper, we present Variant Analysis, a scalable reverse engineering technique that aims at delivering exactly this information. It supports simultaneous analysis of multiple source code variants and enables easy interpretation of the analysis results. We demonstrate the technique by applying it to a large industrial software system with four variants.
{"title":"Analyzing the Source Code of Multiple Software Variants for Reuse Potential","authors":"Slawomir Duszynski, J. Knodel, Martin Becker","doi":"10.1109/WCRE.2011.44","DOIUrl":"https://doi.org/10.1109/WCRE.2011.44","url":null,"abstract":"Software reuse approaches, such as software product lines, can help to achieve considerable effort and cost savings when developing families of software systems with a significant overlap in functionality. In practice, however, the need for strategic reuse often becomes apparent only after a number of product variants have already been delivered. Hence, a reuse approach has to be introduced afterwards. To plan for such a reuse introduction, it is crucial to have precise information about the distribution of commonality and variability in the source code of each system variant. However, this information is often not available because each variant has evolved independently over time and the source code does not exhibit explicit variation points. In this paper, we present Variant Analysis, a scalable reverse engineering technique that aims at delivering exactly this information. It supports simultaneous analysis of multiple source code variants and enables easy interpretation of the analysis results. We demonstrate the technique by applying it to a large industrial software system with four variants.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115089043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feature-centric comprehension of source code is essential during software evolution. However, such comprehension is oftentimes difficult to achieve due the lack of correspondence between functional features and structural units of object-oriented programs. We present a tool for feature-centric analysis of legacy Java programs called Feature us that addresses this issue. Feature us allows a programmer to easily establish feature-code trace ability links and to analyze their characteristics using a number of visualizations. Feature us is an extension to the Net Beans IDE, and can itself be extended by third-party plugins.
{"title":"Understanding Legacy Features with Featureous","authors":"Andrzej Olszak, B. Jørgensen","doi":"10.1109/WCRE.2011.64","DOIUrl":"https://doi.org/10.1109/WCRE.2011.64","url":null,"abstract":"Feature-centric comprehension of source code is essential during software evolution. However, such comprehension is oftentimes difficult to achieve due the lack of correspondence between functional features and structural units of object-oriented programs. We present a tool for feature-centric analysis of legacy Java programs called Feature us that addresses this issue. Feature us allows a programmer to easily establish feature-code trace ability links and to analyze their characteristics using a number of visualizations. Feature us is an extension to the Net Beans IDE, and can itself be extended by third-party plugins.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127608853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software development today has been largely dependent on the use of API libraries, frameworks, and reusable components. However, the API usability issues often increase the development cost (e.g., time, effort) and lower code quality. In this regard, we study 1,513 bug-posts across five different bug repositories, using both qualitative and quantitative analysis. We identify the API usability issues that are reflected in the bug-posts from the API users, and distinguish relative significance of the usability factors. Moreover, from the lessons learned by manual investigation of the bug-posts, we provide further insight into the most frequent API usability issues.
{"title":"Useful, But Usable? Factors Affecting the Usability of APIs","authors":"M. Zibran, Farjana Z. Eishita, C. Roy","doi":"10.1109/WCRE.2011.26","DOIUrl":"https://doi.org/10.1109/WCRE.2011.26","url":null,"abstract":"Software development today has been largely dependent on the use of API libraries, frameworks, and reusable components. However, the API usability issues often increase the development cost (e.g., time, effort) and lower code quality. In this regard, we study 1,513 bug-posts across five different bug repositories, using both qualitative and quantitative analysis. We identify the API usability issues that are reflected in the bug-posts from the API users, and distinguish relative significance of the usability factors. Moreover, from the lessons learned by manual investigation of the bug-posts, we provide further insight into the most frequent API usability issues.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128372301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the long-term evolution of software systems, various maintenance activities such as functionality extension, bug fixing, refactoring may positively or negatively affect the quality of design and implementation. The trend of quality degradation caused by negative affections may accumulate and cause serious difficulties for future maintenance of the software if they were not addressed properly in time. In this paper, we propose an approach for monitoring the degradation trends of software design in evolution and providing useful feedbacks for evolution decisions. The approach is based on the assumption that the deviations between different modularity views and their trends in evolution can be used to monitor the degradation trends of design. Currently, our approach considers three modularity views, namely package view, structural cluster view and semantic cluster view. Package view denotes the package structure reflecting the desired modularity view, Structural cluster view and semantic cluster view are the modularity views extracted from implementation by software clustering based on formal information and non-formal information, respectively. Then based on the three modularity views extracted from each version, our approach calculates the similarity between different views as the measurement of modularity deviations, and analyzes the deviation trends over a series of versions. We conduct an empirical study on three open-source systems, which confirms that continuous monitoring of deviation trends of modularity views can provide useful feedbacks for future evolution decisions.
{"title":"Monitoring Software Quality Evolution by Analyzing Deviation Trends of Modularity Views","authors":"Tianmei Zhu, Yijian Wu, Xin Peng, Zhenchang Xing, Wenyun Zhao","doi":"10.1109/WCRE.2011.35","DOIUrl":"https://doi.org/10.1109/WCRE.2011.35","url":null,"abstract":"In the long-term evolution of software systems, various maintenance activities such as functionality extension, bug fixing, refactoring may positively or negatively affect the quality of design and implementation. The trend of quality degradation caused by negative affections may accumulate and cause serious difficulties for future maintenance of the software if they were not addressed properly in time. In this paper, we propose an approach for monitoring the degradation trends of software design in evolution and providing useful feedbacks for evolution decisions. The approach is based on the assumption that the deviations between different modularity views and their trends in evolution can be used to monitor the degradation trends of design. Currently, our approach considers three modularity views, namely package view, structural cluster view and semantic cluster view. Package view denotes the package structure reflecting the desired modularity view, Structural cluster view and semantic cluster view are the modularity views extracted from implementation by software clustering based on formal information and non-formal information, respectively. Then based on the three modularity views extracted from each version, our approach calculates the similarity between different views as the measurement of modularity deviations, and analyzes the deviation trends over a series of versions. We conduct an empirical study on three open-source systems, which confirms that continuous monitoring of deviation trends of modularity views can provide useful feedbacks for future evolution decisions.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134369510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}