Code clones have been a hot topic in software engineering for decades. Due to the rapid development of clone detection techniques, it is not difficult to find code clones in software systems, while managing the vast amounts of clones remains an open problem. Typically, we should adopt refactoring approaches to eliminate clones, thereby mitigating the threat to software maintenance. In some situations, the clone group may contain several different code variants that reside in different locations, thus making refactoring too complicated, as their differences must be analyzed and reconciled before refactoring. Therefore, we should find an approach to recognize clone groups that are easy to refactor or eliminate. In this paper, we first collected large-scale datasets from three different domains and studied the distribution of four different metrics of code clones. We found that the distribution of each metric follows a certain pattern, the number of inner file clone accounts for approximately 50 %, the number of Type3 clone accounts for above 45 %. But we cannot judge the complexity of code clone groups based solely on these metrics. Based on our findings, we propose a classification approach to assist developers to find clone groups that are easy to eliminate by refactoring from those that are hard to refactor. We propose four different clone feature entropy measures based on information entropy theory, including variant entropy, distribution entropy, relation entropy, and syntactic entropy. Then, we calculate fused clone entropy, which is the weighted summation of the above four clone feature entropy. Finally, we use the four types of feature entropy and the fused feature entropy to classify or rank code clone groups. Experiments on three different application domains show that the proposed clone feature entropy can help developers identify clone groups that are easy to eliminate by refactoring. Manual validation also reveals that the complexity of clone groups is not solely dependent on the number of clone instances. This approach provides a new way to manage code clones and offers some useful ideas for future clone maintenance research.
扫码关注我们
求助内容:
应助结果提醒方式:
