{"title":"苹果、橘子和水果——通过不同工件的镜头理解软件存储库的相似性","authors":"A. Rao, S. Chimalakonda","doi":"10.1109/ICSME55016.2022.00044","DOIUrl":null,"url":null,"abstract":"Open-source repositories have facilitated developers to reuse existing software artifacts to develop and maintain new or similar kinds of software. However, finding similar repositories is a challenging task as the notion of similarity varies depending on multiple contexts, and most of the existing approaches tend to find similar repositories by comparing similar software artifacts. This paper aims to determine \"whether dissimilar artifacts can be used as one of the criteria to find similar repositories?\" Even though, there could be dissimilarity between two similar artifacts, there could also be similarities between two dissimilar artifacts. We define the notion of similarity by defining two categories of similar repositories. Four text-based artifacts are selected for the experiment, i.e., pull-requests, issues, commits, and readme files. The textual similarity is computed between different artifacts. The results show that similarity does exist in dissimilar artifacts. We observed that 10-20% of dissimilar artifact pairs could be used in searching similar repositories. The preliminary results show promising directions where dissimilar artifacts can also be considered while searching for similar repositories motivating the need for further research.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"356 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Apples, Oranges & Fruits – Understanding Similarity of Software Repositories Through The Lens of Dissimilar Artifacts\",\"authors\":\"A. Rao, S. Chimalakonda\",\"doi\":\"10.1109/ICSME55016.2022.00044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Open-source repositories have facilitated developers to reuse existing software artifacts to develop and maintain new or similar kinds of software. However, finding similar repositories is a challenging task as the notion of similarity varies depending on multiple contexts, and most of the existing approaches tend to find similar repositories by comparing similar software artifacts. This paper aims to determine \\\"whether dissimilar artifacts can be used as one of the criteria to find similar repositories?\\\" Even though, there could be dissimilarity between two similar artifacts, there could also be similarities between two dissimilar artifacts. We define the notion of similarity by defining two categories of similar repositories. Four text-based artifacts are selected for the experiment, i.e., pull-requests, issues, commits, and readme files. The textual similarity is computed between different artifacts. The results show that similarity does exist in dissimilar artifacts. We observed that 10-20% of dissimilar artifact pairs could be used in searching similar repositories. The preliminary results show promising directions where dissimilar artifacts can also be considered while searching for similar repositories motivating the need for further research.\",\"PeriodicalId\":300084,\"journal\":{\"name\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"volume\":\"356 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSME55016.2022.00044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Apples, Oranges & Fruits – Understanding Similarity of Software Repositories Through The Lens of Dissimilar Artifacts
Open-source repositories have facilitated developers to reuse existing software artifacts to develop and maintain new or similar kinds of software. However, finding similar repositories is a challenging task as the notion of similarity varies depending on multiple contexts, and most of the existing approaches tend to find similar repositories by comparing similar software artifacts. This paper aims to determine "whether dissimilar artifacts can be used as one of the criteria to find similar repositories?" Even though, there could be dissimilarity between two similar artifacts, there could also be similarities between two dissimilar artifacts. We define the notion of similarity by defining two categories of similar repositories. Four text-based artifacts are selected for the experiment, i.e., pull-requests, issues, commits, and readme files. The textual similarity is computed between different artifacts. The results show that similarity does exist in dissimilar artifacts. We observed that 10-20% of dissimilar artifact pairs could be used in searching similar repositories. The preliminary results show promising directions where dissimilar artifacts can also be considered while searching for similar repositories motivating the need for further research.