{"title":"使用降维技术来理解软件复杂性的来源","authors":"B. Johnson, R. Simha","doi":"10.1109/ICCSM57214.2022.00009","DOIUrl":null,"url":null,"abstract":"Despite significant work in the area of software complexity, there are still numerous unanswered questions about the sources and locations of complexity and its relationship to software design and programming language features. In this paper, we attempt to illuminate these questions by applying code-agnostic statistical dimensionality reduction techniques to a large dataset of 3000 popular open source Java programs.We analyze our set of projects to determine key attributes of Java program composition and complexity, using standard metrics from previous work. We apply two proven dimensionality reduction techniques, Principle Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) to explore the relationships between complexity models and program composition. We find support for three primary sources of Java software complexity and note that particular projects are most often associated primarily with one variety. Our results have potential implications for source code analysis and programming language design.","PeriodicalId":426673,"journal":{"name":"2022 6th International Conference on Computer, Software and Modeling (ICCSM)","volume":"14 11","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Dimensionality Reduction Techniques to Understand the Sources of Software Complexity\",\"authors\":\"B. Johnson, R. Simha\",\"doi\":\"10.1109/ICCSM57214.2022.00009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite significant work in the area of software complexity, there are still numerous unanswered questions about the sources and locations of complexity and its relationship to software design and programming language features. In this paper, we attempt to illuminate these questions by applying code-agnostic statistical dimensionality reduction techniques to a large dataset of 3000 popular open source Java programs.We analyze our set of projects to determine key attributes of Java program composition and complexity, using standard metrics from previous work. We apply two proven dimensionality reduction techniques, Principle Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) to explore the relationships between complexity models and program composition. We find support for three primary sources of Java software complexity and note that particular projects are most often associated primarily with one variety. Our results have potential implications for source code analysis and programming language design.\",\"PeriodicalId\":426673,\"journal\":{\"name\":\"2022 6th International Conference on Computer, Software and Modeling (ICCSM)\",\"volume\":\"14 11\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 6th International Conference on Computer, Software and Modeling (ICCSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSM57214.2022.00009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th International Conference on Computer, Software and Modeling (ICCSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSM57214.2022.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Dimensionality Reduction Techniques to Understand the Sources of Software Complexity
Despite significant work in the area of software complexity, there are still numerous unanswered questions about the sources and locations of complexity and its relationship to software design and programming language features. In this paper, we attempt to illuminate these questions by applying code-agnostic statistical dimensionality reduction techniques to a large dataset of 3000 popular open source Java programs.We analyze our set of projects to determine key attributes of Java program composition and complexity, using standard metrics from previous work. We apply two proven dimensionality reduction techniques, Principle Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) to explore the relationships between complexity models and program composition. We find support for three primary sources of Java software complexity and note that particular projects are most often associated primarily with one variety. Our results have potential implications for source code analysis and programming language design.