Arghavan Moradi Dakhel, M. Desmarais, Foutse Khomh
{"title":"Assessing Developer Expertise from the Statistical Distribution of Programming Syntax Patterns","authors":"Arghavan Moradi Dakhel, M. Desmarais, Foutse Khomh","doi":"10.1145/3463274.3463343","DOIUrl":null,"url":null,"abstract":"Accurate assessment of developer expertise is crucial for the assignment of an individual to perform a task or, more generally, to be involved in a project that requires an adequate level of knowledge. Potential programmers can come from a large pool. Therefore, automatic means to provide such assessment of expertise from written programs would be highly valuable in such context. Previous works towards this goal have generally used heuristics such as Line 10 Rule or linguistic information in source files such as comments or identifiers to represent the knowledge of developers and evaluate their expertise. In this paper, we focus on syntactic patterns mastery as an evidence of knowledge in programming and propose a theoretical definition of programming knowledge based on the distribution of Syntax Patterns (SPs) in source code, namely Zipf’s law. We first validate the model and its scalability over synthetic data of “Expert” and “Novice” programmers. This provides a ground truth and allows us to explore the space of validity of the model. Then, we assess the performance of the model over real data from programmers. The results show that our proposed approach outperforms the recent state of the art approaches for the task of classifying programming experts.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3463274.3463343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Accurate assessment of developer expertise is crucial for the assignment of an individual to perform a task or, more generally, to be involved in a project that requires an adequate level of knowledge. Potential programmers can come from a large pool. Therefore, automatic means to provide such assessment of expertise from written programs would be highly valuable in such context. Previous works towards this goal have generally used heuristics such as Line 10 Rule or linguistic information in source files such as comments or identifiers to represent the knowledge of developers and evaluate their expertise. In this paper, we focus on syntactic patterns mastery as an evidence of knowledge in programming and propose a theoretical definition of programming knowledge based on the distribution of Syntax Patterns (SPs) in source code, namely Zipf’s law. We first validate the model and its scalability over synthetic data of “Expert” and “Novice” programmers. This provides a ground truth and allows us to explore the space of validity of the model. Then, we assess the performance of the model over real data from programmers. The results show that our proposed approach outperforms the recent state of the art approaches for the task of classifying programming experts.