Software systems have been playing important roles in business, scientific research, and our everyday lives. It is critical to improve both software productivity and quality, which are major challenges to software engineering researchers and practitioners. In recent years, software mining has emerged as a promising means to address these challenges. It has been successfully applied to discover knowledge from software artifacts (e.g., specifications, source code, documentations, execution logs, and bug reports) to improve software quality and development process (e.g., to obtain the insights for the causes leading to poor software quality, to help software engineers locate and identify problems quickly, and to help the managers optimize the resources for better productivity). Software mining has attracted much attention in both software engineering and data mining communities. The first International Workshop on Software Mining (SoftwareMining-2012) aims to bridge research in the data mining community and software engineering community by providing an open and interactive forum for researchers who are interested in software mining to discuss the methodologies and technical foundations of software mining, approaches and techniques for mining various types of software-related data, applications of data mining to facilitate specialized tasks in software engineering. The participants of diverse background in either data mining or software engineering can benefit from this workshop by sharing their expertise, exchanging ideas and discussing new research results.
{"title":"Proceedings of the First International Workshop on Software Mining","authors":"Ming Li, Hongyu Zhang, D. Lo","doi":"10.1145/2384416","DOIUrl":"https://doi.org/10.1145/2384416","url":null,"abstract":"Software systems have been playing important roles in business, scientific research, and our everyday lives. It is critical to improve both software productivity and quality, which are major challenges to software engineering researchers and practitioners. In recent years, software mining has emerged as a promising means to address these challenges. It has been successfully applied to discover knowledge from software artifacts (e.g., specifications, source code, documentations, execution logs, and bug reports) to improve software quality and development process (e.g., to obtain the insights for the causes leading to poor software quality, to help software engineers locate and identify problems quickly, and to help the managers optimize the resources for better productivity). Software mining has attracted much attention in both software engineering and data mining communities. \u0000 \u0000The first International Workshop on Software Mining (SoftwareMining-2012) aims to bridge research in the data mining community and software engineering community by providing an open and interactive forum for researchers who are interested in software mining to discuss the methodologies and technical foundations of software mining, approaches and techniques for mining various types of software-related data, applications of data mining to facilitate specialized tasks in software engineering. The participants of diverse background in either data mining or software engineering can benefit from this workshop by sharing their expertise, exchanging ideas and discussing new research results.","PeriodicalId":153000,"journal":{"name":"Proceedings of the First International Workshop on Software Mining","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125212571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Complex software systems are among most sophisticated human-made systems, yet only little is known about the actual structure of 'good' software. We here study different software systems developed in Java from the perspective of network science. The study reveals that network theory can provide a prominent set of techniques for the exploratory analysis of large complex software system. We further identify several applications in software engineering, and propose different network-based quality indicators that address software design, efficiency, reusability, vulnerability, controllability and other. We also highlight various interesting findings, e.g., software systems are highly vulnerable to processes like bug propagation, however, they are not easily controllable.
{"title":"Software systems through complex networks science: review, analysis and applications","authors":"L. Šubelj, M. Bajec","doi":"10.1145/2384416.2384418","DOIUrl":"https://doi.org/10.1145/2384416.2384418","url":null,"abstract":"Complex software systems are among most sophisticated human-made systems, yet only little is known about the actual structure of 'good' software. We here study different software systems developed in Java from the perspective of network science. The study reveals that network theory can provide a prominent set of techniques for the exploratory analysis of large complex software system. We further identify several applications in software engineering, and propose different network-based quality indicators that address software design, efficiency, reusability, vulnerability, controllability and other. We also highlight various interesting findings, e.g., software systems are highly vulnerable to processes like bug propagation, however, they are not easily controllable.","PeriodicalId":153000,"journal":{"name":"Proceedings of the First International Workshop on Software Mining","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116390379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Source-code or program identifiers are sequence of characters consisting of one or more tokens representing domain concepts. Splitting or tokenizing identifiers that does not contain explicit markers or clues (such as came-casing or using underscore as a token separator) is a technically challenging problem. In this paper, we present a technique for automatic tokenization and splitting of source-code identifiers using Yahoo web search and image search similarity distance. We present an algorithm that decides the split position based on various factors such as conceptual correlations and semantic relatedness between the left and right splits strings of a given identifier, popularity of the token and its length. The number of hits or search results returned by the web and image search engine serves as a proxy to measures such as term popularity and correlation. We perform a series of experiments to validate the proposed approach and present performance results.
{"title":"Source code identifier splitting using Yahoo image and web search engine","authors":"A. Sureka","doi":"10.1145/2384416.2384417","DOIUrl":"https://doi.org/10.1145/2384416.2384417","url":null,"abstract":"Source-code or program identifiers are sequence of characters consisting of one or more tokens representing domain concepts. Splitting or tokenizing identifiers that does not contain explicit markers or clues (such as came-casing or using underscore as a token separator) is a technically challenging problem. In this paper, we present a technique for automatic tokenization and splitting of source-code identifiers using Yahoo web search and image search similarity distance. We present an algorithm that decides the split position based on various factors such as conceptual correlations and semantic relatedness between the left and right splits strings of a given identifier, popularity of the token and its length. The number of hits or search results returned by the web and image search engine serves as a proxy to measures such as term popularity and correlation. We perform a series of experiments to validate the proposed approach and present performance results.","PeriodicalId":153000,"journal":{"name":"Proceedings of the First International Workshop on Software Mining","volume":"77 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126021911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays open source software has become an indispensable basis for both individual and industrial software engineering. Various kinds of labeling mechanisms like categories, keywords and tags are used in open source communities to annotate projects and facilitate the discovery of certain software. However, as large amounts of software are attached with no/few labels or the existing labels are from different ontology space, it is still hard to retrieve potentially topic-relevant software. This paper highlights the valuable semantic information of project descriptions and labels, proposes labeled software topic detection (LSTD), a hybrid approach combining topic models and ranking mechanisms to detect and enrich the topics of software by mining the large amount of textual software profiles, which can be employed to do software categorization and tag recommendation. L-STD makes use of labeled LDA to capture the semantic correlations between labels and descriptions and then construct the label-based topic-word matrix. Based on the generated matrix and the generality of labels, LSTD designs a simple yet efficient algorithm to detect the latent topics of software that expressed as relevant and popular labels. Comprehensive evaluations are conducted on the large-scale datasets of representative open source communities and the results validate the effectiveness of LSTD.
{"title":"Labeled topic detection of open source software from mining mass textual project profiles","authors":"Tao Wang, Gang Yin, Xiang Li, Huaimin Wang","doi":"10.1145/2384416.2384419","DOIUrl":"https://doi.org/10.1145/2384416.2384419","url":null,"abstract":"Nowadays open source software has become an indispensable basis for both individual and industrial software engineering. Various kinds of labeling mechanisms like categories, keywords and tags are used in open source communities to annotate projects and facilitate the discovery of certain software. However, as large amounts of software are attached with no/few labels or the existing labels are from different ontology space, it is still hard to retrieve potentially topic-relevant software. This paper highlights the valuable semantic information of project descriptions and labels, proposes labeled software topic detection (LSTD), a hybrid approach combining topic models and ranking mechanisms to detect and enrich the topics of software by mining the large amount of textual software profiles, which can be employed to do software categorization and tag recommendation. L-STD makes use of labeled LDA to capture the semantic correlations between labels and descriptions and then construct the label-based topic-word matrix. Based on the generated matrix and the generality of labels, LSTD designs a simple yet efficient algorithm to detect the latent topics of software that expressed as relevant and popular labels. Comprehensive evaluations are conducted on the large-scale datasets of representative open source communities and the results validate the effectiveness of LSTD.","PeriodicalId":153000,"journal":{"name":"Proceedings of the First International Workshop on Software Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126540459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao-Ji Hu, Jun Fang, Zhengcai Lu, Fengfei Zhao, Zheng Qin
UML class diagram layout is an important task in software visualization to enhance people's comprehension about the systems. In this paper, we describe a novel UML class diagram layout algorithm, called rank-directed method, which captures the difference in relationships among classes and stresses significant classes. As a layout algorithm, rank-directed method supports the clustering of classes according to the inherent characteristics of classes. To recognize the significance of classes, we applied PageRank algorithms through abstracting relationships among different classes as the link among web pages. We assume that important classes have more relationships with other classes. To emphasize the important classes, rank-directed method adopts a sub graph layout method based on clustering of classes. We have developed a UML class diagram layout platform to evaluate our method. Our evaluation shows that rank-directed method could effectively recognize the important classes and layout the class diagram with higher readability than traditional layout methods do.
{"title":"Rank-directed layout of UML class diagrams","authors":"Hao-Ji Hu, Jun Fang, Zhengcai Lu, Fengfei Zhao, Zheng Qin","doi":"10.1145/2384416.2384420","DOIUrl":"https://doi.org/10.1145/2384416.2384420","url":null,"abstract":"UML class diagram layout is an important task in software visualization to enhance people's comprehension about the systems. In this paper, we describe a novel UML class diagram layout algorithm, called rank-directed method, which captures the difference in relationships among classes and stresses significant classes. As a layout algorithm, rank-directed method supports the clustering of classes according to the inherent characteristics of classes. To recognize the significance of classes, we applied PageRank algorithms through abstracting relationships among different classes as the link among web pages. We assume that important classes have more relationships with other classes. To emphasize the important classes, rank-directed method adopts a sub graph layout method based on clustering of classes. We have developed a UML class diagram layout platform to evaluate our method. Our evaluation shows that rank-directed method could effectively recognize the important classes and layout the class diagram with higher readability than traditional layout methods do.","PeriodicalId":153000,"journal":{"name":"Proceedings of the First International Workshop on Software Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121811345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}