Proceedings of the First International Workshop on Software Mining最新文献

英文中文

Proceedings of the First International Workshop on Software Mining 第一届软件挖掘国际研讨会论文集

Proceedings of the First International Workshop on Software Mining

Pub Date : 2012-08-12 DOI: 10.1145/2384416

Ming Li, Hongyu Zhang, D. Lo

Software systems have been playing important roles in business, scientific research, and our everyday lives. It is critical to improve both software productivity and quality, which are major challenges to software engineering researchers and practitioners. In recent years, software mining has emerged as a promising means to address these challenges. It has been successfully applied to discover knowledge from software artifacts (e.g., specifications, source code, documentations, execution logs, and bug reports) to improve software quality and development process (e.g., to obtain the insights for the causes leading to poor software quality, to help software engineers locate and identify problems quickly, and to help the managers optimize the resources for better productivity). Software mining has attracted much attention in both software engineering and data mining communities. The first International Workshop on Software Mining (SoftwareMining-2012) aims to bridge research in the data mining community and software engineering community by providing an open and interactive forum for researchers who are interested in software mining to discuss the methodologies and technical foundations of software mining, approaches and techniques for mining various types of software-related data, applications of data mining to facilitate specialized tasks in software engineering. The participants of diverse background in either data mining or software engineering can benefit from this workshop by sharing their expertise, exchanging ideas and discussing new research results.

软件系统在商业、科学研究和我们的日常生活中一直扮演着重要的角色。提高软件的生产力和质量是至关重要的，这是软件工程研究者和实践者面临的主要挑战。近年来，软件挖掘已经成为解决这些挑战的一种很有前途的方法。它已经成功地应用于从软件工件(例如，规格说明、源代码、文档、执行日志和bug报告)中发现知识，以改进软件质量和开发过程(例如，获得导致软件质量差的原因的见解，帮助软件工程师快速定位和识别问题，并帮助管理人员优化资源以获得更好的生产力)。软件挖掘已经引起了软件工程和数据挖掘社区的广泛关注。首届软件挖掘国际研讨会(SoftwareMining-2012)旨在通过为对软件挖掘感兴趣的研究人员提供一个开放和互动的论坛，讨论软件挖掘的方法和技术基础，挖掘各种类型软件相关数据的方法和技术，以及数据挖掘在软件工程中促进专业任务的应用，从而在数据挖掘界和软件工程界的研究之间架起一座桥梁。来自不同背景的数据挖掘或软件工程的参与者可以通过分享他们的专业知识、交流思想和讨论新的研究成果而受益。

{"title":"Proceedings of the First International Workshop on Software Mining","authors":"Ming Li, Hongyu Zhang, D. Lo","doi":"10.1145/2384416","DOIUrl":"https://doi.org/10.1145/2384416","url":null,"abstract":"Software systems have been playing important roles in business, scientific research, and our everyday lives. It is critical to improve both software productivity and quality, which are major challenges to software engineering researchers and practitioners. In recent years, software mining has emerged as a promising means to address these challenges. It has been successfully applied to discover knowledge from software artifacts (e.g., specifications, source code, documentations, execution logs, and bug reports) to improve software quality and development process (e.g., to obtain the insights for the causes leading to poor software quality, to help software engineers locate and identify problems quickly, and to help the managers optimize the resources for better productivity). Software mining has attracted much attention in both software engineering and data mining communities. \u0000 \u0000The first International Workshop on Software Mining (SoftwareMining-2012) aims to bridge research in the data mining community and software engineering community by providing an open and interactive forum for researchers who are interested in software mining to discuss the methodologies and technical foundations of software mining, approaches and techniques for mining various types of software-related data, applications of data mining to facilitate specialized tasks in software engineering. The participants of diverse background in either data mining or software engineering can benefit from this workshop by sharing their expertise, exchanging ideas and discussing new research results.","PeriodicalId":153000,"journal":{"name":"Proceedings of the First International Workshop on Software Mining","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125212571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Software systems through complex networks science: review, analysis and applications 软件系统通过复杂的网络科学:审查，分析和应用

Proceedings of the First International Workshop on Software Mining

Pub Date : 2012-08-12 DOI: 10.1145/2384416.2384418

L. Šubelj, M. Bajec

Complex software systems are among most sophisticated human-made systems, yet only little is known about the actual structure of 'good' software. We here study different software systems developed in Java from the perspective of network science. The study reveals that network theory can provide a prominent set of techniques for the exploratory analysis of large complex software system. We further identify several applications in software engineering, and propose different network-based quality indicators that address software design, efficiency, reusability, vulnerability, controllability and other. We also highlight various interesting findings, e.g., software systems are highly vulnerable to processes like bug propagation, however, they are not easily controllable.

复杂的软件系统是最复杂的人造系统之一，但人们对“好”软件的实际结构知之甚少。本文从网络科学的角度研究用Java开发的不同软件系统。研究表明，网络理论可以为大型复杂软件系统的探索性分析提供一套突出的技术。我们进一步确定了软件工程中的几种应用，并提出了不同的基于网络的质量指标，以解决软件设计、效率、可重用性、脆弱性、可控性等问题。我们还强调了各种有趣的发现，例如，软件系统非常容易受到像bug传播这样的过程的攻击，然而，它们不容易控制。

引用次数: 60

Source code identifier splitting using Yahoo image and web search engine 源代码标识符拆分使用雅虎图像和网络搜索引擎

Proceedings of the First International Workshop on Software Mining

Pub Date : 2012-08-12 DOI: 10.1145/2384416.2384417

A. Sureka

Source-code or program identifiers are sequence of characters consisting of one or more tokens representing domain concepts. Splitting or tokenizing identifiers that does not contain explicit markers or clues (such as came-casing or using underscore as a token separator) is a technically challenging problem. In this paper, we present a technique for automatic tokenization and splitting of source-code identifiers using Yahoo web search and image search similarity distance. We present an algorithm that decides the split position based on various factors such as conceptual correlations and semantic relatedness between the left and right splits strings of a given identifier, popularity of the token and its length. The number of hits or search results returned by the web and image search engine serves as a proxy to measures such as term popularity and correlation. We perform a series of experiments to validate the proposed approach and present performance results.

源代码或程序标识符是由表示领域概念的一个或多个令牌组成的字符序列。拆分或标记不包含显式标记或线索的标识符(例如大小写或使用下划线作为标记分隔符)是一个具有技术挑战性的问题。在本文中，我们提出了一种使用雅虎网页搜索和图像搜索相似距离对源代码标识符进行自动标记和分割的技术。我们提出了一种算法，该算法基于各种因素，如给定标识符的左右分割字符串之间的概念相关性和语义相关性，令牌的流行程度及其长度来决定分割位置。网络和图像搜索引擎返回的点击量或搜索结果的数量可以作为衡量术语流行度和相关性等指标的代理。我们进行了一系列的实验来验证所提出的方法和目前的性能结果。

引用次数: 6

Labeled topic detection of open source software from mining mass textual project profiles 基于海量文本项目概要的开源软件标记主题检测

Proceedings of the First International Workshop on Software Mining

Pub Date : 2012-08-12 DOI: 10.1145/2384416.2384419

Tao Wang, Gang Yin, Xiang Li, Huaimin Wang

Nowadays open source software has become an indispensable basis for both individual and industrial software engineering. Various kinds of labeling mechanisms like categories, keywords and tags are used in open source communities to annotate projects and facilitate the discovery of certain software. However, as large amounts of software are attached with no/few labels or the existing labels are from different ontology space, it is still hard to retrieve potentially topic-relevant software. This paper highlights the valuable semantic information of project descriptions and labels, proposes labeled software topic detection (LSTD), a hybrid approach combining topic models and ranking mechanisms to detect and enrich the topics of software by mining the large amount of textual software profiles, which can be employed to do software categorization and tag recommendation. L-STD makes use of labeled LDA to capture the semantic correlations between labels and descriptions and then construct the label-based topic-word matrix. Based on the generated matrix and the generality of labels, LSTD designs a simple yet efficient algorithm to detect the latent topics of software that expressed as relevant and popular labels. Comprehensive evaluations are conducted on the large-scale datasets of representative open source communities and the results validate the effectiveness of LSTD.

如今，开源软件已经成为个人和工业软件工程不可或缺的基础。各种各样的标记机制，如类别、关键字和标签，在开源社区中被用来注释项目和促进某些软件的发现。然而，由于大量的软件没有标签或标签很少，或者现有的标签来自不同的本体空间，因此仍然很难检索到潜在的主题相关软件。本文突出项目描述和标签中有价值的语义信息，提出了标签软件主题检测(labeled software topic detection, LSTD)方法，即主题模型和排序机制相结合的混合方法，通过挖掘大量的文本软件概要来检测和丰富软件主题，并将其用于软件分类和标签推荐。L-STD利用带标签的LDA捕获标签和描述之间的语义相关性，然后构造基于标签的主题词矩阵。基于生成的矩阵和标签的通用性，LSTD设计了一种简单而高效的算法来检测软件的潜在主题，这些主题表示为相关和流行的标签。在具有代表性的开源社区的大规模数据集上进行了综合评价，结果验证了LSTD的有效性。

{"title":"Labeled topic detection of open source software from mining mass textual project profiles","authors":"Tao Wang, Gang Yin, Xiang Li, Huaimin Wang","doi":"10.1145/2384416.2384419","DOIUrl":"https://doi.org/10.1145/2384416.2384419","url":null,"abstract":"Nowadays open source software has become an indispensable basis for both individual and industrial software engineering. Various kinds of labeling mechanisms like categories, keywords and tags are used in open source communities to annotate projects and facilitate the discovery of certain software. However, as large amounts of software are attached with no/few labels or the existing labels are from different ontology space, it is still hard to retrieve potentially topic-relevant software. This paper highlights the valuable semantic information of project descriptions and labels, proposes labeled software topic detection (LSTD), a hybrid approach combining topic models and ranking mechanisms to detect and enrich the topics of software by mining the large amount of textual software profiles, which can be employed to do software categorization and tag recommendation. L-STD makes use of labeled LDA to capture the semantic correlations between labels and descriptions and then construct the label-based topic-word matrix. Based on the generated matrix and the generality of labels, LSTD designs a simple yet efficient algorithm to detect the latent topics of software that expressed as relevant and popular labels. Comprehensive evaluations are conducted on the large-scale datasets of representative open source communities and the results validate the effectiveness of LSTD.","PeriodicalId":153000,"journal":{"name":"Proceedings of the First International Workshop on Software Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126540459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Rank-directed layout of UML class diagrams UML类图的排序布局

Proceedings of the First International Workshop on Software Mining

Pub Date : 2012-08-12 DOI: 10.1145/2384416.2384420

Hao-Ji Hu, Jun Fang, Zhengcai Lu, Fengfei Zhao, Zheng Qin

UML class diagram layout is an important task in software visualization to enhance people's comprehension about the systems. In this paper, we describe a novel UML class diagram layout algorithm, called rank-directed method, which captures the difference in relationships among classes and stresses significant classes. As a layout algorithm, rank-directed method supports the clustering of classes according to the inherent characteristics of classes. To recognize the significance of classes, we applied PageRank algorithms through abstracting relationships among different classes as the link among web pages. We assume that important classes have more relationships with other classes. To emphasize the important classes, rank-directed method adopts a sub graph layout method based on clustering of classes. We have developed a UML class diagram layout platform to evaluate our method. Our evaluation shows that rank-directed method could effectively recognize the important classes and layout the class diagram with higher readability than traditional layout methods do.

UML类图布局是软件可视化中的一项重要任务，可以增强人们对系统的理解。在本文中，我们描述了一种新的UML类图布局算法，称为秩导向法，它捕捉类之间关系的差异，并强调重要的类。排序导向法作为一种布局算法，支持根据类的固有特征对类进行聚类。为了认识类的重要性，我们采用PageRank算法，将不同类之间的关系抽象为网页之间的链接。我们假设重要的类与其他类有更多的关系。为了突出重要的类，rank-directed method采用了基于类聚类的子图布局方法。我们已经开发了一个UML类图布局平台来评估我们的方法。我们的评估表明，排序导向方法可以有效地识别重要类，并且比传统的布局方法布局类图具有更高的可读性。

引用次数: 11

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the First International Workshop on Software Mining

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀