内存数据库支持源代码搜索和分析

O. Panchenko
{"title":"内存数据库支持源代码搜索和分析","authors":"O. Panchenko","doi":"10.1109/WCRE.2011.60","DOIUrl":null,"url":null,"abstract":"Software engineers are coerced to deal with a large amount of information about source code. Appropriate tools could assist to handle it, but existing tools are not capable of processing and presenting such a large amount of information sufficiently. With the advent of in-memory column-oriented databases the performance of some data-intensive applications could be significantly improved. This has resulted in a completely new user experience of those applications and enabled new use-cases. This PhD thesis investigates the applicability of in-memory column-oriented databases for supporting daily software engineering activities. The major research question addressed in this thesis is as follows: does in-memory column-oriented database technology provide the necessary performance advantages for working interactively with large amounts of fine-grained structural information about source code? To investigate this research question two scenarios have been selected that particularly suffer from low performance. The first selected scenario is source code search. Existing source code repositories contain a large amount of structural data. Interface definitions, abstract syntax trees, and call graphs are examples of such structural data. Existing tools have solved the performance problems either by reducing the amount of data because of using a coarse-grained representation, or by preparing answers to developers' questions in advance, or by reducing the scope of search. All currently existing alternatives result in the loss of developers' productivity. The second scenario is source code analytics. To complete reverse engineering tasks software engineers often are required to analyze a number of atomic facts that have been extracted from source code. Examples of such atomic facts are occurrences of certain syntactic patterns in code, software product metrics or violations of development guidelines. Each fact typically has several characteristics, such as the type of the fact, the location in code where found, and some attributes. Particularly, analysis of large software systems requires the ability to process a large amount of such facts efficiently. During industrial experiments conducted for this thesis it was evidenced that in-memory technology provides performance gains that improve developers' productivity and enable scenarios previously not possible. This thesis overlaps both software engineering and database technology. From the viewpoint of software engineering, it seeks to find a way to support developers in dealing with a large amount of structural data. From the viewpoint of database technology, source code search and analytics are domains for studying fundamental issues of storing and querying structural data.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"In-Memory Database Support for Source Code Search and Analytics\",\"authors\":\"O. Panchenko\",\"doi\":\"10.1109/WCRE.2011.60\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software engineers are coerced to deal with a large amount of information about source code. Appropriate tools could assist to handle it, but existing tools are not capable of processing and presenting such a large amount of information sufficiently. With the advent of in-memory column-oriented databases the performance of some data-intensive applications could be significantly improved. This has resulted in a completely new user experience of those applications and enabled new use-cases. This PhD thesis investigates the applicability of in-memory column-oriented databases for supporting daily software engineering activities. The major research question addressed in this thesis is as follows: does in-memory column-oriented database technology provide the necessary performance advantages for working interactively with large amounts of fine-grained structural information about source code? To investigate this research question two scenarios have been selected that particularly suffer from low performance. The first selected scenario is source code search. Existing source code repositories contain a large amount of structural data. Interface definitions, abstract syntax trees, and call graphs are examples of such structural data. Existing tools have solved the performance problems either by reducing the amount of data because of using a coarse-grained representation, or by preparing answers to developers' questions in advance, or by reducing the scope of search. All currently existing alternatives result in the loss of developers' productivity. The second scenario is source code analytics. To complete reverse engineering tasks software engineers often are required to analyze a number of atomic facts that have been extracted from source code. Examples of such atomic facts are occurrences of certain syntactic patterns in code, software product metrics or violations of development guidelines. Each fact typically has several characteristics, such as the type of the fact, the location in code where found, and some attributes. Particularly, analysis of large software systems requires the ability to process a large amount of such facts efficiently. During industrial experiments conducted for this thesis it was evidenced that in-memory technology provides performance gains that improve developers' productivity and enable scenarios previously not possible. This thesis overlaps both software engineering and database technology. From the viewpoint of software engineering, it seeks to find a way to support developers in dealing with a large amount of structural data. From the viewpoint of database technology, source code search and analytics are domains for studying fundamental issues of storing and querying structural data.\",\"PeriodicalId\":350863,\"journal\":{\"name\":\"2011 18th Working Conference on Reverse Engineering\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 18th Working Conference on Reverse Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WCRE.2011.60\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 18th Working Conference on Reverse Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCRE.2011.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

软件工程师被迫处理大量关于源代码的信息。适当的工具可以帮助处理它,但现有的工具无法充分处理和呈现如此大量的信息。随着内存中面向列的数据库的出现,一些数据密集型应用程序的性能可以得到显著提高。这导致了这些应用程序的全新用户体验,并启用了新的用例。这篇博士论文研究了内存中面向列的数据库在支持日常软件工程活动中的适用性。本文主要研究的问题如下:内存中面向列的数据库技术是否为与大量细粒度的源代码结构信息交互提供了必要的性能优势?为了调查这个研究问题,选择了两个特别遭受低绩效的场景。第一个选择的场景是源代码搜索。现有的源代码存储库包含大量的结构数据。接口定义、抽象语法树和调用图都是这种结构数据的例子。现有的工具通过减少数据量(因为使用粗粒度表示)、提前准备对开发人员的问题的答案、或者减少搜索范围来解决性能问题。所有当前存在的替代方案都会导致开发人员生产力的损失。第二个场景是源代码分析。为了完成逆向工程任务,软件工程师通常需要分析从源代码中提取的大量原子事实。这种原子事实的例子是代码中某些语法模式的出现、软件产品度量或对开发指南的违反。每个事实通常有几个特征,比如事实的类型、在代码中找到的位置和一些属性。特别是,大型软件系统的分析需要能够有效地处理大量这样的事实。在为本论文进行的工业实验中,证明内存技术提供了性能提升,提高了开发人员的生产力,并实现了以前不可能实现的场景。本文涉及软件工程和数据库技术。从软件工程的角度来看,它试图找到一种方法来支持开发人员处理大量的结构数据。从数据库技术的角度来看,源代码搜索和分析是研究结构化数据存储和查询基本问题的领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
In-Memory Database Support for Source Code Search and Analytics
Software engineers are coerced to deal with a large amount of information about source code. Appropriate tools could assist to handle it, but existing tools are not capable of processing and presenting such a large amount of information sufficiently. With the advent of in-memory column-oriented databases the performance of some data-intensive applications could be significantly improved. This has resulted in a completely new user experience of those applications and enabled new use-cases. This PhD thesis investigates the applicability of in-memory column-oriented databases for supporting daily software engineering activities. The major research question addressed in this thesis is as follows: does in-memory column-oriented database technology provide the necessary performance advantages for working interactively with large amounts of fine-grained structural information about source code? To investigate this research question two scenarios have been selected that particularly suffer from low performance. The first selected scenario is source code search. Existing source code repositories contain a large amount of structural data. Interface definitions, abstract syntax trees, and call graphs are examples of such structural data. Existing tools have solved the performance problems either by reducing the amount of data because of using a coarse-grained representation, or by preparing answers to developers' questions in advance, or by reducing the scope of search. All currently existing alternatives result in the loss of developers' productivity. The second scenario is source code analytics. To complete reverse engineering tasks software engineers often are required to analyze a number of atomic facts that have been extracted from source code. Examples of such atomic facts are occurrences of certain syntactic patterns in code, software product metrics or violations of development guidelines. Each fact typically has several characteristics, such as the type of the fact, the location in code where found, and some attributes. Particularly, analysis of large software systems requires the ability to process a large amount of such facts efficiently. During industrial experiments conducted for this thesis it was evidenced that in-memory technology provides performance gains that improve developers' productivity and enable scenarios previously not possible. This thesis overlaps both software engineering and database technology. From the viewpoint of software engineering, it seeks to find a way to support developers in dealing with a large amount of structural data. From the viewpoint of database technology, source code search and analytics are domains for studying fundamental issues of storing and querying structural data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reverse Engineering Co-maintenance Relationships Using Conceptual Analysis of Source Code Renovation by Machine-Assisted Program Transformation in Production Reporting and Integration Reasoning over the Evolution of Source Code Using Quantified Regular Path Expressions An Exploratory Study of Software Reverse Engineering in a Security Context Analyzing the Source Code of Multiple Software Variants for Reuse Potential
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1