This research paper describes an on-going effort to design, develop and improve upon malicious application detection algorithms. This work looks specifically at improving a cosine similarity, information retrieval technique to enhance detection of known and variances of known malicious applications by applying the feature extraction technique known as randomized projection. Document similarity techniques, such as cosine similarity, have been used with great success in several document retrieval applications. By following a standard information retrieval methodology, software, in machine readable format, can be regarded as documents in the corpus. These "documents" may or may not have a known malicious functionality. The query is software, again in machine readable format, which contains a certain type of malicious software. This methodology provides an ability to search the corpus with a query and retrieve/identify potentially malicious software as well as other instances of the same type of vulnerability. Retrieval is based on the similarity of the query to a given document in the corpus. There have been several efforts to overcome what is known as 'the curse of dimensionality' that can occur with the use of this type of information retrieval technique including mutual information and randomized projections. Randomized projections are used to create a low-order embedding of the high dimensional data. Results from experimentation have shown promise over previously published efforts.
{"title":"Aiding prediction algorithms in detecting high-dimensional malicious applications using a randomized projection technique","authors":"T. Atkison","doi":"10.1145/1900008.1900117","DOIUrl":"https://doi.org/10.1145/1900008.1900117","url":null,"abstract":"This research paper describes an on-going effort to design, develop and improve upon malicious application detection algorithms. This work looks specifically at improving a cosine similarity, information retrieval technique to enhance detection of known and variances of known malicious applications by applying the feature extraction technique known as randomized projection. Document similarity techniques, such as cosine similarity, have been used with great success in several document retrieval applications. By following a standard information retrieval methodology, software, in machine readable format, can be regarded as documents in the corpus. These \"documents\" may or may not have a known malicious functionality. The query is software, again in machine readable format, which contains a certain type of malicious software. This methodology provides an ability to search the corpus with a query and retrieve/identify potentially malicious software as well as other instances of the same type of vulnerability. Retrieval is based on the similarity of the query to a given document in the corpus. There have been several efforts to overcome what is known as 'the curse of dimensionality' that can occur with the use of this type of information retrieval technique including mutual information and randomized projections. Randomized projections are used to create a low-order embedding of the high dimensional data. Results from experimentation have shown promise over previously published efforts.","PeriodicalId":333104,"journal":{"name":"ACM SE '10","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133101431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Collaborative knowledge production in open source science communities such as Open Biomedical Ontologies (OBO) is poorly understood. In this paper, we present the components of a software system that is used to conduct computational ethnography and examine the growth and evolution of the OBO community. OBO is comprised of a global network of communities that are engaged in developing formal ontologies to standardize data acquisition and use in the health sciences community. The process involved in collecting and parsing open source data is presented along with a discussion of multiple socio-technical networks generated from the raw data. We evaluate the characteristics and topological change of the structure of a selected subdomain within the OBO community in terms of social network metrics.
{"title":"SciBrowser: a computational ethnography tool to explore open source science communities","authors":"Michael Arnold, Damodar Shenviwagle, L. Yilmaz","doi":"10.1145/1900008.1900045","DOIUrl":"https://doi.org/10.1145/1900008.1900045","url":null,"abstract":"Collaborative knowledge production in open source science communities such as Open Biomedical Ontologies (OBO) is poorly understood. In this paper, we present the components of a software system that is used to conduct computational ethnography and examine the growth and evolution of the OBO community. OBO is comprised of a global network of communities that are engaged in developing formal ontologies to standardize data acquisition and use in the health sciences community. The process involved in collecting and parsing open source data is presented along with a discussion of multiple socio-technical networks generated from the raw data. We evaluate the characteristics and topological change of the structure of a selected subdomain within the OBO community in terms of social network metrics.","PeriodicalId":333104,"journal":{"name":"ACM SE '10","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130293134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inadequate use of project management techniques in software development can be traced to the lack of efficient education strategies for managers [1]. Software development processes are complex and therefore it is hard to predict how changes made to some part of the process can affect the overall outcome of the process. Introducing change in the process is often time consuming and there is no assurance that the change implemented will result in an improvement. Simulation of software development process provides an easy way for managers to test the different configurations of the process and understand the effects of various policies. Using agent directed simulation to mimic the software development process at the individual level also would enable us to introduce a new phase of software development without having to change the simulation code. This simulation would start with a given number of agents initialized by the user. At any point of time, the user may change the number of developers or assign developers on different phases of the software development depending on their performance and capabilities.
{"title":"A flexible model for simulation of software development process","authors":"R. Agarwal, D. Umphress","doi":"10.1145/1900008.1900064","DOIUrl":"https://doi.org/10.1145/1900008.1900064","url":null,"abstract":"Inadequate use of project management techniques in software development can be traced to the lack of efficient education strategies for managers [1]. Software development processes are complex and therefore it is hard to predict how changes made to some part of the process can affect the overall outcome of the process. Introducing change in the process is often time consuming and there is no assurance that the change implemented will result in an improvement. Simulation of software development process provides an easy way for managers to test the different configurations of the process and understand the effects of various policies.\u0000 Using agent directed simulation to mimic the software development process at the individual level also would enable us to introduce a new phase of software development without having to change the simulation code. This simulation would start with a given number of agents initialized by the user. At any point of time, the user may change the number of developers or assign developers on different phases of the software development depending on their performance and capabilities.","PeriodicalId":333104,"journal":{"name":"ACM SE '10","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116278274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Model-Driven Engineering (MDE) has emerged as a promising paradigm in software engineering by emphasizing the use of models not just for documentation and communication purposes, but as first-class artifacts to be transformed into other work products (e.g., other models, source code, and test scripts). MDE supports full-scale round-trip engineering, from idea inception to operationalization. Historically, models have been developed using general-purpose modeling languages, such as the Unified Modeling Language (UML). A more recent trend is to use domain-specific modeling languages (DSMLs), which assist domain experts in working within their own problem space without being concerned about technical details of the solution space (e.g., programming languages and middleware). DSMLs also provide an accessible way to communicate with stakeholders who are not familiar with the fast changing technologies. This introductory tutorial will present a summary of the areas represented by MDE and offer some insight into the benefits of using DSMLs in both research and teaching.
{"title":"Model-driven engineering: raising the abstraction level through domain-specific modeling","authors":"J. Gray, Jules White, A. Gokhale","doi":"10.1145/1900008.1900010","DOIUrl":"https://doi.org/10.1145/1900008.1900010","url":null,"abstract":"Model-Driven Engineering (MDE) has emerged as a promising paradigm in software engineering by emphasizing the use of models not just for documentation and communication purposes, but as first-class artifacts to be transformed into other work products (e.g., other models, source code, and test scripts). MDE supports full-scale round-trip engineering, from idea inception to operationalization. Historically, models have been developed using general-purpose modeling languages, such as the Unified Modeling Language (UML). A more recent trend is to use domain-specific modeling languages (DSMLs), which assist domain experts in working within their own problem space without being concerned about technical details of the solution space (e.g., programming languages and middleware). DSMLs also provide an accessible way to communicate with stakeholders who are not familiar with the fast changing technologies. This introductory tutorial will present a summary of the areas represented by MDE and offer some insight into the benefits of using DSMLs in both research and teaching.","PeriodicalId":333104,"journal":{"name":"ACM SE '10","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125099079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hong Lin, J. Rushing, T. Berendes, Cara Stein, S. Graves
Spyglass is an ontology-based information retrieval system designed to help analysts explore very large collections of unstructured text documents. The tool includes two main components: server and client. The server is a web-based service that uses a specific domain ontology to index a collection of documents, answer queries from the client, and provide retrieval and visualization services based on the ontology and the resulting index. The client is a graphical user interface which allows analysts to explore the document collections, query single or multiple entities of interest of the ontology and retrieve the documents relevant to the query. The rich set of visualization tools in Spyglass will be presented in this paper.
{"title":"Visualizations for the spyglass ontology-based information analysis and retrieval system","authors":"Hong Lin, J. Rushing, T. Berendes, Cara Stein, S. Graves","doi":"10.1145/1900008.1900061","DOIUrl":"https://doi.org/10.1145/1900008.1900061","url":null,"abstract":"Spyglass is an ontology-based information retrieval system designed to help analysts explore very large collections of unstructured text documents. The tool includes two main components: server and client. The server is a web-based service that uses a specific domain ontology to index a collection of documents, answer queries from the client, and provide retrieval and visualization services based on the ontology and the resulting index. The client is a graphical user interface which allows analysts to explore the document collections, query single or multiple entities of interest of the ontology and retrieve the documents relevant to the query. The rich set of visualization tools in Spyglass will be presented in this paper.","PeriodicalId":333104,"journal":{"name":"ACM SE '10","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130189786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent discoveries using rule-based classifiers and pre-learning data clustering have helped improve classification accuracy in predictive modeling tasks. This research introduces a unique approach which combines the above techniques and studies its predictive effects. The algorithm presented in this research, a Clustering Rule-based Algorithm (CRA), first clusters the original training set using an Expectation Maximization (EM) algorithm. Then, a separate Classification and Regression Tree (CART) is trained on each individual cluster. To obtain an upper-bound on accuracy, each test instance is evaluated against all of the rules produced by each separate Tree, to determine if there exists a rule produced by one of the Trees which correctly classifies the test instance. This study reveals that a predictive accuracy of 100% was achievable. Moreover, this approach exploits the advantages of supervised and unsupervised learning to produce a more powerful and more accurate predictive model.
{"title":"A clustering rule-based approach to predictive modeling","authors":"Philicity Williams, C. Soares, J. Gilbert","doi":"10.1145/1900008.1900071","DOIUrl":"https://doi.org/10.1145/1900008.1900071","url":null,"abstract":"Recent discoveries using rule-based classifiers and pre-learning data clustering have helped improve classification accuracy in predictive modeling tasks. This research introduces a unique approach which combines the above techniques and studies its predictive effects. The algorithm presented in this research, a Clustering Rule-based Algorithm (CRA), first clusters the original training set using an Expectation Maximization (EM) algorithm. Then, a separate Classification and Regression Tree (CART) is trained on each individual cluster. To obtain an upper-bound on accuracy, each test instance is evaluated against all of the rules produced by each separate Tree, to determine if there exists a rule produced by one of the Trees which correctly classifies the test instance. This study reveals that a predictive accuracy of 100% was achievable. Moreover, this approach exploits the advantages of supervised and unsupervised learning to produce a more powerful and more accurate predictive model.","PeriodicalId":333104,"journal":{"name":"ACM SE '10","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130677581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we describe an ontology-driven pattern disambiguation process for Rote Extractors. Our approach can generate lexical patterns for a particular relation from unrestricted text. Then patterns can be used to recognize concepts, which have the same relation in other text. We test our experiments with/without the ontology. The results show that our approach can dramatically improve the performance of existing pattern-based Rote Extractors.
{"title":"An ontology-driven rote extractor for pattern disambiguation","authors":"Sheng Yin, I. Arpinar","doi":"10.1145/1900008.1900049","DOIUrl":"https://doi.org/10.1145/1900008.1900049","url":null,"abstract":"In this paper, we describe an ontology-driven pattern disambiguation process for Rote Extractors. Our approach can generate lexical patterns for a particular relation from unrestricted text. Then patterns can be used to recognize concepts, which have the same relation in other text. We test our experiments with/without the ontology. The results show that our approach can dramatically improve the performance of existing pattern-based Rote Extractors.","PeriodicalId":333104,"journal":{"name":"ACM SE '10","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125793755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruben Ramirez-Padron, Feras A. Batarseh, K. Heyne, A. Wu, Avelino J. Gonzalez
Genetic algorithms (GAs) are probabilistic search techniques inspired by natural evolution. Selection schemes are used by GAs to choose individuals from a population to breed the next generation. Proportionate, ranking and tournament selection are standard selection schemes. They focus on choosing individuals with high fitness values. Fitness Uniform Selection Scheme (FUSS) is a recently proposed selection scheme that focuses on fitness diversity. FUSS have shown better performance than standard selection schemes for deceptive and NP-complete problems. In general, it is difficult to determine whether a real-life problem is deceptive or not. However, there is no information about the relative performance of FUSS on non-deceptive problems. In this paper, the standard selection schemes mentioned above were compared to FUSS on two non-deceptive problems. A GA using FUSS was able to find high-fitness solutions faster than expected. Consequently, FUSS could be a good first-choice selection scheme regardless of whether a problem at hand is deceptive or not.
{"title":"On the performance of fitness uniform selection for non-deceptive problems","authors":"Ruben Ramirez-Padron, Feras A. Batarseh, K. Heyne, A. Wu, Avelino J. Gonzalez","doi":"10.1145/1900008.1900053","DOIUrl":"https://doi.org/10.1145/1900008.1900053","url":null,"abstract":"Genetic algorithms (GAs) are probabilistic search techniques inspired by natural evolution. Selection schemes are used by GAs to choose individuals from a population to breed the next generation. Proportionate, ranking and tournament selection are standard selection schemes. They focus on choosing individuals with high fitness values. Fitness Uniform Selection Scheme (FUSS) is a recently proposed selection scheme that focuses on fitness diversity. FUSS have shown better performance than standard selection schemes for deceptive and NP-complete problems. In general, it is difficult to determine whether a real-life problem is deceptive or not. However, there is no information about the relative performance of FUSS on non-deceptive problems. In this paper, the standard selection schemes mentioned above were compared to FUSS on two non-deceptive problems. A GA using FUSS was able to find high-fitness solutions faster than expected. Consequently, FUSS could be a good first-choice selection scheme regardless of whether a problem at hand is deceptive or not.","PeriodicalId":333104,"journal":{"name":"ACM SE '10","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126998926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a global economy, there is clearly a need to have remote access to information. Organizations provide this access to their customers via Service-Oriented Architecture (SOA) or Web Services (WS). These technologies use the Extensible Markup Language (XML) to exchange information. In this paper, the Simple XML Messaging Framework is presented. This framework can be used to exchange information in either a SOA or a WS environment. A prototype has been implemented and used in a course at the University of South Carolina Upstate to simulate a SOA environment. An analysis from this course is presented.
在全球经济中,显然需要远程获取信息。组织通过面向服务的体系结构(SOA)或Web服务(WS)向其客户提供这种访问。这些技术使用可扩展标记语言(XML)交换信息。本文提出了简单XML消息传递框架。此框架可用于在SOA或WS环境中交换信息。南卡罗莱纳大学(University of South Carolina Upstate)的一门课程中已经实现了一个原型,并使用它来模拟SOA环境。本文对这门课程进行了分析。
{"title":"Simple XML messaging framework","authors":"T. Toland","doi":"10.1145/1900008.1900149","DOIUrl":"https://doi.org/10.1145/1900008.1900149","url":null,"abstract":"In a global economy, there is clearly a need to have remote access to information. Organizations provide this access to their customers via Service-Oriented Architecture (SOA) or Web Services (WS). These technologies use the Extensible Markup Language (XML) to exchange information. In this paper, the Simple XML Messaging Framework is presented. This framework can be used to exchange information in either a SOA or a WS environment. A prototype has been implemented and used in a course at the University of South Carolina Upstate to simulate a SOA environment. An analysis from this course is presented.","PeriodicalId":333104,"journal":{"name":"ACM SE '10","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125875466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Snake-in-the-Box problem finds its roots in graph theory and computer science with applications in electrical engineering, coding theory, analog-to-digital conversion, disjunctive normal form simplification, electronic combination locking, and computer network topologies. This fascinating problem has puzzled scholars for over 50 years, but research has produced steady progress. Here, we provide an overview of the problem, its history, and current research, including a potential solution.
{"title":"The Snake-in-the-Box problem","authors":"K. Krafka, W. Potter, T. Horton","doi":"10.1145/1900008.1900079","DOIUrl":"https://doi.org/10.1145/1900008.1900079","url":null,"abstract":"The Snake-in-the-Box problem finds its roots in graph theory and computer science with applications in electrical engineering, coding theory, analog-to-digital conversion, disjunctive normal form simplification, electronic combination locking, and computer network topologies. This fascinating problem has puzzled scholars for over 50 years, but research has produced steady progress. Here, we provide an overview of the problem, its history, and current research, including a potential solution.","PeriodicalId":333104,"journal":{"name":"ACM SE '10","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125563281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}