Proceedings of the ACMSE 2018 Conference最新文献_第3页

Mining positive and negative association rules in Hadoop's MapReduce environment 在Hadoop MapReduce环境中挖掘正关联规则和负关联规则

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190701

S. Bagui, Probal Chandra Dhar

In this paper, we mine positive and negative rules from Big Data in Hadoop's MapReduce Environment. Positive association rule mining finds items that are positively co-related whereas negative association rule mining finds items that are negatively correlated. Positive association rule mining has been traditionally used to mine association rules, but negative association rule mining also has many applications, including the building of efficient decision support systems, for crime data analysis [2], in the health care sector [1], etc. In this paper, we mine positive and negative association rules using the Apriori algorithm in the Big Data environment using Hadoop's MapReduce environment. Positive association rules are in the form X→Y, which has support s in a transaction set D if s% of the transactions in D contain X U Y. A negative association rule is in the form X → ┐ Y or ┐ X → Y or ┐ X → ┐ Y where X ∩ Y = Ø. X → ┐ Y refers to X occurring in the absence of Y; ┐ X → Y refers to Y occurring in the absence of X; ┐ X → ┐ Y means not X and not Y. For positive association rules: Support (X → Y) refers to the percentage of transactions where itemsets X and Y co-exist in a dataset. Confidence (X → Y) is taken to be the conditional probability, P(X|Y). That is, the percentage of transactions containing X that also contain Y. Support of the negative association rules will be form: Supp(X → ┐ Y) > min_supp; Supp(┐ X → Y) > min_supp; Supp(┐ X → ┐ Y) > min_supp. Confidence of negative association rules will be in the form: Conf(X → ┐ Y) > min_supp; Conf(┐ X → Y) > min_supp; Conf(┐ X → ┐ Y) > min_supp. In MapReduce, we scan the dataset and create 1-itemsets in one MapReduce job and then use this 1-itemset to create 2-itemsets in another MapReduce job. In the last map job, the calculation of positive and negative association rules as well as the calculations of support, confidence and lift are performed. Therefore, in essence, we use three map and two reduce jobs. The main contribution of this work is in presenting how the apriori algorithm can be used to extract negative association rules from Big Data and how it can be executed efficiently on MapReduce.

在本文中，我们从Hadoop的MapReduce环境中挖掘大数据的正规则和负规则。正关联规则挖掘发现正相关的项目，而负关联规则挖掘发现负相关的项目。正关联规则挖掘传统上用于挖掘关联规则，但负关联规则挖掘也有许多应用，包括构建高效的决策支持系统，用于犯罪数据分析[2]，在医疗保健领域[1]等。在本文中，我们使用Hadoop的MapReduce环境，在大数据环境中使用Apriori算法挖掘正关联规则和负关联规则。正关联规则的形式为X→Y，如果D中有s%的事务包含X U Y，则在事务集D中支持X→Y。反关联规则的形式为X→对Y或对X→对Y，其中X∩Y = Ø。X→- Y指在没有Y的情况下发生的X;- X→Y指在没有X的情况下Y发生;- X→- Y表示非X和非Y。对于正关联规则:支持度(X→Y)是指数据集中项目集X和Y同时存在的事务的百分比。置信度(X→Y)取为条件概率P(X|Y)。也就是说，包含X且包含Y的事务的百分比。负关联规则的支持形式为:Supp(X→- Y) > min_supp;Supp(- X→Y) > min_supp;Supp(- X→- Y) > min_supp。负关联规则置信度的形式为:Conf(X→- Y) > min_supp;Conf(- X→Y) > min_supp;Conf(- X→- Y) > min_supp. Conf。在MapReduce中，我们扫描数据集并在一个MapReduce作业中创建1-itemset，然后使用这个1-itemset在另一个MapReduce作业中创建2-itemset。最后进行了正关联规则和负关联规则的计算，以及支撑力、置信度和扬程的计算。因此，在本质上，我们使用了三个map和两个reduce作业。这项工作的主要贡献是展示了如何使用apriori算法从大数据中提取负关联规则，以及如何在MapReduce上有效地执行它。

{"title":"Mining positive and negative association rules in Hadoop's MapReduce environment","authors":"S. Bagui, Probal Chandra Dhar","doi":"10.1145/3190645.3190701","DOIUrl":"https://doi.org/10.1145/3190645.3190701","url":null,"abstract":"In this paper, we mine positive and negative rules from Big Data in Hadoop's MapReduce Environment. Positive association rule mining finds items that are positively co-related whereas negative association rule mining finds items that are negatively correlated. Positive association rule mining has been traditionally used to mine association rules, but negative association rule mining also has many applications, including the building of efficient decision support systems, for crime data analysis [2], in the health care sector [1], etc. In this paper, we mine positive and negative association rules using the Apriori algorithm in the Big Data environment using Hadoop's MapReduce environment. Positive association rules are in the form X→Y, which has support s in a transaction set D if s% of the transactions in D contain X U Y. A negative association rule is in the form X → ┐ Y or ┐ X → Y or ┐ X → ┐ Y where X ∩ Y = Ø. X → ┐ Y refers to X occurring in the absence of Y; ┐ X → Y refers to Y occurring in the absence of X; ┐ X → ┐ Y means not X and not Y. For positive association rules: Support (X → Y) refers to the percentage of transactions where itemsets X and Y co-exist in a dataset. Confidence (X → Y) is taken to be the conditional probability, P(X|Y). That is, the percentage of transactions containing X that also contain Y. Support of the negative association rules will be form: Supp(X → ┐ Y) > min_supp; Supp(┐ X → Y) > min_supp; Supp(┐ X → ┐ Y) > min_supp. Confidence of negative association rules will be in the form: Conf(X → ┐ Y) > min_supp; Conf(┐ X → Y) > min_supp; Conf(┐ X → ┐ Y) > min_supp. In MapReduce, we scan the dataset and create 1-itemsets in one MapReduce job and then use this 1-itemset to create 2-itemsets in another MapReduce job. In the last map job, the calculation of positive and negative association rules as well as the calculations of support, confidence and lift are performed. Therefore, in essence, we use three map and two reduce jobs. The main contribution of this work is in presenting how the apriori algorithm can be used to extract negative association rules from Big Data and how it can be executed efficiently on MapReduce.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130261128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Fast and accurate volume data curvature determination using GPGPU computation 使用GPGPU计算快速准确地确定体数据曲率

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190681

Jacob D. Hauenstein, Timothy S Newman

A methodology for fast determination of a key shape feature in volume datasets using a GPU is described. The shape feature, surface curvature, which is a valuable descriptor for structure classification and dataset registration applications, can be time-consuming to determine reliably by conventional serial computing. The techniques here use parallel processing on a commodity GPU to achieve 100-fold (and above) improvements (for moderate-sized datasets) over conventional serial processing for curvature determination. Techniques for one class of curvature determination methods are detailed, including methods well-suited to datasets acquired by medical scanners.

描述了一种使用GPU快速确定体积数据集中关键形状特征的方法。形状特征曲面曲率是结构分类和数据集配准的重要描述符，但传统的串行计算方法难以可靠地确定曲面曲率。这里的技术在商用GPU上使用并行处理来实现(中等大小的数据集)比传统的曲率确定串行处理100倍(及以上)的改进。详细介绍了一类曲率确定方法的技术，包括非常适合于医学扫描仪获得的数据集的方法。

引用次数: 0

3D pose reconstruction method for CG designers 面向CG设计师的三维姿态重建方法

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190703

Kazumoto Tanaka

This paper proposes a method for reconstructing the 3D poses of drawing-dolls from their images on photographs.

本文提出了一种从绘画娃娃的照片图像中重建其三维姿态的方法。

引用次数: 0

Predicting NFRs in the early stages of agile software engineering 在敏捷软件工程的早期阶段预测nfr

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190716

Richard R. Maiti, A. Krasnov

Non-Functional requirements (NFRs) are overlooked whereas Functional Requirements (FRs) take the center stage in developing agile software. Research has shown that ignoring NFRs can have negative impacts on the software and could potentially cost more to fix at later stages. This research extends the Capture Elicit Prioritize (CEP) methodology to predict NFRs in the early stages agile software development. Research in other fields such as the medical field have shown that historical data can be beneficial in the long run. In the medical field it was found that historical data can be beneficial in determining patient treatments. The Capture Elicit Prioritize (CEP) methodology extended the NERV and NORMAP methodologies in previous research. The CEP methodology identified 56 out of 57 requirement sentences and was successful in eliciting 98.24% of the baseline an improvement of 10.53% of the NORMAP methodology and 1.75% improvement over the NERV methodology. The NFRs count for the CEP methodology was 86 out of 88 NFRs which was an improvement of 12.49% over the NORMAP methodology and 4.55% over the NERV methodology. The CEP was used and utilized the EU eProcument requirements document. The CEP methodology utilized the capture methodology by gathering potential NFRs using OCR from requirements images. The elicit part took the NFR Locator plus and takes sentences from documents and places them in distinct categories. The NFR categories are defined from the Chung's NFR framework utilizing a set of keywords utilized for training to locate NFRs. The e αβγ-framework was utilized to prioritize the NFRs. Utilizing the data from previous research of the CEP methodology and extending the CEP methodology to include a decision tree to predict future NFRs. A simple decision tree was utilized to make a prediction utilizing the past NFRs data. If a certain NFR appears three times or higher in the requirements document. It is most likely that NFRs will appear in the next iteration of the software requirements specification. If the NFRs is equivalent to three times it is likely it will appear in the next iteration. If the NFRs is between one and two it is not likely to appear in future iteration. The path can be traced from the root of the tree to a decision tree's leaf (yes or no) that determines whether the NFRs will appear in future iterations. This research showed that using the data available can be beneficial for the next iteration of software development. This research showed that historical metadata can help in predicting NFRs utilizing a decision tree to make a prediction where NFRs appear multiple times in a set of the EU procurement documents can predict the next iteration of software development. The NFRs Availability, Compliance, Confidentiality, Documentation, Performance, Security, and Usability were found and these NFRs are most likely to appear in the next iteration of the EU procurement software.

在敏捷软件开发中，非功能需求经常被忽略，而功能需求则占据了中心位置。研究表明，忽略NFRs可能会对软件产生负面影响，并可能在后期阶段花费更多的成本来修复。本研究扩展了捕获引出优先级(CEP)方法来预测敏捷软件开发早期阶段的nfr。医学等其他领域的研究表明，从长远来看，历史数据可能是有益的。在医学领域，人们发现历史数据对确定病人的治疗是有益的。捕获诱发优先级(CEP)方法扩展了先前研究中的NERV和NORMAP方法。CEP方法确定了57个需求句子中的56个，并成功地得出了98.24%的基线，提高了10.53%的NORMAP方法和1.75%的NERV方法。在88例NFRs中，CEP方法的NFRs计数为86例，比NORMAP方法提高12.49%，比NERV方法提高4.55%。CEP使用并利用了欧盟电子采购需求文件。CEP方法通过使用OCR从需求图像中收集潜在的nfr来利用捕获方法。引出部分使用NFR定位器plus，从文档中提取句子，并将它们放在不同的类别中。NFR类别是根据Chung的NFR框架定义的，使用一组用于训练定位NFR的关键字。利用e αβγ-框架对NFRs进行排序。利用以往CEP方法的研究数据，并将CEP方法扩展为包含决策树的方法来预测未来的NFRs。利用一个简单的决策树，利用过去的NFRs数据进行预测。如果某个NFR在需求文档中出现三次或三次以上。NFRs很可能出现在软件需求规范的下一个迭代中。如果NFRs相当于三倍，那么它很可能会出现在下一次迭代中。如果nfr介于1和2之间，则不太可能出现在未来的迭代中。路径可以从树的根跟踪到决策树的叶子(是或否)，它决定nfr是否会出现在未来的迭代中。这项研究表明，使用可用的数据对软件开发的下一个迭代是有益的。该研究表明，历史元数据可以帮助预测NFRs，利用决策树进行预测，其中NFRs在一组欧盟采购文档中出现多次，可以预测软件开发的下一次迭代。nfr的可用性、合规性、保密性、文档、性能、安全性和可用性被发现，这些nfr最有可能出现在欧盟采购软件的下一个迭代中。

{"title":"Predicting NFRs in the early stages of agile software engineering","authors":"Richard R. Maiti, A. Krasnov","doi":"10.1145/3190645.3190716","DOIUrl":"https://doi.org/10.1145/3190645.3190716","url":null,"abstract":"Non-Functional requirements (NFRs) are overlooked whereas Functional Requirements (FRs) take the center stage in developing agile software. Research has shown that ignoring NFRs can have negative impacts on the software and could potentially cost more to fix at later stages. This research extends the Capture Elicit Prioritize (CEP) methodology to predict NFRs in the early stages agile software development. Research in other fields such as the medical field have shown that historical data can be beneficial in the long run. In the medical field it was found that historical data can be beneficial in determining patient treatments. The Capture Elicit Prioritize (CEP) methodology extended the NERV and NORMAP methodologies in previous research. The CEP methodology identified 56 out of 57 requirement sentences and was successful in eliciting 98.24% of the baseline an improvement of 10.53% of the NORMAP methodology and 1.75% improvement over the NERV methodology. The NFRs count for the CEP methodology was 86 out of 88 NFRs which was an improvement of 12.49% over the NORMAP methodology and 4.55% over the NERV methodology. The CEP was used and utilized the EU eProcument requirements document. The CEP methodology utilized the capture methodology by gathering potential NFRs using OCR from requirements images. The elicit part took the NFR Locator plus and takes sentences from documents and places them in distinct categories. The NFR categories are defined from the Chung's NFR framework utilizing a set of keywords utilized for training to locate NFRs. The e αβγ-framework was utilized to prioritize the NFRs. Utilizing the data from previous research of the CEP methodology and extending the CEP methodology to include a decision tree to predict future NFRs. A simple decision tree was utilized to make a prediction utilizing the past NFRs data. If a certain NFR appears three times or higher in the requirements document. It is most likely that NFRs will appear in the next iteration of the software requirements specification. If the NFRs is equivalent to three times it is likely it will appear in the next iteration. If the NFRs is between one and two it is not likely to appear in future iteration. The path can be traced from the root of the tree to a decision tree's leaf (yes or no) that determines whether the NFRs will appear in future iterations. This research showed that using the data available can be beneficial for the next iteration of software development. This research showed that historical metadata can help in predicting NFRs utilizing a decision tree to make a prediction where NFRs appear multiple times in a set of the EU procurement documents can predict the next iteration of software development. The NFRs Availability, Compliance, Confidentiality, Documentation, Performance, Security, and Usability were found and these NFRs are most likely to appear in the next iteration of the EU procurement software.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115877293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using software birthmarks and clustering to identify similar classes and major functionalities 使用软件胎记和聚类来识别相似的类和主要功能

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190677

Matt Beck, J. Walden

Software birthmarks are a class of software metrics designed to identify copies of software. An article published in 2006 examined additional applications of software birthmarks. The article described an experiment using software birthmarks to identify similar classes and major functionalities in software applications. This study replicates and extends that experiment, using a modern software birthmark tool and larger dataset, while improving the precision of the research questions and methodologies used in the original article. We found that one of the conclusions of the original article could be replicated while the the other conclusion could not. While software birthmarks provide an effective method for identifying similar class files, they do not offer a reliable, objective, and generalizable method for finding major functionalities in a software release.

软件胎记是一类用于识别软件副本的软件度量。2006年发表的一篇文章研究了软件胎记的其他应用。本文描述了一个使用软件胎记来识别软件应用程序中相似类和主要功能的实验。本研究复制并扩展了该实验，使用现代软件胎记工具和更大的数据集，同时提高了原始文章中使用的研究问题和方法的准确性。我们发现原文的一个结论可以被复制，而另一个结论不能。虽然软件胎记提供了一种识别相似类文件的有效方法，但它们并没有提供一种可靠、客观和通用的方法来查找软件版本中的主要功能。

引用次数: 3

Characterization of differentially private logistic regression 差分私有逻辑回归的特征

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190682

S. Suthaharan

The purpose of this paper is to present an approach that can help data owners select suitable values for the privacy parameter of a differentially private logistic regression (DPLR), whose main intention is to achieve a balance between privacy strength and classification accuracy. The proposed approach implements a supervised learning technique and a feature extraction technique to address this challenging problem and generate solutions. The supervised learning technique selects subspaces from a training data set and generates DPLR classifiers for a range of values of the privacy parameter. The feature extraction technique transforms an original subspace to a differentially private subspace by querying the original subspace multiple times using the DPLR model and the privacy parameter values that were selected by the supervised learning module. The proposed approach then employs a signal processing technique called signal-interference-ratio as a measure to quantify the privacy level of the differentially private subspaces; hence, allows data owner learn the privacy level that the DPLR models can provide for a given subspace and a given classification accuracy.

本文的目的是提出一种方法，可以帮助数据所有者为差分私有逻辑回归(DPLR)的隐私参数选择合适的值，其主要目的是实现隐私强度和分类精度之间的平衡。该方法采用监督学习技术和特征提取技术来解决这一具有挑战性的问题并生成解决方案。监督学习技术从训练数据集中选择子空间，并为隐私参数的一系列值生成DPLR分类器。特征提取技术利用DPLR模型和监督学习模块选择的隐私参数值对原始子空间进行多次查询，将原始子空间转化为差分私有子空间。然后，该方法采用一种称为信号干扰比的信号处理技术作为度量来量化差分私有子空间的隐私级别;因此，允许数据所有者了解DPLR模型可以为给定子空间和给定分类精度提供的隐私级别。

引用次数: 2

Prioritized task scheduling in fog computing 雾计算中的优先级任务调度

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190699

Tejas Choudhari, M. Moh, Teng-Sheng Moh

Fog computing, similar to edge computing, has been proposed as a model to introduce a virtualized layer between the end users and the back-end cloud data centers. Fog computing has attracted much attention due to the recent rapid deployment of smart devices and Internet-of-Things (IoT) systems, which often requires real-time, stringent-delay services. The fog layer placed between client and cloud layers aims to reduce the delay in terms of transmission and processing times, as well as the overall cost. To support the increasing number of IoT, smart devices, and to improve performance and reduce cost, this paper proposes a task scheduling algorithm in the fog layer based on priority levels. The proposed architecture, queueing and priority models, priority assignment module, and the priority-based task scheduling algorithms are carefully described. Performance evaluation shows that, comparing with existing task scheduling algorithms, the proposed algorithm reduces the overall response time and notably decreases the total cost. We believe that this work is significant to the emerging fog computing technology, and the priority-based algorithm is useful to a wide range of application domains.

雾计算，类似于边缘计算，已经被提出作为一种模型，在最终用户和后端云数据中心之间引入虚拟化层。由于最近智能设备和物联网(IoT)系统的快速部署，雾计算引起了人们的广泛关注，这些系统通常需要实时、严格延迟的服务。放置在客户端和云层之间的雾层旨在减少传输和处理时间方面的延迟，以及总体成本。为了支持越来越多的物联网、智能设备，提高性能和降低成本，本文提出了一种基于优先级的雾层任务调度算法。详细描述了该系统的体系结构、排队和优先级模型、优先级分配模块以及基于优先级的任务调度算法。性能评估表明，与现有的任务调度算法相比，本文算法降低了总体响应时间，显著降低了总成本。我们认为这项工作对新兴的雾计算技术具有重要意义，基于优先级的算法在广泛的应用领域是有用的。

{"title":"Prioritized task scheduling in fog computing","authors":"Tejas Choudhari, M. Moh, Teng-Sheng Moh","doi":"10.1145/3190645.3190699","DOIUrl":"https://doi.org/10.1145/3190645.3190699","url":null,"abstract":"Fog computing, similar to edge computing, has been proposed as a model to introduce a virtualized layer between the end users and the back-end cloud data centers. Fog computing has attracted much attention due to the recent rapid deployment of smart devices and Internet-of-Things (IoT) systems, which often requires real-time, stringent-delay services. The fog layer placed between client and cloud layers aims to reduce the delay in terms of transmission and processing times, as well as the overall cost. To support the increasing number of IoT, smart devices, and to improve performance and reduce cost, this paper proposes a task scheduling algorithm in the fog layer based on priority levels. The proposed architecture, queueing and priority models, priority assignment module, and the priority-based task scheduling algorithms are carefully described. Performance evaluation shows that, comparing with existing task scheduling algorithms, the proposed algorithm reduces the overall response time and notably decreases the total cost. We believe that this work is significant to the emerging fog computing technology, and the priority-based algorithm is useful to a wide range of application domains.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126780035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 117

Multi-source data analysis and evaluation of machine learning techniques for SQL injection detection 多源数据分析与评价SQL注入检测的机器学习技术

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190670

Kevin Ross, M. Moh, Teng-Sheng Moh, Jason Yao

SQL Injection continues to be one of the most damaging security exploits in terms of personal information exposure as well as monetary loss. Injection attacks are the number one vulnerability in the most recent OWASP Top 10 report, and the number of these attacks continues to increase. Traditional defense strategies often involve static, signature-based IDS (Intrusion Detection System) rules which are mostly effective only against previously observed attacks but not unknown, or zero-day, attacks. Much current research involves the use of machine learning techniques, which are able to detect unknown attacks, but depending on the algorithm can be costly in terms of performance. In addition, most current intrusion detection strategies involve collection of traffic coming into the web application either from a network device or from the web application host, while other strategies collect data from the database server logs. In this project, we are collecting traffic from two points: at the web application host, and at a Datiphy appliance node located between the webapp host and the associated MySQL database server. In our analysis of these two datasets, and another dataset that is correlated between the two, we have been able to demonstrate that accuracy obtained with the correlated dataset using algorithms such as rule-based and decision tree are nearly the same as those with a neural network algorithm, but with greatly improved performance.

就个人信息暴露和金钱损失而言，SQL注入仍然是最具破坏性的安全漏洞之一。在最新的OWASP十大漏洞报告中，注入攻击是排名第一的漏洞，而且这些攻击的数量还在不断增加。传统的防御策略通常涉及静态的、基于签名的入侵检测系统(IDS)规则，这些规则大多只对先前观察到的攻击有效，而对未知的攻击或零日攻击无效。目前的许多研究都涉及到机器学习技术的使用，这种技术能够检测到未知的攻击，但依赖于算法在性能方面可能会付出高昂的代价。此外，大多数当前的入侵检测策略涉及从网络设备或web应用程序主机收集进入web应用程序的流量，而其他策略则从数据库服务器日志收集数据。在这个项目中，我们从两点收集流量:web应用程序主机和位于web应用程序主机和相关MySQL数据库服务器之间的datatiphy设备节点。在我们对这两个数据集以及两者之间相关的另一个数据集的分析中，我们已经能够证明，使用基于规则和决策树等算法获得的相关数据集的精度与使用神经网络算法获得的精度几乎相同，但性能大大提高。

{"title":"Multi-source data analysis and evaluation of machine learning techniques for SQL injection detection","authors":"Kevin Ross, M. Moh, Teng-Sheng Moh, Jason Yao","doi":"10.1145/3190645.3190670","DOIUrl":"https://doi.org/10.1145/3190645.3190670","url":null,"abstract":"SQL Injection continues to be one of the most damaging security exploits in terms of personal information exposure as well as monetary loss. Injection attacks are the number one vulnerability in the most recent OWASP Top 10 report, and the number of these attacks continues to increase. Traditional defense strategies often involve static, signature-based IDS (Intrusion Detection System) rules which are mostly effective only against previously observed attacks but not unknown, or zero-day, attacks. Much current research involves the use of machine learning techniques, which are able to detect unknown attacks, but depending on the algorithm can be costly in terms of performance. In addition, most current intrusion detection strategies involve collection of traffic coming into the web application either from a network device or from the web application host, while other strategies collect data from the database server logs. In this project, we are collecting traffic from two points: at the web application host, and at a Datiphy appliance node located between the webapp host and the associated MySQL database server. In our analysis of these two datasets, and another dataset that is correlated between the two, we have been able to demonstrate that accuracy obtained with the correlated dataset using algorithms such as rule-based and decision tree are nearly the same as those with a neural network algorithm, but with greatly improved performance.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126857622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Mitigating IoT insecurity with inoculation epidemics 通过接种流行病缓解物联网不安全

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190678

James A. Jerkins, Jillian Stupiansky

Compromising IoT devices to build botnets and disrupt critical infrastructure is an existential threat. Refrigerators, washing machines, DVRs, security cameras, and other consumer goods are high value targets for attackers due to inherent security weaknesses, a lack of consumer security awareness, and an absence of market forces or regulatory requirements to motivate IoT security. As a result of the deficiencies, attackers have quickly assembled large scale botnets of IoT devices to disable Internet infrastructure and deny access to dominant web properties with near impunity. IoT malware is often transmitted from host to host similar to how biological viruses spread in populations. Both biological viruses and computer malware may exhibit epidemic characteristics when spreading in populations of vulnerable hosts. Vaccines are used to stimulate resistance to biological viruses by inoculating a sufficient number of hosts in the vulnerable population to limit the spread of the biological virus and prevent epidemics. Inoculation programs may be viewed as a human instigated epidemic that spreads a vaccine in order to mitigate the damage from a biological virus. In this paper we propose a technique to create an inoculation epidemic for IoT devices using a novel variation of a SIS epidemic model and show experimental results that indicate utility of the approach.

破坏物联网设备以构建僵尸网络和破坏关键基础设施是一种生存威胁。冰箱、洗衣机、dvr、安全摄像头和其他消费品由于固有的安全弱点、消费者缺乏安全意识、缺乏市场力量或监管要求来激励物联网安全，成为攻击者的高价值目标。由于这些缺陷，攻击者迅速组装了大规模的物联网设备僵尸网络，以禁用互联网基础设施，并几乎不受惩罚地拒绝访问占主导地位的web属性。物联网恶意软件通常从一个主机传播到另一个主机，类似于生物病毒在人群中的传播方式。生物病毒和计算机恶意软件在脆弱宿主群体中传播时都可能表现出流行病特征。疫苗是通过在脆弱人群中接种足够数量的宿主来限制生物病毒的传播和预防流行病，从而刺激对生物病毒的抵抗力。接种计划可以被看作是人类煽动的流行病，传播疫苗以减轻生物病毒的损害。在本文中，我们提出了一种技术，使用SIS流行病模型的新变体为物联网设备创建接种流行病，并展示了表明该方法实用性的实验结果。

{"title":"Mitigating IoT insecurity with inoculation epidemics","authors":"James A. Jerkins, Jillian Stupiansky","doi":"10.1145/3190645.3190678","DOIUrl":"https://doi.org/10.1145/3190645.3190678","url":null,"abstract":"Compromising IoT devices to build botnets and disrupt critical infrastructure is an existential threat. Refrigerators, washing machines, DVRs, security cameras, and other consumer goods are high value targets for attackers due to inherent security weaknesses, a lack of consumer security awareness, and an absence of market forces or regulatory requirements to motivate IoT security. As a result of the deficiencies, attackers have quickly assembled large scale botnets of IoT devices to disable Internet infrastructure and deny access to dominant web properties with near impunity. IoT malware is often transmitted from host to host similar to how biological viruses spread in populations. Both biological viruses and computer malware may exhibit epidemic characteristics when spreading in populations of vulnerable hosts. Vaccines are used to stimulate resistance to biological viruses by inoculating a sufficient number of hosts in the vulnerable population to limit the spread of the biological virus and prevent epidemics. Inoculation programs may be viewed as a human instigated epidemic that spreads a vaccine in order to mitigate the damage from a biological virus. In this paper we propose a technique to create an inoculation epidemic for IoT devices using a novel variation of a SIS epidemic model and show experimental results that indicate utility of the approach.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126982400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Comprehension and application of design patterns by novice software engineers: an empirical study of undergraduate software engineering and computer science students 软件工程师新手对设计模式的理解与应用:对软件工程与计算机专业本科生的实证研究

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190686

Jonathan W. Lartigue, Richard O. Chapman

Although there has been a large body of work cataloguing design patterns since their introduction, there is a limited amount of detailed, empirical evidence on pattern use and application. Those studies that have collected experimental data generally focus on experienced, professional software engineers or graduate-level computer science and software engineering students. Although the value of design pattens in general is still widely debated, many experts have concluded that the use of design patterns is beneficial for experienced software engineers and architects. But it is still unclear if the benefits of design patterns translate equally to young, inexperienced software engineers. To assess this, we conducted a controlled experiment to evaluate the comparative performance in targeted tasks of novice software engineers, which are represented by software engineering undergraduate students about to earn a bachelors degree in an ABET-accredited computer science or software engineering program. We assessed the ability of subjects to recognize, comprehend, and refactor software containing a number of design patterns. We also collected subjective data measuring the subjects' preferences for or against pattern use. Although experiment results are mixed, depending on the complexity of the pattern involved, we observe that novice software engineers can recognize and understand software containing some design patterns, but that benefits of pattern use, in terms of refactoring time, are dependent on the complexity of the pattern. We conclude that, while simpler patterns show benefits, more complex design patterns may be an impediment for novice developers.

尽管自从设计模式被引入以来，已经有大量的工作对其进行了编目，但是关于模式的使用和应用的详细的、经验性的证据数量有限。那些收集实验数据的研究通常集中在经验丰富的专业软件工程师或研究生水平的计算机科学和软件工程学生身上。尽管设计模式的价值在总体上仍然存在广泛的争议，但许多专家已经得出结论，设计模式的使用对有经验的软件工程师和架构师是有益的。但是，尚不清楚设计模式的好处是否同样适用于年轻的、没有经验的软件工程师。为了评估这一点，我们进行了一项对照实验来评估新手软件工程师在目标任务中的比较表现，这些新手以即将获得abet认证的计算机科学或软件工程专业学士学位的软件工程本科生为代表。我们评估了测试对象识别、理解和重构包含大量设计模式的软件的能力。我们还收集了主观数据来衡量受试者对模式使用的偏好。尽管实验结果是混合的，取决于所涉及的模式的复杂性，我们观察到新手软件工程师可以识别和理解包含一些设计模式的软件，但是使用模式的好处，就重构时间而言，依赖于模式的复杂性。我们得出的结论是，虽然更简单的模式有好处，但更复杂的设计模式可能会成为新手开发人员的障碍。

{"title":"Comprehension and application of design patterns by novice software engineers: an empirical study of undergraduate software engineering and computer science students","authors":"Jonathan W. Lartigue, Richard O. Chapman","doi":"10.1145/3190645.3190686","DOIUrl":"https://doi.org/10.1145/3190645.3190686","url":null,"abstract":"Although there has been a large body of work cataloguing design patterns since their introduction, there is a limited amount of detailed, empirical evidence on pattern use and application. Those studies that have collected experimental data generally focus on experienced, professional software engineers or graduate-level computer science and software engineering students. Although the value of design pattens in general is still widely debated, many experts have concluded that the use of design patterns is beneficial for experienced software engineers and architects. But it is still unclear if the benefits of design patterns translate equally to young, inexperienced software engineers. To assess this, we conducted a controlled experiment to evaluate the comparative performance in targeted tasks of novice software engineers, which are represented by software engineering undergraduate students about to earn a bachelors degree in an ABET-accredited computer science or software engineering program. We assessed the ability of subjects to recognize, comprehend, and refactor software containing a number of design patterns. We also collected subjective data measuring the subjects' preferences for or against pattern use. Although experiment results are mixed, depending on the complexity of the pattern involved, we observe that novice software engineers can recognize and understand software containing some design patterns, but that benefits of pattern use, in terms of refactoring time, are dependent on the complexity of the pattern. We conclude that, while simpler patterns show benefits, more complex design patterns may be an impediment for novice developers.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131457607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8