Proceedings of the ACMSE 2018 Conference最新文献_第4页

Cloud computing meets 5G networks: efficient cache management in cloud radio access networks 云计算满足5G网络:云无线接入网高效缓存管理

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190674

Gurpreet Kaur, M. Moh

Advances in cellular network technology continue to develop to address increasing demands from the growing number of Internet of Things (IoT) devices. IoT has brought forth countless new equipment competing for service on cellular networks. The latest in cellular technology is 5th Generation Cloud Radio Access Networks, or 5G C-RAN, which consists of applying cloud computing technology to the RAN architecture for better resource utilization and increased flexibility and scalability. A cache is included in each VM for speedy cellular network services, there is thus a necessity for efficient cache management schemes, which ultimately will provide better user experiences. This paper designs new cache management schemes and evaluates their performance. The new algorithms include a probability-based scoring scheme, a hierarchical, or tiered, approach, and enhancements to previously existing approaches. Performance evaluation shows that some of the new schemes, while simple in design, offer high cache hit ratios, low latency of request services, preferential treatment based on users' service levels, and a reduction in network traffic as compared with other existing and classic caching mechanisms. We believe that this work is important in advancing 5G technology for supporting IoT services, and is also useful to other cache management systems.

蜂窝网络技术的进步不断发展，以满足越来越多的物联网(IoT)设备日益增长的需求。物联网带来了无数的新设备，争夺蜂窝网络上的服务。最新的蜂窝技术是第五代云无线接入网，即5G C-RAN，它包括将云计算技术应用于RAN架构，以更好地利用资源，提高灵活性和可扩展性。每个VM中都包含一个缓存，用于快速蜂窝网络服务，因此需要有效的缓存管理方案，最终将提供更好的用户体验。本文设计了新的缓存管理方案，并对其性能进行了评价。新算法包括基于概率的评分方案，分层或分层的方法，以及对先前存在的方法的增强。性能评估表明，一些新方案虽然设计简单，但与其他现有和经典的缓存机制相比，具有高缓存命中率、低请求服务延迟、基于用户服务水平的优先处理以及减少网络流量的优点。我们相信，这项工作对于推进5G技术支持物联网服务非常重要，对其他缓存管理系统也很有用。

{"title":"Cloud computing meets 5G networks: efficient cache management in cloud radio access networks","authors":"Gurpreet Kaur, M. Moh","doi":"10.1145/3190645.3190674","DOIUrl":"https://doi.org/10.1145/3190645.3190674","url":null,"abstract":"Advances in cellular network technology continue to develop to address increasing demands from the growing number of Internet of Things (IoT) devices. IoT has brought forth countless new equipment competing for service on cellular networks. The latest in cellular technology is 5th Generation Cloud Radio Access Networks, or 5G C-RAN, which consists of applying cloud computing technology to the RAN architecture for better resource utilization and increased flexibility and scalability. A cache is included in each VM for speedy cellular network services, there is thus a necessity for efficient cache management schemes, which ultimately will provide better user experiences. This paper designs new cache management schemes and evaluates their performance. The new algorithms include a probability-based scoring scheme, a hierarchical, or tiered, approach, and enhancements to previously existing approaches. Performance evaluation shows that some of the new schemes, while simple in design, offer high cache hit ratios, low latency of request services, preferential treatment based on users' service levels, and a reduction in network traffic as compared with other existing and classic caching mechanisms. We believe that this work is important in advancing 5G technology for supporting IoT services, and is also useful to other cache management systems.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115964346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Towards reproducible research: automatic classification of empirical requirements engineering papers 面向可重复研究:经验需求工程论文的自动分类

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190689

Clinton Woodson, J. Hayes, S. Griffioen

Research must be reproducible in order to make an impact on science and to contribute to the body of knowledge in our field. Yet studies have shown that 70% of research from academic labs cannot be reproduced. In software engineering, and more specifically requirements engineering (RE), reproducible research is rare, with datasets not always available or methods not fully described. This lack of reproducible research hinders progress, with researchers having to replicate an experiment from scratch. A researcher starting out in RE has to sift through conference papers, finding ones that are empirical, then must look through the data available from the empirical paper (if any) to make a preliminary determination if the paper can be reproduced. This paper addresses two parts of that problem, identifying RE papers and identifying empirical papers within the RE papers. Recent RE and empirical conference papers were used to learn features and to build an automatic classifier to identify RE and empirical papers. We introduce the Empirical Requirements Research Classifier (ERRC) method, which uses natural language processing and machine learning to perform supervised classification of conference papers. We compare our method to a baseline keyword-based approach. To evaluate our approach, we examine sets of papers from the IEEE Requirements Engineering conference and the IEEE International Symposium on Software Testing and Analysis. We found that the ERRC method performed better than the baseline method in all but a few cases.

研究必须是可重复的，这样才能对科学产生影响，并为我们所在领域的知识体系做出贡献。然而，研究表明，70%的学术实验室研究无法被复制。在软件工程中，更具体地说是需求工程(RE)中，可重复的研究很少，数据集并不总是可用的，方法也没有完全描述。缺乏可重复的研究阻碍了研究的进展，研究人员不得不从头开始重复实验。一个研究可再生能源的人必须筛选会议论文，找到那些经验性的，然后必须查看从经验性论文中获得的数据(如果有的话)，以初步确定该论文是否可以复制。本文解决了这个问题的两个部分，识别可再生能源论文和识别可再生能源论文中的实证论文。利用最近的RE和经验会议论文来学习特征，并建立一个自动分类器来识别RE和经验论文。本文介绍了经验需求研究分类器(ERRC)方法，该方法利用自然语言处理和机器学习对会议论文进行监督分类。我们将我们的方法与基于关键字的基准方法进行比较。为了评估我们的方法，我们检查了来自IEEE需求工程会议和IEEE软件测试与分析国际研讨会的一系列论文。我们发现ERRC方法在除少数情况外的所有情况下都比基线方法表现得更好。

{"title":"Towards reproducible research: automatic classification of empirical requirements engineering papers","authors":"Clinton Woodson, J. Hayes, S. Griffioen","doi":"10.1145/3190645.3190689","DOIUrl":"https://doi.org/10.1145/3190645.3190689","url":null,"abstract":"Research must be reproducible in order to make an impact on science and to contribute to the body of knowledge in our field. Yet studies have shown that 70% of research from academic labs cannot be reproduced. In software engineering, and more specifically requirements engineering (RE), reproducible research is rare, with datasets not always available or methods not fully described. This lack of reproducible research hinders progress, with researchers having to replicate an experiment from scratch. A researcher starting out in RE has to sift through conference papers, finding ones that are empirical, then must look through the data available from the empirical paper (if any) to make a preliminary determination if the paper can be reproduced. This paper addresses two parts of that problem, identifying RE papers and identifying empirical papers within the RE papers. Recent RE and empirical conference papers were used to learn features and to build an automatic classifier to identify RE and empirical papers. We introduce the Empirical Requirements Research Classifier (ERRC) method, which uses natural language processing and machine learning to perform supervised classification of conference papers. We compare our method to a baseline keyword-based approach. To evaluate our approach, we examine sets of papers from the IEEE Requirements Engineering conference and the IEEE International Symposium on Software Testing and Analysis. We found that the ERRC method performed better than the baseline method in all but a few cases.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121520523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Multi-core real-time scheduling in multilevel feedback queue with starvation mitigation (MLFQ-RT) 基于饥饿缓解的多级反馈队列多核实时调度

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190668

K. Hoganson

Process scheduling for real-time processes is a critical function of real-time operating systems, which are required to guarantee soft and hard deadlines for completing real-time processes. The behavior of Multi-Level Feedback Queue (MLFQ) scheduling mechanisms intrinsically support a scheduling that favors short CPU bursts to the complete exclusion of all other processes in the ready queues. This MLFQ feature has been extended to support meeting both hard and soft real-time process deadlines in robotics and automated manufacturing applications. This research explores a new derivative of MLFQ for real-time scheduling called MLFQ-Real-Time (MLFQ-RT) investigated through simulation for multi-core processors. The MLFQ-RT real-time extension for multi-core processors builds upon research previously solved for a known weakness of MLFQ scheduling: a vulnerability to starvation of processes in the lowest priority queue, so that the operating system is unable to guarantee that all processes will make progress. This scheduling algorithm is extended to multi-core processors with three hypothesis examined and validated through simulation, showing hard and soft real-time process scheduling while maintaining the previously demonstrated mitigation of starvation in low priority queues.

实时进程的进程调度是实时操作系统的一项重要功能，实时操作系统需要保证完成实时进程的软期限和硬期限。多层反馈队列(MLFQ)调度机制的行为本质上支持一种有利于短CPU突发的调度，从而完全排除就绪队列中的所有其他进程。这个MLFQ功能已经扩展到支持满足机器人和自动化制造应用中的硬和软实时过程截止日期。本研究通过对多核处理器的仿真研究，探索了MLFQ实时调度的新衍生产品MLFQ- real-time (MLFQ- rt)。针对多核处理器的MLFQ- rt实时扩展建立在先前解决MLFQ调度的已知弱点的研究基础之上:最低优先级队列中进程的饥饿漏洞，因此操作系统无法保证所有进程都将取得进展。该调度算法扩展到多核处理器，并通过仿真检查和验证了三个假设，显示了硬实时和软实时进程调度，同时保持了先前演示的低优先级队列中的饥饿缓解。

{"title":"Multi-core real-time scheduling in multilevel feedback queue with starvation mitigation (MLFQ-RT)","authors":"K. Hoganson","doi":"10.1145/3190645.3190668","DOIUrl":"https://doi.org/10.1145/3190645.3190668","url":null,"abstract":"Process scheduling for real-time processes is a critical function of real-time operating systems, which are required to guarantee soft and hard deadlines for completing real-time processes. The behavior of Multi-Level Feedback Queue (MLFQ) scheduling mechanisms intrinsically support a scheduling that favors short CPU bursts to the complete exclusion of all other processes in the ready queues. This MLFQ feature has been extended to support meeting both hard and soft real-time process deadlines in robotics and automated manufacturing applications. This research explores a new derivative of MLFQ for real-time scheduling called MLFQ-Real-Time (MLFQ-RT) investigated through simulation for multi-core processors. The MLFQ-RT real-time extension for multi-core processors builds upon research previously solved for a known weakness of MLFQ scheduling: a vulnerability to starvation of processes in the lowest priority queue, so that the operating system is unable to guarantee that all processes will make progress. This scheduling algorithm is extended to multi-core processors with three hypothesis examined and validated through simulation, showing hard and soft real-time process scheduling while maintaining the previously demonstrated mitigation of starvation in low priority queues.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124594811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Implementing webIDs + biometrics 实现webid +生物识别

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190711

Taylor Martin, Justin Zhang, William Nick, Cory Sabol, A. Esterline

In this paper, our main focus will be on the integration of WebIDs and biometrics. biometrics is the process of utilizing a user's physical characteristics to identify them. There are three types of authentication. Knowledge-based authentication, based on the user's knowledge, is where the user will use a pin number or a password to gain access. Token-based authentication uses some form of physical identification to verify the user. The final form of authentication is biometric-based authentication. Genetic and Evolutionary Feature Extraction (GEFE) is a feature extraction technique that can be used to evolve local binary pattern (LBP) based feature extractors that are disposable for users of biometric-based authentication systems. LBP compares intensity values of a pixel in a group of pixels to form a texture pattern. Each of these segmented regions has its own histogram that stores the frequency of these unique texture patterns that occur in a region. GEFE is an instance of a genetic and evolutionary computation (GEC). A WebID is a uniform resource identifier (URI) that represents some agent, such as a person, organization, group, or device. A URI is a sequence of characters that identifies a logical or physical resource. Many services that require any type of authentication rely on centralized systems. This means that users are forced to have a different account and identifier for each service they are using. For every service, a new registration needs to be created, which can be a burden on both the user and the service. A WebID will represent a user's WebID profile. A user's WebID profile contains a set of relations that describe the user. When the user's profile is de-referenced, it will resolve to their profile document with structured data in RDF. WebIDs provide a relatively simple and safe alternative to traditional username/password user verification. However, they can still be compromised if an attacker gains direct access to a user's computer, or if the user's unique certificate is stolen. Adding biometrics to the authentication process can help solve this issue since biometric data (e.g., fingerprints, iris scans) is unique and not easily duplicated. If a biometric element can be added to WebID profiles, then users could be verified through both their WebID and biometric authentication. We are implementing a method of user verification that is convenient, widely applicable via the Internet, and protected against intrusion. Traditionally, sites store user log-in information on their own servers.

在本文中，我们的主要重点将放在webid和生物识别技术的集成上。生物识别技术是利用用户的身体特征来识别他们的过程。有三种类型的身份验证。基于知识的身份验证是基于用户的知识，用户将使用pin码或密码获得访问权限。基于令牌的身份验证使用某种形式的物理标识来验证用户。最后一种身份验证形式是基于生物特征的身份验证。遗传和进化特征提取(GEFE)是一种特征提取技术，可用于进化基于局部二值模式(LBP)的特征提取器，这些特征提取器对于基于生物识别的认证系统的用户来说是一次性的。LBP比较一组像素中一个像素的强度值，从而形成纹理图案。每个分割的区域都有自己的直方图，该直方图存储了这些区域中出现的独特纹理模式的频率。GEFE是遗传与进化计算(GEC)的一个实例。WebID是一个统一的资源标识符(URI)，它代表一些代理，如个人、组织、组或设备。URI是标识逻辑或物理资源的字符序列。许多需要任何类型身份验证的服务都依赖于集中式系统。这意味着用户必须为他们使用的每个服务使用不同的帐户和标识符。对于每个服务，都需要创建一个新的注册，这可能对用户和服务都是负担。WebID将表示用户的WebID配置文件。用户的WebID配置文件包含一组描述该用户的关系。当用户的概要文件被取消引用时，它将解析为使用RDF格式的结构化数据的概要文件。webid为传统的用户名/密码验证提供了一个相对简单和安全的替代方案。但是，如果攻击者获得对用户计算机的直接访问，或者用户的唯一证书被盗，它们仍然可能受到损害。在身份验证过程中添加生物识别技术可以帮助解决这个问题，因为生物识别数据(如指纹、虹膜扫描)是唯一的，不容易复制。如果可以将生物特征元素添加到WebID配置文件中，则可以通过WebID和生物特征身份验证来验证用户。我们正在实现一种方便、广泛适用于互联网的用户验证方法，并且可以防止入侵。传统上，站点将用户登录信息存储在自己的服务器上。

{"title":"Implementing webIDs + biometrics","authors":"Taylor Martin, Justin Zhang, William Nick, Cory Sabol, A. Esterline","doi":"10.1145/3190645.3190711","DOIUrl":"https://doi.org/10.1145/3190645.3190711","url":null,"abstract":"In this paper, our main focus will be on the integration of WebIDs and biometrics. biometrics is the process of utilizing a user's physical characteristics to identify them. There are three types of authentication. Knowledge-based authentication, based on the user's knowledge, is where the user will use a pin number or a password to gain access. Token-based authentication uses some form of physical identification to verify the user. The final form of authentication is biometric-based authentication. Genetic and Evolutionary Feature Extraction (GEFE) is a feature extraction technique that can be used to evolve local binary pattern (LBP) based feature extractors that are disposable for users of biometric-based authentication systems. LBP compares intensity values of a pixel in a group of pixels to form a texture pattern. Each of these segmented regions has its own histogram that stores the frequency of these unique texture patterns that occur in a region. GEFE is an instance of a genetic and evolutionary computation (GEC). A WebID is a uniform resource identifier (URI) that represents some agent, such as a person, organization, group, or device. A URI is a sequence of characters that identifies a logical or physical resource. Many services that require any type of authentication rely on centralized systems. This means that users are forced to have a different account and identifier for each service they are using. For every service, a new registration needs to be created, which can be a burden on both the user and the service. A WebID will represent a user's WebID profile. A user's WebID profile contains a set of relations that describe the user. When the user's profile is de-referenced, it will resolve to their profile document with structured data in RDF. WebIDs provide a relatively simple and safe alternative to traditional username/password user verification. However, they can still be compromised if an attacker gains direct access to a user's computer, or if the user's unique certificate is stolen. Adding biometrics to the authentication process can help solve this issue since biometric data (e.g., fingerprints, iris scans) is unique and not easily duplicated. If a biometric element can be added to WebID profiles, then users could be verified through both their WebID and biometric authentication. We are implementing a method of user verification that is convenient, widely applicable via the Internet, and protected against intrusion. Traditionally, sites store user log-in information on their own servers.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125168966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Improving offensive cyber security assessments using varied and novel initialization perspectives 使用各种新颖的初始化视角改进进攻性网络安全评估

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190673

Jacob Oakley

Offensive cyber security assessment methods such as red teaming and penetration testing have grown in parallel with evolving threats to evaluate traditional and diverging attack surfaces. This paper provides a taxonomy of ethical hacker conducted offensive security assessments by categorization of their initial evaluation perspectives. Included in this taxonomy are the traditional assessment perspectives which initiate analysis and attack simulation against networks either externally, from within a DMZ or internally. A novel paradigm of critical perspective as an initial point for offensive security evaluation processes is also presented. This initialization from a critical perspective bolsters the holistic capabilities of offensive cyber security assessment by providing a new offensive security assessment option intended to begin evaluation at the last line of defense between malicious actors and the crown jewels of an organization. Then from such a perspective assess outwards from the deepest levels of trust and security. This method will be shown to improve the ability to mitigate the impact of threats regardless of their originating from within or without an organization. As such, the assessment initialization at a critical perspective provides a new approach to offensive security assessment different from what has traditionally been practiced by red teams and penetration testers.

攻击性网络安全评估方法，如红队和渗透测试，随着威胁的演变而发展，以评估传统和分散的攻击面。本文通过对道德黑客进行进攻性安全评估的初始评估视角进行了分类。该分类法中包括传统的评估透视图，它们从外部、DMZ内部或内部启动针对网络的分析和攻击模拟。提出了一种新的批判性视角范式，作为进攻性安全评估过程的起点。这种初始化从关键的角度出发，提供了一种新的进攻性安全评估选项，旨在从恶意行为者和组织的皇冠宝石之间的最后一道防线开始评估，从而增强了进攻性网络安全评估的整体能力。然后从这样的角度向外评估最深层次的信任和安全。此方法将被证明可以提高减轻威胁影响的能力，无论其来自组织内部还是外部。因此，从关键的角度进行评估初始化提供了一种新的进攻性安全评估方法，不同于红队和渗透测试人员传统上所采用的方法。

{"title":"Improving offensive cyber security assessments using varied and novel initialization perspectives","authors":"Jacob Oakley","doi":"10.1145/3190645.3190673","DOIUrl":"https://doi.org/10.1145/3190645.3190673","url":null,"abstract":"Offensive cyber security assessment methods such as red teaming and penetration testing have grown in parallel with evolving threats to evaluate traditional and diverging attack surfaces. This paper provides a taxonomy of ethical hacker conducted offensive security assessments by categorization of their initial evaluation perspectives. Included in this taxonomy are the traditional assessment perspectives which initiate analysis and attack simulation against networks either externally, from within a DMZ or internally. A novel paradigm of critical perspective as an initial point for offensive security evaluation processes is also presented. This initialization from a critical perspective bolsters the holistic capabilities of offensive cyber security assessment by providing a new offensive security assessment option intended to begin evaluation at the last line of defense between malicious actors and the crown jewels of an organization. Then from such a perspective assess outwards from the deepest levels of trust and security. This method will be shown to improve the ability to mitigate the impact of threats regardless of their originating from within or without an organization. As such, the assessment initialization at a critical perspective provides a new approach to offensive security assessment different from what has traditionally been practiced by red teams and penetration testers.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127254917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A software engineering schema for data intensive applications 用于数据密集型应用程序的软件工程模式

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190675

S. Suthaharan

The features developed by a software engineer (system specification) for a software system may significantly differ from the features required by a user (user requirements) for their envisioned system. These discrepancies are generally resulted from the complexity of the system, the vagueness of the user requirements, or the lack of knowledge and experience of the software engineer. The principles of software engineering and the recommendations of the ACM's Software Engineering Education Knowledge (SEEK) document can provide solutions to minimize these discrepancies; in turn, improve the quality of a software system and increase user satisfaction. In this paper, a software development framework, called SETh, is presented. The SETh framework consists of a set of visual models that support software engineering education and practices in a systematic manner. It also enables backward tracking/tracing and forward tracking/tracing capabilities - two important concepts that can facilitate the greenfield and evolutionary type software engineering projects. The SETh framework connects every step of the development of a software system tightly; hence, the learners and the experienced software engineers can study, understand, and build efficient software systems for emerging data science applications.

软件工程师为软件系统开发的特性(系统规范)可能与用户为他们设想的系统所要求的特性(用户需求)有很大的不同。这些差异通常是由于系统的复杂性，用户需求的模糊性，或者软件工程师缺乏知识和经验造成的。软件工程的原则和ACM的软件工程教育知识(SEEK)文档的建议可以提供解决方案，以尽量减少这些差异;反过来，提高软件系统的质量并增加用户满意度。本文提出了一个名为SETh的软件开发框架。SETh框架由一组可视化模型组成，这些模型以系统的方式支持软件工程教育和实践。它还支持向后跟踪/跟踪和向前跟踪/跟踪功能——这两个重要的概念可以促进绿地和进化类型的软件工程项目。SETh框架将软件系统开发的每个步骤紧密地联系在一起;因此，学习者和有经验的软件工程师可以为新兴的数据科学应用学习、理解和构建高效的软件系统。

{"title":"A software engineering schema for data intensive applications","authors":"S. Suthaharan","doi":"10.1145/3190645.3190675","DOIUrl":"https://doi.org/10.1145/3190645.3190675","url":null,"abstract":"The features developed by a software engineer (system specification) for a software system may significantly differ from the features required by a user (user requirements) for their envisioned system. These discrepancies are generally resulted from the complexity of the system, the vagueness of the user requirements, or the lack of knowledge and experience of the software engineer. The principles of software engineering and the recommendations of the ACM's Software Engineering Education Knowledge (SEEK) document can provide solutions to minimize these discrepancies; in turn, improve the quality of a software system and increase user satisfaction. In this paper, a software development framework, called SETh, is presented. The SETh framework consists of a set of visual models that support software engineering education and practices in a systematic manner. It also enables backward tracking/tracing and forward tracking/tracing capabilities - two important concepts that can facilitate the greenfield and evolutionary type software engineering projects. The SETh framework connects every step of the development of a software system tightly; hence, the learners and the experienced software engineers can study, understand, and build efficient software systems for emerging data science applications.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126876474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using locality sensitive hashing to improve the KNN algorithm in the mapreduce framework 在mapreduce框架中使用位置敏感散列改进KNN算法

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190700

S. Bagui, A. Mondal, S. Bagui

The K-Nearest Neighbor! (KNN) algorithm is one of the most widely used algorithms in data mining for classification and prediction. The algorithm has several applications: in facial detection when used with deep learning, in biometric security applications etc. The traditional KNN algorithm involves an iterative process of computing the distance between a test data point and every data point in the training dataset, and classifying the object based on the closest training sample. This method first selects K nearest training data points for classifying a test data point and then predicts the test sample's class based on the majority class among those neighbors. If both the train and test datasets are large, this conventional form can be considered computationally expensive. Reduction of the massive calculation that is required to predict a data vector was our main goal, and with this intention, the training dataset was split into several buckets. The KNN algorithm was then performed inside a bucket, instead of iterating over the whole training dataset. We used the Jaccard Coefficient to determine the degree of similarity of a data vector with some arbitrarily defined data points P and placed similar data points in the same bucket. This was the core functionality of our hash function. The hash function determines the bucket number where the similar data vectors will be placed. Unlike the standard hashing algorithm, our approach of hashing was to maximize the probability of the hash collision to preserve the locality sensitiveness. Both the conventional and proposed methods were implemented in Hadoop's MapReduce framework. Hadoop gives us an architecture for handling large datasets on a computer cluster in a distributed manner and gives us massive scalability. The use of the locality sensitive hashing in KNN in Hadoop's MapReduce environment took less time than conventional KNN to classify a new data object.

k近邻!KNN算法是数据挖掘中应用最广泛的分类和预测算法之一。该算法有几个应用:与深度学习一起使用的面部检测，生物识别安全应用等。传统的KNN算法是一个迭代过程，计算测试数据点与训练数据集中每个数据点之间的距离，并根据最接近的训练样本对目标进行分类。该方法首先选择K个最近的训练数据点用于对测试数据点进行分类，然后根据这些邻居中的多数类预测测试样本的类。如果训练数据集和测试数据集都很大，那么这种传统的形式在计算上可能会很昂贵。减少预测数据向量所需的大量计算是我们的主要目标，出于这个目的，我们将训练数据集分成几个桶。KNN算法然后在一个桶内执行，而不是在整个训练数据集上迭代。我们使用Jaccard系数来确定数据向量与一些任意定义的数据点P的相似程度，并将相似的数据点放在同一桶中。这是我们哈希函数的核心功能。哈希函数确定将放置相似数据向量的桶号。与标准哈希算法不同，我们的哈希方法是最大化哈希冲突的概率，以保持局部敏感性。传统方法和提出的方法都在Hadoop的MapReduce框架中实现。Hadoop为我们提供了一个以分布式方式处理计算机集群上的大型数据集的架构，并为我们提供了巨大的可扩展性。在Hadoop的MapReduce环境中，使用KNN中的位置敏感散列比传统KNN对新数据对象进行分类所需的时间更短。

{"title":"Using locality sensitive hashing to improve the KNN algorithm in the mapreduce framework","authors":"S. Bagui, A. Mondal, S. Bagui","doi":"10.1145/3190645.3190700","DOIUrl":"https://doi.org/10.1145/3190645.3190700","url":null,"abstract":"The K-Nearest Neighbor! (KNN) algorithm is one of the most widely used algorithms in data mining for classification and prediction. The algorithm has several applications: in facial detection when used with deep learning, in biometric security applications etc. The traditional KNN algorithm involves an iterative process of computing the distance between a test data point and every data point in the training dataset, and classifying the object based on the closest training sample. This method first selects K nearest training data points for classifying a test data point and then predicts the test sample's class based on the majority class among those neighbors. If both the train and test datasets are large, this conventional form can be considered computationally expensive. Reduction of the massive calculation that is required to predict a data vector was our main goal, and with this intention, the training dataset was split into several buckets. The KNN algorithm was then performed inside a bucket, instead of iterating over the whole training dataset. We used the Jaccard Coefficient to determine the degree of similarity of a data vector with some arbitrarily defined data points P and placed similar data points in the same bucket. This was the core functionality of our hash function. The hash function determines the bucket number where the similar data vectors will be placed. Unlike the standard hashing algorithm, our approach of hashing was to maximize the probability of the hash collision to preserve the locality sensitiveness. Both the conventional and proposed methods were implemented in Hadoop's MapReduce framework. Hadoop gives us an architecture for handling large datasets on a computer cluster in a distributed manner and gives us massive scalability. The use of the locality sensitive hashing in KNN in Hadoop's MapReduce environment took less time than conventional KNN to classify a new data object.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"29 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116712974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detail preservation of morphological operations through image scaling 通过图像缩放形态学操作的细节保存

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190691

Kaleb E. Smith, Chunhua Dong, M. Naghedolfeizi, Xiangyan Zeng

Morphological techniques probe an image with a structuring element. By varying the size and the shape of structuring elements, geometrical information of different parts of an image and their interrelation can be extracted for the applications of demodulating boundary, identifying components or removing noise. While large size elements benefits eliminating noise, they may be disadvantageous for preserving details in an image. Taking this into consideration, in this paper, we propose an image scaling method that will preserve detailed information when applying morphological operations to remove noise. First, a binary image is obtained, from which a Preservation Ratio Scalar (PRS) is calculated. The PRS is used for upscaling the image before morphological operations, which aims at preserving structural fine details otherwise eliminated in the original image. Finally, the morphological operator processed image is downscaled using the PRS. Experiments of target detection demonstrated the effectiveness of the proposed method in preserving the structural details such as edges while eliminating noises.

形态学技术用结构元素探测图像。通过改变结构元素的大小和形状，可以提取图像不同部分的几何信息及其相互关系，用于解调边界、识别成分或去除噪声。虽然大尺寸的元素有利于消除噪声，但它们可能不利于保留图像中的细节。考虑到这一点，在本文中，我们提出了一种图像缩放方法，该方法可以在应用形态学操作去除噪声时保留详细信息。首先，获取二值图像，计算二值图像的保存比标量(PRS)。PRS用于在形态学操作之前对图像进行放大，目的是保留原始图像中消除的结构细节。最后，利用PRS对形态学算子处理后的图像进行缩尺度处理。目标检测实验表明，该方法在去除噪声的同时能够有效地保留图像的边缘等结构细节。

引用次数: 1

A team software process approach to database course 数据库课程的团队软件过程方法

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190676

R. Tashakkori, Zachary W. Andrews

In recent years, some programs have created a database track to provide an opportunity for students to further their database skills and expertise. As database management systems are being utilized widely in the real-world and become an integral part of computer science applications, it is critical for students to gain practical experiences in this field.

近年来，一些课程开设了数据库课程，为学生提供进一步提高数据库技能和专业知识的机会。随着数据库管理系统在现实世界中被广泛应用，并成为计算机科学应用的一个组成部分，学生在该领域获得实践经验是至关重要的。

引用次数: 2

Preliminary studies of honey queen bee conditions using Cyranose 320 nose technology 用Cyranose 320鼻技术对蜂王条件的初步研究

Proceedings of the ACMSE 2018 Conference

Pub Date : 2018-03-29 DOI: 10.1145/3190645.3190696

G. Johnson, Drashti Patel, Adel Alluhayb, Nannan Li, Chi Shen, T. Webster

Over the last ten years, the bee keeping industry has been struggling to understand and stop the sudden widespread loss or collapse of honey bee colonies, known collectively as Colony Collapse Disorder (CCD), in the U.S. and around the world. While honey bee colonies experience many stressors that could cause a colony to collapse, we are focusing on the quality, health and reproductive ability of honey bee queens. The purpose of this line of research is to identify relationships between the pheromone signatures of honey bee queens and the quality of honey bee queens. The ultimate goal of this research is to find a reliable, non-invasive tool that does not harm the queen, but still allows beekeepers to make informed decisions about purchasing honey bee queens and deciding when to replace a queen bee before a colony collapses. In this portion of the research, we use an electronic nose (e-nose) device, which is a device that digitizes smells. The scope of this paper is to determine whether an e-nose device is viable for our research, and if so, to determine the best way to configure the settings to improve data collection. Also, to gather data on queen bee pheromones production was considered since that is an indicator of a queen bee's reproductive ability. We were able to use the e-nose device to digitize pheromone signatures from 20 queen bees. Using Microsoft excel and R programming language, we were able to see patterns that will be useful in configuring the e-nose device for future research. We also noticed an early indication that the e-nose can distinguish between a healthy bee and a sick bee.

在过去的十年里，养蜂业一直在努力理解和阻止蜂群的突然广泛损失或崩溃，统称为蜂群衰竭失调(CCD)，在美国和世界各地。虽然蜂群经历了许多可能导致蜂群崩溃的压力因素，但我们关注的是蜂王的质量、健康和繁殖能力。这项研究的目的是确定蜂王的信息素特征和蜂王的质量之间的关系。这项研究的最终目标是找到一种可靠的，非侵入性的工具，不伤害蜂王，但仍然允许养蜂人做出明智的决定，购买蜂王，并决定何时更换蜂王，以免蜂群崩溃。在这一部分的研究中，我们使用了电子鼻(e-nose)设备，这是一种将气味数字化的设备。本文的范围是确定电子鼻设备是否适用于我们的研究，如果可行，则确定配置设置以改进数据收集的最佳方法。此外，还考虑了收集蜂王信息素产生的数据，因为这是蜂王繁殖能力的一个指标。我们能够使用电子鼻设备将20只蜂王的信息素特征数字化。使用微软excel和R编程语言，我们能够看到在配置电子鼻设备为未来的研究有用的模式。我们还注意到一个早期的迹象，即电子鼻可以区分健康的蜜蜂和生病的蜜蜂。

{"title":"Preliminary studies of honey queen bee conditions using Cyranose 320 nose technology","authors":"G. Johnson, Drashti Patel, Adel Alluhayb, Nannan Li, Chi Shen, T. Webster","doi":"10.1145/3190645.3190696","DOIUrl":"https://doi.org/10.1145/3190645.3190696","url":null,"abstract":"Over the last ten years, the bee keeping industry has been struggling to understand and stop the sudden widespread loss or collapse of honey bee colonies, known collectively as Colony Collapse Disorder (CCD), in the U.S. and around the world. While honey bee colonies experience many stressors that could cause a colony to collapse, we are focusing on the quality, health and reproductive ability of honey bee queens. The purpose of this line of research is to identify relationships between the pheromone signatures of honey bee queens and the quality of honey bee queens. The ultimate goal of this research is to find a reliable, non-invasive tool that does not harm the queen, but still allows beekeepers to make informed decisions about purchasing honey bee queens and deciding when to replace a queen bee before a colony collapses. In this portion of the research, we use an electronic nose (e-nose) device, which is a device that digitizes smells. The scope of this paper is to determine whether an e-nose device is viable for our research, and if so, to determine the best way to configure the settings to improve data collection. Also, to gather data on queen bee pheromones production was considered since that is an indicator of a queen bee's reproductive ability. We were able to use the e-nose device to digitize pheromone signatures from 20 queen bees. Using Microsoft excel and R programming language, we were able to see patterns that will be useful in configuring the e-nose device for future research. We also noticed an early indication that the e-nose can distinguish between a healthy bee and a sick bee.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131573065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0