Applied Computer Science最新文献

英文中文

CYBER SECURITY IN INDUSTRIAL CONTROL SYSTEMS (ICS): A SURVEY OF ROWHAMMER VULNERABILITY 工业控制系统中的网络安全：ROWHAMMER漏洞调查

Q3 Economics, Econometrics and Finance

Applied Computer Science

Pub Date : 2022-06-30 DOI: 10.35784/acs-2022-15

Hakan Aydin, A. Sertbas

Increasing dependence on Information and Communication Technologies (ICT) and especially on the Internet in Industrial Control Systems (ICS) has made these systems the primary target of cyber-attacks. As ICS are extensively used in Critical Infrastructures (CI), this makes CI more vulnerable to cyber-attacks and their protection becomes an important issue. On the other hand, cyberattacks can exploit not only software but also physics; that is, they can target the fundamental physical aspects of computation. The newly discovered RowHammer (RH) fault injection attack is a serious vulnerability targeting hardware on reliability and security of DRAM (Dynamic Random Access Memory). Studies on this vulnerability issue raise serious security concerns. The purpose of this study was to overview the RH phenomenon in DRAMs and its possible security risks on ICSs and to discuss a few possible realistic RH attack scenarios for ICSs. The results of the study revealed that RH is a serious security threat to any computer-based system having DRAMs, and this also applies to ICS.

工业控制系统越来越依赖信息和通信技术，尤其是互联网，使这些系统成为网络攻击的主要目标。由于ICS在关键基础设施（CI）中被广泛使用，这使得CI更容易受到网络攻击，其保护成为一个重要问题。另一方面，网络攻击不仅可以利用软件，还可以利用物理；也就是说，它们可以针对计算的基本物理方面。新发现的RowHammer（RH）故障注入攻击是针对DRAM（Dynamic Random Access Memory，动态随机存取存储器）硬件的一个严重的可靠性和安全性漏洞。对这一漏洞问题的研究引起了严重的安全问题。本研究的目的是概述DRAM中的RH现象及其对ICSs可能的安全风险，并讨论ICSs可能存在的几种现实RH攻击场景。研究结果表明，RH对任何具有DRAM的基于计算机的系统都是严重的安全威胁，这也适用于ICS。

引用次数: 4

TOMATO DISEASE DETECTION MODEL BASED ON DENSENET AND TRANSFER LEARNING 基于DENSENET和迁移学习的番茄病害检测模型

Q3 Economics, Econometrics and Finance

Applied Computer Science

Pub Date : 2022-06-30 DOI: 10.35784/acs-2022-13

Mahmoud Bakr, Sayed Abdel-Gaber, M. Nasr, M. Hazman

Plant diseases are a foremost risk to the safety of food. They have the potential to significantly reduce agricultural products quality and quantity. In agriculture sectors, it is the most prominent challenge to recognize plant diseases. In computer vision, the Convolutional Neural Network (CNN) produces good results when solving image classification tasks. For plant disease diagnosis, many deep learning architectures have been applied. This paper introduces a transfer learning based model for detecting tomato leaf diseases. This study proposes a model of DenseNet201 as a transfer learning-based model and CNN classifier. A comparison study between four deep learning models (VGG16, Inception V3, ResNet152V2 and DenseNet201) done in order to determine the best accuracy in using transfer learning in plant disease detection. The used images dataset contains 22930 photos of tomato leaves in 10 different classes, 9 disorders and one healthy class. In our experimental, the results shows that the proposed model achieves the highest training accuracy of 99.84% and validation accuracy of 99.30%.

植物病害是食品安全的首要风险。它们有可能大大降低农产品的质量和数量。在农业部门，识别植物病害是最突出的挑战。在计算机视觉中，卷积神经网络（CNN）在解决图像分类任务时产生了良好的效果。对于植物疾病诊断，许多深度学习架构已经被应用。本文介绍了一种基于迁移学习的番茄叶片病害检测模型。本研究提出了一个DenseNet201模型作为基于迁移学习的模型和CNN分类器。四种深度学习模型（VGG16、Inception V3、ResNet152V2和DenseNet201）之间的比较研究，以确定在植物疾病检测中使用迁移学习的最佳准确性。所使用的图像数据集包含22930张番茄叶片的照片，这些照片分为10个不同类别、9个疾病类别和一个健康类别。在我们的实验中，结果表明，所提出的模型实现了99.84%的最高训练精度和99.30%的验证精度。

引用次数: 1

A DISTRIBUTED ALGORITHM FOR PROTEIN IDENTIFICATION FROM TANDEM MASS SPECTROMETRY DATA 从串联质谱数据中进行蛋白质鉴定的分布式算法

Q3 Economics, Econometrics and Finance

Applied Computer Science

Pub Date : 2022-06-30 DOI: 10.35784/acs-2022-10

Katarzyna Orzechowska, T. Rubel, R. Kurjata, Krzysztof Zaremba

Tandem mass spectrometry is an analytical technique widely used in proteomics for the high-throughput characterization of proteins in biological samples. Modern in-depth proteomic studies require the collection of even millions of mass spectra representing short protein fragments (peptides). In order to identify the peptides, the measured spectra are most often scored against a database of amino acid sequences of known proteins. Due to the volume of input data and the sizes of proteomic databases, this is a resource-intensive task, which requires an efficient and scalable computational strategy. Here, we present SparkMS, an algorithm for peptide and protein identification from mass spectrometry data explicitly designed to work in a distributed computational environment. To achieve the required performance and scalability, we use Apache Spark, a modern framework that is becoming increasingly popular not only in the field of “big data” analysis but also in bioinformatics. This paper describes the algorithm in detail and demonstrates its performance on a large proteomic dataset. Experimental results indicate that SparkMS scales with the number of worker nodes and the increasing complexity of the search task. Furthermore, it exhibits a protein identification efficiency comparable to X!Tandem, a widely-used proteomic search engine.

串联质谱法是一种广泛应用于蛋白质组学的分析技术，用于生物样品中蛋白质的高通量表征。现代深入的蛋白质组学研究需要收集甚至数百万个代表短蛋白质片段(肽)的质谱。为了鉴定肽，测量的光谱通常是根据已知蛋白质的氨基酸序列数据库进行评分。由于输入数据量和蛋白质组学数据库的大小，这是一项资源密集型任务，需要高效和可扩展的计算策略。在这里，我们提出了SparkMS，一种从质谱数据中识别肽和蛋白质的算法，明确设计用于分布式计算环境。为了实现所需的性能和可扩展性，我们使用了Apache Spark，这是一个现代框架，不仅在“大数据”分析领域，而且在生物信息学领域也越来越流行。本文详细介绍了该算法，并在大型蛋白质组学数据集上展示了其性能。实验结果表明，SparkMS随着工作节点数量和搜索任务复杂度的增加而扩展。此外，其蛋白质鉴定效率可与X!Tandem，一个广泛使用的蛋白质组学搜索引擎。

{"title":"A DISTRIBUTED ALGORITHM FOR PROTEIN IDENTIFICATION FROM TANDEM MASS SPECTROMETRY DATA","authors":"Katarzyna Orzechowska, T. Rubel, R. Kurjata, Krzysztof Zaremba","doi":"10.35784/acs-2022-10","DOIUrl":"https://doi.org/10.35784/acs-2022-10","url":null,"abstract":"Tandem mass spectrometry is an analytical technique widely used in proteomics for the high-throughput characterization of proteins in biological samples. Modern in-depth proteomic studies require the collection of even millions of mass spectra representing short protein fragments (peptides). In order to identify the peptides, the measured spectra are most often scored against a database of amino acid sequences of known proteins. Due to the volume of input data and the sizes of proteomic databases, this is a resource-intensive task, which requires an efficient and scalable computational strategy. Here, we present SparkMS, an algorithm for peptide and protein identification from mass spectrometry data explicitly designed to work in a distributed computational environment. To achieve the required performance and scalability, we use Apache Spark, a modern framework that is becoming increasingly popular not only in the field of “big data” analysis but also in bioinformatics. This paper describes the algorithm in detail and demonstrates its performance on a large proteomic dataset. Experimental results indicate that SparkMS scales with the number of worker nodes and the increasing complexity of the search task. Furthermore, it exhibits a protein identification efficiency comparable to X!Tandem, a widely-used proteomic search engine.","PeriodicalId":36379,"journal":{"name":"Applied Computer Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43968129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

APPLICATION OF FINITE DIFFERENCE METHOD FOR MEASUREMENT SIMULATION IN ULTRASOUND TRANSMISSION TOMOGRAPHY 有限差分法在超声透射层析成像测量仿真中的应用

Q3 Economics, Econometrics and Finance

Applied Computer Science

Pub Date : 2022-06-30 DOI: 10.35784/acs-2022-16

Konrad Kania, Mariusz Mazurek, T. Rymarczyk

In this work, we present a computer simulation model that generates the propagation of sound waves to solve a forward problem in ultrasound transmission tomography. The simulator can be used to create data sets used in the supervised learning process. A solution to the "free-space" boundary problem was proposed, and the memory consumption was significantly optimized from O(n2) to O(n). The given method of simulating wave scattering enables the control of the noise extinction time within the tomographic probe and the permeability of the sound wave. The presented version of the script simulates the classic variant of a circular probe with evenly distributed sensors around the circumference.

在这项工作中，我们提出了一个产生声波传播的计算机模拟模型，以解决超声透射断层扫描中的正演问题。该模拟器可用于创建用于监督学习过程的数据集。提出了一种求解“自由空间”边界问题的方法，内存消耗从O(n2)显著优化到O(n)。所给出的模拟波散射的方法能够控制层析探头内的噪声消除时间和声波的渗透率。所呈现的脚本版本模拟了圆形探针的经典变体，其周围均匀分布着传感器。

引用次数: 0

USE OF SERIOUS GAMES FOR THE ASSESSMENT OF MILD COGNITIVE IMPAIRMENT IN THE ELDERLY 使用严肃游戏评估老年人轻度认知障碍

Q3 Economics, Econometrics and Finance

Applied Computer Science

Pub Date : 2022-03-31 DOI: 10.35784/acs-2022-9

Moon-gee Choi

This study investigated the use of computer games to detect the symptoms of mild cognitive impairment (MCI), an early stage of dementia, in the elderly. To this end, three serious games were used to measure the visio-perception coordination and psycho-motor abilities, spatial memory, and short-term digit span memory. Subsequently, the correlations between the results of the games and the results of the Korean Mini-Mental State Examination (K-MMSE), a dementia screening test, were analyzed. In addition, the game results of normal elderly persons were compared with those of elderly patients who exhibited MCI symptoms. The results indicated that the game play time and the frequency of errors had significant correlations with K-MMSE. Significant differences were also found in several factors between the control group and the group with MCI. Based on these findings, the advantages and disadvantages of using serious games as tools for screening mild cognitive impairment were discussed.

这项研究调查了使用电脑游戏来检测轻度认知障碍(MCI)的症状，这是老年人痴呆症的早期阶段。为此，采用三个严肃游戏来测量视觉感知协调和心理运动能力、空间记忆和短期数字广度记忆。随后，分析了游戏结果与韩国迷你精神状态检查(K-MMSE)结果之间的相关性，这是一种痴呆症筛查测试。此外，我们还比较了正常老年人与表现MCI症状的老年患者的游戏结果。结果表明，游戏时间和错误频次与K-MMSE有显著相关。对照组和轻度认知损伤组在几个因素上也有显著差异。基于这些发现，讨论了使用严肃游戏作为筛查轻度认知障碍的工具的利弊。

引用次数: 0

HISTOPATHOLOGY IMAGE CLASSIFICATION USING HYBRID PARALLEL STRUCTURED DEEP-CNN MODELS 基于混合并行结构深度cnn模型的组织病理学图像分类

Q3 Economics, Econometrics and Finance

Applied Computer Science

Pub Date : 2022-03-30 DOI: 10.35784/acs-2022-2

K. Dsouza, Z. Ansari

The healthcare industry is one of the many out there that could majorly benefit from advancement in the technology it utilizes. Artificial intelligence (AI) technologies are especially integral and specifically deep learning (DL); a highly useful data-driven technology. It is applied in a variety of different methods but it mainly depends on the structure of the available data. However, with varying applications, this technology produces data in different contexts with particular connotations. Reports which are the images of scans play a great role in identifying the existence of the disease in a patient. Further, the automation in processing these images using technology like CNN-based models makes it highly efficient in reducing human errors otherwise resulting in large data. Hence this study presents a hybrid deep learning architecture to classify the histopathology images to identify the presence of cancer in a patient. Further, the proposed models are parallelized using the TensorFlow-GPU framework to accelerate the training of these deep CNN (Convolution Neural Networks) architectures. This study uses the transfer learning technique during training and early stopping criteria are used to avoid overfitting during the training phase. these models use LSTM parallel layer imposed in the model to experiment with four considered architectures such as MobileNet, VGG16, and ResNet with 101 and 152 layers. The experimental results produced by these hybrid models show that the capability of Hybrid ResNet101 and Hybrid ResNet152 architectures are highly suitable with an accuracy of 90% and 92%. Finally, this study concludes that the proposed Hybrid ResNet-152 architecture is highly efficient in classifying the histopathology images. The proposed study has conducted a well-focused and detailed experimental study which will further help researchers to understand the deep CNN architectures to be applied in application development.

医疗保健行业是众多可以从其所利用的技术进步中获益的行业之一。人工智能(AI)技术尤其不可或缺，特别是深度学习(DL);一种非常有用的数据驱动技术。它应用于各种不同的方法，但它主要取决于可用数据的结构。然而，对于不同的应用程序，该技术在具有特定内涵的不同上下文中生成数据。报告是扫描图像，在确定患者是否存在疾病方面起着很大的作用。此外，使用基于cnn的模型等技术自动化处理这些图像，使其在减少人为错误方面非常有效，否则会导致大数据。因此，本研究提出了一种混合深度学习架构，用于对组织病理学图像进行分类，以识别患者是否存在癌症。此外，所提出的模型使用TensorFlow-GPU框架并行化，以加速这些深度CNN(卷积神经网络)架构的训练。本研究在训练过程中使用迁移学习技术，并使用早期停止准则来避免训练阶段的过拟合。这些模型使用强加在模型中的LSTM并行层来实验四种考虑的体系结构，如MobileNet、VGG16和ResNet的101层和152层。混合模型的实验结果表明，混合ResNet101和混合ResNet152架构的性能非常合适，准确率分别为90%和92%。最后，本研究得出结论，提出的Hybrid ResNet-152架构在组织病理图像分类方面是高效的。本研究进行了针对性强、详细的实验研究，将进一步帮助研究人员了解深度CNN架构在应用开发中的应用。

{"title":"HISTOPATHOLOGY IMAGE CLASSIFICATION USING HYBRID PARALLEL STRUCTURED DEEP-CNN MODELS","authors":"K. Dsouza, Z. Ansari","doi":"10.35784/acs-2022-2","DOIUrl":"https://doi.org/10.35784/acs-2022-2","url":null,"abstract":"The healthcare industry is one of the many out there that could majorly benefit from advancement in the technology it utilizes. Artificial intelligence (AI) technologies are especially integral and specifically deep learning (DL); a highly useful data-driven technology. It is applied in a variety of different methods but it mainly depends on the structure of the available data. However, with varying applications, this technology produces data in different contexts with particular connotations. Reports which are the images of scans play a great role in identifying the existence of the disease in a patient. Further, the automation in processing these images using technology like CNN-based models makes it highly efficient in reducing human errors otherwise resulting in large data. Hence this study presents a hybrid deep learning architecture to classify the histopathology images to identify the presence of cancer in a patient. Further, the proposed models are parallelized using the TensorFlow-GPU framework to accelerate the training of these deep CNN (Convolution Neural Networks) architectures. This study uses the transfer learning technique during training and early stopping criteria are used to avoid overfitting during the training phase. these models use LSTM parallel layer imposed in the model to experiment with four considered architectures such as MobileNet, VGG16, and ResNet with 101 and 152 layers. The experimental results produced by these hybrid models show that the capability of Hybrid ResNet101 and Hybrid ResNet152 architectures are highly suitable with an accuracy of 90% and 92%. Finally, this study concludes that the proposed Hybrid ResNet-152 architecture is highly efficient in classifying the histopathology images. The proposed study has conducted a well-focused and detailed experimental study which will further help researchers to understand the deep CNN architectures to be applied in application development.","PeriodicalId":36379,"journal":{"name":"Applied Computer Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41613980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

STRENGTH ANALYSIS OF A PROTOTYPE COMPOSITE HELICOPTER ROTOR BLADE SPAR 某型复合材料直升机旋翼桨叶强度分析

Q3 Economics, Econometrics and Finance

Applied Computer Science

Pub Date : 2022-03-30 DOI: 10.35784/acs-2022-1

R. Kliza, Karol Ścisłowski, K. Siadkowska, Jacek Padyjasek, M. Wendeker

This paper investigates the strenght of a conceptual main rotor blade dedicated to an unmanned helicopter. The blade is made of smart materials in order to optimize the efficiency of the aircraft by increasing its aerodynamic performance. This purpose was achieved by performing a series of strength calculations for the blade of a prototype main rotor used in an unmanned helicopter. The calculations were done with the Finite Element Method (FEM) and software like CAE (Computer-Aided Engineering) which uses advanced techniques of computer modeling of load in composite structures. Our analysis included CAD (Computer-Aided Design) modeling the rotor blade, importing the solid model into the CAE software, defining the simulation boundary conditions and performing strength calculations of the blade spar for selected materials used in aviation, i.e. fiberglass and carbon fiber laminate. This paper presents the results and analysis of the numerical calculations.

本文研究了用于无人直升机的概念主旋翼叶片的强度。叶片由智能材料制成，通过提高飞机的空气动力学性能来优化飞机的效率。该目的是通过对无人直升机中使用的原型主旋翼叶片进行一系列强度计算来实现的。计算是用有限元法(FEM)和CAE(计算机辅助工程)等软件完成的，CAE(计算机辅助工程)采用了复合结构中载荷的先进计算机建模技术。我们的分析包括CAD(计算机辅助设计)对旋翼叶片进行建模，将实体模型导入CAE软件，定义仿真边界条件，并对选定的航空材料(即玻璃纤维和碳纤维层压板)进行叶片梁的强度计算。本文给出了数值计算的结果和分析。

引用次数: 1

DETECTION AND CLASSIFICATION OF VEGETATION AREAS FROM RED AND NEAR INFRARED BANDS OF LANDSAT-8 OPTICAL SATELLITE IMAGE LANDSAT-8光学卫星图像红、近红外波段植被区的检测与分类

Q3 Economics, Econometrics and Finance

Applied Computer Science

Pub Date : 2022-03-30 DOI: 10.35784/acs-2022-4

Anusha Nallapareddy

Detection and classification of vegetation is a crucial technical task in the management of natural resources since vegetation serves as a foundation for all living things and has a significant impact on climate change such as impacting terrestrial carbon dioxide (CO2). Traditional approaches for acquiring vegetation covers such as field surveys, map interpretation, collateral and data analysis are ineffective as they are time consuming and expensive. In this paper vegetation regions are automatically detected by applying simple but effective vegetation indices Normalized Difference Vegetation Index (NDVI) and Soil Adjusted Vegetation Index (SAVI) on red(R) and near infrared (NIR) bands of Landsat-8 satellite image. Remote sensing technology makes it possible to analyze vegetation cover across wide areas in a cost-effective manner. Using remotely sensed images, the mapping of vegetation requires a number of factors, techniques, and methodologies. The rapid improvement of remote sensing technologies broadens possibilities for image sources making remotely sensed images more accessible. The dataset used in this paper is the R and NIR bands of Level-1 Tier 1 Landsat-8 optical remote sensing image acquired on 6th September 2013, is processed and made available to users on 2nd May 2017. The pre-processing involving sub-setting operation is performed using the ERDAS Imagine tool on R and NIR bands of Landsat-8 image. The NDVI and SAVI are utilized to extract vegetation features automatically by using python language. Finally by establishing a threshold, vegetation cover of the research area is detected and then classified.

植被的检测和分类是自然资源管理中的一项关键技术任务，因为植被是所有生物的基础，对气候变化有重大影响，例如影响陆地二氧化碳。获取植被覆盖物的传统方法，如实地调查、地图解释、抵押品和数据分析，由于耗时且昂贵，因此无效。本文应用简单有效的植被指数归一化植被指数（NDVI）和土壤调整植被指数（SAVI）对陆地卫星8号卫星图像的红色（R）和近红外（NIR）波段进行植被区域自动检测。遥感技术使以具有成本效益的方式分析大面积植被覆盖成为可能。利用遥感图像绘制植被图需要多种因素、技术和方法。遥感技术的快速进步拓宽了图像源的可能性，使遥感图像更容易获取。本文使用的数据集是2013年9月6日采集的一级陆地卫星-8光学遥感图像的R和NIR波段，并于2017年5月2日进行处理并向用户提供。使用ERDAS Imagine工具对Landsat-8图像的R和NIR波段进行预处理，包括子设置操作。利用python语言，利用NDVI和SAVI自动提取植被特征。最后通过建立阈值，对研究区的植被覆盖进行检测和分类。

{"title":"DETECTION AND CLASSIFICATION OF VEGETATION AREAS FROM RED AND NEAR INFRARED BANDS OF LANDSAT-8 OPTICAL SATELLITE IMAGE","authors":"Anusha Nallapareddy","doi":"10.35784/acs-2022-4","DOIUrl":"https://doi.org/10.35784/acs-2022-4","url":null,"abstract":"Detection and classification of vegetation is a crucial technical task in the management of natural resources since vegetation serves as a foundation for all living things and has a significant impact on climate change such as impacting terrestrial carbon dioxide (CO2). Traditional approaches for acquiring vegetation covers such as field surveys, map interpretation, collateral and data analysis are ineffective as they are time consuming and expensive. In this paper vegetation regions are automatically detected by applying simple but effective vegetation indices Normalized Difference Vegetation Index (NDVI) and Soil Adjusted Vegetation Index (SAVI) on red(R) and near infrared (NIR) bands of Landsat-8 satellite image. Remote sensing technology makes it possible to analyze vegetation cover across wide areas in a cost-effective manner. Using remotely sensed images, the mapping of vegetation requires a number of factors, techniques, and methodologies. The rapid improvement of remote sensing technologies broadens possibilities for image sources making remotely sensed images more accessible. The dataset used in this paper is the R and NIR bands of Level-1 Tier 1 Landsat-8 optical remote sensing image acquired on 6th September 2013, is processed and made available to users on 2nd May 2017. The pre-processing involving sub-setting operation is performed using the ERDAS Imagine tool on R and NIR bands of Landsat-8 image. The NDVI and SAVI are utilized to extract vegetation features automatically by using python language. Finally by establishing a threshold, vegetation cover of the research area is detected and then classified.","PeriodicalId":36379,"journal":{"name":"Applied Computer Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49162887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DETECTION OF SOURCE CODE IN INTERNET TEXTS USING AUTOMATICALLY GENERATED MACHINE LEARNING MODELS 使用自动生成的机器学习模型检测互联网文本中的源代码

Q3 Economics, Econometrics and Finance

Applied Computer Science

Pub Date : 2022-03-30 DOI: 10.35784/acs-2022-7

M. Badurowicz

In the paper, the authors are presenting the outcome of web scraping software allowing for the automated classification of source code. The software system was prepared for a discussion forum for software developers to find fragments of source code that were published without marking them as code snippets. The analyzer software is using a Machine Learning binary classification model for differentiating between a programming language source code and highly technical text about software. The analyzer model was prepared using the AutoML subsystem without human intervention and fine-tuning and its accuracy in a described problem exceeds 95%. The analyzer based on the automatically generated model has been deployed and after the first year of continuous operation, its False Positive Rate is less than 3%. The similar process may be introduced in document management in software development process, where automatic tagging and search for code or pseudo-code may be useful for archiving purposes.

在这篇论文中，作者展示了一种允许对源代码进行自动分类的网络抓取软件的成果。该软件系统是为软件开发人员准备的讨论论坛，用于查找发布的源代码片段，而不将其标记为代码片段。分析软件使用机器学习二元分类模型来区分编程语言源代码和关于软件的高技术性文本。该模型采用AutoML子系统编制，无需人工干预和微调，对所描述问题的准确率超过95%。基于自动生成模型的分析仪已投入使用，经过第一年的连续运行，其误报率小于3%。在软件开发过程中的文档管理中可能会引入类似的过程，其中自动标记和搜索代码或伪代码可能对存档目的很有用。

引用次数: 1

IMPROVING CORONARY HEART DISEASE PREDICTION BY OUTLIER ELIMINATION 应用异常值消除改进冠心病预测

Q3 Economics, Econometrics and Finance

Applied Computer Science

Pub Date : 2022-03-30 DOI: 10.35784/acs-2022-6

Lubna Riyaz, M. A. Butt, Majid Zaman

Nowadays, heart disease is the major cause of deaths globally. According to a survey conducted by the World Health Organization, almost 18 million people die of heart diseases (or cardiovascular diseases) every day. So, there should be a system for early detection and prevention of heart disease. Detection of heart disease mostly depends on the huge pathological and clinical data that is quite complex. So, researchers and other medical professionals are showing keen interest in accurate prediction of heart disease. Heart disease is a general term for a large number of medical conditions related to heart and one of them is the coronary heart disease (CHD). Coronary heart disease is caused by the amassing of plaque on the artery walls. In this paper, various machine learning base and ensemble classifiers have been applied on heart disease dataset for efficient prediction of coronary heart disease. Various machine learning classifiers that have been employed include k-nearest neighbor, multilayer perceptron, multinomial naïve bayes, logistic regression, decision tree, random forest and support vector machine classifiers. Ensemble classifiers that have been used include majority voting, weighted average, bagging and boosting classifiers. The dataset used in this study is obtained from the Framingham Heart Study which is a long-term, ongoing cardiovascular study of people from the Framingham city in Massachusetts, USA. To evaluate the performance of the classifiers, various evaluation metrics including accuracy, precision, recall and f1 score have been used. According to our results, the best accuracy was achieved by logistic regression, random forest, majority voting, weighted average and bagging classifiers but the highest accuracy among these was achieved using weighted average ensemble classifier.

如今，心脏病是全球死亡的主要原因。根据世界卫生组织的一项调查，每天有近1800万人死于心脏病（或心血管疾病）。因此，应该有一个早期发现和预防心脏病的系统。心脏病的检测主要取决于庞大的病理和临床数据，这些数据相当复杂。因此，研究人员和其他医学专业人士对心脏病的准确预测表现出了浓厚的兴趣。心脏病是大量与心脏有关的疾病的总称，其中之一是冠心病。冠心病是由动脉壁上的斑块堆积引起的。本文将各种机器学习库和集成分类器应用于心脏病数据集，以有效预测冠心病。已经使用的各种机器学习分类器包括k近邻、多层感知器、多项式朴素贝叶斯、逻辑回归、决策树、随机森林和支持向量机分类器。已经使用的集合分类器包括多数投票、加权平均、装袋和提升分类器。本研究中使用的数据集来自弗雷明汉心脏研究，这是一项针对美国马萨诸塞州弗雷明汉市人群的长期、持续的心血管研究。为了评估分类器的性能，使用了各种评估指标，包括准确性、准确度、召回率和f1分数。根据我们的结果，逻辑回归、随机森林、多数投票、加权平均和套袋分类器获得了最好的准确度，但其中使用加权平均集成分类器获得了最高的准确度。

{"title":"IMPROVING CORONARY HEART DISEASE PREDICTION BY OUTLIER ELIMINATION","authors":"Lubna Riyaz, M. A. Butt, Majid Zaman","doi":"10.35784/acs-2022-6","DOIUrl":"https://doi.org/10.35784/acs-2022-6","url":null,"abstract":"Nowadays, heart disease is the major cause of deaths globally. According to a survey conducted by the World Health Organization, almost 18 million people die of heart diseases (or cardiovascular diseases) every day. So, there should be a system for early detection and prevention of heart disease. Detection of heart disease mostly depends on the huge pathological and clinical data that is quite complex. So, researchers and other medical professionals are showing keen interest in accurate prediction of heart disease. Heart disease is a general term for a large number of medical conditions related to heart and one of them is the coronary heart disease (CHD). Coronary heart disease is caused by the amassing of plaque on the artery walls. In this paper, various machine learning base and ensemble classifiers have been applied on heart disease dataset for efficient prediction of coronary heart disease. Various machine learning classifiers that have been employed include k-nearest neighbor, multilayer perceptron, multinomial naïve bayes, logistic regression, decision tree, random forest and support vector machine classifiers. Ensemble classifiers that have been used include majority voting, weighted average, bagging and boosting classifiers. The dataset used in this study is obtained from the Framingham Heart Study which is a long-term, ongoing cardiovascular study of people from the Framingham city in Massachusetts, USA. To evaluate the performance of the classifiers, various evaluation metrics including accuracy, precision, recall and f1 score have been used. According to our results, the best accuracy was achieved by logistic regression, random forest, majority voting, weighted average and bagging classifiers but the highest accuracy among these was achieved using weighted average ensemble classifier. ","PeriodicalId":36379,"journal":{"name":"Applied Computer Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42392134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Applied Computer Science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀