2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...最新文献
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00065
S. Dang, L. Allison
In this work, we report the performance of the deep learning model in automatically assigning joint scores and overall patients scores for Rheumatoid Arthritis patients’ X-ray images. The dataset is from RA2 DREAM Challenge https://www.synapse.org/#!Synapse:syn20545111/wiki/594083. Overall, we achieve good predictive performance with an average accuracy of 0.908.
{"title":"Using Deep Learning To Assign Rheumatoid Arthritis Scores","authors":"S. Dang, L. Allison","doi":"10.1109/IRI49571.2020.00065","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00065","url":null,"abstract":"In this work, we report the performance of the deep learning model in automatically assigning joint scores and overall patients scores for Rheumatoid Arthritis patients’ X-ray images. The dataset is from RA2 DREAM Challenge https://www.synapse.org/#!Synapse:syn20545111/wiki/594083. Overall, we achieve good predictive performance with an average accuracy of 0.908.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"24 1","pages":"399-402"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90803574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00029
Alejandro Gabriel Villanueva Zacarias, Rachaa Ghabri, P. Reimann
Machine learning is increasingly adopted in manufacturing use cases, e.g., for fault detection in a production line. Each new use case requires developing its own machine learning (ML) solution. A ML solution integrates different software components to read, process, and analyze all use case data, as well as to finally generate the output that domain experts need for their decision-making. The process to design a system specification for a ML solution is not straight-forward. It entails two types of complexity: (1) The technical complexity of selecting combinations of ML algorithms and software components that suit a use case; (2) the organizational complexity of integrating different requirements from a multidisciplinary team of, e.g., domain experts, data scientists, and IT specialists. In this paper, we propose several adaptations to Axiomatic Design in order to design ML solution specifications that handle these complexities. We call this Axiomatic Design for Machine Learning (AD4ML). We apply AD4ML to specify a ML solution for a fault detection use case and discuss to what extent our approach conquers the above-mentioned complexities. We also discuss how AD4ML facilitates the agile design of ML solutions.
{"title":"AD4ML: Axiomatic Design to Specify Machine Learning Solutions for Manufacturing","authors":"Alejandro Gabriel Villanueva Zacarias, Rachaa Ghabri, P. Reimann","doi":"10.1109/IRI49571.2020.00029","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00029","url":null,"abstract":"Machine learning is increasingly adopted in manufacturing use cases, e.g., for fault detection in a production line. Each new use case requires developing its own machine learning (ML) solution. A ML solution integrates different software components to read, process, and analyze all use case data, as well as to finally generate the output that domain experts need for their decision-making. The process to design a system specification for a ML solution is not straight-forward. It entails two types of complexity: (1) The technical complexity of selecting combinations of ML algorithms and software components that suit a use case; (2) the organizational complexity of integrating different requirements from a multidisciplinary team of, e.g., domain experts, data scientists, and IT specialists. In this paper, we propose several adaptations to Axiomatic Design in order to design ML solution specifications that handle these complexities. We call this Axiomatic Design for Machine Learning (AD4ML). We apply AD4ML to specify a ML solution for a fault detection use case and discuss to what extent our approach conquers the above-mentioned complexities. We also discuss how AD4ML facilitates the agile design of ML solutions.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"128 1","pages":"148-155"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80984663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00023
Nuha Zamzami, Pantea Koochemeshkian, N. Bouguila
The novel coronavirus (COVID-19) that started last December in Wuhan, Hubei Province, China has become a serious healthcare threat with over five million confirmed cases in 215 countries around the world as on May 20. The World Health Organization recommends a rapid diagnosis and immediate isolation of suspected cases. Thus, there is an imminent need to develop an automatic real-time detection system as a quick alternative diagnosis option to control the virus spread. In this work, we propose a regression model based on a flexible distribution called shifted-scaled Dirichlet for real-time detection of coronavirus pneumonia infected patient using chest X-ray radiographs. To derive the parameters of our proposed model, we adopt the maximum likelihood method, where we update the parameters based on the stochastic gradient descent. The experimental results demonstrate that our approach is highly effective for detecting COVID-19 cases and understand the infection on a real-time basis with high accuracy up to 97%.
{"title":"A Distribution-based Regression for Real-time COVID-19 Cases Detection from Chest X-ray and CT Images","authors":"Nuha Zamzami, Pantea Koochemeshkian, N. Bouguila","doi":"10.1109/IRI49571.2020.00023","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00023","url":null,"abstract":"The novel coronavirus (COVID-19) that started last December in Wuhan, Hubei Province, China has become a serious healthcare threat with over five million confirmed cases in 215 countries around the world as on May 20. The World Health Organization recommends a rapid diagnosis and immediate isolation of suspected cases. Thus, there is an imminent need to develop an automatic real-time detection system as a quick alternative diagnosis option to control the virus spread. In this work, we propose a regression model based on a flexible distribution called shifted-scaled Dirichlet for real-time detection of coronavirus pneumonia infected patient using chest X-ray radiographs. To derive the parameters of our proposed model, we adopt the maximum likelihood method, where we update the parameters based on the stochastic gradient descent. The experimental results demonstrate that our approach is highly effective for detecting COVID-19 cases and understand the infection on a real-time basis with high accuracy up to 97%.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"1 1","pages":"104-111"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79913622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00024
Srikanth Amudala, Samr Ali, N. Bouguila
This paper presents hierarchical Pitman-Yor process mixture of generalized Gaussian distributions for background subtraction. The motivation behind choosing generalized Gaussian distribution is its flexibility as compared to the widely used Gaussian. We also integrate the Pitman-Yor process into our proposed model for an infinite extension that leads to better performance in the task of background subtraction. Our model is learned via a variational Bayes approach and is applied on the challenging Change Detection dataset. Experimental results on background subtraction show the effectiveness of the proposed algorithm.
{"title":"Background Subtraction with a Hierarchical Pitman-Yor Process Mixture Model of Generalized Gaussian Distributions","authors":"Srikanth Amudala, Samr Ali, N. Bouguila","doi":"10.1109/IRI49571.2020.00024","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00024","url":null,"abstract":"This paper presents hierarchical Pitman-Yor process mixture of generalized Gaussian distributions for background subtraction. The motivation behind choosing generalized Gaussian distribution is its flexibility as compared to the widely used Gaussian. We also integrate the Pitman-Yor process into our proposed model for an infinite extension that leads to better performance in the task of background subtraction. Our model is learned via a variational Bayes approach and is applied on the challenging Change Detection dataset. Experimental results on background subtraction show the effectiveness of the proposed algorithm.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"6 1","pages":"112-119"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78706174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00051
Salvador V. Balkus, Joshua Rumbut, Honggang Wang, Hua Fang
The impact of the COVID-19 global pandemic has required governments across the world to develop effective public health policies using epidemiological models. Unfortunately, as a result of limited testing ability, these models often rely on lagged rather than real-time data, and cannot be adapted to small geographies to provide localized forecasts. This study proposes ADBio, a multi-level adaptive and dynamic biosensor-based model that can be used to predict the risk of infection with COVID-19 from the individual level to the county level, providing more timely and accurate estimates of virus exposure at all levels. The model is evaluated using diagnosis simulation based on current COVID-19 cases as well as GPS movement data for Massachusetts and New York, where COVID-19 hotspots had previously been observed. Results demonstrate that lagged testing data is indeed a major detriment to current modeling efforts, and that unlike the standard SEIR model, ADBio is able to adapt to arbitrarily small geographic regions and provide reasonable forecasts of COVID-19 cases. The features of this model enable greater national pandemic preparedness and provide local town and county governments a valuable tool for decision-making during a pandemic.
{"title":"An Adaptive and Dynamic Biosensor Epidemic Model for COVID-19","authors":"Salvador V. Balkus, Joshua Rumbut, Honggang Wang, Hua Fang","doi":"10.1109/IRI49571.2020.00051","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00051","url":null,"abstract":"The impact of the COVID-19 global pandemic has required governments across the world to develop effective public health policies using epidemiological models. Unfortunately, as a result of limited testing ability, these models often rely on lagged rather than real-time data, and cannot be adapted to small geographies to provide localized forecasts. This study proposes ADBio, a multi-level adaptive and dynamic biosensor-based model that can be used to predict the risk of infection with COVID-19 from the individual level to the county level, providing more timely and accurate estimates of virus exposure at all levels. The model is evaluated using diagnosis simulation based on current COVID-19 cases as well as GPS movement data for Massachusetts and New York, where COVID-19 hotspots had previously been observed. Results demonstrate that lagged testing data is indeed a major detriment to current modeling efforts, and that unlike the standard SEIR model, ADBio is able to adapt to arbitrarily small geographic regions and provide reasonable forecasts of COVID-19 cases. The features of this model enable greater national pandemic preparedness and provide local town and county governments a valuable tool for decision-making during a pandemic.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"174 1","pages":"306-313"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72636099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01Epub Date: 2020-09-10DOI: 10.1109/iri49571.2020.00034
Hajar Emami, Ming Dong, Carri K Glide-Hurst
Recently, interest in MR-only treatment planning using synthetic CTs (synCTs) has grown rapidly in radiation therapy. However, developing class solutions for medical images that contain atypical anatomy remains a major limitation. In this paper, we propose a novel spatial attention-guided generative adversarial network (attention-GAN) model to generate accurate synCTs using T1-weighted MRI images as the input to address atypical anatomy. Experimental results on fifteen brain cancer patients show that attention-GAN outperformed existing synCT models and achieved an average MAE of 85.223±12.08, 232.41±60.86, 246.38±42.67 Hounsfield units between synCT and CT-SIM across the entire head, bone and air regions, respectively. Qualitative analysis shows that attention-GAN has the ability to use spatially focused areas to better handle outliers, areas with complex anatomy or post-surgical regions, and thus offer strong potential for supporting near real-time MR-only treatment planning.
{"title":"Attention-Guided Generative Adversarial Network to Address Atypical Anatomy in Synthetic CT Generation.","authors":"Hajar Emami, Ming Dong, Carri K Glide-Hurst","doi":"10.1109/iri49571.2020.00034","DOIUrl":"10.1109/iri49571.2020.00034","url":null,"abstract":"<p><p>Recently, interest in MR-only treatment planning using synthetic CTs (synCTs) has grown rapidly in radiation therapy. However, developing class solutions for medical images that contain atypical anatomy remains a major limitation. In this paper, we propose a novel spatial attention-guided generative adversarial network (attention-GAN) model to generate accurate synCTs using T1-weighted MRI images as the input to address atypical anatomy. Experimental results on fifteen brain cancer patients show that attention-GAN outperformed existing synCT models and achieved an average MAE of 85.223±12.08, 232.41±60.86, 246.38±42.67 Hounsfield units between synCT and CT-SIM across the entire head, bone and air regions, respectively. Qualitative analysis shows that attention-GAN has the ability to use spatially focused areas to better handle outliers, areas with complex anatomy or post-surgical regions, and thus offer strong potential for supporting near real-time MR-only treatment planning.</p>","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"2020 ","pages":"188-193"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/iri49571.2020.00034","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38999271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00010
Clifford Kemp, Chad L. Calvert, T. Khoshgoftaar
Detecting Denial of Service (DoS) attacks on web servers has become extremely popular with cybercriminals and organized crime groups. A successful DoS attack on network resources reduces availability of service to a web site and backend resources, and could easily result in a loss of millions of dollars in revenue depending on company size. There are many DoS attack methods, each of which is critical to providing an understanding of the nature of the DoS attack class. There has been a rise in recent years of application-layer DoS attack methods that target web servers and are challenging to detect. An attack may be disguised to look like legitimate traffic, except it targets specific application packets or functions. Slow Read DoS attack is one type of slow HTTP attack targeting the application-layer. Slow Read attacks are often used to exploit weaknesses in the HTTP protocol, as it is the most widely used protocol on the Internet. In this paper, we use Full Packet Capture (FPC) datasets for detecting Slow Read DoS attacks with machine learning methods. All data collected originates in a live network environment. Our approach produces FPC features taken from network packets at the IP and TCP layers. Experimental results show that the machine learners were quite successful in identifying the Slow Read attacks with high detection and low false alarm rates using FPC data. Our experiment evaluates FPC datasets to determine the accuracy and efficiency of several detection models for Slow Read attacks. The experiment demonstrates that FPC features are discriminative enough to detect such attacks.
{"title":"Detection Methods of Slow Read DoS Using Full Packet Capture Data","authors":"Clifford Kemp, Chad L. Calvert, T. Khoshgoftaar","doi":"10.1109/IRI49571.2020.00010","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00010","url":null,"abstract":"Detecting Denial of Service (DoS) attacks on web servers has become extremely popular with cybercriminals and organized crime groups. A successful DoS attack on network resources reduces availability of service to a web site and backend resources, and could easily result in a loss of millions of dollars in revenue depending on company size. There are many DoS attack methods, each of which is critical to providing an understanding of the nature of the DoS attack class. There has been a rise in recent years of application-layer DoS attack methods that target web servers and are challenging to detect. An attack may be disguised to look like legitimate traffic, except it targets specific application packets or functions. Slow Read DoS attack is one type of slow HTTP attack targeting the application-layer. Slow Read attacks are often used to exploit weaknesses in the HTTP protocol, as it is the most widely used protocol on the Internet. In this paper, we use Full Packet Capture (FPC) datasets for detecting Slow Read DoS attacks with machine learning methods. All data collected originates in a live network environment. Our approach produces FPC features taken from network packets at the IP and TCP layers. Experimental results show that the machine learners were quite successful in identifying the Slow Read attacks with high detection and low false alarm rates using FPC data. Our experiment evaluates FPC datasets to determine the accuracy and efficiency of several detection models for Slow Read attacks. The experiment demonstrates that FPC features are discriminative enough to detect such attacks.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"31 1","pages":"9-16"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78789284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00055
C. Giossi, D. Maier, K. Tufte, Elliot Gall, M. Barnes
Utilizing data sets from multiple domains is a common procedure in scientific research. For example, research on the performance of buildings may require data from multiple sources that lack a singular standard for data reporting. The Building Management System might report data at regular 5minute intervals, whereas an air-quality sensor might capture values only when there has been significant change from the previous value. Many systems exist to help integrate multiple data sources into a single system or interface. However, such systems do not necessarily make it easy to modify an integration plan, for example, to accommodate data exploration, new and changing data sets or shifts in the questions of interest. We propose an agile data-integration system to enable quick and adaptive analysis across many data sets, concentrating initially on the data alignment step: combining data values from multiple time-series based data sets whose time schedules. To this end, we adopt a Domain Specific Language approach where we construct a domain model for alignment, provide a specification language for describing alignments in the model and implement an interpreter for specification in that language. Our implementation exploits a rank-based join in SQL that produces faster alignment times than the commonly suggested method of aligning data sets in a database. We present experiments to demonstrate the advantage of our method and exploit data properties for optimization.
{"title":"Towards Agile Integration: Specification-based Data Alignment","authors":"C. Giossi, D. Maier, K. Tufte, Elliot Gall, M. Barnes","doi":"10.1109/IRI49571.2020.00055","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00055","url":null,"abstract":"Utilizing data sets from multiple domains is a common procedure in scientific research. For example, research on the performance of buildings may require data from multiple sources that lack a singular standard for data reporting. The Building Management System might report data at regular 5minute intervals, whereas an air-quality sensor might capture values only when there has been significant change from the previous value. Many systems exist to help integrate multiple data sources into a single system or interface. However, such systems do not necessarily make it easy to modify an integration plan, for example, to accommodate data exploration, new and changing data sets or shifts in the questions of interest. We propose an agile data-integration system to enable quick and adaptive analysis across many data sets, concentrating initially on the data alignment step: combining data values from multiple time-series based data sets whose time schedules. To this end, we adopt a Domain Specific Language approach where we construct a domain model for alignment, provide a specification language for describing alignments in the model and implement an interpreter for specification in that language. Our implementation exploits a rank-based join in SQL that produces faster alignment times than the commonly suggested method of aligning data sets in a database. We present experiments to demonstrate the advantage of our method and exploit data properties for optimization.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"62 1","pages":"333-340"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87576325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00053
Md. Ahsan Ayub, Andrea Continella, Ambareen Siraj
In recent times, there has been a global surge of ransomware attacks targeted at industries of various types and sizes from retail to critical infrastructure. Ransomware researchers are constantly coming across new kinds of ransomware samples every day and discovering novel ransomware families out in the wild. To mitigate this ever-growing menace, academia and industry-based security researchers have been utilizing unique ways to defend against this type of cyber-attacks. I/O Request Packet (IRP), a low-level file system I/O log, is a newly found research paradigm for defense against ransomware that is being explored frequently. As such in this study, to learn granular level, actionable insights of ransomware behavior, we analyze the IRP logs of 272 ransomware samples belonging to 18 different ransomware families captured during individual execution. We further our analysis by building an effective Artificial Neural Network (ANN) structure for successful ransomware detection by learning the underlying patterns of the IRP logs. We evaluate the ANN model with three different experimental settings to prove the effectiveness of our approach. The model demonstrates outstanding performance in terms of accuracy, precision score, recall score, and F1 score, i.e., in the range of 99.7%±0.2%.
{"title":"An I/O Request Packet (IRP) Driven Effective Ransomware Detection Scheme using Artificial Neural Network","authors":"Md. Ahsan Ayub, Andrea Continella, Ambareen Siraj","doi":"10.1109/IRI49571.2020.00053","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00053","url":null,"abstract":"In recent times, there has been a global surge of ransomware attacks targeted at industries of various types and sizes from retail to critical infrastructure. Ransomware researchers are constantly coming across new kinds of ransomware samples every day and discovering novel ransomware families out in the wild. To mitigate this ever-growing menace, academia and industry-based security researchers have been utilizing unique ways to defend against this type of cyber-attacks. I/O Request Packet (IRP), a low-level file system I/O log, is a newly found research paradigm for defense against ransomware that is being explored frequently. As such in this study, to learn granular level, actionable insights of ransomware behavior, we analyze the IRP logs of 272 ransomware samples belonging to 18 different ransomware families captured during individual execution. We further our analysis by building an effective Artificial Neural Network (ANN) structure for successful ransomware detection by learning the underlying patterns of the IRP logs. We evaluate the ANN model with three different experimental settings to prove the effectiveness of our approach. The model demonstrates outstanding performance in terms of accuracy, precision score, recall score, and F1 score, i.e., in the range of 99.7%±0.2%.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"14 1","pages":"319-324"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84806798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00016
A. Bensefia, Chawki Djeddi
Recognizing and identifying people, based on their physical and behavioral characteristics, have always had a wide range of applications, inciting researchers to propose dedicated human recognition systems for each human characteristic. These systems operate according to two different modes: identification mode, where the task is to assign one of the preregistered identities in the system to the human’s sample read as input. The second mode is the verification (authentication), is a decision task stating if a human’s sample read as input belongs really to the claimed identity. Handwriting has emerged as one of these behavioral features that attracted a lot of interests during the last decade. Many writer identification systems have been developed comparing to writer verification (authentication) systems. In this paper we propose an original approach based on the usage of the shape complexity to authenticate writers’ identities. To this end, a local feature (grapheme) is considered, where the graphemes are generated automatically with a dedicated segmentation module. The Fourier Elliptic Transform was used to measure the shape complexity of the resulting graphemes. Only the top complex graphemes (K-Graphemes) were used to measure the similarity between a pair of handwritten samples. The approach was evaluated with 3 sets of 50 different writers of the BFL dataset, where we obtained a performance of almost 80% of good acceptance at 8% error rate. These results validate completely the relevance of the shape complexity in writer recognition tasks.
{"title":"Relevance of Grapheme’s Shape Complexity in Writer Verification Task","authors":"A. Bensefia, Chawki Djeddi","doi":"10.1109/IRI49571.2020.00016","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00016","url":null,"abstract":"Recognizing and identifying people, based on their physical and behavioral characteristics, have always had a wide range of applications, inciting researchers to propose dedicated human recognition systems for each human characteristic. These systems operate according to two different modes: identification mode, where the task is to assign one of the preregistered identities in the system to the human’s sample read as input. The second mode is the verification (authentication), is a decision task stating if a human’s sample read as input belongs really to the claimed identity. Handwriting has emerged as one of these behavioral features that attracted a lot of interests during the last decade. Many writer identification systems have been developed comparing to writer verification (authentication) systems. In this paper we propose an original approach based on the usage of the shape complexity to authenticate writers’ identities. To this end, a local feature (grapheme) is considered, where the graphemes are generated automatically with a dedicated segmentation module. The Fourier Elliptic Transform was used to measure the shape complexity of the resulting graphemes. Only the top complex graphemes (K-Graphemes) were used to measure the similarity between a pair of handwritten samples. The approach was evaluated with 3 sets of 50 different writers of the BFL dataset, where we obtained a performance of almost 80% of good acceptance at 8% error rate. These results validate completely the relevance of the shape complexity in writer recognition tasks.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"36 1","pages":"53-58"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88360860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...