Pub Date : 2019-12-01DOI: 10.1109/CSCI49370.2019.00274
Luis Diego Mora-Jimenez, Oscar Azofeifa-Segura, J. Guevara-Coto
Cancer consists of a set of diseases that result from deregulated cell growth and invasion of adjacent tissues. Due to an increase in research, more information has become available regarding the potential causes for cancer, including non-coding elements such as lncRNAs. This new knowledge can be discovered through machine learning methods that can extract new information from data such as gene expression profiles and identify new cancer-associated genes. For this work we use two different machine learning algorithms, random forests and support vector machines. The models were trained and we tested fine-tuning methods including: balancing and feature selection. The predictors with the highest metrics were: balanced RF with Boruta (AUC-ROC: 0.9696) and the balanced SVM with recursive feature elimination (AUC-ROC: 0.9710). These models were used to identify new potential lncRNA driver-like genes from protein coding expression data. The predicted candidates were then functionally annotated using disease ontologies and molecular function ontologies to determine their enrichment in cancer related processes. These processes included prostate cancer and glycosaminglycan binding, a potential tumor therapeutic target.
{"title":"Functional Annotations of Novel Cancer-Associated lncRNAs Identified Using Machine Learning Algorithms","authors":"Luis Diego Mora-Jimenez, Oscar Azofeifa-Segura, J. Guevara-Coto","doi":"10.1109/CSCI49370.2019.00274","DOIUrl":"https://doi.org/10.1109/CSCI49370.2019.00274","url":null,"abstract":"Cancer consists of a set of diseases that result from deregulated cell growth and invasion of adjacent tissues. Due to an increase in research, more information has become available regarding the potential causes for cancer, including non-coding elements such as lncRNAs. This new knowledge can be discovered through machine learning methods that can extract new information from data such as gene expression profiles and identify new cancer-associated genes. For this work we use two different machine learning algorithms, random forests and support vector machines. The models were trained and we tested fine-tuning methods including: balancing and feature selection. The predictors with the highest metrics were: balanced RF with Boruta (AUC-ROC: 0.9696) and the balanced SVM with recursive feature elimination (AUC-ROC: 0.9710). These models were used to identify new potential lncRNA driver-like genes from protein coding expression data. The predicted candidates were then functionally annotated using disease ontologies and molecular function ontologies to determine their enrichment in cancer related processes. These processes included prostate cancer and glycosaminglycan binding, a potential tumor therapeutic target.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122418280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/CSCI49370.2019.00109
Babak Namazi, G. Sankaranarayanan, V. Devarajan
A new deep learning-based method is proposed for identifying the boundaries of all surgical phases in a laparoscopic video. The model is designed based on the sequence-to-sequence architecture with an attention mechanism, to map the extracted visual features to the frame numbers of the beginning and the ending of each phase. The main novelty is that the alignment vectors for each phase are taken as the outputs, and are trained directly to select the indices. We evaluated our model using a large publicly available dataset of laparoscopic cholecystectomy procedure and obtained the Mean Absolute Error (MAE) of 48 seconds.
{"title":"Attention-Based Surgical Phase Boundaries Detection in Laparoscopic Videos","authors":"Babak Namazi, G. Sankaranarayanan, V. Devarajan","doi":"10.1109/CSCI49370.2019.00109","DOIUrl":"https://doi.org/10.1109/CSCI49370.2019.00109","url":null,"abstract":"A new deep learning-based method is proposed for identifying the boundaries of all surgical phases in a laparoscopic video. The model is designed based on the sequence-to-sequence architecture with an attention mechanism, to map the extracted visual features to the frame numbers of the beginning and the ending of each phase. The main novelty is that the alignment vectors for each phase are taken as the outputs, and are trained directly to select the indices. We evaluated our model using a large publicly available dataset of laparoscopic cholecystectomy procedure and obtained the Mean Absolute Error (MAE) of 48 seconds.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131394057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/CSCI49370.2019.00241
Stefan Hirschmeier, J. Melsbach, D. Schoder, Sven Stahlmann
More and more businesses are in need for metadata for their documents. However, automatic generation for metadata is not easy, as for supervised document classification, a significant amount of labelled training data is needed, which is not always present in the desired amount or quality. Often, documents need to be tagged with a predefined set of company specific keywords that are organized in a taxonomy. We present an unsupervised approach to perform multi-label document classification for large taxonomies using word embeddings and evaluate it with a dataset of a public broadcaster. We point out strengths of the approach compared to supervised classification and statistical approaches like tf-idf.
{"title":"Unsupervised Multi-Label Document Classification for Large Taxonomies Using Word Embeddings","authors":"Stefan Hirschmeier, J. Melsbach, D. Schoder, Sven Stahlmann","doi":"10.1109/CSCI49370.2019.00241","DOIUrl":"https://doi.org/10.1109/CSCI49370.2019.00241","url":null,"abstract":"More and more businesses are in need for metadata for their documents. However, automatic generation for metadata is not easy, as for supervised document classification, a significant amount of labelled training data is needed, which is not always present in the desired amount or quality. Often, documents need to be tagged with a predefined set of company specific keywords that are organized in a taxonomy. We present an unsupervised approach to perform multi-label document classification for large taxonomies using word embeddings and evaluate it with a dataset of a public broadcaster. We point out strengths of the approach compared to supervised classification and statistical approaches like tf-idf.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121854339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/CSCI49370.2019.00029
Gary Cantrell, Joan Runs Through
Digital forensics has become a fundamental piece of many cyber security programs across the US, and data recovery is an integral building block of digital forensics. Data recovery can be a difficult topic to cover without a system or organization to the different methods of recovery. The following manuscript offers a structure for introducing data recovery in a digital forensics or information technology course and a method for evaluating the admissibility of recovered files as court evidence based on how the data were recovered. This offers both a framework for teaching data recovery and a way for discussing evidence admissibility. The five levels of destruction paradigm is a result of over a decade of teaching digital forensics in vocational and academic environments in a computer science program. The authors offer up this paradigm in hopes it will be useful to other computer science and digital forensics educators.
{"title":"The Five Levels of Data Destruction: A Paradigm for Introducing Data Recovery in a Computer Science Course","authors":"Gary Cantrell, Joan Runs Through","doi":"10.1109/CSCI49370.2019.00029","DOIUrl":"https://doi.org/10.1109/CSCI49370.2019.00029","url":null,"abstract":"Digital forensics has become a fundamental piece of many cyber security programs across the US, and data recovery is an integral building block of digital forensics. Data recovery can be a difficult topic to cover without a system or organization to the different methods of recovery. The following manuscript offers a structure for introducing data recovery in a digital forensics or information technology course and a method for evaluating the admissibility of recovered files as court evidence based on how the data were recovered. This offers both a framework for teaching data recovery and a way for discussing evidence admissibility. The five levels of destruction paradigm is a result of over a decade of teaching digital forensics in vocational and academic environments in a computer science program. The authors offer up this paradigm in hopes it will be useful to other computer science and digital forensics educators.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"79 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132531931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/CSCI49370.2019.00024
Jules Chenou, G. Hsieh, Tonya Fields
this work is a continuation of an ongoing effort to increase the robustness of the deep neural network, and thus mitigate possible adversarial examples. In our previous work, the emphasis was placed on denoising the input dataset by adding colored noise before processing. In that work, the evaluation made with the empirical robustness score, resulted in a 1% improvement on average for individual noise and a 3.74% improvement on average for ensemble noise. The aim of this paper is to demonstrate the effective robustness of a well-designed radial basis function neural network in tackling adversarial examples. With the empirical robustness as a metric, the results show a 72.5% increase with Fast Gradient Sign Method (FGSM) attack on the MNIST dataset in comparison to a simple deep network and a 6.4 % increase with FGSM on the CIFAR10 dataset.
{"title":"Radial Basis Function Network: Its Robustness and Ability to Mitigate Adversarial Examples","authors":"Jules Chenou, G. Hsieh, Tonya Fields","doi":"10.1109/CSCI49370.2019.00024","DOIUrl":"https://doi.org/10.1109/CSCI49370.2019.00024","url":null,"abstract":"this work is a continuation of an ongoing effort to increase the robustness of the deep neural network, and thus mitigate possible adversarial examples. In our previous work, the emphasis was placed on denoising the input dataset by adding colored noise before processing. In that work, the evaluation made with the empirical robustness score, resulted in a 1% improvement on average for individual noise and a 3.74% improvement on average for ensemble noise. The aim of this paper is to demonstrate the effective robustness of a well-designed radial basis function neural network in tackling adversarial examples. With the empirical robustness as a metric, the results show a 72.5% increase with Fast Gradient Sign Method (FGSM) attack on the MNIST dataset in comparison to a simple deep network and a 6.4 % increase with FGSM on the CIFAR10 dataset.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132818978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/CSCI49370.2019.00083
Xavier Williams, N. Mahapatra
Automatic object identification (auto-ID) involves techniques for automatically identifying objects using visual features or tags with unique identification codes. These auto-ID systems then transfer the collected identification information to computer systems for further data management. In this paper, we analyze the existing auto-ID techniques for physically tagged objects.
{"title":"Analysis of Recent Trends in Automatic Object Identification","authors":"Xavier Williams, N. Mahapatra","doi":"10.1109/CSCI49370.2019.00083","DOIUrl":"https://doi.org/10.1109/CSCI49370.2019.00083","url":null,"abstract":"Automatic object identification (auto-ID) involves techniques for automatically identifying objects using visual features or tags with unique identification codes. These auto-ID systems then transfer the collected identification information to computer systems for further data management. In this paper, we analyze the existing auto-ID techniques for physically tagged objects.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131818469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/CSCI49370.2019.00182
B. Smaradottir, R. Fensli
There is an urgent call from health organizations, health professionals and health authorities to re-design care delivery for patients with chronic conditions and multi-morbidities. The research project 3P-Patients and Professionals in Productive Teams aims to study health care services that are run with different patient-centered teamwork models. In this context, a case study was made of an E-clinic in Denmark, with a focus on the technology use and information flow in a patient-centered clinical care context. Qualitative methods were applied with observations and interviews with key informants. The results showed that information flow worked well in a patient-centered care perspective, even though the technology was a standalone system for the E-clinic and with limited interoperability with other health care providers.
{"title":"A Case Study of Technology Use and Information Flow at a Danish E-Clinic","authors":"B. Smaradottir, R. Fensli","doi":"10.1109/CSCI49370.2019.00182","DOIUrl":"https://doi.org/10.1109/CSCI49370.2019.00182","url":null,"abstract":"There is an urgent call from health organizations, health professionals and health authorities to re-design care delivery for patients with chronic conditions and multi-morbidities. The research project 3P-Patients and Professionals in Productive Teams aims to study health care services that are run with different patient-centered teamwork models. In this context, a case study was made of an E-clinic in Denmark, with a focus on the technology use and information flow in a patient-centered clinical care context. Qualitative methods were applied with observations and interviews with key informants. The results showed that information flow worked well in a patient-centered care perspective, even though the technology was a standalone system for the E-clinic and with limited interoperability with other health care providers.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134506576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/CSCI49370.2019.00239
Yuan-Yuan Lee, Y. Chang
Consumer behavior analytics is at the epicenter of a Big Data revolution. In this paper we propose to analyze intra-regional spatial patterns mining tourists' behaviors and characteristics based on traveling group size with data collected from Airbnb open source focused on Los Angeles neighborhood in 2016. Random Forest Classification (RF) technique, an ensemble approach, is applied to identify the key drivers according to relevant traveler groups and presented patterns using Hotspot Analysis on Geographic Information System (GIS). Our empirical result highlights driving factors within Airbnb listings, providing valuable insights to better plan, monitor and manage tourism activity.
{"title":"Uncovering Los Angeles Tourists' Patterns Using Geospatial Analysis and Supervised Machine Learning with Random Forest Predictors","authors":"Yuan-Yuan Lee, Y. Chang","doi":"10.1109/CSCI49370.2019.00239","DOIUrl":"https://doi.org/10.1109/CSCI49370.2019.00239","url":null,"abstract":"Consumer behavior analytics is at the epicenter of a Big Data revolution. In this paper we propose to analyze intra-regional spatial patterns mining tourists' behaviors and characteristics based on traveling group size with data collected from Airbnb open source focused on Los Angeles neighborhood in 2016. Random Forest Classification (RF) technique, an ensemble approach, is applied to identify the key drivers according to relevant traveler groups and presented patterns using Hotspot Analysis on Geographic Information System (GIS). Our empirical result highlights driving factors within Airbnb listings, providing valuable insights to better plan, monitor and manage tourism activity.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"18 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114031365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/CSCI49370.2019.00243
B. Sy, Jin Chen, Rebecca Horowitz
The goal of this research is to develop a predictive analytics technique based on manifold clustering of mixed data type. In this research, we explore the concept of statistically significant association patterns to induce an initial partition on data for deriving manifolds. Manifolds are hyperplanes embedded in low dimensions. The advantage of this novel technique is a bootstrap on data clusters that reveals statistical associations from the information-theoretic perspective. As an illustration, the proposed technique is applied to a real data set of diabetes patients. An assessment on the proposed technique is performed to investigate the effect of bootstrap based on association patterns. Results of the preliminary study demonstrate the feasibility of applying the proposed technique to real-world data.
{"title":"Incorporating Association Patterns into Manifold Clustering for Enabling Predictive Analytics","authors":"B. Sy, Jin Chen, Rebecca Horowitz","doi":"10.1109/CSCI49370.2019.00243","DOIUrl":"https://doi.org/10.1109/CSCI49370.2019.00243","url":null,"abstract":"The goal of this research is to develop a predictive analytics technique based on manifold clustering of mixed data type. In this research, we explore the concept of statistically significant association patterns to induce an initial partition on data for deriving manifolds. Manifolds are hyperplanes embedded in low dimensions. The advantage of this novel technique is a bootstrap on data clusters that reveals statistical associations from the information-theoretic perspective. As an illustration, the proposed technique is applied to a real data set of diabetes patients. An assessment on the proposed technique is performed to investigate the effect of bootstrap based on association patterns. Results of the preliminary study demonstrate the feasibility of applying the proposed technique to real-world data.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115025086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/CSCI49370.2019.00268
Ali Saadat, E. Masehian
Cloud computing systems play a vital role in the digital age. A critical bottleneck in most scenarios in cloud computing is the high degree of unpredictability with respect to resource availability and network bandwidth, which may lead to low Quality of Service (like low response times), which can be improved by Load Balancing. Load balancing concerns with efficiently distributing incoming network traffic across a group of servers. This ensures no single server bears too much demand, and thus the availability of applications and websites for users is increased. Due to the huge state-space of such a problem, implementing task scheduling algorithms in load balancing can be very effective. In this paper, we propose a hybrid intelligent approach to load balancing: a Genetic Algorithm module arranges the jobs randomly, and a fuzzy logic module builds the objective function for determining busy states of servers according to their RAM and CPU task queues. The fuzzy input variables include the satisfaction degree and the start and end times of the service, and the fuzzy output is service availability. Computational experiments showed that the best solution was obtained within half of the planned execution time, which leads to higher user satisfaction degree.
{"title":"Load Balancing in Cloud Computing Using Genetic Algorithm and Fuzzy Logic","authors":"Ali Saadat, E. Masehian","doi":"10.1109/CSCI49370.2019.00268","DOIUrl":"https://doi.org/10.1109/CSCI49370.2019.00268","url":null,"abstract":"Cloud computing systems play a vital role in the digital age. A critical bottleneck in most scenarios in cloud computing is the high degree of unpredictability with respect to resource availability and network bandwidth, which may lead to low Quality of Service (like low response times), which can be improved by Load Balancing. Load balancing concerns with efficiently distributing incoming network traffic across a group of servers. This ensures no single server bears too much demand, and thus the availability of applications and websites for users is increased. Due to the huge state-space of such a problem, implementing task scheduling algorithms in load balancing can be very effective. In this paper, we propose a hybrid intelligent approach to load balancing: a Genetic Algorithm module arranges the jobs randomly, and a fuzzy logic module builds the objective function for determining busy states of servers according to their RAM and CPU task queues. The fuzzy input variables include the satisfaction degree and the start and end times of the service, and the fuzzy output is service availability. Computational experiments showed that the best solution was obtained within half of the planned execution time, which leads to higher user satisfaction degree.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115342550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}