Pub Date : 2016-10-01DOI: 10.1109/IWBIS.2016.7872883
Xue Li, Xin Zhao, Mingyang Zhong
With the rapid development of theory and practice in Genomics, research on Public Health Genomics, as a new field is beginning to contribute to people's life. A large volume of genomics data is available but not yet readily used in clinical services. A gap exists between genomics research and public healthcare genomics applications. We believe that machine intelligence can play an important role in transferring genomics knowledge to practical use. As a vision of our research, in this paper we present the usefulness of applying machine intelligence to public health genomics.
{"title":"Advancing public health genomics","authors":"Xue Li, Xin Zhao, Mingyang Zhong","doi":"10.1109/IWBIS.2016.7872883","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872883","url":null,"abstract":"With the rapid development of theory and practice in Genomics, research on Public Health Genomics, as a new field is beginning to contribute to people's life. A large volume of genomics data is available but not yet readily used in clinical services. A gap exists between genomics research and public healthcare genomics applications. We believe that machine intelligence can play an important role in transferring genomics knowledge to practical use. As a vision of our research, in this paper we present the usefulness of applying machine intelligence to public health genomics.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115552923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/IWBIS.2016.7872891
S. C. Purbarani, H. Sanabila, A. Bowolaksono, B. Wiweko
Next generation DNA sequencing (NGS) project that aims to give understandings in various genes seems to boosts innovative breakthrough in whole genome issues. Dealing with genomic data requires large-scale data storage and processing. Big data technology could be the most appropriate solution to gaining useful knowledge from data comprehensively. This study discusses about genome tools and framework that implement MapReduce of Hadoop's components in sequence alignment computation. The aim of this discussion is presenting an overview of whole genome alignment software tools and the implementation in big data.
{"title":"A survey of whole genome alignment tools and frameworks based on Hadoop's MapReduce","authors":"S. C. Purbarani, H. Sanabila, A. Bowolaksono, B. Wiweko","doi":"10.1109/IWBIS.2016.7872891","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872891","url":null,"abstract":"Next generation DNA sequencing (NGS) project that aims to give understandings in various genes seems to boosts innovative breakthrough in whole genome issues. Dealing with genomic data requires large-scale data storage and processing. Big data technology could be the most appropriate solution to gaining useful knowledge from data comprehensively. This study discusses about genome tools and framework that implement MapReduce of Hadoop's components in sequence alignment computation. The aim of this discussion is presenting an overview of whole genome alignment software tools and the implementation in big data.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116646516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/IWBIS.2016.7872890
Darmatasia, A. M. Arymurthy
Data mining approach can be used to discover knowledge by analyzing the patterns or correlations among of fields in large databases. Data mining approach was used to find the patterns of the data from Tanzania Ministry of Water. It is used to predict current and future status of water pumps in Tanzania. The data mining method proposed is XGBoost (eXtreme Gradient Boosting). XGBoost implement the concept of Gradient Tree Boosting which designed to be highly fast, accurate, efficient, flexible, and portable. In addition, Recursive Feature Elimination (RFE) is also proposed to select the important features of the data to obtain an accurate model. The best accuracy achieved with using 27 input factors selected by RFE and XGBoost as a learning model. The achieved result show 80.38% in accuracy. The information or knowledge which is discovered from data mining approach can be used by the government to improve the inspection planning, maintenance, and identify which factor that can cause damage to the water pumps to ensure the availability of potable water in Tanzania. Using data mining approach is cost-effective, less time consuming and faster than manual inspection.
{"title":"Predicting the status of water pumps using data mining approach","authors":"Darmatasia, A. M. Arymurthy","doi":"10.1109/IWBIS.2016.7872890","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872890","url":null,"abstract":"Data mining approach can be used to discover knowledge by analyzing the patterns or correlations among of fields in large databases. Data mining approach was used to find the patterns of the data from Tanzania Ministry of Water. It is used to predict current and future status of water pumps in Tanzania. The data mining method proposed is XGBoost (eXtreme Gradient Boosting). XGBoost implement the concept of Gradient Tree Boosting which designed to be highly fast, accurate, efficient, flexible, and portable. In addition, Recursive Feature Elimination (RFE) is also proposed to select the important features of the data to obtain an accurate model. The best accuracy achieved with using 27 input factors selected by RFE and XGBoost as a learning model. The achieved result show 80.38% in accuracy. The information or knowledge which is discovered from data mining approach can be used by the government to improve the inspection planning, maintenance, and identify which factor that can cause damage to the water pumps to ensure the availability of potable water in Tanzania. Using data mining approach is cost-effective, less time consuming and faster than manual inspection.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126100163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/IWBIS.2016.7872885
Jun Zhang, A. Goscinski
This paper proposes a new technique, flow-based traffic retrieval (FBTR), to find traffic flows that satisfy an information need from within large collections of network traffic. It is shown that flow-based traffic retrieval will become a powerful tool in network management and security. For example, the retrieved traffic flows can be used to help analysing new applications/protocols and detecting unknown attacks. In the context of flow-based traffic retrieval, a traffic flow is represented by a vector that consists of a set of flow statistics, such as the average of packet sizes and the average of inter-packet times. The user can submit a traffic flow, or several traffic flows, and ask for “similar” traffic flows to be retrieved from a traffic collection. Similarity search is based on comparing flow vectors in a feature space. We have done some preliminary experiments to evaluate the performance of flow-based traffic retrieval. The results show flow-based traffic retrieval has potential to quickly and accurately find user-interested network traffic, even encrypted traffic.
{"title":"Flow-based traffic retrieval using statistical features","authors":"Jun Zhang, A. Goscinski","doi":"10.1109/IWBIS.2016.7872885","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872885","url":null,"abstract":"This paper proposes a new technique, flow-based traffic retrieval (FBTR), to find traffic flows that satisfy an information need from within large collections of network traffic. It is shown that flow-based traffic retrieval will become a powerful tool in network management and security. For example, the retrieved traffic flows can be used to help analysing new applications/protocols and detecting unknown attacks. In the context of flow-based traffic retrieval, a traffic flow is represented by a vector that consists of a set of flow statistics, such as the average of packet sizes and the average of inter-packet times. The user can submit a traffic flow, or several traffic flows, and ask for “similar” traffic flows to be retrieved from a traffic collection. Similarity search is based on comparing flow vectors in a feature space. We have done some preliminary experiments to evaluate the performance of flow-based traffic retrieval. The results show flow-based traffic retrieval has potential to quickly and accurately find user-interested network traffic, even encrypted traffic.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132090879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}