In previous works we have presented Cross Motif Search (CMS), a MP/MPI parallel tool for geometrical motif extraction in the secondary structure of proteins. We proved that our algorithm is capable of retrieving previously unknown motifs, thanks to its innovative approach based on the generalized Hough transform. We have also presented a GUI to CMS, called MotifVisualizer, which was introduced to improve software usability and to encourage collaboration with the biology community. In this paper we address the main shortcoming of CMS: with a simple approach based on heuristic data mining we show how we can classify the candidate motifs according to their statistical significance in the data set. We also present two extensions to MotifVisualizer, one to include the new data mining functions in the GUI, and a second one to allow for an easier retrieval of testing data sets.
{"title":"Extending Cross Motif Search with Heuristic Data Mining","authors":"Teo Argentieri, V. Cantoni, M. Musci","doi":"10.1109/DEXA.2017.28","DOIUrl":"https://doi.org/10.1109/DEXA.2017.28","url":null,"abstract":"In previous works we have presented Cross Motif Search (CMS), a MP/MPI parallel tool for geometrical motif extraction in the secondary structure of proteins. We proved that our algorithm is capable of retrieving previously unknown motifs, thanks to its innovative approach based on the generalized Hough transform. We have also presented a GUI to CMS, called MotifVisualizer, which was introduced to improve software usability and to encourage collaboration with the biology community. In this paper we address the main shortcoming of CMS: with a simple approach based on heuristic data mining we show how we can classify the candidate motifs according to their statistical significance in the data set. We also present two extensions to MotifVisualizer, one to include the new data mining functions in the GUI, and a second one to allow for an easier retrieval of testing data sets.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121294381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The analysis of biological data is a challenging problem in bioinformatics and data mining field. Given the complexity of the analysis of biological information, several methods have been proposed for analyzing this biological information in databases mostly in the form of genetic sequences and protein structures. Actually, genetic sequences are represented by matrices that indicate the expression levels of thousands of genes under several conditions. The analysis of this huge amount of data consists in extracting genes that behave similarly under certain conditions. In fact, the extracted information are sub-matrices (biclusters) that satisfy a coherence constraint. The process of extracting them is called biclustering. In this paper, we deal with biclustering problems applied to the analysis of biological data. First, a description of the problem is reviewed. Furthermore, we present a description of the divide and conquer approach that we will adopt to our algorithm for extracting biclusters. Additionally, a new evaluation function intitled Pattern Correlation Value (PCV), allowing identification of all biclusters types is proposed. Experimental results, demonstrate that the proposed methods are effective on this problem and are able to extract relevant information from the considered data.
{"title":"Biclustering of Biological Sequences","authors":"F. Mhamdi, Sourour Marai","doi":"10.1109/DEXA.2017.31","DOIUrl":"https://doi.org/10.1109/DEXA.2017.31","url":null,"abstract":"The analysis of biological data is a challenging problem in bioinformatics and data mining field. Given the complexity of the analysis of biological information, several methods have been proposed for analyzing this biological information in databases mostly in the form of genetic sequences and protein structures. Actually, genetic sequences are represented by matrices that indicate the expression levels of thousands of genes under several conditions. The analysis of this huge amount of data consists in extracting genes that behave similarly under certain conditions. In fact, the extracted information are sub-matrices (biclusters) that satisfy a coherence constraint. The process of extracting them is called biclustering. In this paper, we deal with biclustering problems applied to the analysis of biological data. First, a description of the problem is reviewed. Furthermore, we present a description of the divide and conquer approach that we will adopt to our algorithm for extracting biclusters. Additionally, a new evaluation function intitled Pattern Correlation Value (PCV), allowing identification of all biclusters types is proposed. Experimental results, demonstrate that the proposed methods are effective on this problem and are able to extract relevant information from the considered data.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115165862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To satisfy quality of service requirements in a cost-efficient manner, cloud service providers would benefit from providing a means for quantifying the level of operational uncertainty within their systems. This uncertainty arises due to the dynamic nature of the cloud. Since tasks requiring various amounts of resources may enter and leave the system at any time, systems plagued by high volatility are challenging in preemptive resource provisioning. In this paper, we present a general method based on Dempster-Shafer theory that enables quantifying the level of operational uncertainty in an entire cloud system or parts thereof. In addition to the standard quality metrics, we propose monitoring of system calls tocapture historical behavior of virtual machines as an input tothe general method. Knowing the level of operationaluncertainty enables greater accuracy in online resourceprovisioning by quantifying the volatility of thedeployedsystem
{"title":"Quantifying Uncertainty for Preemptive Resource Provisioning in the Cloud","authors":"Marin Aranitasi, Benjamin Byholm, Mats Neovius","doi":"10.1109/DEXA.2017.42","DOIUrl":"https://doi.org/10.1109/DEXA.2017.42","url":null,"abstract":"To satisfy quality of service requirements in a cost-efficient manner, cloud service providers would benefit from providing a means for quantifying the level of operational uncertainty within their systems. This uncertainty arises due to the dynamic nature of the cloud. Since tasks requiring various amounts of resources may enter and leave the system at any time, systems plagued by high volatility are challenging in preemptive resource provisioning. In this paper, we present a general method based on Dempster-Shafer theory that enables quantifying the level of operational uncertainty in an entire cloud system or parts thereof. In addition to the standard quality metrics, we propose monitoring of system calls tocapture historical behavior of virtual machines as an input tothe general method. Knowing the level of operationaluncertainty enables greater accuracy in online resourceprovisioning by quantifying the volatility of thedeployedsystem","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122613854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, cloud computing paradigm has grown massively popular in both industry and academic sectors. One of the main reasons for the wide adoption of Cloud Computing is the ability to add and remove resources "on the fly" to handle the load variation through the concept of elasticity. The efficient management of cloud elastic system is a challenging task. This paper proposes a multi agent system for cloud of clouds elasticity management. Concretely, we adopt a formal modelling approach based on Bigraphs (BRS) for the specification of the multi-agent system structural and behavioral aspects.
{"title":"Towards a Cloud of Clouds Elasticity Management System","authors":"Rayene Moudjari, Z. Sahnoun","doi":"10.1109/DEXA.2017.47","DOIUrl":"https://doi.org/10.1109/DEXA.2017.47","url":null,"abstract":"In recent years, cloud computing paradigm has grown massively popular in both industry and academic sectors. One of the main reasons for the wide adoption of Cloud Computing is the ability to add and remove resources \"on the fly\" to handle the load variation through the concept of elasticity. The efficient management of cloud elastic system is a challenging task. This paper proposes a multi agent system for cloud of clouds elasticity management. Concretely, we adopt a formal modelling approach based on Bigraphs (BRS) for the specification of the multi-agent system structural and behavioral aspects.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116814042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recognizing secondary structures in proteins can be a highly computationally expensive task that may not always yield good results. Using Restricted Boltzmann Machines (RBM) we were able to train a simple neural network to recognize an alpha-helix with a good degree of accuracy. Modifying the RBM implementation to be much simpler and more efficient than the standard implementation we are able to see a 14-fold speedup in training with no loss in detection accuracy or in cluster formation. With even very small training sets (160 members) we are able to recognize both the alpha-helix structures we are training for but also other, similar, helix structures that we did not train for. We are also able to recognize these structures with a high degree of accuracy. We are also able to cluster these structures together in a meaningful way based on the RBM training results. Both the training and clustering is completely unsupervised beyond the training set meeting certain constraints. Interestingly, each cluster shares structural similarities within itself but also has noticeable differences from other clusters that are detected. These clusters seem to form regardless of training set size or makeup.
{"title":"Recognizing Protein Secondary Structures with Neural Networks","authors":"R. Harrison, Michael McDermott, Chinua Umoja","doi":"10.1109/DEXA.2017.29","DOIUrl":"https://doi.org/10.1109/DEXA.2017.29","url":null,"abstract":"Recognizing secondary structures in proteins can be a highly computationally expensive task that may not always yield good results. Using Restricted Boltzmann Machines (RBM) we were able to train a simple neural network to recognize an alpha-helix with a good degree of accuracy. Modifying the RBM implementation to be much simpler and more efficient than the standard implementation we are able to see a 14-fold speedup in training with no loss in detection accuracy or in cluster formation. With even very small training sets (160 members) we are able to recognize both the alpha-helix structures we are training for but also other, similar, helix structures that we did not train for. We are also able to recognize these structures with a high degree of accuracy. We are also able to cluster these structures together in a meaningful way based on the RBM training results. Both the training and clustering is completely unsupervised beyond the training set meeting certain constraints. Interestingly, each cluster shares structural similarities within itself but also has noticeable differences from other clusters that are detected. These clusters seem to form regardless of training set size or makeup.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127345130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thespis is a middleware that innovatively leverages the Actor model to implement causal consistency over an industry-standard database, whilst abstracting complexities for application developers behind a REST open-protocol interface. Our evaluation considers correctness, performance and scalability aspects. We also run empirical experiments using YCSB to show the efficacy of the approach for a variety of workloads.
{"title":"Thespis: Actor-Based Causal Consistency","authors":"C. Camilleri, J. Vella, Vitezslav Nezval","doi":"10.1109/DEXA.2017.25","DOIUrl":"https://doi.org/10.1109/DEXA.2017.25","url":null,"abstract":"Thespis is a middleware that innovatively leverages the Actor model to implement causal consistency over an industry-standard database, whilst abstracting complexities for application developers behind a REST open-protocol interface. Our evaluation considers correctness, performance and scalability aspects. We also run empirical experiments using YCSB to show the efficacy of the approach for a variety of workloads.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129846896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a new approach to building the query based relevance sets (qrels) or relevance judgments for a test collection automatically without using any human intervention. The methods we describe use supervised machine learning algorithms, namely the Naïve Bayes classifier and the Support Vector Machine (SVM). We achieve better Kendall's tau and Spearman correlation results between the TREC system ranking using the newly generated qrels and the ranking obtained from using the human-built qrels than previous baselines. We also apply a variation of these approaches by using the doc2vec representation of the documents rather than using the traditional tf-idf representation.
{"title":"Using Supervised Machine Learning to Automatically Build Relevance Judgments for a Test Collection","authors":"Mireille Makary, M. Oakes, R. Mitkov, Fadi Yamout","doi":"10.1109/DEXA.2017.38","DOIUrl":"https://doi.org/10.1109/DEXA.2017.38","url":null,"abstract":"This paper describes a new approach to building the query based relevance sets (qrels) or relevance judgments for a test collection automatically without using any human intervention. The methods we describe use supervised machine learning algorithms, namely the Naïve Bayes classifier and the Support Vector Machine (SVM). We achieve better Kendall's tau and Spearman correlation results between the TREC system ranking using the newly generated qrels and the ranking obtained from using the human-built qrels than previous baselines. We also apply a variation of these approaches by using the doc2vec representation of the documents rather than using the traditional tf-idf representation.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130620020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Lara-Cabrera, A. González-Pardo, M. Barhamgi, David Camacho
Social networks (SNs) have become essential communication tools in recent years, generating a large amount of information about its users that can be analysed with data processing algorithms. Recently, a new type of SN user has emerged: jihadists that use SNs as a tool to recruit new militants and share their propaganda. In this paper, we study a set of indicators to assess the risk of radicalisation of a social network user. These radicalisation indicators help law-enforcement agencies, prosecutors and organizations devoted to fight terrorism to detect vulnerable targets even before the radicalisation process is completed. Moreover, these indicators are the first steps towards a software tool to gather, represent, pre-process and analyse behavioural indicators of radicalisation in terrorism.
{"title":"Extracting Radicalisation Behavioural Patterns from Social Network Data","authors":"R. Lara-Cabrera, A. González-Pardo, M. Barhamgi, David Camacho","doi":"10.1109/DEXA.2017.18","DOIUrl":"https://doi.org/10.1109/DEXA.2017.18","url":null,"abstract":"Social networks (SNs) have become essential communication tools in recent years, generating a large amount of information about its users that can be analysed with data processing algorithms. Recently, a new type of SN user has emerged: jihadists that use SNs as a tool to recruit new militants and share their propaganda. In this paper, we study a set of indicators to assess the risk of radicalisation of a social network user. These radicalisation indicators help law-enforcement agencies, prosecutors and organizations devoted to fight terrorism to detect vulnerable targets even before the radicalisation process is completed. Moreover, these indicators are the first steps towards a software tool to gather, represent, pre-process and analyse behavioural indicators of radicalisation in terrorism.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134609691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Organizations enhance the velocity of simple operations that read and write a small amount of data from big data by extending a SQL system with a key-value store (KVS). The resulting system is suitable for workloads that issue simple operations and exhibit a high read to write ratio, e.g., interactive social networking actions. A popular distributed in-memory KVS is memcached in use by organizations such as Facebook and YouTube. This study presents SQL query to trigger translation (SQLTrig) as a novel transparent consistency technique that maintains the key-value pairs of the KVS consistent with the tabular data in the relational database management system (RDBMS). SQLTrig provides physical data independence, hiding the representation of data (either as rows of a table or key-value pairs) from the application developers. Software developers are provided with the SQL query language and observe the performance enhancements of a KVS without authoring additional software. This simplifies software complexity to expedite its development life cycle.
{"title":"SQL Query to Trigger Translation: A Novel Transparent Consistency Technique for Cache Augmented SQL Systems","authors":"Shahram Ghandeharizadeh, Jason Yap","doi":"10.1109/DEXA.2017.24","DOIUrl":"https://doi.org/10.1109/DEXA.2017.24","url":null,"abstract":"Organizations enhance the velocity of simple operations that read and write a small amount of data from big data by extending a SQL system with a key-value store (KVS). The resulting system is suitable for workloads that issue simple operations and exhibit a high read to write ratio, e.g., interactive social networking actions. A popular distributed in-memory KVS is memcached in use by organizations such as Facebook and YouTube. This study presents SQL query to trigger translation (SQLTrig) as a novel transparent consistency technique that maintains the key-value pairs of the KVS consistent with the tabular data in the relational database management system (RDBMS). SQLTrig provides physical data independence, hiding the representation of data (either as rows of a table or key-value pairs) from the application developers. Software developers are provided with the SQL query language and observe the performance enhancements of a KVS without authoring additional software. This simplifies software complexity to expedite its development life cycle.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128705874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiufeng Liu, P. S. Nielsen, A. Heller, Panagiota Gianniou
The pervasive use of Internet of Things and smart meter technologies in smart cities increases the complexity of managing the data, due to their sizes, diversity, and privacy issues. This requires an innovate solution to process and manage the data effectively. This paper presents an elastic private scientific cloud, SciCloud, to tackle these grand challenges. SciCloud provides on-demand computing resource provisions, a scalable data management platform and an in-place data analytics environment to support the scientific research using smart city data.
{"title":"SciCloud: A Scientific Cloud and Management Platform for Smart City Data","authors":"Xiufeng Liu, P. S. Nielsen, A. Heller, Panagiota Gianniou","doi":"10.1109/DEXA.2017.22","DOIUrl":"https://doi.org/10.1109/DEXA.2017.22","url":null,"abstract":"The pervasive use of Internet of Things and smart meter technologies in smart cities increases the complexity of managing the data, due to their sizes, diversity, and privacy issues. This requires an innovate solution to process and manage the data effectively. This paper presents an elastic private scientific cloud, SciCloud, to tackle these grand challenges. SciCloud provides on-demand computing resource provisions, a scalable data management platform and an in-place data analytics environment to support the scientific research using smart city data.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126456052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}