2017 28th International Workshop on Database and Expert Systems Applications (DEXA)最新文献

英文中文

Extending Cross Motif Search with Heuristic Data Mining 基于启发式数据挖掘的扩展交叉基序搜索

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.28

Teo Argentieri, V. Cantoni, M. Musci

In previous works we have presented Cross Motif Search (CMS), a MP/MPI parallel tool for geometrical motif extraction in the secondary structure of proteins. We proved that our algorithm is capable of retrieving previously unknown motifs, thanks to its innovative approach based on the generalized Hough transform. We have also presented a GUI to CMS, called MotifVisualizer, which was introduced to improve software usability and to encourage collaboration with the biology community. In this paper we address the main shortcoming of CMS: with a simple approach based on heuristic data mining we show how we can classify the candidate motifs according to their statistical significance in the data set. We also present two extensions to MotifVisualizer, one to include the new data mining functions in the GUI, and a second one to allow for an easier retrieval of testing data sets.

在之前的工作中，我们提出了交叉基序搜索(CMS)，一个MP/MPI并行工具，用于蛋白质二级结构的几何基序提取。我们证明了我们的算法能够检索以前未知的基元，这要归功于它基于广义霍夫变换的创新方法。我们还为CMS提供了一个名为MotifVisualizer的GUI，它的引入是为了提高软件的可用性，并鼓励与生物界的合作。在本文中，我们解决了CMS的主要缺点:通过一种基于启发式数据挖掘的简单方法，我们展示了如何根据数据集中的统计显著性对候选基序进行分类。我们还为MotifVisualizer提供了两个扩展，一个是在GUI中包含新的数据挖掘功能，另一个是允许更容易地检索测试数据集。

引用次数: 3

Biclustering of Biological Sequences 生物序列的双聚类

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.31

F. Mhamdi, Sourour Marai

The analysis of biological data is a challenging problem in bioinformatics and data mining field. Given the complexity of the analysis of biological information, several methods have been proposed for analyzing this biological information in databases mostly in the form of genetic sequences and protein structures. Actually, genetic sequences are represented by matrices that indicate the expression levels of thousands of genes under several conditions. The analysis of this huge amount of data consists in extracting genes that behave similarly under certain conditions. In fact, the extracted information are sub-matrices (biclusters) that satisfy a coherence constraint. The process of extracting them is called biclustering. In this paper, we deal with biclustering problems applied to the analysis of biological data. First, a description of the problem is reviewed. Furthermore, we present a description of the divide and conquer approach that we will adopt to our algorithm for extracting biclusters. Additionally, a new evaluation function intitled Pattern Correlation Value (PCV), allowing identification of all biclusters types is proposed. Experimental results, demonstrate that the proposed methods are effective on this problem and are able to extract relevant information from the considered data.

生物数据分析是生物信息学和数据挖掘领域的一个具有挑战性的问题。鉴于生物信息分析的复杂性，已经提出了几种方法来分析数据库中的生物信息，主要以基因序列和蛋白质结构的形式。实际上，基因序列是由矩阵表示的，矩阵表示数千个基因在不同条件下的表达水平。对大量数据的分析包括提取在特定条件下表现相似的基因。实际上，提取的信息是满足相干约束的子矩阵(双聚类)。提取它们的过程被称为双聚类。在本文中，我们处理应用于生物数据分析的双聚类问题。首先，对问题的描述进行回顾。此外，我们提出了分而治之的方法的描述，我们将采用我们的算法提取双聚类。此外，提出了一种新的评价函数模式相关值(Pattern Correlation Value, PCV)，可以识别所有的双聚类类型。实验结果表明，该方法能够有效地从考虑的数据中提取出相关信息。

{"title":"Biclustering of Biological Sequences","authors":"F. Mhamdi, Sourour Marai","doi":"10.1109/DEXA.2017.31","DOIUrl":"https://doi.org/10.1109/DEXA.2017.31","url":null,"abstract":"The analysis of biological data is a challenging problem in bioinformatics and data mining field. Given the complexity of the analysis of biological information, several methods have been proposed for analyzing this biological information in databases mostly in the form of genetic sequences and protein structures. Actually, genetic sequences are represented by matrices that indicate the expression levels of thousands of genes under several conditions. The analysis of this huge amount of data consists in extracting genes that behave similarly under certain conditions. In fact, the extracted information are sub-matrices (biclusters) that satisfy a coherence constraint. The process of extracting them is called biclustering. In this paper, we deal with biclustering problems applied to the analysis of biological data. First, a description of the problem is reviewed. Furthermore, we present a description of the divide and conquer approach that we will adopt to our algorithm for extracting biclusters. Additionally, a new evaluation function intitled Pattern Correlation Value (PCV), allowing identification of all biclusters types is proposed. Experimental results, demonstrate that the proposed methods are effective on this problem and are able to extract relevant information from the considered data.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115165862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantifying Uncertainty for Preemptive Resource Provisioning in the Cloud 量化云中抢占式资源供应的不确定性

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.42

Marin Aranitasi, Benjamin Byholm, Mats Neovius

To satisfy quality of service requirements in a cost-efficient manner, cloud service providers would benefit from providing a means for quantifying the level of operational uncertainty within their systems. This uncertainty arises due to the dynamic nature of the cloud. Since tasks requiring various amounts of resources may enter and leave the system at any time, systems plagued by high volatility are challenging in preemptive resource provisioning. In this paper, we present a general method based on Dempster-Shafer theory that enables quantifying the level of operational uncertainty in an entire cloud system or parts thereof. In addition to the standard quality metrics, we propose monitoring of system calls tocapture historical behavior of virtual machines as an input tothe general method. Knowing the level of operationaluncertainty enables greater accuracy in online resourceprovisioning by quantifying the volatility of thedeployedsystem

为了以经济高效的方式满足服务质量要求，云服务提供商将受益于提供一种量化其系统内操作不确定性水平的方法。这种不确定性源于云的动态特性。由于需要不同数量资源的任务可能随时进入和离开系统，因此受高波动性困扰的系统在抢占式资源供应方面面临挑战。在本文中，我们提出了一种基于Dempster-Shafer理论的通用方法，可以量化整个云系统或其部分的操作不确定性水平。除了标准的质量度量之外，我们建议监控系统调用，以捕获虚拟机的历史行为，作为通用方法的输入。通过量化已部署系统的波动性，了解操作不确定性的水平可以提高在线资源供应的准确性

引用次数: 3

Towards a Cloud of Clouds Elasticity Management System 面向云的云弹性管理系统

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.47

Rayene Moudjari, Z. Sahnoun

In recent years, cloud computing paradigm has grown massively popular in both industry and academic sectors. One of the main reasons for the wide adoption of Cloud Computing is the ability to add and remove resources "on the fly" to handle the load variation through the concept of elasticity. The efficient management of cloud elastic system is a challenging task. This paper proposes a multi agent system for cloud of clouds elasticity management. Concretely, we adopt a formal modelling approach based on Bigraphs (BRS) for the specification of the multi-agent system structural and behavioral aspects.

近年来，云计算范式在工业界和学术界都变得非常流行。云计算被广泛采用的主要原因之一是能够“动态地”添加和删除资源，从而通过弹性的概念处理负载变化。云弹性系统的高效管理是一项具有挑战性的任务。提出了一种云弹性管理的多智能体系统。具体而言，我们采用了一种基于图形(BRS)的形式化建模方法来规范多智能体系统的结构和行为方面。

引用次数: 1

Recognizing Protein Secondary Structures with Neural Networks 用神经网络识别蛋白质二级结构

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.29

R. Harrison, Michael McDermott, Chinua Umoja

Recognizing secondary structures in proteins can be a highly computationally expensive task that may not always yield good results. Using Restricted Boltzmann Machines (RBM) we were able to train a simple neural network to recognize an alpha-helix with a good degree of accuracy. Modifying the RBM implementation to be much simpler and more efficient than the standard implementation we are able to see a 14-fold speedup in training with no loss in detection accuracy or in cluster formation. With even very small training sets (160 members) we are able to recognize both the alpha-helix structures we are training for but also other, similar, helix structures that we did not train for. We are also able to recognize these structures with a high degree of accuracy. We are also able to cluster these structures together in a meaningful way based on the RBM training results. Both the training and clustering is completely unsupervised beyond the training set meeting certain constraints. Interestingly, each cluster shares structural similarities within itself but also has noticeable differences from other clusters that are detected. These clusters seem to form regardless of training set size or makeup.

识别蛋白质中的二级结构可能是一项计算成本很高的任务，可能并不总是产生良好的结果。使用受限玻尔兹曼机(RBM)，我们能够训练一个简单的神经网络以较高的精度识别α -螺旋。将RBM实现修改得比标准实现更简单、更有效，我们可以看到训练速度提高了14倍，而检测精度和聚类形成没有损失。即使是非常小的训练集(160个成员)，我们也能够识别我们正在训练的α -螺旋结构，以及其他类似的螺旋结构，我们没有训练过。我们还能够高度准确地识别这些结构。我们还能够基于RBM训练结果以一种有意义的方式将这些结构聚类在一起。训练和聚类都是完全无监督的，超出了满足一定约束的训练集。有趣的是，每个簇本身具有结构相似性，但也与检测到的其他簇有明显的差异。这些聚类似乎与训练集的大小或组成无关。

{"title":"Recognizing Protein Secondary Structures with Neural Networks","authors":"R. Harrison, Michael McDermott, Chinua Umoja","doi":"10.1109/DEXA.2017.29","DOIUrl":"https://doi.org/10.1109/DEXA.2017.29","url":null,"abstract":"Recognizing secondary structures in proteins can be a highly computationally expensive task that may not always yield good results. Using Restricted Boltzmann Machines (RBM) we were able to train a simple neural network to recognize an alpha-helix with a good degree of accuracy. Modifying the RBM implementation to be much simpler and more efficient than the standard implementation we are able to see a 14-fold speedup in training with no loss in detection accuracy or in cluster formation. With even very small training sets (160 members) we are able to recognize both the alpha-helix structures we are training for but also other, similar, helix structures that we did not train for. We are also able to recognize these structures with a high degree of accuracy. We are also able to cluster these structures together in a meaningful way based on the RBM training results. Both the training and clustering is completely unsupervised beyond the training set meeting certain constraints. Interestingly, each cluster shares structural similarities within itself but also has noticeable differences from other clusters that are detected. These clusters seem to form regardless of training set size or makeup.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127345130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Thespis: Actor-Based Causal Consistency 主题:基于行动者的因果一致性

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.25

C. Camilleri, J. Vella, Vitezslav Nezval

Thespis is a middleware that innovatively leverages the Actor model to implement causal consistency over an industry-standard database, whilst abstracting complexities for application developers behind a REST open-protocol interface. Our evaluation considers correctness, performance and scalability aspects. We also run empirical experiments using YCSB to show the efficacy of the approach for a variety of workloads.

thespiis是一种中间件，它创新性地利用Actor模型在行业标准数据库上实现因果一致性，同时为应用程序开发人员在REST开放协议接口后面抽象复杂性。我们的评估考虑了正确性、性能和可伸缩性方面。我们还使用YCSB进行了实证实验，以显示该方法对各种工作负载的有效性。

引用次数: 4

Using Supervised Machine Learning to Automatically Build Relevance Judgments for a Test Collection 使用监督机器学习为测试集自动构建相关性判断

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.38

Mireille Makary, M. Oakes, R. Mitkov, Fadi Yamout

This paper describes a new approach to building the query based relevance sets (qrels) or relevance judgments for a test collection automatically without using any human intervention. The methods we describe use supervised machine learning algorithms, namely the Naïve Bayes classifier and the Support Vector Machine (SVM). We achieve better Kendall's tau and Spearman correlation results between the TREC system ranking using the newly generated qrels and the ranking obtained from using the human-built qrels than previous baselines. We also apply a variation of these approaches by using the doc2vec representation of the documents rather than using the traditional tf-idf representation.

本文描述了一种新的方法，在不使用任何人工干预的情况下，为测试集合自动构建基于查询的相关性集或相关性判断。我们描述的方法使用监督机器学习算法，即Naïve贝叶斯分类器和支持向量机(SVM)。我们在使用新生成的qql和使用人工构建的qql获得的排名之间获得了比以前基线更好的Kendall's tau和Spearman相关结果。我们还通过使用文档的doc2vec表示而不是传统的tf-idf表示来应用这些方法的一种变体。

引用次数: 0

Extracting Radicalisation Behavioural Patterns from Social Network Data 从社交网络数据中提取激进行为模式

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.18

R. Lara-Cabrera, A. González-Pardo, M. Barhamgi, David Camacho

Social networks (SNs) have become essential communication tools in recent years, generating a large amount of information about its users that can be analysed with data processing algorithms. Recently, a new type of SN user has emerged: jihadists that use SNs as a tool to recruit new militants and share their propaganda. In this paper, we study a set of indicators to assess the risk of radicalisation of a social network user. These radicalisation indicators help law-enforcement agencies, prosecutors and organizations devoted to fight terrorism to detect vulnerable targets even before the radicalisation process is completed. Moreover, these indicators are the first steps towards a software tool to gather, represent, pre-process and analyse behavioural indicators of radicalisation in terrorism.

近年来，社交网络(SNs)已经成为必不可少的交流工具，它产生了大量关于用户的信息，这些信息可以用数据处理算法进行分析。最近，出现了一种新型的社交网络用户:圣战分子利用社交网络招募新的武装分子，并分享他们的宣传。在本文中，我们研究了一组指标来评估社交网络用户激进化的风险。这些激进化指标有助于执法机构、检察官和致力于打击恐怖主义的组织在激进化过程完成之前就发现易受攻击的目标。此外，这些指标是朝着收集、表示、预处理和分析恐怖主义激进化行为指标的软件工具迈出的第一步。

引用次数: 10

SQL Query to Trigger Translation: A Novel Transparent Consistency Technique for Cache Augmented SQL Systems SQL查询触发转换:一种用于缓存增强SQL系统的透明一致性新技术

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.24

Shahram Ghandeharizadeh, Jason Yap

Organizations enhance the velocity of simple operations that read and write a small amount of data from big data by extending a SQL system with a key-value store (KVS). The resulting system is suitable for workloads that issue simple operations and exhibit a high read to write ratio, e.g., interactive social networking actions. A popular distributed in-memory KVS is memcached in use by organizations such as Facebook and YouTube. This study presents SQL query to trigger translation (SQLTrig) as a novel transparent consistency technique that maintains the key-value pairs of the KVS consistent with the tabular data in the relational database management system (RDBMS). SQLTrig provides physical data independence, hiding the representation of data (either as rows of a table or key-value pairs) from the application developers. Software developers are provided with the SQL query language and observe the performance enhancements of a KVS without authoring additional software. This simplifies software complexity to expedite its development life cycle.

组织通过扩展SQL系统的键值存储(key-value store, KVS)来提高从大数据中读写少量数据的简单操作的速度。由此产生的系统适用于发出简单操作并表现出高读写比率的工作负载，例如交互式社交网络操作。一个流行的分布式内存中的KVS是memcached，被Facebook和YouTube等组织使用。本研究将SQL查询触发转换(SQLTrig)作为一种新的透明一致性技术，在关系数据库管理系统(RDBMS)中保持KVS的键值对与表格数据的一致性。SQLTrig提供物理数据独立性，对应用程序开发人员隐藏数据的表示(作为表的行或键值对)。软件开发人员可以使用SQL查询语言，无需编写其他软件即可观察到KVS的性能增强。这简化了软件的复杂性，加快了它的开发生命周期。

引用次数: 1

SciCloud: A Scientific Cloud and Management Platform for Smart City Data sciicloud:智慧城市数据的科学云和管理平台

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.22

Xiufeng Liu, P. S. Nielsen, A. Heller, Panagiota Gianniou

The pervasive use of Internet of Things and smart meter technologies in smart cities increases the complexity of managing the data, due to their sizes, diversity, and privacy issues. This requires an innovate solution to process and manage the data effectively. This paper presents an elastic private scientific cloud, SciCloud, to tackle these grand challenges. SciCloud provides on-demand computing resource provisions, a scalable data management platform and an in-place data analytics environment to support the scientific research using smart city data.

由于数据的规模、多样性和隐私问题，物联网和智能电表技术在智慧城市的广泛使用增加了数据管理的复杂性。这需要一个创新的解决方案来有效地处理和管理数据。本文提出了一种弹性私有科学云，SciCloud，来解决这些巨大的挑战。SciCloud提供按需计算资源供应、可扩展的数据管理平台和就地数据分析环境，以支持使用智慧城市数据的科学研究。

引用次数: 8

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀