首页 > 最新文献

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)最新文献

英文 中文
Unfairness Correction in P2P Grids Based on Residue Number System of a Special Form 基于特殊形式剩余数系统的P2P网格不公平性校正
M. Babenko, N. Chervyakov, Andrei Tchernykh, N. Kucherov, M. N. Shabalina, I. Vashchenko, G. Radchenko, Daniil Murga
This paper addresses error correction codes approach in order to improve the performance of BOINC under uncertainty of users' behavior. Redundant Residue Number System (RRNS) moduli set of the special form provides correction of user unfairness, reliability, decreased redundancy and load of the computing network. Error correction code in RRNS is improved by using error syndrome. It decreases the amount of computations required for data decoding up to 20/7 times compared to the projection methods. The proposed modification of the error syndrome allow to omit the assumption that clients are honest and reliable. The proposed approach decreases the execution time of the client programs asymptotically by 4 times. On the other hand, encoding with RRNS places some restrictions on the range of performed operations.
为了提高BOINC在用户行为不确定情况下的性能,本文研究了纠错码方法。特殊形式的冗余余数系统(RRNS)模集提供了对用户不公平、可靠性、减少冗余和计算网络负载的校正。利用错误综合征对RRNS中的纠错码进行了改进。与投影方法相比,它将数据解码所需的计算量减少了20/7倍。对错误综合症提出的修改允许忽略客户诚实可靠的假设。该方法将客户端程序的执行时间渐近地减少了4倍。另一方面,使用RRNS编码会对执行的操作范围施加一些限制。
{"title":"Unfairness Correction in P2P Grids Based on Residue Number System of a Special Form","authors":"M. Babenko, N. Chervyakov, Andrei Tchernykh, N. Kucherov, M. N. Shabalina, I. Vashchenko, G. Radchenko, Daniil Murga","doi":"10.1109/DEXA.2017.46","DOIUrl":"https://doi.org/10.1109/DEXA.2017.46","url":null,"abstract":"This paper addresses error correction codes approach in order to improve the performance of BOINC under uncertainty of users' behavior. Redundant Residue Number System (RRNS) moduli set of the special form provides correction of user unfairness, reliability, decreased redundancy and load of the computing network. Error correction code in RRNS is improved by using error syndrome. It decreases the amount of computations required for data decoding up to 20/7 times compared to the projection methods. The proposed modification of the error syndrome allow to omit the assumption that clients are honest and reliable. The proposed approach decreases the execution time of the client programs asymptotically by 4 times. On the other hand, encoding with RRNS places some restrictions on the range of performed operations.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122879942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS 利用并行阵列数据库管理系统的机器学习云系统
Yiqun Zhang, C. Ordonez, S. Johnsson
Computing machine learning models in the cloud remains a central problem in big data analytics. In this work, we introduce a cloud analytic system exploiting a parallel array DBMS based on a classical shared-nothing architecture. Our approach combines in-DBMS data summarization with mathematical processing in an external program. We study how to summarize a data set in parallel assuming a large number of processing nodes and how to further accelerate it with GPUs. In contrast to most big data analytic systems, we do not use Java, HDFS, MapReduce or Spark: our system is programmed in C++ and C on top of a traditional Unix le system. In our system, models are ef ciently computed using a suite of innovative parallel matrix operators, which compute comprehensive statistical summaries of a large input data set (matrix) in one pass, leaving the remaining mathematically complex computations, with matrices that t in RAM, to R. In order to be competitive with the Hadoop ecosystem (i.e. HDFS and Spark RDDs) we also introduce a parallel load operator for large matrices and an automated, yet exible, cluster con guration in the cloud. Experiments compare our system with Spark, showing orders of magnitude time improvement. A GPU with many cores widens the gap further. In summary, our system is a competitive solution.
在云端计算机器学习模型仍然是大数据分析的核心问题。在这项工作中,我们介绍了一个利用基于经典无共享架构的并行阵列DBMS的云分析系统。我们的方法将数据库管理系统中的数据汇总与外部程序中的数学处理相结合。我们研究了如何在大量处理节点的情况下并行总结数据集,以及如何利用gpu进一步加速。与大多数大数据分析系统相比,我们不使用Java、HDFS、MapReduce或Spark:我们的系统是在传统的Unix系统上用c++和C编程的。ef地在我们的系统中,模型计算使用的一套创新的并行矩阵算子,计算综合统计总结大量输入数据集(矩阵)一遍,留下剩下的复杂的数学计算,与矩阵t在RAM中,r .为了竞争与Hadoop生态系统(即HDFS和火花抽样),我们还将介绍并行加载运营商对于大型矩阵和一个自动化,然而exible,集群con guration在云端。实验将我们的系统与Spark进行了比较,显示出数量级的时间改进。多核GPU进一步拉大了差距。总之,我们的系统是一个有竞争力的解决方案。
{"title":"A Cloud System for Machine Learning Exploiting a Parallel Array DBMS","authors":"Yiqun Zhang, C. Ordonez, S. Johnsson","doi":"10.1109/DEXA.2017.21","DOIUrl":"https://doi.org/10.1109/DEXA.2017.21","url":null,"abstract":"Computing machine learning models in the cloud remains a central problem in big data analytics. In this work, we introduce a cloud analytic system exploiting a parallel array DBMS based on a classical shared-nothing architecture. Our approach combines in-DBMS data summarization with mathematical processing in an external program. We study how to summarize a data set in parallel assuming a large number of processing nodes and how to further accelerate it with GPUs. In contrast to most big data analytic systems, we do not use Java, HDFS, MapReduce or Spark: our system is programmed in C++ and C on top of a traditional Unix le system. In our system, models are ef ciently computed using a suite of innovative parallel matrix operators, which compute comprehensive statistical summaries of a large input data set (matrix) in one pass, leaving the remaining mathematically complex computations, with matrices that t in RAM, to R. In order to be competitive with the Hadoop ecosystem (i.e. HDFS and Spark RDDs) we also introduce a parallel load operator for large matrices and an automated, yet exible, cluster con guration in the cloud. Experiments compare our system with Spark, showing orders of magnitude time improvement. A GPU with many cores widens the gap further. In summary, our system is a competitive solution.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126618320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Global and Local Feature Learning for Ego-Network Analysis 自我网络分析的全局和局部特征学习
Fatemeh Salehi Rizi, M. Granitzer, Konstantin Ziegler
In an ego-network, an individual (ego) organizes its friends (alters) in different groups (social circles). This social network can be efficiently analyzed after learning representations of the ego and its alters in a low-dimensional, real vector space. These representations are then easily exploited via statistical models for tasks such as social circle detection and prediction. Recent advances in language modeling via deep learning have inspired new methods for learning network representations. These methods can capture the global structure of networks. In this paper, we evolve these techniques to also encode the local structure of neighborhoods. Therefore, our local representations capture network features that are hidden in the global representation of large networks. We show that the task of social circle prediction benefits from a combination of global and local features generated by our technique.
在自我网络中,个体(自我)在不同的群体(社交圈)中组织自己的朋友(改变者)。在学习了自我及其在低维实向量空间中的变化的表征后,可以有效地分析这个社会网络。然后,这些表征很容易通过统计模型用于社交圈检测和预测等任务。通过深度学习的语言建模的最新进展激发了学习网络表示的新方法。这些方法可以捕获网络的全局结构。在本文中,我们改进了这些技术来编码邻域的局部结构。因此,我们的局部表示捕获了隐藏在大型网络的全局表示中的网络特征。我们表明,社交圈预测任务受益于我们的技术生成的全局和局部特征的结合。
{"title":"Global and Local Feature Learning for Ego-Network Analysis","authors":"Fatemeh Salehi Rizi, M. Granitzer, Konstantin Ziegler","doi":"10.1109/DEXA.2017.36","DOIUrl":"https://doi.org/10.1109/DEXA.2017.36","url":null,"abstract":"In an ego-network, an individual (ego) organizes its friends (alters) in different groups (social circles). This social network can be efficiently analyzed after learning representations of the ego and its alters in a low-dimensional, real vector space. These representations are then easily exploited via statistical models for tasks such as social circle detection and prediction. Recent advances in language modeling via deep learning have inspired new methods for learning network representations. These methods can capture the global structure of networks. In this paper, we evolve these techniques to also encode the local structure of neighborhoods. Therefore, our local representations capture network features that are hidden in the global representation of large networks. We show that the task of social circle prediction benefits from a combination of global and local features generated by our technique.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115350500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improvement of Sentiment Analysis Based on Clustering of Word2Vec Features 基于Word2Vec特征聚类的情感分析改进
Eissa Alshari, A. Azman, S. Doraisamy, N. Mustapha, Mustafa Alkeshr
Recently, many researchers have shown interest in using Word2Vec as the features for text classification tasks such as sentiment analysis. Its ability to model high quality distributional semantics among words has contributed to its success in many of the tasks. However, due to the high dimensional nature of the Word2Vec features, it increases the complexity for the classifier. In this paper, a method to construct a feature set based on Word2Vec is proposed for sentiment analysis. The method is based on clustering of terms in the vocabulary based on a set of opinion words from a sentiment lexical dictionary. As a result, the feature set for the classification is constructed based on the set of clusters. The effectiveness of the proposed method is evaluated on the Internet Movie Review Dataset with two classifiers, namely the Support Vector Machine and the Logistic Regression. The result is promising, showing that the proposed method can be more effective than the baseline approaches.
最近,许多研究人员对使用Word2Vec作为文本分类任务(如情感分析)的特征表现出兴趣。它在单词之间建立高质量分布语义模型的能力有助于它在许多任务中取得成功。然而,由于Word2Vec特征的高维性质,它增加了分类器的复杂性。本文提出了一种基于Word2Vec的情感分析特征集构建方法。该方法基于一组来自情感词汇词典的意见词,对词汇中的术语进行聚类。因此,分类的特征集是基于聚类集构建的。用支持向量机和逻辑回归两种分类器在互联网电影评论数据集上评估了该方法的有效性。结果表明,该方法比基线方法更有效。
{"title":"Improvement of Sentiment Analysis Based on Clustering of Word2Vec Features","authors":"Eissa Alshari, A. Azman, S. Doraisamy, N. Mustapha, Mustafa Alkeshr","doi":"10.1109/DEXA.2017.41","DOIUrl":"https://doi.org/10.1109/DEXA.2017.41","url":null,"abstract":"Recently, many researchers have shown interest in using Word2Vec as the features for text classification tasks such as sentiment analysis. Its ability to model high quality distributional semantics among words has contributed to its success in many of the tasks. However, due to the high dimensional nature of the Word2Vec features, it increases the complexity for the classifier. In this paper, a method to construct a feature set based on Word2Vec is proposed for sentiment analysis. The method is based on clustering of terms in the vocabulary based on a set of opinion words from a sentiment lexical dictionary. As a result, the feature set for the classification is constructed based on the set of clusters. The effectiveness of the proposed method is evaluated on the Internet Movie Review Dataset with two classifiers, namely the Support Vector Machine and the Logistic Regression. The result is promising, showing that the proposed method can be more effective than the baseline approaches.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127770395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Towards Mitigating Uncertainty of Data Security Breaches and Collusion in Cloud Computing 减轻云计算中数据安全漏洞和合谋的不确定性
A. Tchernykh, M. Babenko, N. Chervyakov, J. M. Cortés-Mendoza, N. Kucherov, V. Miranda-López, M. Deryabin, I. Dvoryaninova, G. Radchenko
Cloud computing has become a part of people's lives. However, there are many unresolved problems with security of this technology. According to the assessment of international experts in the field of security, there are risks in the appearance of cloud collusion in uncertain conditions. To mitigate this type of uncertainty, and minimize data redundancy of encryption together with harms caused by cloud collusion, modified threshold Asmuth-Bloom and weighted Mignotte secret sharing schemes are used. We show that if the villains do know the secret parts, and/or do not know the secret key, they cannot recuperate the secret. If the attackers do not know the required number of secret parts but know the secret key, the probability that they obtain the secret depends the size of the machine word in bits that is less than 1/2 ((l-1)). We demonstrate that the proposed scheme ensures security under several types of attacks. We propose four approaches to select weights for secret sharing schemes to optimize the system behavior based on data access speed: pessimistic, balanced, and optimistic, and on speed per price ratio. We use the approximate method to improve the detection, localization and error correction accuracy under cloud parameters uncertainty.
云计算已经成为人们生活的一部分。然而,该技术的安全性仍存在许多未解决的问题。根据国际安全领域专家的评估,在不确定条件下出现云合谋存在风险。为了减轻这种不确定性,并最大限度地减少加密的数据冗余以及云合谋造成的危害,使用了修改阈值Asmuth-Bloom和加权Mignotte秘密共享方案。我们表明,如果反派知道秘密部分,和/或不知道秘密密钥,他们无法恢复秘密。如果攻击者不知道所需的秘密部分数量,但知道秘密密钥,则他们获得秘密的概率取决于机器字的大小,其比特数小于1/2 ((l-1))。我们证明了所提出的方案在几种类型的攻击下保证了安全性。我们提出了四种方法来选择秘密共享方案的权重,以优化基于数据访问速度的系统行为:悲观、平衡和乐观,以及速度/价格比。采用近似方法提高了云参数不确定情况下的检测、定位和纠错精度。
{"title":"Towards Mitigating Uncertainty of Data Security Breaches and Collusion in Cloud Computing","authors":"A. Tchernykh, M. Babenko, N. Chervyakov, J. M. Cortés-Mendoza, N. Kucherov, V. Miranda-López, M. Deryabin, I. Dvoryaninova, G. Radchenko","doi":"10.1109/DEXA.2017.44","DOIUrl":"https://doi.org/10.1109/DEXA.2017.44","url":null,"abstract":"Cloud computing has become a part of people's lives. However, there are many unresolved problems with security of this technology. According to the assessment of international experts in the field of security, there are risks in the appearance of cloud collusion in uncertain conditions. To mitigate this type of uncertainty, and minimize data redundancy of encryption together with harms caused by cloud collusion, modified threshold Asmuth-Bloom and weighted Mignotte secret sharing schemes are used. We show that if the villains do know the secret parts, and/or do not know the secret key, they cannot recuperate the secret. If the attackers do not know the required number of secret parts but know the secret key, the probability that they obtain the secret depends the size of the machine word in bits that is less than 1/2 ((l-1)). We demonstrate that the proposed scheme ensures security under several types of attacks. We propose four approaches to select weights for secret sharing schemes to optimize the system behavior based on data access speed: pessimistic, balanced, and optimistic, and on speed per price ratio. We use the approximate method to improve the detection, localization and error correction accuracy under cloud parameters uncertainty.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132165937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A Multi-Recommenders System for Service Provisioning in Multi-Cloud Environment 多云环境下服务发放的多推荐系统
Haithem Mezni
Cloud service recommendation has become an important technique that helps users decide whether a service satisfies their requirements or not. However, the few existing recommendation systems are not suitable for real world environments and only deal with services hosted in a single cloud, which is simply unrealistic. In addition, a same service may be hosted on more than one cloud and, hence, may have different user ratings that depend on specific conditions of their cloud availability zones. This uncertainty regarding the real quality of the cloud service and users' satisfaction levels raises a question about how to trust the different users' ratings in order to recommend the adequate cloud service. Unlike existing solutions, the goal of this work is to propose a cooperative recommender system that aims to resolve two major issues: recommendation of cloud services in multiple clouds and recommendation under uncertainty of users' ratings. The proposed system will take advantage from a set of powerful techniques and paradigms in order to offer an overlay of cloud recommender entities that cooperate to deliver top-rated services to the user.
云服务推荐已经成为帮助用户判断服务是否满足其需求的一项重要技术。然而,现有的几个推荐系统不适合现实世界的环境,只处理托管在单个云中的服务,这是不现实的。此外,相同的服务可能托管在多个云上,因此可能有不同的用户评级,这取决于其云可用性区域的特定条件。这种关于云服务的真实质量和用户满意度的不确定性提出了一个问题,即如何信任不同用户的评级,以便推荐适当的云服务。与现有的解决方案不同,本工作的目标是提出一种协作推荐系统,旨在解决两个主要问题:多云云服务的推荐和用户评分不确定性下的推荐。该系统将利用一系列强大的技术和范例,以提供云推荐实体的覆盖,这些实体相互合作,为用户提供最高评级的服务。
{"title":"A Multi-Recommenders System for Service Provisioning in Multi-Cloud Environment","authors":"Haithem Mezni","doi":"10.1109/DEXA.2017.45","DOIUrl":"https://doi.org/10.1109/DEXA.2017.45","url":null,"abstract":"Cloud service recommendation has become an important technique that helps users decide whether a service satisfies their requirements or not. However, the few existing recommendation systems are not suitable for real world environments and only deal with services hosted in a single cloud, which is simply unrealistic. In addition, a same service may be hosted on more than one cloud and, hence, may have different user ratings that depend on specific conditions of their cloud availability zones. This uncertainty regarding the real quality of the cloud service and users' satisfaction levels raises a question about how to trust the different users' ratings in order to recommend the adequate cloud service. Unlike existing solutions, the goal of this work is to propose a cooperative recommender system that aims to resolve two major issues: recommendation of cloud services in multiple clouds and recommendation under uncertainty of users' ratings. The proposed system will take advantage from a set of powerful techniques and paradigms in order to offer an overlay of cloud recommender entities that cooperate to deliver top-rated services to the user.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127993137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MuMs: Energy-Aware VM Selection Scheme for Cloud Data Center MuMs:云数据中心节能虚拟机选择方案
Rahul Yadav, Weizhe Zhang, Huangning Chen, T. Guo
The energy consumption of data centers has been increasing continuously during the last years due to the rising demands of computational power especially in current Grid- and Cloud Computing systems, which directly influence the increment in operational costs as well as carbon dioxide (CO2) emission. To reduce energy consumption within the cloud data center, it required energy-aware virtual machines (VMs) selection algorithms for VM consolidation at time host detected underloaded and overloaded and after allocating resources to all VMs from the underloaded hosts required to turn into energy saving-mode. In this paper, we propose energy-aware dynamic VM selection algorithms for consolidating the VMs from overloaded or underloaded host for minimising the total energy consumption and maximise the Quality of Service (QoS) include the reduction of service level agreements (SLAs) violation. To validate our scheme, we implemented it using CloudSim simulator and conducted simulations on the 10 different day's real workloads trace, which provided by the PlanetLab.
由于当前网格和云计算系统对计算能力的需求不断增加,数据中心的能源消耗在过去几年中一直在不断增加,这直接影响到运营成本的增加以及二氧化碳(CO2)排放。为了降低云数据中心内部的能耗,需要在主机检测到负载过低和过载时,以及在将资源从负载过低的主机分配给所有虚拟机后,采用节能模式对虚拟机进行能量感知的选择算法。在本文中,我们提出了能量感知的动态虚拟机选择算法,用于整合来自过载或欠负载主机的虚拟机,以最小化总能耗并最大化服务质量(QoS),包括减少违反服务水平协议(sla)。为了验证我们的方案,我们使用CloudSim模拟器实现了它,并在PlanetLab提供的10个不同天的真实工作负载跟踪上进行了模拟。
{"title":"MuMs: Energy-Aware VM Selection Scheme for Cloud Data Center","authors":"Rahul Yadav, Weizhe Zhang, Huangning Chen, T. Guo","doi":"10.1109/DEXA.2017.43","DOIUrl":"https://doi.org/10.1109/DEXA.2017.43","url":null,"abstract":"The energy consumption of data centers has been increasing continuously during the last years due to the rising demands of computational power especially in current Grid- and Cloud Computing systems, which directly influence the increment in operational costs as well as carbon dioxide (CO2) emission. To reduce energy consumption within the cloud data center, it required energy-aware virtual machines (VMs) selection algorithms for VM consolidation at time host detected underloaded and overloaded and after allocating resources to all VMs from the underloaded hosts required to turn into energy saving-mode. In this paper, we propose energy-aware dynamic VM selection algorithms for consolidating the VMs from overloaded or underloaded host for minimising the total energy consumption and maximise the Quality of Service (QoS) include the reduction of service level agreements (SLAs) violation. To validate our scheme, we implemented it using CloudSim simulator and conducted simulations on the 10 different day's real workloads trace, which provided by the PlanetLab.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115164490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Interactive Chord Visualization for Metaproteomics 元蛋白质组学的交互式和弦可视化
Roman Zoun, K. Schallert, David Broneske, R. Heyer, D. Benndorf, G. Saake
Metaproteomics is an analytic approach to research microorganisms that live in complex microbial communities. A key aspect of understanding microbial communities is to link the functions of proteins identified by metaproteomics to their taxonomy. In this paper we demonstrate the interactive chord visualization as a powerful tool to explore such data. To evaluate the tools efficacy, we use the relation data between functions and taxonomies from a large metaproteomics experiment. We evaluated the work flow in comparison to previous methods of data analysis and showed that interactive exploration of data using the chord diagram is significantly faster in four of five tasks. Therefore, the chord visualization improves the user's ability to discover complex biological relationships.
宏蛋白质组学是一种分析方法,用于研究生活在复杂微生物群落中的微生物。了解微生物群落的一个关键方面是将宏蛋白质组学鉴定的蛋白质的功能与其分类联系起来。在本文中,我们展示了交互式和弦可视化作为一种强大的工具来探索这些数据。为了评估工具的有效性,我们使用了来自大型宏蛋白质组学实验的函数和分类法之间的关系数据。我们评估了与以前的数据分析方法相比的工作流程,并表明使用和弦图的数据交互探索在五个任务中的四个任务中要快得多。因此,和弦可视化提高了用户发现复杂生物关系的能力。
{"title":"Interactive Chord Visualization for Metaproteomics","authors":"Roman Zoun, K. Schallert, David Broneske, R. Heyer, D. Benndorf, G. Saake","doi":"10.1109/DEXA.2017.32","DOIUrl":"https://doi.org/10.1109/DEXA.2017.32","url":null,"abstract":"Metaproteomics is an analytic approach to research microorganisms that live in complex microbial communities. A key aspect of understanding microbial communities is to link the functions of proteins identified by metaproteomics to their taxonomy. In this paper we demonstrate the interactive chord visualization as a powerful tool to explore such data. To evaluate the tools efficacy, we use the relation data between functions and taxonomies from a large metaproteomics experiment. We evaluated the work flow in comparison to previous methods of data analysis and showed that interactive exploration of data using the chord diagram is significantly faster in four of five tasks. Therefore, the chord visualization improves the user's ability to discover complex biological relationships.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127221714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Evaluation of Contextualization and Diversification Approaches in Aggregated Search 聚合搜索中语境化和多样化方法的评价
Hermann Ziak, Roman Kern
The combination of different knowledge bases in the field of information retrieval is called federated or aggregated search. It has several benefits over single source retrieval but poses some challenges as well. This work focuses on the challenge of result aggregation; especially in a setting where the final result list should include some level of diversity and serendipity. Both concepts have been shown to have an impact on how user perceive an information retrieval system. In particular, we want to assess if conventional procedures for result list aggregation can be utilised to introduce diversity and serendipity. Furthermore, we study whether blocking or interleaving for result aggregation yields better results. In a cross vertical aggregated search the so-called verticals could be news, multimedia content or text. Block ranking is one approach to combine such heterogeneous result. It relies on the idea that these verticals are combined into a single result list as blocks of several adjacent items. An alternative approach for this is interleaving. Here the verticals are blended into one result list on an item by item basis, i.e. adjacent items in the result list may come from different verticals. To generate the diverse and serendipitous results we relied on a query reformulation technique which we showed to be beneficial to produce diversified results in previous work. To conduct this evaluation we created a dedicated dataset. This dataset served as a basis for three different evaluation settings on a crowdsourcing platform, with over 300 participants. Our results show that query based diversification can be adapted to generate serendipitous results in a similar manner. Further, we discovered that both methods, interleaving and block ranking, appear to be beneficial to introduce diversity and serendipity. Though it seems that queries either benefit from one approach, or the other one, but not from both.
信息检索领域中不同知识库的组合称为联合搜索或聚合搜索。与单一来源检索相比,它有几个优点,但也带来了一些挑战。这项工作的重点是结果聚合的挑战;特别是在一个最终结果列表应该包含某种程度的多样性和偶然性的环境中。这两个概念都被证明对用户如何感知信息检索系统有影响。特别是,我们想要评估结果列表聚合的传统程序是否可以用来引入多样性和偶然性。此外,我们还研究了阻塞或交错聚合结果是否会产生更好的结果。在交叉垂直聚合搜索中,所谓的垂直搜索可能是新闻、多媒体内容或文本。块排序就是将这种异构结果结合起来的一种方法。它依赖于这样一种思想,即这些垂直方向作为几个相邻项的块组合成单个结果列表。另一种方法是交错。在这里,垂直的搜索结果被混合成一个结果列表,即结果列表中相邻的条目可能来自不同的垂直搜索。为了产生多样化和偶然的结果,我们依赖于一种查询重新表述技术,我们在以前的工作中证明了这种技术有利于产生多样化的结果。为了进行评估,我们创建了一个专用数据集。该数据集作为众包平台上三种不同评估设置的基础,参与者超过300人。我们的结果表明,基于查询的多样化可以适应以类似的方式产生偶然的结果。此外,我们发现交错和块排序这两种方法似乎都有利于引入多样性和偶然性。虽然看起来查询要么受益于一种方法,要么受益于另一种方法,但不能同时受益于两种方法。
{"title":"Evaluation of Contextualization and Diversification Approaches in Aggregated Search","authors":"Hermann Ziak, Roman Kern","doi":"10.1109/DEXA.2017.37","DOIUrl":"https://doi.org/10.1109/DEXA.2017.37","url":null,"abstract":"The combination of different knowledge bases in the field of information retrieval is called federated or aggregated search. It has several benefits over single source retrieval but poses some challenges as well. This work focuses on the challenge of result aggregation; especially in a setting where the final result list should include some level of diversity and serendipity. Both concepts have been shown to have an impact on how user perceive an information retrieval system. In particular, we want to assess if conventional procedures for result list aggregation can be utilised to introduce diversity and serendipity. Furthermore, we study whether blocking or interleaving for result aggregation yields better results. In a cross vertical aggregated search the so-called verticals could be news, multimedia content or text. Block ranking is one approach to combine such heterogeneous result. It relies on the idea that these verticals are combined into a single result list as blocks of several adjacent items. An alternative approach for this is interleaving. Here the verticals are blended into one result list on an item by item basis, i.e. adjacent items in the result list may come from different verticals. To generate the diverse and serendipitous results we relied on a query reformulation technique which we showed to be beneficial to produce diversified results in previous work. To conduct this evaluation we created a dedicated dataset. This dataset served as a basis for three different evaluation settings on a crowdsourcing platform, with over 300 participants. Our results show that query based diversification can be adapted to generate serendipitous results in a similar manner. Further, we discovered that both methods, interleaving and block ranking, appear to be beneficial to introduce diversity and serendipity. Though it seems that queries either benefit from one approach, or the other one, but not from both.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115795907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Initial Study on Radicalization Risk Factors: Towards an Assessment Software Tool 激进化风险因素初探:一种评估软件工具
Irene Gilpérez-López, J. Torregrosa, M. Barhamgi, David Camacho
Radicalization has increasingly become a transnational risk as cyber-technologies and social networks have steadily improved in the past years. DAESH has utilized the benefits of these new technologies to radicalize and recruit home-grown fighters, inviting them to join their ranks in the Islamic State territories, or inciting them to attack in their Western countries. However, this process of radicalization is not a simple task. Jihadist recruiters take advantage of certain vulnerable individuals who are better targets for radicalization. Academics have subsequently attempted to identify and examine these characteristics, which can be useful in identifying people who may be vulnerable to jihadist rhetoric. Violence risk assessment is a method which has been used by psychologists, Law Enforcement Agencies, prosecutors and other relevant actors, in order to assess the risk of an individual to commit violent acts. As terrorist or political violence is not comparable with other types of violence, specialized tools are needed to calculate the risk of radicalization or attack threat. This paper tries to gather the risk factors associated with violent extremism and jihadist radicalization from the literature and the risk assessment tools already available or under development. This paper presents some details related to the RiskTrack software tool, which is currently under development by behavioral researchers and computer engineers. The RiskTrack software tool aims to automatically assess an individual's risk of becoming radicalized. This is done by analyzing social media profiles and testing for specific factors that have been found to be associated with radicalization.
随着网络技术和社会网络在过去几年的稳步发展,激进主义日益成为一种跨国风险。DAESH利用这些新技术的优势来激进化和招募本土武装分子,邀请他们加入伊斯兰国的行列,或者煽动他们在西方国家发动袭击。然而,这个激进化的过程并不是一个简单的任务。圣战分子的招募人员利用了某些脆弱的人,这些人更容易成为激进化的目标。随后,学者们试图识别和研究这些特征,这对于识别可能容易受到圣战言论影响的人很有用。暴力风险评估是心理学家、执法机构、检察官和其他相关行为者用来评估个人实施暴力行为的风险的一种方法。由于恐怖主义或政治暴力无法与其他类型的暴力相提并论,因此需要专门的工具来计算激进化或攻击威胁的风险。本文试图从文献和已经可用或正在开发的风险评估工具中收集与暴力极端主义和圣战激进化相关的风险因素。本文介绍了与RiskTrack软件工具有关的一些细节,该软件工具目前正在由行为研究人员和计算机工程师开发。RiskTrack软件工具旨在自动评估个人变得激进的风险。这是通过分析社交媒体资料和测试与激进化有关的特定因素来完成的。
{"title":"An Initial Study on Radicalization Risk Factors: Towards an Assessment Software Tool","authors":"Irene Gilpérez-López, J. Torregrosa, M. Barhamgi, David Camacho","doi":"10.1109/DEXA.2017.19","DOIUrl":"https://doi.org/10.1109/DEXA.2017.19","url":null,"abstract":"Radicalization has increasingly become a transnational risk as cyber-technologies and social networks have steadily improved in the past years. DAESH has utilized the benefits of these new technologies to radicalize and recruit home-grown fighters, inviting them to join their ranks in the Islamic State territories, or inciting them to attack in their Western countries. However, this process of radicalization is not a simple task. Jihadist recruiters take advantage of certain vulnerable individuals who are better targets for radicalization. Academics have subsequently attempted to identify and examine these characteristics, which can be useful in identifying people who may be vulnerable to jihadist rhetoric. Violence risk assessment is a method which has been used by psychologists, Law Enforcement Agencies, prosecutors and other relevant actors, in order to assess the risk of an individual to commit violent acts. As terrorist or political violence is not comparable with other types of violence, specialized tools are needed to calculate the risk of radicalization or attack threat. This paper tries to gather the risk factors associated with violent extremism and jihadist radicalization from the literature and the risk assessment tools already available or under development. This paper presents some details related to the RiskTrack software tool, which is currently under development by behavioral researchers and computer engineers. The RiskTrack software tool aims to automatically assess an individual's risk of becoming radicalized. This is done by analyzing social media profiles and testing for specific factors that have been found to be associated with radicalization.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127898428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2017 28th International Workshop on Database and Expert Systems Applications (DEXA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1