2015 IEEE Trustcom/BigDataSE/ISPA最新文献

英文中文

Proofs of Encrypted Data Retrievability with Probabilistic and Homomorphic Message Authenticators 基于概率和同态消息验证器的加密数据可检索性证明

2015 IEEE Trustcom/BigDataSE/ISPA

Pub Date : 2015-08-20 DOI: 10.1109/Trustcom.2015.462

Dongxi Liu, J. Zic

When users store their data on a cloud, they may concern on whether their data is stored correctly and can be fully retrieved. Proofs of Retrivability (PoR) is a cryptographic concept that allows users to remotely check the integrity of their data without downloading. This check is usually done by attaching data with message authenticators that contain data integrity information. The existing PoR schemes consider only the retrievability of unencrypted data and their message authenticators are usually deterministic. In this paper, we propose a PoR scheme that is built over homomorphic encryption schemes. Our PoR scheme can prove the retrievability of homomorphically encrypted data by generating probabilistic and homomorphic message authenticators. Moreover, the homomorphically encrypted data can be processed by the cloud directly and our PoR scheme can verify the integrity of such outsourced computations over ciphertexts. A prototype of our scheme is implemented to evaluate its performance.

当用户将数据存储在云上时，他们可能会关心他们的数据是否存储正确，是否可以完全检索。可检索性证明(PoR)是一种密码学概念，允许用户在不下载的情况下远程检查其数据的完整性。这种检查通常是通过将数据附加到包含数据完整性信息的消息验证器中来完成的。现有的PoR方案只考虑未加密数据的可检索性，其消息验证器通常是确定性的。在本文中，我们提出了一种基于同态加密方案的PoR方案。我们的PoR方案通过生成概率和同态消息验证器来证明同态加密数据的可检索性。此外，同态加密的数据可以直接由云处理，我们的PoR方案可以在密文上验证这种外包计算的完整性。实现了该方案的原型，以评估其性能。

引用次数: 5

Parallel H4MSA for Multiple Sequence Alignment 并行H4MSA多序列比对

2015 IEEE Trustcom/BigDataSE/ISPA

Pub Date : 2015-08-20 DOI: 10.1109/Trustcom.2015.639

Á. Rubio-Largo, M. A. Vega-Rodríguez, D. L. González-Álvarez

Multiple Sequence Alignment (MSA) is the process of aligning three or more nucleotides/amino-acids sequences at the same time. It is an NP-complete optimization problem where the time complexity of finding an optimal alignment raises exponentially when the number of sequences to align increases. In the multiobjective version of the MSA problem, we simultaneously optimize the alignment accuracy and conservation. In this work, we present a parallel scheme for a multiobjective version of a memetic metaheuristic: Hybrid Multiobjective Memetic Metaheuristics for Multiple Sequence Alignment (H4MSA). In order to evaluate the parallel performance of H4MSA, we use several datasets with different number of sequences (up to 1000 sequences) and compare its parallel performance against other well-known parallel approaches published in the literature, such as MSAProbs, T-Coffee, Clustal O and MAFFT. On the other hand, the results reveals that parallel H4MSA is around 25 times faster than the sequential version with 32 cores.

多序列比对(Multiple Sequence Alignment, MSA)是指同时对三个或三个以上的核苷酸/氨基酸序列进行比对的过程。这是一个np完全优化问题，当需要对齐的序列数量增加时，寻找最优对齐的时间复杂度呈指数增长。在多目标版本的MSA问题中，我们同时优化了对准精度和守恒。在这项工作中，我们提出了一个模因元启发式的多目标版本的并行方案:混合多目标模因元启发式多序列比对(H4MSA)。为了评估H4MSA的并行性能，我们使用了几个具有不同序列数(最多1000个序列)的数据集，并将其并行性能与文献中发表的其他知名并行方法(如MSAProbs, T-Coffee, Clustal O和MAFFT)进行了比较。另一方面，结果显示并行H4MSA比32核的顺序版本快25倍左右。

引用次数: 4

TrustTokenF: A Generic Security Framework for Mobile Two-Factor Authentication Using TrustZone TrustTokenF:使用TrustZone的移动双因素身份验证的通用安全框架

2015 IEEE Trustcom/BigDataSE/ISPA

Pub Date : 2015-08-20 DOI: 10.1109/Trustcom.2015.355

Yingjun Zhang, Shijun Zhao, Yu Qin, Bo Yang, D. Feng

We give a detail analysis of the security issues when using mobile devices as a substitution of dedicated hardware tokens in two-factor authentication (2FA) schemes and propose TrustTokenF, a generic security framework for mobile 2FA schemes, which provides comparable security assurance to dedicated hardware tokens, and is more flexible for token management. We first illustrate how to leverage the Trusted Execution Environment(TEE) based on ARM TrustZone to provide essential security features for mobile 2FA applications, i.e., runtime isolated execution and trusted user interaction, which resist software attackers who even compromise the entire mobile OS. We also use the SRAM Physical Unclonable Functions (PUFs) to provide persistent secure storage for the authentication secrets, which achieves both high-level security and low cost. Based on these security features, we design a series of secure protocols for token deployment, migration and device key updating. We also introduce TPM2.0 policy-based authorization mechanism to enhance the security of the interface from outside world into the trusted tokens. Finally, we implement the prototype system on real TrustZone-enabled hardware. The experiment results show that TrustTokenF is secure, flexible, economical and efficient for mobile 2FA applications.

我们详细分析了在双因素身份验证(2FA)方案中使用移动设备替代专用硬件令牌时的安全问题，并提出了TrustTokenF，这是一种用于移动2FA方案的通用安全框架，它提供了与专用硬件令牌相当的安全保证，并且对于令牌管理更加灵活。我们首先说明如何利用基于ARM TrustZone的可信执行环境(TEE)为移动2FA应用程序提供基本的安全特性，即运行时隔离执行和可信用户交互，从而抵御甚至危及整个移动操作系统的软件攻击者。我们还使用SRAM物理不可克隆功能(puf)为认证秘密提供持久的安全存储，实现了高安全性和低成本。基于这些安全特性，我们设计了一系列用于令牌部署、迁移和设备密钥更新的安全协议。我们还引入了基于TPM2.0策略的授权机制，以增强从外部进入可信令牌的接口的安全性。最后，我们在启用trustzone的真实硬件上实现了原型系统。实验结果表明，TrustTokenF在移动2FA应用中具有安全、灵活、经济、高效的特点。

{"title":"TrustTokenF: A Generic Security Framework for Mobile Two-Factor Authentication Using TrustZone","authors":"Yingjun Zhang, Shijun Zhao, Yu Qin, Bo Yang, D. Feng","doi":"10.1109/Trustcom.2015.355","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.355","url":null,"abstract":"We give a detail analysis of the security issues when using mobile devices as a substitution of dedicated hardware tokens in two-factor authentication (2FA) schemes and propose TrustTokenF, a generic security framework for mobile 2FA schemes, which provides comparable security assurance to dedicated hardware tokens, and is more flexible for token management. We first illustrate how to leverage the Trusted Execution Environment(TEE) based on ARM TrustZone to provide essential security features for mobile 2FA applications, i.e., runtime isolated execution and trusted user interaction, which resist software attackers who even compromise the entire mobile OS. We also use the SRAM Physical Unclonable Functions (PUFs) to provide persistent secure storage for the authentication secrets, which achieves both high-level security and low cost. Based on these security features, we design a series of secure protocols for token deployment, migration and device key updating. We also introduce TPM2.0 policy-based authorization mechanism to enhance the security of the interface from outside world into the trusted tokens. Finally, we implement the prototype system on real TrustZone-enabled hardware. The experiment results show that TrustTokenF is secure, flexible, economical and efficient for mobile 2FA applications.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127828774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Virtualization and Cyber Security: Arming Future Security Practitioners 虚拟化和网络安全:武装未来的安全从业者

2015 IEEE Trustcom/BigDataSE/ISPA

Pub Date : 2015-08-20 DOI: 10.1109/Trustcom.2015.537

Midhun Babu Tharayanil, G. Whitney, Mahdi Aiash, Chafika Benzaid

In the past five years cybercrime has grown to become one of the most significant threats to the safety of the nation and its economy. The government's call to arms has been eagerly accepted by business enterprises and academia. But training cyber security professionals raises a unique set of challenges. Cost, space, time and scalability are among the issues identified and possible solutions proposed. As a cyber-security professionals, we have realized the importance of practical experience which can be hard to deliver in a lecture based environment. The primary aim of this project is to evaluate and recommend a platform for Virtual handson Labs which may be used to provide a secure environment for cyber security students to evaluate and receive hands-on experience on possible threats and countermeasures. There are similar labs setup in different universities across the world but we have not been able to find any studies evaluating the virtualization platforms for their merit in order to run a virtual lab. Hence we study three of the most popular virtualization platforms and recommendations are provided to guide anyone who desires to setup such a lab.

在过去的五年中，网络犯罪已经发展成为对国家安全和经济的最重大威胁之一。商业企业和学术界热切地接受了政府的武装号召。但培训网络安全专业人员带来了一系列独特的挑战。成本、空间、时间和可扩展性是确定的问题和可能的解决方案。作为一名网络安全专业人士，我们已经意识到实践经验的重要性，而在讲座式的环境中，实践经验是很难传授的。该项目的主要目的是评估和推荐一个虚拟动手实验室平台，该平台可用于为网络安全学生提供一个安全的环境，以评估和获得关于可能的威胁和对策的实践经验。世界各地的大学都建立了类似的实验室，但我们还没有找到任何研究来评估虚拟化平台的优点，以便运行一个虚拟实验室。因此，我们研究了三个最流行的虚拟化平台，并提供了建议，以指导希望建立这样一个实验室的任何人。

{"title":"Virtualization and Cyber Security: Arming Future Security Practitioners","authors":"Midhun Babu Tharayanil, G. Whitney, Mahdi Aiash, Chafika Benzaid","doi":"10.1109/Trustcom.2015.537","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.537","url":null,"abstract":"In the past five years cybercrime has grown to become one of the most significant threats to the safety of the nation and its economy. The government's call to arms has been eagerly accepted by business enterprises and academia. But training cyber security professionals raises a unique set of challenges. Cost, space, time and scalability are among the issues identified and possible solutions proposed. As a cyber-security professionals, we have realized the importance of practical experience which can be hard to deliver in a lecture based environment. The primary aim of this project is to evaluate and recommend a platform for Virtual handson Labs which may be used to provide a secure environment for cyber security students to evaluate and receive hands-on experience on possible threats and countermeasures. There are similar labs setup in different universities across the world but we have not been able to find any studies evaluating the virtualization platforms for their merit in order to run a virtual lab. Hence we study three of the most popular virtualization platforms and recommendations are provided to guide anyone who desires to setup such a lab.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133180862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Trusted Tamper-Evident Data Provenance 可信的可篡改数据来源

2015 IEEE Trustcom/BigDataSE/ISPA

Pub Date : 2015-08-20 DOI: 10.1109/Trustcom.2015.430

M. Taha, Sivadon Chaisiri, R. Ko

Data provenance, the origin and derivation history of data, is commonly used for security auditing, forensics and data analysis. While provenance loggers provide evidence of data changes, the integrity of the provenance logs is also critical for the integrity of the forensics process. However, to our best knowledge, few solutions are able to fully satisfy this trust requirement. In this paper, we propose a framework to enable tamper-evidence and preserve the confidentiality and integrity of data provenance using the Trusted Platform Module (TPM). Our framework also stores provenance logs in trusted and backup servers to guarantee the availability of data provenance. Tampered provenance logs can be discovered and consequently recovered by retrieving the original logs from the servers. Leveraging on TPM's technical capability, our framework guarantees data provenance collected to be admissible, complete, and confidential. More importantly, this framework can be applied to capture tampering evidence in large-scale cloud environments at system, network, and application granularities. We applied our framework to provide tamper-evidence for Progger, a cloud-based, kernel-space logger. Our results demonstrate the ability to conduct remote attestation of Progger logs' integrity, and uphold the completeness, confidential and admissible requirements.

数据来源，即数据的起源和派生历史，通常用于安全审计、取证和数据分析。虽然溯源日志记录器提供数据更改的证据，但溯源日志的完整性对于取证过程的完整性也至关重要。然而，据我们所知，很少有解决方案能够完全满足这种信任要求。在本文中，我们提出了一个框架来启用篡改证据，并使用可信平台模块(TPM)保持数据来源的机密性和完整性。我们的框架还将来源日志存储在可信和备份服务器中，以保证数据来源的可用性。通过从服务器检索原始日志，可以发现被篡改的来源日志，并随后恢复这些日志。利用TPM的技术能力，我们的框架保证收集的数据来源是可接受的、完整的和机密的。更重要的是，该框架可以应用于在系统、网络和应用程序粒度的大规模云环境中捕获篡改证据。我们应用我们的框架为Progger(一个基于云的内核空间记录器)提供篡改证据。我们的结果表明，能够对Progger日志的完整性进行远程认证，并维护完整性、机密性和可接受的要求。

{"title":"Trusted Tamper-Evident Data Provenance","authors":"M. Taha, Sivadon Chaisiri, R. Ko","doi":"10.1109/Trustcom.2015.430","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.430","url":null,"abstract":"Data provenance, the origin and derivation history of data, is commonly used for security auditing, forensics and data analysis. While provenance loggers provide evidence of data changes, the integrity of the provenance logs is also critical for the integrity of the forensics process. However, to our best knowledge, few solutions are able to fully satisfy this trust requirement. In this paper, we propose a framework to enable tamper-evidence and preserve the confidentiality and integrity of data provenance using the Trusted Platform Module (TPM). Our framework also stores provenance logs in trusted and backup servers to guarantee the availability of data provenance. Tampered provenance logs can be discovered and consequently recovered by retrieving the original logs from the servers. Leveraging on TPM's technical capability, our framework guarantees data provenance collected to be admissible, complete, and confidential. More importantly, this framework can be applied to capture tampering evidence in large-scale cloud environments at system, network, and application granularities. We applied our framework to provide tamper-evidence for Progger, a cloud-based, kernel-space logger. Our results demonstrate the ability to conduct remote attestation of Progger logs' integrity, and uphold the completeness, confidential and admissible requirements.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133305600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Formal Modeling and Verification of Opportunity-enabled Risk Management 机会风险管理的正式建模和验证

2015 IEEE Trustcom/BigDataSE/ISPA

Pub Date : 2015-08-20 DOI: 10.1109/Trustcom.2015.434

A. Aldini, J. Seigneur, C. B. Lafuente, Xavier Titi, Jonathan Guislain

With the advent of the Bring-Your-Own-Device (BYOD) trend, mobile work is achieving a widespread diffusion that challenges the traditional view of security standard and risk management. A recently proposed model, called opportunity-enabled risk management (OPPRIM), aims at balancing the analysis of the major threats that arise in the BYOD setting with the analysis of the potential increased opportunities emerging in such an environment, by combining mechanisms of risk estimation with trust and threat metrics. Firstly, this paper provides a logic-based formalization of the policy and metric specification paradigm of OPPRIM. Secondly, we verify the OPPRIM model with respect to the socio-economic perspective. More precisely, this is validated formally by employing tool-supported quantitative model checking techniques.

随着BYOD (Bring-Your-Own-Device)趋势的出现，移动办公正在实现广泛的扩散，这挑战了传统的安全标准和风险管理观点。最近提出了一个模型，称为机会驱动风险管理(OPPRIM)，旨在通过将风险评估机制与信任和威胁指标相结合，平衡对BYOD环境中出现的主要威胁的分析与对这种环境中出现的潜在增加机会的分析。首先，本文提供了基于逻辑的OPPRIM策略和度量规范范式的形式化。其次，我们从社会经济角度验证OPPRIM模型。更准确地说，这是通过使用工具支持的定量模型检查技术来正式验证的。

引用次数: 4

M-PCA Binary Embedding for Approximate Nearest Neighbor Search 近似最近邻搜索的M-PCA二值嵌入

2015 IEEE Trustcom/BigDataSE/ISPA

Pub Date : 2015-08-20 DOI: 10.1109/Trustcom.2015.554

Ezgi C. Ozan, S. Kiranyaz, M. Gabbouj

Principal Component Analysis (PCA) is widely used within binary embedding methods for approximate nearest neighbor search and has proven to have a significant effect on the performance. Current methods aim to represent the whole data using a single PCA however, considering the Gaussian distribution requirements of PCA, this representation is not appropriate. In this study we propose using Multiple PCA (M-PCA) transformations to represent the whole data and show that it increases the performance significantly compared to methods using a single PCA.

主成分分析(PCA)在二值嵌入方法中广泛应用于近似最近邻搜索，并已被证明对性能有显著影响。目前的方法旨在使用单个主成分分析来表示整个数据，但考虑到主成分分析的高斯分布要求，这种表示并不合适。在本研究中，我们提出使用多个主成分(M-PCA)变换来表示整个数据，并表明与使用单个主成分的方法相比，它显着提高了性能。

引用次数: 6

Achieving Lightweight and Secure Access Control in Multi-authority Cloud 在多授权云中实现轻量级和安全的访问控制

2015 IEEE Trustcom/BigDataSE/ISPA

Pub Date : 2015-08-20 DOI: 10.1109/Trustcom.2015.407

Yanchao Wang, Fenghua Li, Jinbo Xiong, Ben Niu, Fangfang Shan

Cloud computing has become a vital part of our daily life. While enjoying the provided convenience, users may lose control on their personal data since the ownership of the data is separated from the administration of them, this concern becomes more serious in the multi-authority cloud environment. In this paper, we propose a novel access control scheme, termed easy-ACCESS, which achieves lightweight and secure data access control for resource-limited devices in multi-authority cloud. Our easy-ACCESS simultaneously enjoys the following properties: i) shorter size of the decryption keys, which increase linearly with the number of authorities rather than the number of attributes, ii) lower computation cost, we delegate almost all of the decryption cost to the decryption service provider, and iii) provably secure, the proposed scheme is provably secure under the selective security model. Thoroughly theoretical analysis and performance evaluation indicate the effectiveness and efficiency of our proposed easy-ACCESS.

云计算已经成为我们日常生活的重要组成部分。在享受提供的便利的同时，由于数据的所有权与管理分离，用户可能会失去对个人数据的控制，这种担忧在多权限云环境中变得更加严重。本文提出了一种新的访问控制方案easy-ACCESS，实现了多授权云中资源受限设备的轻量级、安全的数据访问控制。我们的easy-ACCESS同时具有以下特性:1)解密密钥的大小更短，解密密钥的大小随授权数而不是属性数线性增加;2)计算成本更低，我们将几乎所有的解密成本委托给解密服务提供商;3)可证明的安全性，我们提出的方案在选择性安全模型下是可证明的安全的。充分的理论分析和性能评估表明了我们所提出的easy-ACCESS的有效性和高效性。

引用次数: 9

A Sybil Attack Detection Scheme for a Centralized Clustering-Based Hierarchical Network 一种集中式聚类分层网络的Sybil攻击检测方案

2015 IEEE Trustcom/BigDataSE/ISPA

Pub Date : 2015-08-20 DOI: 10.1109/Trustcom.2015.390

M. Jan, P. Nanda, Xiangjian He, R. Liu

Wireless Sensor Networks (WSNs) have experienced phenomenal growth over the past decade. They are typically deployed in remote and hostile environments for monitoring applications and data collection. Miniature sensor nodes collaborate with each other to provide information on an unprecedented temporal and spatial scale. The resource-constrained nature of sensor nodes along with human-inaccessible terrains poses various security challenges to these networks at different layers. In this paper, we propose a novel detection scheme for Sybil attack in a centralized clustering-based hierarchical network. Sybil nodes are detected prior to cluster formation to prevent their forged identities from participating in cluster head selection. Only legitimate nodes are elected as cluster heads to enhance utilization of the resources. The proposed scheme requires collaboration of any two high energy nodes to analyze received signal strengths of neighboring nodes. The simulation results show that our proposed scheme significantly improves network lifetime in comparison with existing clustering-based hierarchical routing protocols.

无线传感器网络(wsn)在过去十年中经历了惊人的增长。它们通常部署在远程和恶劣的环境中，用于监视应用程序和数据收集。微型传感器节点相互协作，在前所未有的时间和空间尺度上提供信息。传感器节点的资源约束性质以及人类无法进入的地形对这些网络在不同层次上的安全提出了各种挑战。本文提出了一种新的基于集中式聚类的分层网络中Sybil攻击检测方案。在集群形成之前检测Sybil节点，以防止其伪造身份参与簇头选择。只有合法的节点被选为集群头，以提高资源的利用率。该方案要求任意两个高能节点协同分析相邻节点接收到的信号强度。仿真结果表明，与现有的基于聚类的分层路由协议相比，该方案显著提高了网络生存期。

引用次数: 104

Compounds Activity Prediction in Large Imbalanced Datasets with Substructural Relations Fingerprint and EEM 基于亚结构关系指纹图谱和EEM的大型不平衡数据集化合物活性预测

2015 IEEE Trustcom/BigDataSE/ISPA

Pub Date : 2015-08-20 DOI: 10.1109/Trustcom.2015.581

Wojciech M. Czarnecki, Krzysztof Rataj

Modern drug design procedures involve the process of virtual screening, a highly efficient filtering step used for maximizing the efficiency of the preselection of compounds which are valuable drug candidates. Recent advances in introduction of machine learning models to this process can lead to significant increase in the overall quality of the drug designing pipeline. Unfortunately, for many proteins it is still extremely hard to come up with a valid statistical model. It is a consequence of huge classes disproportion (even 1000:1), large datasets (over 100,000 of samples) and restricted data representation (mostly high-dimensional, sparse, binary vectors). In this paper, we try to tackle this problem through three important innovations. First we represent compounds with 2-dimensional, graph representation. Second, we show how one can provide extremely fast method for measuring similarity of such data. Finally, we use the Extreme Entropy Machine which shows increase in balanced accuracy over Extreme Learning Machines, Support Vector Machines, one-class Support Vector Machines as well as Random Forest. Proposed pipeline brings significantly better results than all considered alternative, state-of-the-art approaches. We introduce some important novel elements and show why they lead to better model. Despite this, it should still be considered as a proof of concept and further investigations in the field are needed.

现代药物设计程序涉及虚拟筛选过程，这是一种高效的过滤步骤，用于最大化预选有价值的候选药物化合物的效率。在这一过程中引入机器学习模型的最新进展可以显著提高药物设计管道的整体质量。不幸的是，对于许多蛋白质来说，仍然很难提出一个有效的统计模型。这是巨大的类不比例(甚至1000:1)，大型数据集(超过100,000个样本)和有限的数据表示(主要是高维，稀疏，二进制向量)的结果。在本文中，我们试图通过三个重要的创新来解决这个问题。首先，我们用二维图形表示化合物。其次，我们展示了如何提供一种非常快速的方法来测量这些数据的相似性。最后，我们使用了极端熵机，它比极端学习机、支持向量机、一类支持向量机和随机森林的平衡精度更高。拟议的管道比所有考虑过的最先进的替代方法带来了明显更好的结果。我们介绍了一些重要的新元素，并说明了为什么它们会导致更好的模型。尽管如此，它仍应被视为概念的证明，需要在实地进行进一步的调查。

{"title":"Compounds Activity Prediction in Large Imbalanced Datasets with Substructural Relations Fingerprint and EEM","authors":"Wojciech M. Czarnecki, Krzysztof Rataj","doi":"10.1109/Trustcom.2015.581","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.581","url":null,"abstract":"Modern drug design procedures involve the process of virtual screening, a highly efficient filtering step used for maximizing the efficiency of the preselection of compounds which are valuable drug candidates. Recent advances in introduction of machine learning models to this process can lead to significant increase in the overall quality of the drug designing pipeline. Unfortunately, for many proteins it is still extremely hard to come up with a valid statistical model. It is a consequence of huge classes disproportion (even 1000:1), large datasets (over 100,000 of samples) and restricted data representation (mostly high-dimensional, sparse, binary vectors). In this paper, we try to tackle this problem through three important innovations. First we represent compounds with 2-dimensional, graph representation. Second, we show how one can provide extremely fast method for measuring similarity of such data. Finally, we use the Extreme Entropy Machine which shows increase in balanced accuracy over Extreme Learning Machines, Support Vector Machines, one-class Support Vector Machines as well as Random Forest. Proposed pipeline brings significantly better results than all considered alternative, state-of-the-art approaches. We introduce some important novel elements and show why they lead to better model. Despite this, it should still be considered as a proof of concept and further investigations in the field are needed.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122133182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 IEEE Trustcom/BigDataSE/ISPA

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀