Pub Date : 2015-08-20DOI: 10.1109/Trustcom.2015.462
Dongxi Liu, J. Zic
When users store their data on a cloud, they may concern on whether their data is stored correctly and can be fully retrieved. Proofs of Retrivability (PoR) is a cryptographic concept that allows users to remotely check the integrity of their data without downloading. This check is usually done by attaching data with message authenticators that contain data integrity information. The existing PoR schemes consider only the retrievability of unencrypted data and their message authenticators are usually deterministic. In this paper, we propose a PoR scheme that is built over homomorphic encryption schemes. Our PoR scheme can prove the retrievability of homomorphically encrypted data by generating probabilistic and homomorphic message authenticators. Moreover, the homomorphically encrypted data can be processed by the cloud directly and our PoR scheme can verify the integrity of such outsourced computations over ciphertexts. A prototype of our scheme is implemented to evaluate its performance.
{"title":"Proofs of Encrypted Data Retrievability with Probabilistic and Homomorphic Message Authenticators","authors":"Dongxi Liu, J. Zic","doi":"10.1109/Trustcom.2015.462","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.462","url":null,"abstract":"When users store their data on a cloud, they may concern on whether their data is stored correctly and can be fully retrieved. Proofs of Retrivability (PoR) is a cryptographic concept that allows users to remotely check the integrity of their data without downloading. This check is usually done by attaching data with message authenticators that contain data integrity information. The existing PoR schemes consider only the retrievability of unencrypted data and their message authenticators are usually deterministic. In this paper, we propose a PoR scheme that is built over homomorphic encryption schemes. Our PoR scheme can prove the retrievability of homomorphically encrypted data by generating probabilistic and homomorphic message authenticators. Moreover, the homomorphically encrypted data can be processed by the cloud directly and our PoR scheme can verify the integrity of such outsourced computations over ciphertexts. A prototype of our scheme is implemented to evaluate its performance.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127685230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-20DOI: 10.1109/Trustcom.2015.639
Á. Rubio-Largo, M. A. Vega-Rodríguez, D. L. González-Álvarez
Multiple Sequence Alignment (MSA) is the process of aligning three or more nucleotides/amino-acids sequences at the same time. It is an NP-complete optimization problem where the time complexity of finding an optimal alignment raises exponentially when the number of sequences to align increases. In the multiobjective version of the MSA problem, we simultaneously optimize the alignment accuracy and conservation. In this work, we present a parallel scheme for a multiobjective version of a memetic metaheuristic: Hybrid Multiobjective Memetic Metaheuristics for Multiple Sequence Alignment (H4MSA). In order to evaluate the parallel performance of H4MSA, we use several datasets with different number of sequences (up to 1000 sequences) and compare its parallel performance against other well-known parallel approaches published in the literature, such as MSAProbs, T-Coffee, Clustal O and MAFFT. On the other hand, the results reveals that parallel H4MSA is around 25 times faster than the sequential version with 32 cores.
{"title":"Parallel H4MSA for Multiple Sequence Alignment","authors":"Á. Rubio-Largo, M. A. Vega-Rodríguez, D. L. González-Álvarez","doi":"10.1109/Trustcom.2015.639","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.639","url":null,"abstract":"Multiple Sequence Alignment (MSA) is the process of aligning three or more nucleotides/amino-acids sequences at the same time. It is an NP-complete optimization problem where the time complexity of finding an optimal alignment raises exponentially when the number of sequences to align increases. In the multiobjective version of the MSA problem, we simultaneously optimize the alignment accuracy and conservation. In this work, we present a parallel scheme for a multiobjective version of a memetic metaheuristic: Hybrid Multiobjective Memetic Metaheuristics for Multiple Sequence Alignment (H4MSA). In order to evaluate the parallel performance of H4MSA, we use several datasets with different number of sequences (up to 1000 sequences) and compare its parallel performance against other well-known parallel approaches published in the literature, such as MSAProbs, T-Coffee, Clustal O and MAFFT. On the other hand, the results reveals that parallel H4MSA is around 25 times faster than the sequential version with 32 cores.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127798601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-20DOI: 10.1109/Trustcom.2015.355
Yingjun Zhang, Shijun Zhao, Yu Qin, Bo Yang, D. Feng
We give a detail analysis of the security issues when using mobile devices as a substitution of dedicated hardware tokens in two-factor authentication (2FA) schemes and propose TrustTokenF, a generic security framework for mobile 2FA schemes, which provides comparable security assurance to dedicated hardware tokens, and is more flexible for token management. We first illustrate how to leverage the Trusted Execution Environment(TEE) based on ARM TrustZone to provide essential security features for mobile 2FA applications, i.e., runtime isolated execution and trusted user interaction, which resist software attackers who even compromise the entire mobile OS. We also use the SRAM Physical Unclonable Functions (PUFs) to provide persistent secure storage for the authentication secrets, which achieves both high-level security and low cost. Based on these security features, we design a series of secure protocols for token deployment, migration and device key updating. We also introduce TPM2.0 policy-based authorization mechanism to enhance the security of the interface from outside world into the trusted tokens. Finally, we implement the prototype system on real TrustZone-enabled hardware. The experiment results show that TrustTokenF is secure, flexible, economical and efficient for mobile 2FA applications.
{"title":"TrustTokenF: A Generic Security Framework for Mobile Two-Factor Authentication Using TrustZone","authors":"Yingjun Zhang, Shijun Zhao, Yu Qin, Bo Yang, D. Feng","doi":"10.1109/Trustcom.2015.355","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.355","url":null,"abstract":"We give a detail analysis of the security issues when using mobile devices as a substitution of dedicated hardware tokens in two-factor authentication (2FA) schemes and propose TrustTokenF, a generic security framework for mobile 2FA schemes, which provides comparable security assurance to dedicated hardware tokens, and is more flexible for token management. We first illustrate how to leverage the Trusted Execution Environment(TEE) based on ARM TrustZone to provide essential security features for mobile 2FA applications, i.e., runtime isolated execution and trusted user interaction, which resist software attackers who even compromise the entire mobile OS. We also use the SRAM Physical Unclonable Functions (PUFs) to provide persistent secure storage for the authentication secrets, which achieves both high-level security and low cost. Based on these security features, we design a series of secure protocols for token deployment, migration and device key updating. We also introduce TPM2.0 policy-based authorization mechanism to enhance the security of the interface from outside world into the trusted tokens. Finally, we implement the prototype system on real TrustZone-enabled hardware. The experiment results show that TrustTokenF is secure, flexible, economical and efficient for mobile 2FA applications.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127828774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-20DOI: 10.1109/Trustcom.2015.537
Midhun Babu Tharayanil, G. Whitney, Mahdi Aiash, Chafika Benzaid
In the past five years cybercrime has grown to become one of the most significant threats to the safety of the nation and its economy. The government's call to arms has been eagerly accepted by business enterprises and academia. But training cyber security professionals raises a unique set of challenges. Cost, space, time and scalability are among the issues identified and possible solutions proposed. As a cyber-security professionals, we have realized the importance of practical experience which can be hard to deliver in a lecture based environment. The primary aim of this project is to evaluate and recommend a platform for Virtual handson Labs which may be used to provide a secure environment for cyber security students to evaluate and receive hands-on experience on possible threats and countermeasures. There are similar labs setup in different universities across the world but we have not been able to find any studies evaluating the virtualization platforms for their merit in order to run a virtual lab. Hence we study three of the most popular virtualization platforms and recommendations are provided to guide anyone who desires to setup such a lab.
{"title":"Virtualization and Cyber Security: Arming Future Security Practitioners","authors":"Midhun Babu Tharayanil, G. Whitney, Mahdi Aiash, Chafika Benzaid","doi":"10.1109/Trustcom.2015.537","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.537","url":null,"abstract":"In the past five years cybercrime has grown to become one of the most significant threats to the safety of the nation and its economy. The government's call to arms has been eagerly accepted by business enterprises and academia. But training cyber security professionals raises a unique set of challenges. Cost, space, time and scalability are among the issues identified and possible solutions proposed. As a cyber-security professionals, we have realized the importance of practical experience which can be hard to deliver in a lecture based environment. The primary aim of this project is to evaluate and recommend a platform for Virtual handson Labs which may be used to provide a secure environment for cyber security students to evaluate and receive hands-on experience on possible threats and countermeasures. There are similar labs setup in different universities across the world but we have not been able to find any studies evaluating the virtualization platforms for their merit in order to run a virtual lab. Hence we study three of the most popular virtualization platforms and recommendations are provided to guide anyone who desires to setup such a lab.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133180862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-20DOI: 10.1109/Trustcom.2015.430
M. Taha, Sivadon Chaisiri, R. Ko
Data provenance, the origin and derivation history of data, is commonly used for security auditing, forensics and data analysis. While provenance loggers provide evidence of data changes, the integrity of the provenance logs is also critical for the integrity of the forensics process. However, to our best knowledge, few solutions are able to fully satisfy this trust requirement. In this paper, we propose a framework to enable tamper-evidence and preserve the confidentiality and integrity of data provenance using the Trusted Platform Module (TPM). Our framework also stores provenance logs in trusted and backup servers to guarantee the availability of data provenance. Tampered provenance logs can be discovered and consequently recovered by retrieving the original logs from the servers. Leveraging on TPM's technical capability, our framework guarantees data provenance collected to be admissible, complete, and confidential. More importantly, this framework can be applied to capture tampering evidence in large-scale cloud environments at system, network, and application granularities. We applied our framework to provide tamper-evidence for Progger, a cloud-based, kernel-space logger. Our results demonstrate the ability to conduct remote attestation of Progger logs' integrity, and uphold the completeness, confidential and admissible requirements.
{"title":"Trusted Tamper-Evident Data Provenance","authors":"M. Taha, Sivadon Chaisiri, R. Ko","doi":"10.1109/Trustcom.2015.430","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.430","url":null,"abstract":"Data provenance, the origin and derivation history of data, is commonly used for security auditing, forensics and data analysis. While provenance loggers provide evidence of data changes, the integrity of the provenance logs is also critical for the integrity of the forensics process. However, to our best knowledge, few solutions are able to fully satisfy this trust requirement. In this paper, we propose a framework to enable tamper-evidence and preserve the confidentiality and integrity of data provenance using the Trusted Platform Module (TPM). Our framework also stores provenance logs in trusted and backup servers to guarantee the availability of data provenance. Tampered provenance logs can be discovered and consequently recovered by retrieving the original logs from the servers. Leveraging on TPM's technical capability, our framework guarantees data provenance collected to be admissible, complete, and confidential. More importantly, this framework can be applied to capture tampering evidence in large-scale cloud environments at system, network, and application granularities. We applied our framework to provide tamper-evidence for Progger, a cloud-based, kernel-space logger. Our results demonstrate the ability to conduct remote attestation of Progger logs' integrity, and uphold the completeness, confidential and admissible requirements.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133305600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-20DOI: 10.1109/Trustcom.2015.434
A. Aldini, J. Seigneur, C. B. Lafuente, Xavier Titi, Jonathan Guislain
With the advent of the Bring-Your-Own-Device (BYOD) trend, mobile work is achieving a widespread diffusion that challenges the traditional view of security standard and risk management. A recently proposed model, called opportunity-enabled risk management (OPPRIM), aims at balancing the analysis of the major threats that arise in the BYOD setting with the analysis of the potential increased opportunities emerging in such an environment, by combining mechanisms of risk estimation with trust and threat metrics. Firstly, this paper provides a logic-based formalization of the policy and metric specification paradigm of OPPRIM. Secondly, we verify the OPPRIM model with respect to the socio-economic perspective. More precisely, this is validated formally by employing tool-supported quantitative model checking techniques.
{"title":"Formal Modeling and Verification of Opportunity-enabled Risk Management","authors":"A. Aldini, J. Seigneur, C. B. Lafuente, Xavier Titi, Jonathan Guislain","doi":"10.1109/Trustcom.2015.434","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.434","url":null,"abstract":"With the advent of the Bring-Your-Own-Device (BYOD) trend, mobile work is achieving a widespread diffusion that challenges the traditional view of security standard and risk management. A recently proposed model, called opportunity-enabled risk management (OPPRIM), aims at balancing the analysis of the major threats that arise in the BYOD setting with the analysis of the potential increased opportunities emerging in such an environment, by combining mechanisms of risk estimation with trust and threat metrics. Firstly, this paper provides a logic-based formalization of the policy and metric specification paradigm of OPPRIM. Secondly, we verify the OPPRIM model with respect to the socio-economic perspective. More precisely, this is validated formally by employing tool-supported quantitative model checking techniques.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115712609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-20DOI: 10.1109/Trustcom.2015.554
Ezgi C. Ozan, S. Kiranyaz, M. Gabbouj
Principal Component Analysis (PCA) is widely used within binary embedding methods for approximate nearest neighbor search and has proven to have a significant effect on the performance. Current methods aim to represent the whole data using a single PCA however, considering the Gaussian distribution requirements of PCA, this representation is not appropriate. In this study we propose using Multiple PCA (M-PCA) transformations to represent the whole data and show that it increases the performance significantly compared to methods using a single PCA.
{"title":"M-PCA Binary Embedding for Approximate Nearest Neighbor Search","authors":"Ezgi C. Ozan, S. Kiranyaz, M. Gabbouj","doi":"10.1109/Trustcom.2015.554","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.554","url":null,"abstract":"Principal Component Analysis (PCA) is widely used within binary embedding methods for approximate nearest neighbor search and has proven to have a significant effect on the performance. Current methods aim to represent the whole data using a single PCA however, considering the Gaussian distribution requirements of PCA, this representation is not appropriate. In this study we propose using Multiple PCA (M-PCA) transformations to represent the whole data and show that it increases the performance significantly compared to methods using a single PCA.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114459588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-20DOI: 10.1109/Trustcom.2015.407
Yanchao Wang, Fenghua Li, Jinbo Xiong, Ben Niu, Fangfang Shan
Cloud computing has become a vital part of our daily life. While enjoying the provided convenience, users may lose control on their personal data since the ownership of the data is separated from the administration of them, this concern becomes more serious in the multi-authority cloud environment. In this paper, we propose a novel access control scheme, termed easy-ACCESS, which achieves lightweight and secure data access control for resource-limited devices in multi-authority cloud. Our easy-ACCESS simultaneously enjoys the following properties: i) shorter size of the decryption keys, which increase linearly with the number of authorities rather than the number of attributes, ii) lower computation cost, we delegate almost all of the decryption cost to the decryption service provider, and iii) provably secure, the proposed scheme is provably secure under the selective security model. Thoroughly theoretical analysis and performance evaluation indicate the effectiveness and efficiency of our proposed easy-ACCESS.
{"title":"Achieving Lightweight and Secure Access Control in Multi-authority Cloud","authors":"Yanchao Wang, Fenghua Li, Jinbo Xiong, Ben Niu, Fangfang Shan","doi":"10.1109/Trustcom.2015.407","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.407","url":null,"abstract":"Cloud computing has become a vital part of our daily life. While enjoying the provided convenience, users may lose control on their personal data since the ownership of the data is separated from the administration of them, this concern becomes more serious in the multi-authority cloud environment. In this paper, we propose a novel access control scheme, termed easy-ACCESS, which achieves lightweight and secure data access control for resource-limited devices in multi-authority cloud. Our easy-ACCESS simultaneously enjoys the following properties: i) shorter size of the decryption keys, which increase linearly with the number of authorities rather than the number of attributes, ii) lower computation cost, we delegate almost all of the decryption cost to the decryption service provider, and iii) provably secure, the proposed scheme is provably secure under the selective security model. Thoroughly theoretical analysis and performance evaluation indicate the effectiveness and efficiency of our proposed easy-ACCESS.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116248326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-20DOI: 10.1109/Trustcom.2015.390
M. Jan, P. Nanda, Xiangjian He, R. Liu
Wireless Sensor Networks (WSNs) have experienced phenomenal growth over the past decade. They are typically deployed in remote and hostile environments for monitoring applications and data collection. Miniature sensor nodes collaborate with each other to provide information on an unprecedented temporal and spatial scale. The resource-constrained nature of sensor nodes along with human-inaccessible terrains poses various security challenges to these networks at different layers. In this paper, we propose a novel detection scheme for Sybil attack in a centralized clustering-based hierarchical network. Sybil nodes are detected prior to cluster formation to prevent their forged identities from participating in cluster head selection. Only legitimate nodes are elected as cluster heads to enhance utilization of the resources. The proposed scheme requires collaboration of any two high energy nodes to analyze received signal strengths of neighboring nodes. The simulation results show that our proposed scheme significantly improves network lifetime in comparison with existing clustering-based hierarchical routing protocols.
{"title":"A Sybil Attack Detection Scheme for a Centralized Clustering-Based Hierarchical Network","authors":"M. Jan, P. Nanda, Xiangjian He, R. Liu","doi":"10.1109/Trustcom.2015.390","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.390","url":null,"abstract":"Wireless Sensor Networks (WSNs) have experienced phenomenal growth over the past decade. They are typically deployed in remote and hostile environments for monitoring applications and data collection. Miniature sensor nodes collaborate with each other to provide information on an unprecedented temporal and spatial scale. The resource-constrained nature of sensor nodes along with human-inaccessible terrains poses various security challenges to these networks at different layers. In this paper, we propose a novel detection scheme for Sybil attack in a centralized clustering-based hierarchical network. Sybil nodes are detected prior to cluster formation to prevent their forged identities from participating in cluster head selection. Only legitimate nodes are elected as cluster heads to enhance utilization of the resources. The proposed scheme requires collaboration of any two high energy nodes to analyze received signal strengths of neighboring nodes. The simulation results show that our proposed scheme significantly improves network lifetime in comparison with existing clustering-based hierarchical routing protocols.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"478 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123429324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-20DOI: 10.1109/Trustcom.2015.581
Wojciech M. Czarnecki, Krzysztof Rataj
Modern drug design procedures involve the process of virtual screening, a highly efficient filtering step used for maximizing the efficiency of the preselection of compounds which are valuable drug candidates. Recent advances in introduction of machine learning models to this process can lead to significant increase in the overall quality of the drug designing pipeline. Unfortunately, for many proteins it is still extremely hard to come up with a valid statistical model. It is a consequence of huge classes disproportion (even 1000:1), large datasets (over 100,000 of samples) and restricted data representation (mostly high-dimensional, sparse, binary vectors). In this paper, we try to tackle this problem through three important innovations. First we represent compounds with 2-dimensional, graph representation. Second, we show how one can provide extremely fast method for measuring similarity of such data. Finally, we use the Extreme Entropy Machine which shows increase in balanced accuracy over Extreme Learning Machines, Support Vector Machines, one-class Support Vector Machines as well as Random Forest. Proposed pipeline brings significantly better results than all considered alternative, state-of-the-art approaches. We introduce some important novel elements and show why they lead to better model. Despite this, it should still be considered as a proof of concept and further investigations in the field are needed.
{"title":"Compounds Activity Prediction in Large Imbalanced Datasets with Substructural Relations Fingerprint and EEM","authors":"Wojciech M. Czarnecki, Krzysztof Rataj","doi":"10.1109/Trustcom.2015.581","DOIUrl":"https://doi.org/10.1109/Trustcom.2015.581","url":null,"abstract":"Modern drug design procedures involve the process of virtual screening, a highly efficient filtering step used for maximizing the efficiency of the preselection of compounds which are valuable drug candidates. Recent advances in introduction of machine learning models to this process can lead to significant increase in the overall quality of the drug designing pipeline. Unfortunately, for many proteins it is still extremely hard to come up with a valid statistical model. It is a consequence of huge classes disproportion (even 1000:1), large datasets (over 100,000 of samples) and restricted data representation (mostly high-dimensional, sparse, binary vectors). In this paper, we try to tackle this problem through three important innovations. First we represent compounds with 2-dimensional, graph representation. Second, we show how one can provide extremely fast method for measuring similarity of such data. Finally, we use the Extreme Entropy Machine which shows increase in balanced accuracy over Extreme Learning Machines, Support Vector Machines, one-class Support Vector Machines as well as Random Forest. Proposed pipeline brings significantly better results than all considered alternative, state-of-the-art approaches. We introduce some important novel elements and show why they lead to better model. Despite this, it should still be considered as a proof of concept and further investigations in the field are needed.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122133182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}