It is generally believed that data mining results do not violate the anonymity of the individuals recorded in the source database. In fact, data mining models and patterns, in order to ensure a required statistical significance, represent a large number of individuals and thus conceal individual identities: this is the case of the minimum support threshold in association rule mining. We have recently shown [3], that the above belief is ill-founded: by shifting the concept of k-anonymity [8] from data to patterns, we have formally characterized the notion of a threat to anonymity in the context of frequent itemsets mining, and provided a methodology to efficiently and effectively identify such threats that might arise from the disclosure of a set of frequent itemsets. In our previous paper [2] we have introduced a first, naïve strategy (named suppressive) to sanitize such threats. In this paper we develop a novel sanitization strategy, named additive, which outperforms the previous one in terms of the introduced distortion and has the interesting feature of maintaining the original set of frequent itemsets unchanged, while modifying only the corresponding support values.
{"title":"Towards low-perturbation anonymity preserving pattern discovery","authors":"M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi","doi":"10.1145/1141277.1141412","DOIUrl":"https://doi.org/10.1145/1141277.1141412","url":null,"abstract":"It is generally believed that data mining results do not violate the anonymity of the individuals recorded in the source database. In fact, data mining models and patterns, in order to ensure a required statistical significance, represent a large number of individuals and thus conceal individual identities: this is the case of the minimum support threshold in association rule mining. We have recently shown [3], that the above belief is ill-founded: by shifting the concept of k-anonymity [8] from data to patterns, we have formally characterized the notion of a threat to anonymity in the context of frequent itemsets mining, and provided a methodology to efficiently and effectively identify such threats that might arise from the disclosure of a set of frequent itemsets. In our previous paper [2] we have introduced a first, naïve strategy (named suppressive) to sanitize such threats. In this paper we develop a novel sanitization strategy, named additive, which outperforms the previous one in terms of the introduced distortion and has the interesting feature of maintaining the original set of frequent itemsets unchanged, while modifying only the corresponding support values.","PeriodicalId":269830,"journal":{"name":"Proceedings of the 2006 ACM symposium on Applied computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114450084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Numerical optimization of given objective functions is a crucial task in many real-life problems. This paper introduces a new immunological algorithm for continuous global optimization problems, called opt-IMMALG; it is an improved version of a previously proposed clonal selection algorithm, using a real-code representation and a new Inversely Proportional Hypermutation operator.We evaluate and assess the performance of opt-IMMALG and several others algorithms, namely opt-IA, PSO, arPSO, DE, and SEA with respect to their general applicability as numerical optimization algorithms. The experiments have been performed on 23 widely used benchmark problems.The experimental results show that opt-IMMALG is a suitable numerical optimization technique that, in terms of accuracy, outperforms the analyzed algorithms in this comparative study. In addition it is shown that opt-IMMALG is also suitable for solving large-scale problems.
{"title":"Real coded clonal selection algorithm for unconstrained global optimization using a hybrid inversely proportional hypermutation operator","authors":"V. Cutello, Giuseppe Nicosia, M. Pavone","doi":"10.1145/1141277.1141501","DOIUrl":"https://doi.org/10.1145/1141277.1141501","url":null,"abstract":"Numerical optimization of given objective functions is a crucial task in many real-life problems. This paper introduces a new immunological algorithm for continuous global optimization problems, called opt-IMMALG; it is an improved version of a previously proposed clonal selection algorithm, using a real-code representation and a new Inversely Proportional Hypermutation operator.We evaluate and assess the performance of opt-IMMALG and several others algorithms, namely opt-IA, PSO, arPSO, DE, and SEA with respect to their general applicability as numerical optimization algorithms. The experiments have been performed on 23 widely used benchmark problems.The experimental results show that opt-IMMALG is a suitable numerical optimization technique that, in terms of accuracy, outperforms the analyzed algorithms in this comparative study. In addition it is shown that opt-IMMALG is also suitable for solving large-scale problems.","PeriodicalId":269830,"journal":{"name":"Proceedings of the 2006 ACM symposium on Applied computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114478374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Based on an experiment using three languages under .NET, this paper argues that the semantic differences between these languages regarding method overloading and overriding give rise to significant complexity and break encapsulation. We first recalls the various interpretations of overriding and overloading in object oriented languages through what we call language signatures. Then, we realize an experimentation with .NET components coded in different programming languages in order to observe the global behavior. From this, we show that overriding and overloading are not compatible with a key property of components: encapsulation. We conclude that, in the current state of the art, in order to build predictable assembly, components must expose their internal structure! We propose a solution to this problem.
{"title":"Method overloading and overriding cause encapsulation flaw: an experiment on assembly of heterogeneous components","authors":"A. Beugnard","doi":"10.1145/1141277.1141608","DOIUrl":"https://doi.org/10.1145/1141277.1141608","url":null,"abstract":"Based on an experiment using three languages under .NET, this paper argues that the semantic differences between these languages regarding method overloading and overriding give rise to significant complexity and break encapsulation. We first recalls the various interpretations of overriding and overloading in object oriented languages through what we call language signatures. Then, we realize an experimentation with .NET components coded in different programming languages in order to observe the global behavior. From this, we show that overriding and overloading are not compatible with a key property of components: encapsulation. We conclude that, in the current state of the art, in order to build predictable assembly, components must expose their internal structure! We propose a solution to this problem.","PeriodicalId":269830,"journal":{"name":"Proceedings of the 2006 ACM symposium on Applied computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117136726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A composite object represented as a directed graph (digraph for short) is an important data structure that requires efficient support in CAD/CAM, CASE, office systems, software management, web databases, and document databases. It is cumbersome to handle such objects in relational database systems when they involve ancestor-descendant relationships (or say, recursive relationships). In this paper, we present a new encoding method to label a digraph, which reduces the footprints of all previous strategies. This method is based on a tree labeling method and the concept of branchings that are used in graph theory for finding the shortest connection networks. A branching is a subgraph of a given digraph that is in fact a forest, but covers all the nodes of the graph. On the one hand, the proposed encoding scheme achieves the smallest space requirements among all previously published strategies for recognizing recursive relationships. On the other hand, it leads to a new algorithm for computing transitive closures for DAGs (directed acyclic graph) in O(e·b) time and O(n·b) space, where n represents the number of the nodes of a DAG, e the numbers of the edges, and b the DAG's breadth. The method can also be extended to graphs containing cycles. Especially, based on this encoding method, a multi-level compression is developed, by means of which the space for the representation of a transitive closure can be reduced to O((b/dk)·n), where k is the number of compression levels and d is the average outdegree of the nodes.
{"title":"On the transitive closure representation and adjustable compression","authors":"Yangjun Chen, D. Cooke","doi":"10.1145/1141277.1141385","DOIUrl":"https://doi.org/10.1145/1141277.1141385","url":null,"abstract":"A composite object represented as a directed graph (digraph for short) is an important data structure that requires efficient support in CAD/CAM, CASE, office systems, software management, web databases, and document databases. It is cumbersome to handle such objects in relational database systems when they involve ancestor-descendant relationships (or say, recursive relationships). In this paper, we present a new encoding method to label a digraph, which reduces the footprints of all previous strategies. This method is based on a tree labeling method and the concept of branchings that are used in graph theory for finding the shortest connection networks. A branching is a subgraph of a given digraph that is in fact a forest, but covers all the nodes of the graph. On the one hand, the proposed encoding scheme achieves the smallest space requirements among all previously published strategies for recognizing recursive relationships. On the other hand, it leads to a new algorithm for computing transitive closures for DAGs (directed acyclic graph) in O(e·b) time and O(n·b) space, where n represents the number of the nodes of a DAG, e the numbers of the edges, and b the DAG's breadth. The method can also be extended to graphs containing cycles. Especially, based on this encoding method, a multi-level compression is developed, by means of which the space for the representation of a transitive closure can be reduced to O((b/dk)·n), where k is the number of compression levels and d is the average outdegree of the nodes.","PeriodicalId":269830,"journal":{"name":"Proceedings of the 2006 ACM symposium on Applied computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122047349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabrício Benevenuto, C. Costa, Marisa A. Vasconcelos, Virgílio A. F. Almeida, J. Almeida, M. Mowbray
Recent studies have reported a new form of malicious behavior in file-sharing Peer-to-Peer systems: content pollution. The dissemination of polluted content in a P2P system has the detrimental effect of reducing content availability, and ultimately, decreasing the confidence of users in such systems. Two potential strategies for polluting P2P content are decoy insertion, which consists of injecting corrupted copies of a file into the system, and hash corruption, which consists of injecting a corrupted file with the same hash code as a non-corrupted one. Polluted content disseminates through P2P networks because users typically do not delete the corrupted files that they download.This paper investigates the effectiveness of peer incentives to delete corrupted files in reducing the dissemination of polluted content, considering the two aforementioned pollution mechanisms. Our simulation results show that the effectiveness of incentives is highly dependent on the pollution mechanism. We show that for a pollution dissemintation techinique called hash corruption, only effective incentive mechanisms are able to avoid spreading of polluted content.
{"title":"Impact of peer incentives on the dissemination of polluted content","authors":"Fabrício Benevenuto, C. Costa, Marisa A. Vasconcelos, Virgílio A. F. Almeida, J. Almeida, M. Mowbray","doi":"10.1145/1141277.1141720","DOIUrl":"https://doi.org/10.1145/1141277.1141720","url":null,"abstract":"Recent studies have reported a new form of malicious behavior in file-sharing Peer-to-Peer systems: content pollution. The dissemination of polluted content in a P2P system has the detrimental effect of reducing content availability, and ultimately, decreasing the confidence of users in such systems. Two potential strategies for polluting P2P content are decoy insertion, which consists of injecting corrupted copies of a file into the system, and hash corruption, which consists of injecting a corrupted file with the same hash code as a non-corrupted one. Polluted content disseminates through P2P networks because users typically do not delete the corrupted files that they download.This paper investigates the effectiveness of peer incentives to delete corrupted files in reducing the dissemination of polluted content, considering the two aforementioned pollution mechanisms. Our simulation results show that the effectiveness of incentives is highly dependent on the pollution mechanism. We show that for a pollution dissemintation techinique called hash corruption, only effective incentive mechanisms are able to avoid spreading of polluted content.","PeriodicalId":269830,"journal":{"name":"Proceedings of the 2006 ACM symposium on Applied computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129422838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In [6], Ian Foster and Karl Kesselman explain that grids need "a rethinking of existing programming models and, most likely, new thinking about novel models". In this work, we investigate a "novel programming model" for grids based on the chemical metaphor.
{"title":"Towards chemical coordination for grids","authors":"J. Banâtre, Pascal Fradet, Yann Radenac","doi":"10.1145/1141277.1141381","DOIUrl":"https://doi.org/10.1145/1141277.1141381","url":null,"abstract":"In [6], Ian Foster and Karl Kesselman explain that grids need \"a rethinking of existing programming models and, most likely, new thinking about novel models\". In this work, we investigate a \"novel programming model\" for grids based on the chemical metaphor.","PeriodicalId":269830,"journal":{"name":"Proceedings of the 2006 ACM symposium on Applied computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128689733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco M. Carvalho, M. Rebeschini, J. Horsley, Niranjan Suri
In this paper we introduce a group-messaging interface that allows humans to efficiently interact with a group of agents through a hierarchical and customizable text protocol. Our approach is presented in the context of the MAST mobile agent-based framework for security and administration of large scale computer networks. The MAST framework is primarily human-centric and directly supports human-agent interaction that enables customized agents to notify administrators and react to abnormal environmental conditions. The proposed IRC-like interface was developed and tested in the context of MAST. In this paper we present the group-manager interface in contrast with other agent interfaces currently available in the MAST framework.
{"title":"A chat interface for human-agent interaction in MAST","authors":"Marco M. Carvalho, M. Rebeschini, J. Horsley, Niranjan Suri","doi":"10.1145/1141277.1141302","DOIUrl":"https://doi.org/10.1145/1141277.1141302","url":null,"abstract":"In this paper we introduce a group-messaging interface that allows humans to efficiently interact with a group of agents through a hierarchical and customizable text protocol. Our approach is presented in the context of the MAST mobile agent-based framework for security and administration of large scale computer networks. The MAST framework is primarily human-centric and directly supports human-agent interaction that enables customized agents to notify administrators and react to abnormal environmental conditions. The proposed IRC-like interface was developed and tested in the context of MAST. In this paper we present the group-manager interface in contrast with other agent interfaces currently available in the MAST framework.","PeriodicalId":269830,"journal":{"name":"Proceedings of the 2006 ACM symposium on Applied computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129593570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we propose a new approach for protein classification based on Bayesian classifiers. Our goal is to predict the functional family of novel protein sequences based on their motif composition. For this purpose, datasets extracted from Prosite, a curated protein family database, are used as training datasets. In the conducted experiments, the performance of our classifier is compared to other known data mining approaches. The computational results have shown that the proposed method outperforms the other ones and looks very promising for problems with characteristics similar to the problem addressed here.
{"title":"A Bayesian approach for protein classification","authors":"L. Merschmann, A. Plastino","doi":"10.1145/1141277.1141322","DOIUrl":"https://doi.org/10.1145/1141277.1141322","url":null,"abstract":"In this work, we propose a new approach for protein classification based on Bayesian classifiers. Our goal is to predict the functional family of novel protein sequences based on their motif composition. For this purpose, datasets extracted from Prosite, a curated protein family database, are used as training datasets. In the conducted experiments, the performance of our classifier is compared to other known data mining approaches. The computational results have shown that the proposed method outperforms the other ones and looks very promising for problems with characteristics similar to the problem addressed here.","PeriodicalId":269830,"journal":{"name":"Proceedings of the 2006 ACM symposium on Applied computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127088339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The implementation of a correctly rounded or interval elementary function needs to be proven carefully in the very last details. The proof requires a tight bound on the overall error of the implementation with respect to the mathematical function. Such work is function specific, concerns tens of lines of code for each function, and will usually be broken by the smallest change to the code (e.g. for maintenance or optimization purpose). Therefore, it is very tedious and error-prone if done by hand. This article discusses the use of the Gappa proof assistant in this context. Gappa has two main advantages over previous approaches: Its input format is very close to the actual C code to validate, and it automates error evaluation and propagation using interval arithmetic. Besides, it can be used to incrementally prove complex mathematical properties pertaining to the C code. Yet it does not require any specific knowledge about automatic theorem proving, and thus is accessible to a wider community. Moreover, Gappa may generate a formal proof of the results that can be checked independently by a lower-level proof assistant like Coq, hence providing an even higher confidence in the certification of the numerical code.
{"title":"Assisted verification of elementary functions using Gappa","authors":"F. D. Dinechin, C. Lauter, G. Melquiond","doi":"10.1145/1141277.1141584","DOIUrl":"https://doi.org/10.1145/1141277.1141584","url":null,"abstract":"The implementation of a correctly rounded or interval elementary function needs to be proven carefully in the very last details. The proof requires a tight bound on the overall error of the implementation with respect to the mathematical function. Such work is function specific, concerns tens of lines of code for each function, and will usually be broken by the smallest change to the code (e.g. for maintenance or optimization purpose). Therefore, it is very tedious and error-prone if done by hand. This article discusses the use of the Gappa proof assistant in this context. Gappa has two main advantages over previous approaches: Its input format is very close to the actual C code to validate, and it automates error evaluation and propagation using interval arithmetic. Besides, it can be used to incrementally prove complex mathematical properties pertaining to the C code. Yet it does not require any specific knowledge about automatic theorem proving, and thus is accessible to a wider community. Moreover, Gappa may generate a formal proof of the results that can be checked independently by a lower-level proof assistant like Coq, hence providing an even higher confidence in the certification of the numerical code.","PeriodicalId":269830,"journal":{"name":"Proceedings of the 2006 ACM symposium on Applied computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127285675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we will present the results of research into the semantics of modeling constructs for the process-oriented perspective for the conceptual modeling of enterprise subject areas. The set of modeling constructs that are defined in this paper are fully 'compatible' with the models in the data-oriented perspective in the fact oriented school of conceptual modeling. We will derive the 'semantic' bridges for the conceptual modeling methodology for enterprises in the process-oriented perspective.
{"title":"Conceptual process configurations in enterprise knowledge management systems","authors":"Peter Bollen","doi":"10.1145/1141277.1141631","DOIUrl":"https://doi.org/10.1145/1141277.1141631","url":null,"abstract":"In this paper we will present the results of research into the semantics of modeling constructs for the process-oriented perspective for the conceptual modeling of enterprise subject areas. The set of modeling constructs that are defined in this paper are fully 'compatible' with the models in the data-oriented perspective in the fact oriented school of conceptual modeling. We will derive the 'semantic' bridges for the conceptual modeling methodology for enterprises in the process-oriented perspective.","PeriodicalId":269830,"journal":{"name":"Proceedings of the 2006 ACM symposium on Applied computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127539800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}