The service-oriented architecture (SOA) proposes a new software development paradigm that is based on loosely coupled software components deployed and located across the web. This key benefit of SOA allows building workflows crossing organizational boundaries. One of the major challenges in this field of research is the automatic discovery and composition of services. In this paper we suggest a novel ontology-based discovery and composition approach. In a first step standard web services descriptions based on WSDL are semantically enriched by pre-processing steps including general-purpose ontologies as well as domainspecific ontologies. Resulting semantic-aware service profiles are stored in a registry component. In a second step we describe the matchmaking algorithms underlying the IRIS discovery component. Finally, we evaluate the performance of our algorithms analytically as well as we measure its quality by an experimental setup.
{"title":"Automatic Discovery and Composition of Services with IRIS","authors":"U. Radetzki, A. Cremers","doi":"10.1109/ICDEW.2006.34","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.34","url":null,"abstract":"The service-oriented architecture (SOA) proposes a new software development paradigm that is based on loosely coupled software components deployed and located across the web. This key benefit of SOA allows building workflows crossing organizational boundaries. One of the major challenges in this field of research is the automatic discovery and composition of services. In this paper we suggest a novel ontology-based discovery and composition approach. In a first step standard web services descriptions based on WSDL are semantically enriched by pre-processing steps including general-purpose ontologies as well as domainspecific ontologies. Resulting semantic-aware service profiles are stored in a registry component. In a second step we describe the matchmaking algorithms underlying the IRIS discovery component. Finally, we evaluate the performance of our algorithms analytically as well as we measure its quality by an experimental setup.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125189369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we introduce a new privacy protection property called p-sensitive k-anonymity. The existing kanonymity property protects against identity disclosure, but it fails to protect against attribute disclosure. The new introduced privacy model avoids this shortcoming. Two necessary conditions to achieve p-sensitive kanonymity property are presented, and used in developing algorithms to create masked microdata with p-sensitive k-anonymity property using generalization and suppression.
{"title":"Privacy Protection: p-Sensitive k-Anonymity Property","authors":"T. Truta, B. Vinay","doi":"10.1109/ICDEW.2006.116","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.116","url":null,"abstract":"In this paper, we introduce a new privacy protection property called p-sensitive k-anonymity. The existing kanonymity property protects against identity disclosure, but it fails to protect against attribute disclosure. The new introduced privacy model avoids this shortcoming. Two necessary conditions to achieve p-sensitive kanonymity property are presented, and used in developing algorithms to create masked microdata with p-sensitive k-anonymity property using generalization and suppression.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125038573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Document clustering has been used as a core technique in managing vast amount of data and providing needed information. In on-line environments, generally new information gains more interests than old one. Traditional clustering focuses on grouping similar documents into clusters by treating each document with equal weight. We proposed a novelty-based incremental clustering method for on-line documents that has biases on recent documents. In the clustering method, the notion of ‘novelty’ is incorporated into a similarity function and a clustering method, a variant of the K-means method, is proposed. We examine the efficiency and behaviors of the method by experiments.
{"title":"Novelty-based Incremental Document Clustering for On-line Documents","authors":"Sophoin Khy, Y. Ishikawa, H. Kitagawa","doi":"10.1109/ICDEW.2006.100","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.100","url":null,"abstract":"Document clustering has been used as a core technique in managing vast amount of data and providing needed information. In on-line environments, generally new information gains more interests than old one. Traditional clustering focuses on grouping similar documents into clusters by treating each document with equal weight. We proposed a novelty-based incremental clustering method for on-line documents that has biases on recent documents. In the clustering method, the notion of ‘novelty’ is incorporated into a similarity function and a clustering method, a variant of the K-means method, is proposed. We examine the efficiency and behaviors of the method by experiments.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122078917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current applications are often forced to filter the richness of datasources in order to reduce the information noise the user is subject to. We consider this aspect as a critical issue of applications, to be factorized at the data management level. The Context-ADDICT system, leveraging on ontology-based context and domain models, is able to personalize the data to be made available to the user by "context-aware tailoring". In this paper we present a formal approach to the definition of the relationship between context (represented by an appropriate context model) and application domain (modeled by a domain ontology). Once such relationship has been defined, we are able to work out the boundary of the portion of the domain relevant to a user in a certain context. We also sketch the implementation of a visual tool supporting the application designer in this modeling task
{"title":"Ontology-Based Information Tailoring","authors":"C. Curino, E. Quintarelli, L. Tanca","doi":"10.1109/ICDEW.2006.104","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.104","url":null,"abstract":"Current applications are often forced to filter the richness of datasources in order to reduce the information noise the user is subject to. We consider this aspect as a critical issue of applications, to be factorized at the data management level. The Context-ADDICT system, leveraging on ontology-based context and domain models, is able to personalize the data to be made available to the user by \"context-aware tailoring\". In this paper we present a formal approach to the definition of the relationship between context (represented by an appropriate context model) and application domain (modeled by a domain ontology). Once such relationship has been defined, we are able to work out the boundary of the portion of the domain relevant to a user in a certain context. We also sketch the implementation of a visual tool supporting the application designer in this modeling task","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131392369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The window query model is widely used in data stream management systems where the focus of a continuous query is limited to a set of the most recent tuples. In this dissertation, we show that an interesting and important class of continuous queries can not be answered by the existing sliding-window query models. Thus, we introduce a new model for continuous queries, termed the predicate-window query model that limits the focus of a continuous query to the stream tuples that qualify a certain predicate. Predicate windows are characterized by the following (1) The window predicate can be defined over any attribute in the stream tuple (ordered or unordered). (2) Stream tuples qualify and disqualify the window predicate in an out-of-order manner. The goal of this dissertation is to develop an efficient framework to realize predicate windows inside data stream management systems. The predicate-window query framework enables the system to efficiently support a wide variety of streaming applications through an expressive query language and efficient query evaluation mechanisms (i.e., query execution and query optimization). As a test bed for our research, the predicate-window framework is being developed inside Nile; a prototype data stream management system developed at Purdue Univers
{"title":"Supporting Predicate-Window Queries in Data Stream Management Systems","authors":"T. Ghanem","doi":"10.1109/ICDEW.2006.140","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.140","url":null,"abstract":"The window query model is widely used in data stream management systems where the focus of a continuous query is limited to a set of the most recent tuples. In this dissertation, we show that an interesting and important class of continuous queries can not be answered by the existing sliding-window query models. Thus, we introduce a new model for continuous queries, termed the predicate-window query model that limits the focus of a continuous query to the stream tuples that qualify a certain predicate. Predicate windows are characterized by the following (1) The window predicate can be defined over any attribute in the stream tuple (ordered or unordered). (2) Stream tuples qualify and disqualify the window predicate in an out-of-order manner. The goal of this dissertation is to develop an efficient framework to realize predicate windows inside data stream management systems. The predicate-window query framework enables the system to efficiently support a wide variety of streaming applications through an expressive query language and efficient query evaluation mechanisms (i.e., query execution and query optimization). As a test bed for our research, the predicate-window framework is being developed inside Nile; a prototype data stream management system developed at Purdue Univers","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131632023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research.
{"title":"Using Element Clustering to Increase the Efficiency of XML Schema Matching","authors":"M. Smiljanic, M. V. Keulen, W. Jonker","doi":"10.1109/ICDEW.2006.159","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.159","url":null,"abstract":"Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124689683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Schema matching has been historically difficult to automate. Most previous studies have tried to find matches by exploiting information on schema and data instances. However, schema and data instances cannot fully capture the semantic information of the databases. Therefore, some attributes can be matched to improper attributes. To address this problem, we propose a schema matching framework that supports identification of the correct matches by extracting the semantics from ontologies. In ontologies, two concepts share similar semantics in their common parent. In addition, the parent can be further used to quantify a similarity between them. By combining this idea with effective contemporary mapping algorithms, we perform an ontology-driven semantic matching in multiple data sources. Experimental results indicate that the proposed method successfully identifies higher accurate matches than those of previous works.
{"title":"Ontology-Driven Semantic Matches between Database Schemas","authors":"Sangsoo Sung, D. McLeod","doi":"10.1109/ICDEW.2006.105","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.105","url":null,"abstract":"Schema matching has been historically difficult to automate. Most previous studies have tried to find matches by exploiting information on schema and data instances. However, schema and data instances cannot fully capture the semantic information of the databases. Therefore, some attributes can be matched to improper attributes. To address this problem, we propose a schema matching framework that supports identification of the correct matches by extracting the semantics from ontologies. In ontologies, two concepts share similar semantics in their common parent. In addition, the parent can be further used to quantify a similarity between them. By combining this idea with effective contemporary mapping algorithms, we perform an ontology-driven semantic matching in multiple data sources. Experimental results indicate that the proposed method successfully identifies higher accurate matches than those of previous works.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124730306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Web spamming refers to actions intended to mislead search engines into ranking certain pages higher than they deserve. Recently, the amount of web spam has increased dramatically, leading to a degradation of search results. One of the most effective spamming techniques is link spamming. This is done by setting up an interconnected structure of pages for deceiving link-based ranking methods, such as PageRank. In this paper, we analyze distributions of link spam in our archive of Japanese web pages using link analysis techniques.
{"title":"IdentifyingWeb Spam by Densely Connected Sites and its Statistics in a JapaneseWeb Snapshot","authors":"Hiroshi Ono, Masashi Toyoda, M. Kitsuregawa","doi":"10.1109/ICDEW.2006.64","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.64","url":null,"abstract":"Web spamming refers to actions intended to mislead search engines into ranking certain pages higher than they deserve. Recently, the amount of web spam has increased dramatically, leading to a degradation of search results. One of the most effective spamming techniques is link spamming. This is done by setting up an interconnected structure of pages for deceiving link-based ranking methods, such as PageRank. In this paper, we analyze distributions of link spam in our archive of Japanese web pages using link analysis techniques.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116540310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite all the efforts to build up a Semantic Web, where each machine can understand and interpret the data it processes, information is usually still stored in ordinary relational databases. Semantic Web applications needing access to such semantically unexploited data, have to create their own manual relational database to Semantic Web mappings. In this paper we analyze, whether the combination of Relational.OWL as a Semantic Web representation of relational databases and a semantic query language like SPARQL could be an alternative. The benefits of such an approach are clear, since it enables Semantic Web applications to access and query data actually stored in relational databases using their own built-in functionality.
{"title":"Bringing Relational Data into the SemanticWeb using SPARQL and Relational.OWL","authors":"C. Laborda, Stefan Conrad","doi":"10.1109/ICDEW.2006.37","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.37","url":null,"abstract":"Despite all the efforts to build up a Semantic Web, where each machine can understand and interpret the data it processes, information is usually still stored in ordinary relational databases. Semantic Web applications needing access to such semantically unexploited data, have to create their own manual relational database to Semantic Web mappings. In this paper we analyze, whether the combination of Relational.OWL as a Semantic Web representation of relational databases and a semantic query language like SPARQL could be an alternative. The benefits of such an approach are clear, since it enables Semantic Web applications to access and query data actually stored in relational databases using their own built-in functionality.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129660184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minho Sung, Abhishek Kumar, Erran L. Li, Jia Wang, Jun Xu
Recent research on data streaming algorithms has provided powerful tools to efficiently monitor various characteristics of traffic passing through a single network link or node. However, it is often desirable to perform data streaming analysis on the traffic aggregated over hundreds or even thousands of links/nodes, which will provide network operators with a holistic view of the network operation. Shipping raw traffic data to a centralized location (i.e., “raw aggregation”) for streaming analysis is clearly not a feasible approach for a large network. In this paper, we propose a set of novel distributed data streaming algorithms that allow scalable and efficient monitoring of aggregated traffic without the need for raw aggregation. Our algorithms target the specific network monitoring problem of finding common content in the Internet traffic traversing several nodes/links, which has applications in network-wide intrusion detection, early warning for fast propagating worms, and detection of hot objects and spam traffic. We evaluate our algorithms through extensive simulations and experiments on traffic traces collected from a tier-1 ISP. The experimental results demonstrate that our algorithms can effectively detect common content in the traffic traversing across a large network.
{"title":"Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic","authors":"Minho Sung, Abhishek Kumar, Erran L. Li, Jia Wang, Jun Xu","doi":"10.1109/ICDEW.2006.130","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.130","url":null,"abstract":"Recent research on data streaming algorithms has provided powerful tools to efficiently monitor various characteristics of traffic passing through a single network link or node. However, it is often desirable to perform data streaming analysis on the traffic aggregated over hundreds or even thousands of links/nodes, which will provide network operators with a holistic view of the network operation. Shipping raw traffic data to a centralized location (i.e., “raw aggregation”) for streaming analysis is clearly not a feasible approach for a large network. In this paper, we propose a set of novel distributed data streaming algorithms that allow scalable and efficient monitoring of aggregated traffic without the need for raw aggregation. Our algorithms target the specific network monitoring problem of finding common content in the Internet traffic traversing several nodes/links, which has applications in network-wide intrusion detection, early warning for fast propagating worms, and detection of hot objects and spam traffic. We evaluate our algorithms through extensive simulations and experiments on traffic traces collected from a tier-1 ISP. The experimental results demonstrate that our algorithms can effectively detect common content in the traffic traversing across a large network.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129667429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}