Search engines provide an invaluable capability to search the Web which is growing at a fast pace. In addition to growth, the content of the pages on the Web is also changing continuously. Periodical retrieval of pages for understanding changes is both inefficient and time consuming. Search engines do not help in this aspect of information retrieval at all. In this paper, we present an overview of WebVigiL, a system that automates the change detection and timely notification of HTML/XML pages based on user-specified changes of interest. User interest, specified as a sentinel/profile, is automatically monitored by the system using a combination of learning-based and event-driven techniques. The system is currently available for use
{"title":"Automating Change Detection and Notification of Web Pages (Invited Paper)","authors":"Sharma Chakravarthy, Subramanian C. Hari Hara","doi":"10.1109/DEXA.2006.34","DOIUrl":"https://doi.org/10.1109/DEXA.2006.34","url":null,"abstract":"Search engines provide an invaluable capability to search the Web which is growing at a fast pace. In addition to growth, the content of the pages on the Web is also changing continuously. Periodical retrieval of pages for understanding changes is both inefficient and time consuming. Search engines do not help in this aspect of information retrieval at all. In this paper, we present an overview of WebVigiL, a system that automates the change detection and timely notification of HTML/XML pages based on user-specified changes of interest. User interest, specified as a sentinel/profile, is automatically monitored by the system using a combination of learning-based and event-driven techniques. The system is currently available for use","PeriodicalId":282986,"journal":{"name":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130978890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Extracting and visualizing concepts and relationship between text documents strongly depends on the used similarity measure. In order to provide meaningful visualizations and to extract useful knowledge from document collections, user needs must be captured by the internal representation of documents, and the used similarity measure. In most applications the vector space model and the cosine similarity are used therefore and serve as good approximations. Nevertheless, influencing similarities between documents is rather hard, since parameter tuning relies heavily on expert knowledge of the underlying algorithms, and the influence of different weighting schemes and similarity measures is not known before. In this paper we present an approach on how to adapt the vector space representation of documents by giving visual feedback to the system. Our approach starts by clustering a corpus of text documents and visualizing the results using multi dimensional scaling techniques. Afterwards, a 2D landscape visualization is shown which can be manipulated by the user. Based on these manipulations the high dimensional representation of the documents is adapted to fit the users need more precisely. Our experiments show that iterating these steps results in an adapted representation of documents and similarities, generating layouts as intended by the user and furthermore increases clustering accuracy. While this paper only investigates the influence on clustering and visualization, the method itself may also be used for increasing classification and retrieval performance since it adapts to the users need of similarity
{"title":"Learning Term Spaces Based on Visual Feedback","authors":"M. Granitzer, T. Neidhart, M. Lux","doi":"10.1109/DEXA.2006.82","DOIUrl":"https://doi.org/10.1109/DEXA.2006.82","url":null,"abstract":"Extracting and visualizing concepts and relationship between text documents strongly depends on the used similarity measure. In order to provide meaningful visualizations and to extract useful knowledge from document collections, user needs must be captured by the internal representation of documents, and the used similarity measure. In most applications the vector space model and the cosine similarity are used therefore and serve as good approximations. Nevertheless, influencing similarities between documents is rather hard, since parameter tuning relies heavily on expert knowledge of the underlying algorithms, and the influence of different weighting schemes and similarity measures is not known before. In this paper we present an approach on how to adapt the vector space representation of documents by giving visual feedback to the system. Our approach starts by clustering a corpus of text documents and visualizing the results using multi dimensional scaling techniques. Afterwards, a 2D landscape visualization is shown which can be manipulated by the user. Based on these manipulations the high dimensional representation of the documents is adapted to fit the users need more precisely. Our experiments show that iterating these steps results in an adapted representation of documents and similarities, generating layouts as intended by the user and furthermore increases clustering accuracy. While this paper only investigates the influence on clustering and visualization, the method itself may also be used for increasing classification and retrieval performance since it adapts to the users need of similarity","PeriodicalId":282986,"journal":{"name":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133626558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The PENG system is intended to provide an integrated and personalized environment for news professionals, providing functionalities for filtering, distributed retrieval, and a flexible interface environment for the display and manipulation of news materials. In this paper, we review the progress and results of the PENG system to date, and describe in detail the document filtering part of the system, which is designed to gather and filter documents to user profiles. The current architecture was described, along with some of the main issues which have so far been found in its development
{"title":"The PENG System: Practice and Experience","authors":"G. Pasi, Gloria Bordogna, R. Villa","doi":"10.1109/DEXA.2006.136","DOIUrl":"https://doi.org/10.1109/DEXA.2006.136","url":null,"abstract":"The PENG system is intended to provide an integrated and personalized environment for news professionals, providing functionalities for filtering, distributed retrieval, and a flexible interface environment for the display and manipulation of news materials. In this paper, we review the progress and results of the PENG system to date, and describe in detail the document filtering part of the system, which is designed to gather and filter documents to user profiles. The current architecture was described, along with some of the main issues which have so far been found in its development","PeriodicalId":282986,"journal":{"name":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132748742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The concepts of bipolar queries and queries with preferences are studied. Various interpretations of the former, recently defined by Dubois and Prade, are discussed. The latter was defined by Chomicki together with a new relational algebra operator winnow. The fuzzy version of the winnow operator is proposed. It is shown how it may be used to express a selected interpretation of the bipolar queries
{"title":"Bipolar Queries and Queries with Preferences (Invited Paper)","authors":"S. Zadrożny, J. Kacprzyk","doi":"10.1109/DEXA.2006.36","DOIUrl":"https://doi.org/10.1109/DEXA.2006.36","url":null,"abstract":"The concepts of bipolar queries and queries with preferences are studied. Various interpretations of the former, recently defined by Dubois and Prade, are discussed. The latter was defined by Chomicki together with a new relational algebra operator winnow. The fuzzy version of the winnow operator is proposed. It is shown how it may be used to express a selected interpretation of the bipolar queries","PeriodicalId":282986,"journal":{"name":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127802785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scalable distributed data structures (SDDSs) store large scalable files over a distributed RAM of nodes in a grid or a P2P network. The files scale transparently for the applications. The prototype system was designed by CERIA, experiments with this technology for Wintel multicomputers. The application may manipulate data much faster than on local disks. We present the functions we have put into the prototype we now call SDDS-2004. We improve the searches and updates of records in our SDDS files. An original property of these functions is the use of the algebraic signatures. This technique serves the distributed non-key record search. The search may concern the entire field or a (sub)string. The algebraic properties of the signatures act similarly to hash schemes in the work of R.M. Karp and M.O. Rabin (1987). In particular, sending a few-byte signature of the searched string alone, suffices for the search. This makes the communication between the SDDS client and server more efficient. It is also more confidential, since the signature in an intercepted message does not disclose the searched string. On the other hand, we use the signatures for the update management. The clients do not need to then to send updates which in fact do not change the stored records. Finally, our signatures help managing the concurrency control. We present our architecture and design choices. Performance measures validate our implementation. It is now available for download in site of CERIA
{"title":"String-Matching and Update through Algebraic Signatures in Scalable Distributed Data Structures","authors":"R. Mokadem, W. Litwin","doi":"10.1109/DEXA.2006.132","DOIUrl":"https://doi.org/10.1109/DEXA.2006.132","url":null,"abstract":"Scalable distributed data structures (SDDSs) store large scalable files over a distributed RAM of nodes in a grid or a P2P network. The files scale transparently for the applications. The prototype system was designed by CERIA, experiments with this technology for Wintel multicomputers. The application may manipulate data much faster than on local disks. We present the functions we have put into the prototype we now call SDDS-2004. We improve the searches and updates of records in our SDDS files. An original property of these functions is the use of the algebraic signatures. This technique serves the distributed non-key record search. The search may concern the entire field or a (sub)string. The algebraic properties of the signatures act similarly to hash schemes in the work of R.M. Karp and M.O. Rabin (1987). In particular, sending a few-byte signature of the searched string alone, suffices for the search. This makes the communication between the SDDS client and server more efficient. It is also more confidential, since the signature in an intercepted message does not disclose the searched string. On the other hand, we use the signatures for the update management. The clients do not need to then to send updates which in fact do not change the stored records. Finally, our signatures help managing the concurrency control. We present our architecture and design choices. Performance measures validate our implementation. It is now available for download in site of CERIA","PeriodicalId":282986,"journal":{"name":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131400984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, in order to reduce communication overhead in mobile ad hoc networks, we present a zone-based hierarchical link state routing protocol with gateway flooding (ZHLS-GF) in which a new flooding scheme, called gateway flooding is proposed. ZHLS-GF is based on ZHLS, a zone-based hierarchical link state routing protocol. ZHLS is a hierarchical routing protocol for mobile ad hoc networks in which a network is divided into non-overlapping zones. All network nodes in ZHLS construct two routing tables, an intrazone routing table and an inter-zone routing table, by flooding NodeLSPs within the zone and ZoneLSPs throughout the network. However, this incurs a large communication overhead in the network. Our proposed flooding scheme floods ZoneLSPs only to the gateway nodes of zones thus reduces the communication overhead significantly. Furthermore in ZHLS-GF, only the gateway nodes store ZoneLSPs and construct interzone routing tables therefore the total storage capacity required in the network is less than ZHLS
{"title":"An Efficient ZHLS Routing Protocol for Mobile Ad Hoc Networks","authors":"Takashi Hamma, T. Katoh, B. B. Bista, T. Takata","doi":"10.1109/DEXA.2006.24","DOIUrl":"https://doi.org/10.1109/DEXA.2006.24","url":null,"abstract":"In this paper, in order to reduce communication overhead in mobile ad hoc networks, we present a zone-based hierarchical link state routing protocol with gateway flooding (ZHLS-GF) in which a new flooding scheme, called gateway flooding is proposed. ZHLS-GF is based on ZHLS, a zone-based hierarchical link state routing protocol. ZHLS is a hierarchical routing protocol for mobile ad hoc networks in which a network is divided into non-overlapping zones. All network nodes in ZHLS construct two routing tables, an intrazone routing table and an inter-zone routing table, by flooding NodeLSPs within the zone and ZoneLSPs throughout the network. However, this incurs a large communication overhead in the network. Our proposed flooding scheme floods ZoneLSPs only to the gateway nodes of zones thus reduces the communication overhead significantly. Furthermore in ZHLS-GF, only the gateway nodes store ZoneLSPs and construct interzone routing tables therefore the total storage capacity required in the network is less than ZHLS","PeriodicalId":282986,"journal":{"name":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117096994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Relational data may have access limitations, i.e., relations may require certain attributes to be selected when they are accessed; this happens, for instance, while querying Web data sources (wrapped in relational form) or legacy databases. It is known that the evaluation of a conjunctive query under access limitations requires a recursive algorithm that is encoded into a Datalog program. In this paper we consider the problem of optimising query answering in this setting, where the query language is that of conjunctive queries. We review some optimisation techniques for this problem, that aim to reduce the number of accesses to the data in the query plan. Then we argue that checking query containment is necessary in this case for achieving effective query optimisation. Checking containment in the presence of access limitations would amount to check containment of recursive DATALOG programs, which is undecidable in general. We show however that, due to the specific form of the DATALOG programs resulting from encoding access limitations, the containment problem is indeed decidable. We present a decision procedure, first presented in a paper by the authors and based on chase techniques, and we analyse its computational complexity
{"title":"Optimising Query Answering in the Presence of Access Limitations (Position Paper)","authors":"A. Calí, Diego Calvanese","doi":"10.1109/DEXA.2006.109","DOIUrl":"https://doi.org/10.1109/DEXA.2006.109","url":null,"abstract":"Relational data may have access limitations, i.e., relations may require certain attributes to be selected when they are accessed; this happens, for instance, while querying Web data sources (wrapped in relational form) or legacy databases. It is known that the evaluation of a conjunctive query under access limitations requires a recursive algorithm that is encoded into a Datalog program. In this paper we consider the problem of optimising query answering in this setting, where the query language is that of conjunctive queries. We review some optimisation techniques for this problem, that aim to reduce the number of accesses to the data in the query plan. Then we argue that checking query containment is necessary in this case for achieving effective query optimisation. Checking containment in the presence of access limitations would amount to check containment of recursive DATALOG programs, which is undecidable in general. We show however that, due to the specific form of the DATALOG programs resulting from encoding access limitations, the containment problem is indeed decidable. We present a decision procedure, first presented in a paper by the authors and based on chase techniques, and we analyse its computational complexity","PeriodicalId":282986,"journal":{"name":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124763376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Karanikas, N. Pelekis, Dimitrios K. Iakovidis, Ioannis Kopanakis, T. Mavroudakis, Y. Theodoridis
State of the art in multimedia technology focuses in managing data collected from various sources, including documents, images, video, and speech. Therefore the effective management, analysis and mining of such heterogeneous data require the combination of various techniques. In this paper, we present an overview of the funded MetaOn project. The core objective of MetaOn is to construct and integrate semantically rich metadata collections extracted from documents, images and linguistic resources, to facilitate intelligent search and analysis. The proposed MetaOn framework involves ontology-based information extraction and data mining, semi-automatic construction of domain specific ontologies, content-based image indexing and retrieval, and metadata management. The Hellenic history has been chosen as a challenging application case study
{"title":"MetaOn - Ontology Driven Metadata Construction and Management for Intelligent Search in Text and Image Collections","authors":"H. Karanikas, N. Pelekis, Dimitrios K. Iakovidis, Ioannis Kopanakis, T. Mavroudakis, Y. Theodoridis","doi":"10.1109/DEXA.2006.95","DOIUrl":"https://doi.org/10.1109/DEXA.2006.95","url":null,"abstract":"State of the art in multimedia technology focuses in managing data collected from various sources, including documents, images, video, and speech. Therefore the effective management, analysis and mining of such heterogeneous data require the combination of various techniques. In this paper, we present an overview of the funded MetaOn project. The core objective of MetaOn is to construct and integrate semantically rich metadata collections extracted from documents, images and linguistic resources, to facilitate intelligent search and analysis. The proposed MetaOn framework involves ontology-based information extraction and data mining, semi-automatic construction of domain specific ontologies, content-based image indexing and retrieval, and metadata management. The Hellenic history has been chosen as a challenging application case study","PeriodicalId":282986,"journal":{"name":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129317803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Hussain, Md. Rafiqul Islam, E. Shakshuki, M. Zaman
This paper investigates the architecture and design of agent-based sensor networks for petroleum offshore monitoring. A few challenges to monitor the reservoir, wellbore and wellhead are identified. Moreover, the necessary components for a reliable, precise, and accurate monitoring are suggested. The paper describes the architecture of the routing agent and discusses the cross layer optimization issues for query processing. The paper also provides the software design and components for a Web-based continuous monitoring application
{"title":"Agent-Based Petroleum Offshore Monitoring Using Sensor Networks","authors":"S. Hussain, Md. Rafiqul Islam, E. Shakshuki, M. Zaman","doi":"10.1109/DEXA.2006.22","DOIUrl":"https://doi.org/10.1109/DEXA.2006.22","url":null,"abstract":"This paper investigates the architecture and design of agent-based sensor networks for petroleum offshore monitoring. A few challenges to monitor the reservoir, wellbore and wellhead are identified. Moreover, the necessary components for a reliable, precise, and accurate monitoring are suggested. The paper describes the architecture of the routing agent and discusses the cross layer optimization issues for query processing. The paper also provides the software design and components for a Web-based continuous monitoring application","PeriodicalId":282986,"journal":{"name":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132047008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High throughput sampling of expressed sequence tags (ESTs) has generated huge collections of transcripts that are difficult to compare with each other using existing tools for sequence matching. The major problem is lack of computer memory. We therefore present a new exact and memory efficient algorithm for the simultaneous identification of matching substrings in large sets of sequences. Its application to more than six million human ESTs in Genbank of date 2005-04-06, counting more than 3.3 billion base pairs, takes less than four hours to find all more than seven million clusters of multiple substrings of at least 50 nucleotides in length, say, by using a standard PC with 2 GB of RAM, 2.8 GHz processor speed. The corresponding program ClustDB is able to handle at least eight times more data than VMATCH, the most memory efficient exact software known today. Our program is freely available for academic use
{"title":"ClustDB: A High-Performance Tool for Large Scale Sequence Matching","authors":"J. Kleffe, Friedrich Möller, B. Wittig","doi":"10.1109/DEXA.2006.40","DOIUrl":"https://doi.org/10.1109/DEXA.2006.40","url":null,"abstract":"High throughput sampling of expressed sequence tags (ESTs) has generated huge collections of transcripts that are difficult to compare with each other using existing tools for sequence matching. The major problem is lack of computer memory. We therefore present a new exact and memory efficient algorithm for the simultaneous identification of matching substrings in large sets of sequences. Its application to more than six million human ESTs in Genbank of date 2005-04-06, counting more than 3.3 billion base pairs, takes less than four hours to find all more than seven million clusters of multiple substrings of at least 50 nucleotides in length, say, by using a standard PC with 2 GB of RAM, 2.8 GHz processor speed. The corresponding program ClustDB is able to handle at least eight times more data than VMATCH, the most memory efficient exact software known today. Our program is freely available for academic use","PeriodicalId":282986,"journal":{"name":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133546095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}