Ken'ichi Ishikawa, Atsuyuki Morishima, S. Sugimoto
Today, more and more people in knowledge communities, like research laboratories, use shared file servers to store and share their information. People in such communities often work together and their files stored in a file server have relationships with each other. Information on the relationships is usually exchanged offline and used implicitly to facilitate the management and sharing of the files. This paper proposes new functions to manage and use the relationships to make various views on the file servers. The functions provide a high-level support and are compatible with the operational framework of existing file systems.
{"title":"New Functions of File Systems to Manage Information Shared by Communities","authors":"Ken'ichi Ishikawa, Atsuyuki Morishima, S. Sugimoto","doi":"10.1109/ICDEW.2006.98","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.98","url":null,"abstract":"Today, more and more people in knowledge communities, like research laboratories, use shared file servers to store and share their information. People in such communities often work together and their files stored in a file server have relationships with each other. Information on the relationships is usually exchanged offline and used implicitly to facilitate the management and sharing of the files. This paper proposes new functions to manage and use the relationships to make various views on the file servers. The functions provide a high-level support and are compatible with the operational framework of existing file systems.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115897431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The resource lookup requirements in applications such as web caching, web content search, content distribution, resource sharing, network monitoring and management, and e-commerce have caught the attention of peer-to-peer (P2P) distributed systems researchers. Over the past few years, several decentralized P2P lookup system designs have been proposed for addressing these requirements. Most of these early designs are targeted at specific applications. Unfortunately, the variations in the operating environments and lookup characteristics across applications restricts the applicability of such specialized designs. In this paper, we present an architecture for P2P systems that identifies the functions necessary for designing resource lookup systems with wide applicability. We demonstrate the usefulness of the functions included in the architecture by illustrating their use in developing diverse lookup techniques.
{"title":"A Peer-to-Peer Architecture to Enable Versatile Lookup System Design","authors":"Vivek Sawant, J. Kaur","doi":"10.1109/ICDEW.2006.17","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.17","url":null,"abstract":"The resource lookup requirements in applications such as web caching, web content search, content distribution, resource sharing, network monitoring and management, and e-commerce have caught the attention of peer-to-peer (P2P) distributed systems researchers. Over the past few years, several decentralized P2P lookup system designs have been proposed for addressing these requirements. Most of these early designs are targeted at specific applications. Unfortunately, the variations in the operating environments and lookup characteristics across applications restricts the applicability of such specialized designs. In this paper, we present an architecture for P2P systems that identifies the functions necessary for designing resource lookup systems with wide applicability. We demonstrate the usefulness of the functions included in the architecture by illustrating their use in developing diverse lookup techniques.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115208294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Overload management has been an important problem for large-scale dynamic systems. In this paper, we study this problem in the context of our Borealis distributed stream processing system. We show that server nodes must coordinate in their load shedding decisions to achieve global control on output quality. We describe a distributed load shedding approach which provides this coordination by upstream metadata aggregation and propagation. Metadata enables an upstream node to make fast local load shedding decisions which will influence its descendant nodes in the best possible way.
{"title":"Dealing with Overload in Distributed Stream Processing Systems","authors":"Nesime Tatbul, S. Zdonik","doi":"10.1109/ICDEW.2006.45","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.45","url":null,"abstract":"Overload management has been an important problem for large-scale dynamic systems. In this paper, we study this problem in the context of our Borealis distributed stream processing system. We show that server nodes must coordinate in their load shedding decisions to achieve global control on output quality. We describe a distributed load shedding approach which provides this coordination by upstream metadata aggregation and propagation. Metadata enables an upstream node to make fast local load shedding decisions which will influence its descendant nodes in the best possible way.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114784442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steven P. Callahan, J. Freire, E. Santos, C. Scheidegger, Cláudio T. Silva, H. Vo
Scientists are now faced with an incredible volume of data to analyze. To successfully analyze and validate various hypotheses, it is necessary to pose several queries, correlate disparate data, and create insightful visualizations of both the simulated processes and observed phenomena. Data exploration through visualization requires scientists to go through several steps. In essence, they need to assemble complex workflows that consist of dataset selection, specification of series of operations that need to be applied to the data, and the creation of appropriate visual representations, before they can finally view and analyze the results. Often, insight comes from comparing the results of multiple visualizations that are created during the data exploration process.
{"title":"Managing the Evolution of Dataflows with VisTrails","authors":"Steven P. Callahan, J. Freire, E. Santos, C. Scheidegger, Cláudio T. Silva, H. Vo","doi":"10.1109/ICDEW.2006.75","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.75","url":null,"abstract":"Scientists are now faced with an incredible volume of data to analyze. To successfully analyze and validate various hypotheses, it is necessary to pose several queries, correlate disparate data, and create insightful visualizations of both the simulated processes and observed phenomena. Data exploration through visualization requires scientists to go through several steps. In essence, they need to assemble complex workflows that consist of dataset selection, specification of series of operations that need to be applied to the data, and the creation of appropriate visual representations, before they can finally view and analyze the results. Often, insight comes from comparing the results of multiple visualizations that are created during the data exploration process.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116254380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Just as the link structure of the web is a critical component in today's web search, complex relationships (i.e., the different ways the dots are connected) will be an important component in tomorrow's web search technologies. In this paper, I summarize my research on answering the question of: How we can exploit semantic relationships of named-entities to improve relevance in search and ranking of documents? The intuition of my approach is to first analyze the relationships of namedentities with respect to a query. Second, relevance weights, which are assigned by human experts, can then be used to guarantee results within a relevance threshold. These relevance measures can be applied both for searching and ranking of documents.
{"title":"Searching and Ranking Documents based on Semantic Relationships","authors":"Boanerges Aleman-Meza","doi":"10.1109/ICDEW.2006.131","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.131","url":null,"abstract":"Just as the link structure of the web is a critical component in today's web search, complex relationships (i.e., the different ways the dots are connected) will be an important component in tomorrow's web search technologies. In this paper, I summarize my research on answering the question of: How we can exploit semantic relationships of named-entities to improve relevance in search and ranking of documents? The intuition of my approach is to first analyze the relationships of namedentities with respect to a query. Second, relevance weights, which are assigned by human experts, can then be used to guarantee results within a relevance threshold. These relevance measures can be applied both for searching and ranking of documents.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116293418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fraud detection is of great importance to financial institutions. This paper is concerned with the problem of finding outliers in time series financial data using Peer Group Analysis (PGA), which is an unsupervised technique for fraud detection. The objective of PGA is to characterize the expected pattern of behavior around the target sequence in terms of the behavior of similar objects, and then to detect any difference in evolution between the expected pattern and the target. The tool has been applied to the stock market data, which has been collected from Bangladesh Stock Exchange to assess its performance in stock fraud detection. We observed PGA can detect those brokers who suddenly start selling the stock in a different way to other brokers to whom they were previously similar. We also applied t-statistics to find the deviations effectively.
欺诈检测对金融机构来说非常重要。本文研究了一种无监督的欺诈检测技术——对等群分析(Peer Group Analysis, PGA)在时间序列金融数据中发现异常值的问题。PGA的目标是根据相似对象的行为来描述目标序列周围的预期行为模式,然后检测预期模式与目标之间的进化差异。该工具已应用于股票市场数据,这些数据已从孟加拉国证券交易所收集,以评估其在股票欺诈检测方面的表现。我们观察到PGA可以检测到那些突然开始以不同的方式出售股票的经纪人,而这些经纪人之前与他们相似。我们还应用了t统计量来有效地找到偏差。
{"title":"Unsupervised Outlier Detection in Time Series Data","authors":"Z. Ferdousi, Akira Maeda","doi":"10.1109/ICDEW.2006.157","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.157","url":null,"abstract":"Fraud detection is of great importance to financial institutions. This paper is concerned with the problem of finding outliers in time series financial data using Peer Group Analysis (PGA), which is an unsupervised technique for fraud detection. The objective of PGA is to characterize the expected pattern of behavior around the target sequence in terms of the behavior of similar objects, and then to detect any difference in evolution between the expected pattern and the target. The tool has been applied to the stock market data, which has been collected from Bangladesh Stock Exchange to assess its performance in stock fraud detection. We observed PGA can detect those brokers who suddenly start selling the stock in a different way to other brokers to whom they were previously similar. We also applied t-statistics to find the deviations effectively.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117337467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Applying conventional sequential pattern mining methods to text data extracts many uninteresting patterns, which increases the time to interpret the extracted patterns. To solve this problem, we propose a new sequential pattern mining algorithm by adopting the following two constraints. One is to select sequences with regard to item intervals--the number of items between any two adjacent items in a sequence--and the other is to select sequences with regard to item attributes. Using Amazon customer reviews in the book category, we have confirmed that our method is able to extract patterns faster than the conventional method, and is better able to exclude uninteresting patterns while retaining the patterns of interest.
{"title":"Text Mining using PrefixSpan constrained by Item Interval and Item Attribute","authors":"Issei Sato, Yu Hirate, H. Yamana","doi":"10.1109/ICDEW.2006.142","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.142","url":null,"abstract":"Applying conventional sequential pattern mining methods to text data extracts many uninteresting patterns, which increases the time to interpret the extracted patterns. To solve this problem, we propose a new sequential pattern mining algorithm by adopting the following two constraints. One is to select sequences with regard to item intervals--the number of items between any two adjacent items in a sequence--and the other is to select sequences with regard to item attributes. Using Amazon customer reviews in the book category, we have confirmed that our method is able to extract patterns faster than the conventional method, and is better able to exclude uninteresting patterns while retaining the patterns of interest.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122877670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jyotishman Pathak, Samik Basu, R. Lutz, Vasant G Honavar
Development of sound approaches and software tools for specification, assembly, and deployment of composite Web services from independently developed components promises to enhance collaborative software design and reuse. In this context, the proposed research introduces a new incremental approach to service composition, MoSCoE (Modeling Web Service Composition and Execution), based on the three steps of abstraction, composition and refinement. Abstraction refers to the high-level description of the service desired (goal) by the user, which drives the identification of an appropriate composition strategy. In the event that such a composition is not realizable, MoSCoE guides the user through successive refinements of the specification towards a realizable goal service that meets the user requirements.
{"title":"MoSCoE: A Framework for Modeling Web Service Composition and Execution","authors":"Jyotishman Pathak, Samik Basu, R. Lutz, Vasant G Honavar","doi":"10.1109/ICDEW.2006.96","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.96","url":null,"abstract":"Development of sound approaches and software tools for specification, assembly, and deployment of composite Web services from independently developed components promises to enhance collaborative software design and reuse. In this context, the proposed research introduces a new incremental approach to service composition, MoSCoE (Modeling Web Service Composition and Execution), based on the three steps of abstraction, composition and refinement. Abstraction refers to the high-level description of the service desired (goal) by the user, which drives the identification of an appropriate composition strategy. In the event that such a composition is not realizable, MoSCoE guides the user through successive refinements of the specification towards a realizable goal service that meets the user requirements.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"417 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117322849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The wide spread of location-based services results in a strong market for location-detection devices (e.g., GPS-like devices, RFIDs, handheld devices, and cellular phones). Examples of location-based services include location-aware emergency service, location-based advertisement, live traffic reports, and location-based store finder. However, location-detection devices pose a major privacy threat on its users where it transmits private information (i.e., the location) to the server who may be untrustworthy. The existing model of location-based applications trades service with privacy where if a user wants to keep her private location information, she has to turn off her location-detection device, i.e., unsubscribe from the service. This paper tackles this model in a way that protects the user privacy while keeping the functionality of location-based services. The main idea is to employ a trusted third party, the Location Anonymizer, that expands the user location into a spatial region such that: (1) The exact user location can lie anywhere in the spatial region, and (2) There are k other users within the expanded spatial region so that each user is k-anonymous. The location-based database server is equipped with additional functionalities that support spatio-temporal queries based on the spatial region received from the location anonymizer rather than the exact point location received from the user.
{"title":"Towards Privacy-Aware Location-Based Database Servers","authors":"M. Mokbel","doi":"10.1109/ICDEW.2006.152","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.152","url":null,"abstract":"The wide spread of location-based services results in a strong market for location-detection devices (e.g., GPS-like devices, RFIDs, handheld devices, and cellular phones). Examples of location-based services include location-aware emergency service, location-based advertisement, live traffic reports, and location-based store finder. However, location-detection devices pose a major privacy threat on its users where it transmits private information (i.e., the location) to the server who may be untrustworthy. The existing model of location-based applications trades service with privacy where if a user wants to keep her private location information, she has to turn off her location-detection device, i.e., unsubscribe from the service. This paper tackles this model in a way that protects the user privacy while keeping the functionality of location-based services. The main idea is to employ a trusted third party, the Location Anonymizer, that expands the user location into a spatial region such that: (1) The exact user location can lie anywhere in the spatial region, and (2) There are k other users within the expanded spatial region so that each user is k-anonymous. The location-based database server is equipped with additional functionalities that support spatio-temporal queries based on the spatial region received from the location anonymizer rather than the exact point location received from the user.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125082480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An appropriate database language characteristics leading to the success of declarative query processing - and, in turn, to the rise of relational DBMSs in general - always provides more than one way of evaluating a query. This counts for structurally different but logically equivalent query evaluation plans (QEPs) as well as for different implementations of the same logical operator. This principle surely holds for the novel XML database management systems (XDBMSs): Recently proposed operators for XML query processing can be grouped into the logical operators Structural Join [1, 22] and Holistic Twig Join [3, 6, 16]. Depending on available internal system mechanisms, a lot of opportunities exist how to implement these operators (two of which are presented in this paper.
{"title":"Twig Query Processing Under Concurrent Updates","authors":"Christian Mathis, T. Härder","doi":"10.1109/ICDEW.2006.156","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.156","url":null,"abstract":"An appropriate database language characteristics leading to the success of declarative query processing - and, in turn, to the rise of relational DBMSs in general - always provides more than one way of evaluating a query. This counts for structurally different but logically equivalent query evaluation plans (QEPs) as well as for different implementations of the same logical operator. This principle surely holds for the novel XML database management systems (XDBMSs): Recently proposed operators for XML query processing can be grouped into the logical operators Structural Join [1, 22] and Holistic Twig Join [3, 6, 16]. Depending on available internal system mechanisms, a lot of opportunities exist how to implement these operators (two of which are presented in this paper.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127241066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}