Fraud detection is of great importance to financial institutions. This paper is concerned with the problem of finding outliers in time series financial data using Peer Group Analysis (PGA), which is an unsupervised technique for fraud detection. The objective of PGA is to characterize the expected pattern of behavior around the target sequence in terms of the behavior of similar objects, and then to detect any difference in evolution between the expected pattern and the target. The tool has been applied to the stock market data, which has been collected from Bangladesh Stock Exchange to assess its performance in stock fraud detection. We observed PGA can detect those brokers who suddenly start selling the stock in a different way to other brokers to whom they were previously similar. We also applied t-statistics to find the deviations effectively.
欺诈检测对金融机构来说非常重要。本文研究了一种无监督的欺诈检测技术——对等群分析(Peer Group Analysis, PGA)在时间序列金融数据中发现异常值的问题。PGA的目标是根据相似对象的行为来描述目标序列周围的预期行为模式,然后检测预期模式与目标之间的进化差异。该工具已应用于股票市场数据,这些数据已从孟加拉国证券交易所收集,以评估其在股票欺诈检测方面的表现。我们观察到PGA可以检测到那些突然开始以不同的方式出售股票的经纪人,而这些经纪人之前与他们相似。我们还应用了t统计量来有效地找到偏差。
{"title":"Unsupervised Outlier Detection in Time Series Data","authors":"Z. Ferdousi, Akira Maeda","doi":"10.1109/ICDEW.2006.157","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.157","url":null,"abstract":"Fraud detection is of great importance to financial institutions. This paper is concerned with the problem of finding outliers in time series financial data using Peer Group Analysis (PGA), which is an unsupervised technique for fraud detection. The objective of PGA is to characterize the expected pattern of behavior around the target sequence in terms of the behavior of similar objects, and then to detect any difference in evolution between the expected pattern and the target. The tool has been applied to the stock market data, which has been collected from Bangladesh Stock Exchange to assess its performance in stock fraud detection. We observed PGA can detect those brokers who suddenly start selling the stock in a different way to other brokers to whom they were previously similar. We also applied t-statistics to find the deviations effectively.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117337467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jyotishman Pathak, Samik Basu, R. Lutz, Vasant G Honavar
Development of sound approaches and software tools for specification, assembly, and deployment of composite Web services from independently developed components promises to enhance collaborative software design and reuse. In this context, the proposed research introduces a new incremental approach to service composition, MoSCoE (Modeling Web Service Composition and Execution), based on the three steps of abstraction, composition and refinement. Abstraction refers to the high-level description of the service desired (goal) by the user, which drives the identification of an appropriate composition strategy. In the event that such a composition is not realizable, MoSCoE guides the user through successive refinements of the specification towards a realizable goal service that meets the user requirements.
{"title":"MoSCoE: A Framework for Modeling Web Service Composition and Execution","authors":"Jyotishman Pathak, Samik Basu, R. Lutz, Vasant G Honavar","doi":"10.1109/ICDEW.2006.96","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.96","url":null,"abstract":"Development of sound approaches and software tools for specification, assembly, and deployment of composite Web services from independently developed components promises to enhance collaborative software design and reuse. In this context, the proposed research introduces a new incremental approach to service composition, MoSCoE (Modeling Web Service Composition and Execution), based on the three steps of abstraction, composition and refinement. Abstraction refers to the high-level description of the service desired (goal) by the user, which drives the identification of an appropriate composition strategy. In the event that such a composition is not realizable, MoSCoE guides the user through successive refinements of the specification towards a realizable goal service that meets the user requirements.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117322849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As various models and languages have been proposed to handle information in the Semantic Web, it is important to be able to translate data from one to another. By referring to two specific models, namely RDF and Topic Maps, we propose a meta-modelling approach, based on previous experiences on handling heterogeneity in the database world.
{"title":"Management of Heterogeneity in the SemanticWeb","authors":"P. Atzeni, P. D. Nostro","doi":"10.1109/ICDEW.2006.74","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.74","url":null,"abstract":"As various models and languages have been proposed to handle information in the Semantic Web, it is important to be able to translate data from one to another. By referring to two specific models, namely RDF and Topic Maps, we propose a meta-modelling approach, based on previous experiences on handling heterogeneity in the database world.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127892614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Widespread interest in time-series similarity search has made more in need of efficient technique, which can reduce dimensionality of the data and then to index it easily using a multidimensional structure. In this paper, we introduce a new technique, which we called grid representation, based on a grid approximation of the data. We propose a lower bounding distance measure that enables a bitmap approach for fast computation and searching. We also show how grid representation can be indexed with a multidimensional index structure, and demonstrate its superiority.
{"title":"Grid Representation for Efficient Similarity Search in Time Series Databases","authors":"Guifang Duan, Yu Suzuki, K. Kawagoe","doi":"10.1109/ICDEW.2006.63","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.63","url":null,"abstract":"Widespread interest in time-series similarity search has made more in need of efficient technique, which can reduce dimensionality of the data and then to index it easily using a multidimensional structure. In this paper, we introduce a new technique, which we called grid representation, based on a grid approximation of the data. We propose a lower bounding distance measure that enables a bitmap approach for fast computation and searching. We also show how grid representation can be indexed with a multidimensional index structure, and demonstrate its superiority.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128003232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During the wrapping of web interfaces ontological know-ledge is important in order to support an automated interpretation of information. The development of ontologies is a time consuming issue and not realistic in global contexts. On the other hand, the web provides a huge amount of knowledge, which can be used instead of ontologies. Three common classes of web knowledge sources are: Web Thesauri, search engines and Web encyclopedias. The paper investigates how Web knowledge can be utilized to solve the three semantic problems Parameter Finding for Query Interfaces, Labeling of Values and Relabeling after interface evolution. For the solution of the parameter finding problem an algorithm has been implemented using the web encyclopedia WikiPedia for the initial identification of parameter value candidates and the search engine Google for a validation of label-value relationships. The approach has been integrated into a wrapper definition framework.
{"title":"UsingWeb Knowledge to Improve the Wrapping of Web Sources","authors":"Thomas Kabisch, Ronald Padur, D. Rother","doi":"10.1109/ICDEW.2006.160","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.160","url":null,"abstract":"During the wrapping of web interfaces ontological know-ledge is important in order to support an automated interpretation of information. The development of ontologies is a time consuming issue and not realistic in global contexts. On the other hand, the web provides a huge amount of knowledge, which can be used instead of ontologies. Three common classes of web knowledge sources are: Web Thesauri, search engines and Web encyclopedias. The paper investigates how Web knowledge can be utilized to solve the three semantic problems Parameter Finding for Query Interfaces, Labeling of Values and Relabeling after interface evolution. For the solution of the parameter finding problem an algorithm has been implemented using the web encyclopedia WikiPedia for the initial identification of parameter value candidates and the search engine Google for a validation of label-value relationships. The approach has been integrated into a wrapper definition framework.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133753296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we describe NKRL (Narrative Knowledge Representation Language), a conceptual modeling formalism for taking into account the semantic characteristics of this important component of eChronicle information represented by the ‘narrative’ documents. In these documents, the main part of the information consists in the description of the ‘events’ that relate the real or intended behavior of some ‘actors’. Narrative documents of an industrial and economic interest correspond to news stories, corporate documents, normative and legal texts, intelligence messages, medical records, etc. NKRL employs several representational principles and some high-level inference tools.
{"title":"Modeling and Advanced Exploitation of eChronicle ‘Narrative’ Information","authors":"G. P. Zarri","doi":"10.1109/ICDEW.2006.95","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.95","url":null,"abstract":"In this paper, we describe NKRL (Narrative Knowledge Representation Language), a conceptual modeling formalism for taking into account the semantic characteristics of this important component of eChronicle information represented by the ‘narrative’ documents. In these documents, the main part of the information consists in the description of the ‘events’ that relate the real or intended behavior of some ‘actors’. Narrative documents of an industrial and economic interest correspond to news stories, corporate documents, normative and legal texts, intelligence messages, medical records, etc. NKRL employs several representational principles and some high-level inference tools.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130188007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large data integration projects must often cope with undocumented data sources. Schema discovery aims at automatically finding structures in such cases. An important class of relationships between attributes that can be detected automatically are inclusion dependencies (IND), which provide an excellent basis for guessing foreign key constraints. INDs can be discovered by comparing the sets of distinct values of pairs of attributes. In this paper we present efficient algorithms for finding unary INDs. We first show that (and why) SQL is not suitable for this task. We then develop two algorithms that compute inclusion dependencies outside of the database. Both are much faster than the SQL-based methods; in fact, for larger schemas they are the only feasible solution. Our experiments show that we can compute all unary INDs in a schema of 1, 680 attributes with a total database size of 3.2 GB in approximately 2.5 hours.
{"title":"Efficiently Computing Inclusion Dependencies for Schema Discovery","authors":"Jana Bauckmann, U. Leser, Felix Naumann","doi":"10.1109/ICDEW.2006.54","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.54","url":null,"abstract":"Large data integration projects must often cope with undocumented data sources. Schema discovery aims at automatically finding structures in such cases. An important class of relationships between attributes that can be detected automatically are inclusion dependencies (IND), which provide an excellent basis for guessing foreign key constraints. INDs can be discovered by comparing the sets of distinct values of pairs of attributes. In this paper we present efficient algorithms for finding unary INDs. We first show that (and why) SQL is not suitable for this task. We then develop two algorithms that compute inclusion dependencies outside of the database. Both are much faster than the SQL-based methods; in fact, for larger schemas they are the only feasible solution. Our experiments show that we can compute all unary INDs in a schema of 1, 680 attributes with a total database size of 3.2 GB in approximately 2.5 hours.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129814442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Web pages are collected and stored in Web archives, and several methods to construct Web archives have been developed. We propose a method to retrieve time series of Web pages from Web archives by using the pages’ temporal characteristics. We present two processes for searching Web archives based on the temporal relation of query keywords. One is a method for determining the relation. The other is a method of inquiring Web pages based on the relation. In this paper, we discuss the two processes and an experimental result of the method.
{"title":"A Temporal Clustering Method forWeb Archives","authors":"T. Kage, K. Sumiya","doi":"10.1109/ICDEW.2006.23","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.23","url":null,"abstract":"Web pages are collected and stored in Web archives, and several methods to construct Web archives have been developed. We propose a method to retrieve time series of Web pages from Web archives by using the pages’ temporal characteristics. We present two processes for searching Web archives based on the temporal relation of query keywords. One is a method for determining the relation. The other is a method of inquiring Web pages based on the relation. In this paper, we discuss the two processes and an experimental result of the method.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132226434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiyao Lu, W. Meng, Wanjing Zhang, King-Lup Liu, Clement T. Yu
The publication time of a page can have a big impact on its relevance to a query, especially for time-sensitive pages such as news items. For news search engines, the publication time of news items can usually be found in the returned search result records. In this paper, we introduce a method that can automatically extract the publication time for each news story returned from news search engines based on several important observations we made. We also introduce a wrapper implementation for the extraction method. The experimental results using data collected from 50 news search engine show that our method is effective and the wrapper implementation can not only improve the extraction accuracy but also the extraction efficiency.
{"title":"Automatic Extraction of Publication Time from News Search Results","authors":"Yiyao Lu, W. Meng, Wanjing Zhang, King-Lup Liu, Clement T. Yu","doi":"10.1109/ICDEW.2006.35","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.35","url":null,"abstract":"The publication time of a page can have a big impact on its relevance to a query, especially for time-sensitive pages such as news items. For news search engines, the publication time of news items can usually be found in the returned search result records. In this paper, we introduce a method that can automatically extract the publication time for each news story returned from news search engines based on several important observations we made. We also introduce a wrapper implementation for the extraction method. The experimental results using data collected from 50 news search engine show that our method is effective and the wrapper implementation can not only improve the extraction accuracy but also the extraction efficiency.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123777893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The adoption of XML to represent any kind of data and documents, even complex and huge, is becoming a matter of fact. However, interfacing algorithms and applications with XML Parsers requires to adapt algorithms and applications: event-based SAX Parsers need algorithms that react to events generated by the parser. But parsing/loading XML documents provides poor performance (if compared to reading flat files): therefore, several researches are trying to address this problem by improving the parsing phase, e.g., by adopting condensed or binary representations of XML documents. This paper deals with the other side of the coin, i.e., the problem of coupling algorithms with XML Parsers, in a way that does not require to change the active (polling-based) nature of many algorithms and provides acceptable performance during execution; this problem becomes even more important when we consider Java algorithms, that usually are less efficient than C or C++ algorithms. This paper presents a study about the problem of loosely coupling Java algorithms with XML Parsers. The coupling is loose because the algorithm should be unaware of the particular interface provided by parsers. We consider several coupling techniques, and we compare them by analyzing their performance. The evaluation leads us to identify the coupling techniques that perform better, depending on the specific algorithm’s needs and application scenario.
{"title":"Loosely Coupling Java Algorithms and XML Parsers: a Performance-Oriented Study","authors":"G. Psaila","doi":"10.1109/ICDEW.2006.73","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.73","url":null,"abstract":"The adoption of XML to represent any kind of data and documents, even complex and huge, is becoming a matter of fact. However, interfacing algorithms and applications with XML Parsers requires to adapt algorithms and applications: event-based SAX Parsers need algorithms that react to events generated by the parser. But parsing/loading XML documents provides poor performance (if compared to reading flat files): therefore, several researches are trying to address this problem by improving the parsing phase, e.g., by adopting condensed or binary representations of XML documents. This paper deals with the other side of the coin, i.e., the problem of coupling algorithms with XML Parsers, in a way that does not require to change the active (polling-based) nature of many algorithms and provides acceptable performance during execution; this problem becomes even more important when we consider Java algorithms, that usually are less efficient than C or C++ algorithms. This paper presents a study about the problem of loosely coupling Java algorithms with XML Parsers. The coupling is loose because the algorithm should be unaware of the particular interface provided by parsers. We consider several coupling techniques, and we compare them by analyzing their performance. The evaluation leads us to identify the coupling techniques that perform better, depending on the specific algorithm’s needs and application scenario.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127498965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}