22nd International Conference on Data Engineering Workshops (ICDEW'06)最新文献

英文中文

Unsupervised Outlier Detection in Time Series Data 时间序列数据的无监督离群点检测

22nd International Conference on Data Engineering Workshops (ICDEW'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.157

Z. Ferdousi, Akira Maeda

Fraud detection is of great importance to financial institutions. This paper is concerned with the problem of finding outliers in time series financial data using Peer Group Analysis (PGA), which is an unsupervised technique for fraud detection. The objective of PGA is to characterize the expected pattern of behavior around the target sequence in terms of the behavior of similar objects, and then to detect any difference in evolution between the expected pattern and the target. The tool has been applied to the stock market data, which has been collected from Bangladesh Stock Exchange to assess its performance in stock fraud detection. We observed PGA can detect those brokers who suddenly start selling the stock in a different way to other brokers to whom they were previously similar. We also applied t-statistics to find the deviations effectively.

欺诈检测对金融机构来说非常重要。本文研究了一种无监督的欺诈检测技术——对等群分析(Peer Group Analysis, PGA)在时间序列金融数据中发现异常值的问题。PGA的目标是根据相似对象的行为来描述目标序列周围的预期行为模式，然后检测预期模式与目标之间的进化差异。该工具已应用于股票市场数据，这些数据已从孟加拉国证券交易所收集，以评估其在股票欺诈检测方面的表现。我们观察到PGA可以检测到那些突然开始以不同的方式出售股票的经纪人，而这些经纪人之前与他们相似。我们还应用了t统计量来有效地找到偏差。

引用次数: 90

MoSCoE: A Framework for Modeling Web Service Composition and Execution 建模Web服务组合和执行的框架

22nd International Conference on Data Engineering Workshops (ICDEW'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.96

Jyotishman Pathak, Samik Basu, R. Lutz, Vasant G Honavar

Development of sound approaches and software tools for specification, assembly, and deployment of composite Web services from independently developed components promises to enhance collaborative software design and reuse. In this context, the proposed research introduces a new incremental approach to service composition, MoSCoE (Modeling Web Service Composition and Execution), based on the three steps of abstraction, composition and refinement. Abstraction refers to the high-level description of the service desired (goal) by the user, which drives the identification of an appropriate composition strategy. In the event that such a composition is not realizable, MoSCoE guides the user through successive refinements of the specification towards a realizable goal service that meets the user requirements.

为独立开发的组件的组合Web服务的规范、组装和部署开发可靠的方法和软件工具，有望增强协作式软件设计和重用。在此背景下，本文提出的研究引入了一种新的服务组合增量方法，即基于抽象、组合和细化三个步骤的建模Web服务组合和执行(MoSCoE)。抽象指的是用户期望的服务(目标)的高级描述，它驱动适当组合策略的识别。如果这样的组合是不可实现的，那么MoSCoE将引导用户通过对规范的连续改进来实现满足用户需求的可实现目标服务。

引用次数: 32

Management of Heterogeneity in the SemanticWeb 语义web中的异构性管理

22nd International Conference on Data Engineering Workshops (ICDEW'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.74

P. Atzeni, P. D. Nostro

As various models and languages have been proposed to handle information in the Semantic Web, it is important to be able to translate data from one to another. By referring to two specific models, namely RDF and Topic Maps, we propose a meta-modelling approach, based on previous experiences on handling heterogeneity in the database world.

由于已经提出了各种模型和语言来处理语义Web中的信息，因此能够将数据从一种转换为另一种非常重要。通过参考两个特定的模型，即RDF和Topic Maps，我们提出了一种元建模方法，该方法基于以前在数据库世界中处理异构性的经验。

引用次数: 5

Grid Representation for Efficient Similarity Search in Time Series Databases 时间序列数据库中高效相似度搜索的网格表示

22nd International Conference on Data Engineering Workshops (ICDEW'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.63

Guifang Duan, Yu Suzuki, K. Kawagoe

Widespread interest in time-series similarity search has made more in need of efficient technique, which can reduce dimensionality of the data and then to index it easily using a multidimensional structure. In this paper, we introduce a new technique, which we called grid representation, based on a grid approximation of the data. We propose a lower bounding distance measure that enables a bitmap approach for fast computation and searching. We also show how grid representation can be indexed with a multidimensional index structure, and demonstrate its superiority.

对时间序列相似性搜索的广泛关注使得对数据降维的高效检索技术的需求越来越大，这种技术既可以降低数据的维数，又可以利用多维结构方便地对数据进行索引。在本文中，我们介绍了一种基于数据的网格逼近的新技术，我们称之为网格表示。我们提出了一种较低的边界距离度量，使位图方法能够快速计算和搜索。我们还展示了如何使用多维索引结构对网格表示进行索引，并展示了它的优越性。

引用次数: 6

UsingWeb Knowledge to Improve the Wrapping of Web Sources 利用Web知识改进Web源的包装

22nd International Conference on Data Engineering Workshops (ICDEW'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.160

Thomas Kabisch, Ronald Padur, D. Rother

During the wrapping of web interfaces ontological know-ledge is important in order to support an automated interpretation of information. The development of ontologies is a time consuming issue and not realistic in global contexts. On the other hand, the web provides a huge amount of knowledge, which can be used instead of ontologies. Three common classes of web knowledge sources are: Web Thesauri, search engines and Web encyclopedias. The paper investigates how Web knowledge can be utilized to solve the three semantic problems Parameter Finding for Query Interfaces, Labeling of Values and Relabeling after interface evolution. For the solution of the parameter finding problem an algorithm has been implemented using the web encyclopedia WikiPedia for the initial identification of parameter value candidates and the search engine Google for a validation of label-value relationships. The approach has been integrated into a wrapper definition framework.

在包装web界面的过程中，本体知识对于支持信息的自动解释非常重要。本体的开发是一个耗时的问题，在全局环境中是不现实的。另一方面，网络提供了大量的知识，这些知识可以用来代替本体。三种常见的网络知识来源是:网络词典、搜索引擎和网络百科全书。本文研究了如何利用Web知识来解决查询接口的参数查找、值的标注和接口进化后的重新标注三个语义问题。对于参数查找问题的解决，实现了一种算法，该算法使用网络百科全书WikiPedia对候选参数值进行初始识别，使用搜索引擎Google对标签-值关系进行验证。该方法已集成到包装器定义框架中。

引用次数: 3

Modeling and Advanced Exploitation of eChronicle ‘Narrative’ Information 编年史叙事性信息

22nd International Conference on Data Engineering Workshops (ICDEW'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.95

G. P. Zarri

In this paper, we describe NKRL (Narrative Knowledge Representation Language), a conceptual modeling formalism for taking into account the semantic characteristics of this important component of eChronicle information represented by the ‘narrative’ documents. In these documents, the main part of the information consists in the description of the ‘events’ that relate the real or intended behavior of some ‘actors’. Narrative documents of an industrial and economic interest correspond to news stories, corporate documents, normative and legal texts, intelligence messages, medical records, etc. NKRL employs several representational principles and some high-level inference tools.

在本文中，我们描述了NKRL(叙述性知识表示语言)，这是一种概念建模形式，用于考虑由“叙述性”文档表示的编年史信息的重要组成部分的语义特征。在这些文件中，信息的主要部分包括对“事件”的描述，这些“事件”与某些“参与者”的真实或预期行为有关。具有工业和经济利益的叙述性文件对应于新闻报道、公司文件、规范和法律文本、情报信息、医疗记录等。NKRL采用了几种表示原则和一些高级推理工具。

引用次数: 2

Efficiently Computing Inclusion Dependencies for Schema Discovery 高效计算包含依赖关系的模式发现

22nd International Conference on Data Engineering Workshops (ICDEW'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.54

Jana Bauckmann, U. Leser, Felix Naumann

Large data integration projects must often cope with undocumented data sources. Schema discovery aims at automatically finding structures in such cases. An important class of relationships between attributes that can be detected automatically are inclusion dependencies (IND), which provide an excellent basis for guessing foreign key constraints. INDs can be discovered by comparing the sets of distinct values of pairs of attributes. In this paper we present efficient algorithms for finding unary INDs. We first show that (and why) SQL is not suitable for this task. We then develop two algorithms that compute inclusion dependencies outside of the database. Both are much faster than the SQL-based methods; in fact, for larger schemas they are the only feasible solution. Our experiments show that we can compute all unary INDs in a schema of 1, 680 attributes with a total database size of 3.2 GB in approximately 2.5 hours.

大型数据集成项目通常必须处理未记录的数据源。模式发现的目的是在这种情况下自动查找结构。可以自动检测的属性之间的一类重要关系是包含依赖关系(IND)，它为猜测外键约束提供了很好的基础。可以通过比较属性对的不同值的集合来发现索引。在本文中，我们提出了寻找一元ind的有效算法。我们首先说明(以及为什么)SQL不适合这个任务。然后，我们开发了两个算法来计算数据库外的包含依赖关系。两者都比基于sql的方法快得多;事实上，对于较大的模式，它们是唯一可行的解决方案。我们的实验表明，我们可以在大约2.5小时内计算出包含1,680个属性的模式中的所有一元ind，数据库总大小为3.2 GB。

引用次数: 31

A Temporal Clustering Method forWeb Archives web档案的时间聚类方法

22nd International Conference on Data Engineering Workshops (ICDEW'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.23

T. Kage, K. Sumiya

Web pages are collected and stored in Web archives, and several methods to construct Web archives have been developed. We propose a method to retrieve time series of Web pages from Web archives by using the pages’ temporal characteristics. We present two processes for searching Web archives based on the temporal relation of query keywords. One is a method for determining the relation. The other is a method of inquiring Web pages based on the relation. In this paper, we discuss the two processes and an experimental result of the method.

Web档案是对Web页面进行收集和存储的一种方式，目前已经发展了几种构建Web档案的方法。本文提出了一种利用网页的时间特征从Web存档中检索网页时间序列的方法。提出了两种基于查询关键字时间关系的Web档案检索方法。一个是确定关系的方法。另一种是基于关系查询Web页面的方法。本文讨论了这两个过程，并给出了该方法的实验结果。

引用次数: 2

Automatic Extraction of Publication Time from News Search Results 从新闻搜索结果中自动提取发布时间

22nd International Conference on Data Engineering Workshops (ICDEW'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.35

Yiyao Lu, W. Meng, Wanjing Zhang, King-Lup Liu, Clement T. Yu

The publication time of a page can have a big impact on its relevance to a query, especially for time-sensitive pages such as news items. For news search engines, the publication time of news items can usually be found in the returned search result records. In this paper, we introduce a method that can automatically extract the publication time for each news story returned from news search engines based on several important observations we made. We also introduce a wrapper implementation for the extraction method. The experimental results using data collected from 50 news search engine show that our method is effective and the wrapper implementation can not only improve the extraction accuracy but also the extraction efficiency.

页面的发布时间对其与查询的相关性有很大的影响，特别是对于时间敏感的页面，如新闻条目。对于新闻搜索引擎，通常可以在返回的搜索结果记录中找到新闻条目的发布时间。在本文中，我们介绍了一种方法，该方法可以根据我们所做的几个重要观察，自动提取新闻搜索引擎返回的每个新闻故事的发布时间。我们还介绍了提取方法的包装器实现。使用50个新闻搜索引擎的数据进行实验，结果表明该方法是有效的，包装器的实现不仅提高了提取精度，而且提高了提取效率。

引用次数: 18

Loosely Coupling Java Algorithms and XML Parsers: a Performance-Oriented Study 松散耦合Java算法和XML解析器:面向性能的研究

22nd International Conference on Data Engineering Workshops (ICDEW'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.73

G. Psaila

The adoption of XML to represent any kind of data and documents, even complex and huge, is becoming a matter of fact. However, interfacing algorithms and applications with XML Parsers requires to adapt algorithms and applications: event-based SAX Parsers need algorithms that react to events generated by the parser. But parsing/loading XML documents provides poor performance (if compared to reading flat files): therefore, several researches are trying to address this problem by improving the parsing phase, e.g., by adopting condensed or binary representations of XML documents. This paper deals with the other side of the coin, i.e., the problem of coupling algorithms with XML Parsers, in a way that does not require to change the active (polling-based) nature of many algorithms and provides acceptable performance during execution; this problem becomes even more important when we consider Java algorithms, that usually are less efficient than C or C++ algorithms. This paper presents a study about the problem of loosely coupling Java algorithms with XML Parsers. The coupling is loose because the algorithm should be unaware of the particular interface provided by parsers. We consider several coupling techniques, and we compare them by analyzing their performance. The evaluation leads us to identify the coupling techniques that perform better, depending on the specific algorithm’s needs and application scenario.

采用XML来表示任何类型的数据和文档，甚至是复杂和庞大的数据和文档，正在成为一种事实。但是，将算法和应用程序与XML解析器连接需要调整算法和应用程序:基于事件的SAX解析器需要对解析器生成的事件作出反应的算法。但是解析/加载XML文档提供了较差的性能(如果与读取平面文件相比):因此，一些研究试图通过改进解析阶段来解决这个问题，例如，通过采用XML文档的压缩或二进制表示。本文处理硬币的另一面，即算法与XML解析器的耦合问题，以一种不需要改变许多算法的活动(基于轮询)性质并在执行期间提供可接受的性能的方式;当我们考虑Java算法时，这个问题变得更加重要，因为Java算法通常比C或c++算法效率低。本文研究了Java算法与XML解析器之间的松耦合问题。耦合是松散的，因为算法应该不知道解析器提供的特定接口。我们考虑了几种耦合技术，并通过分析它们的性能对它们进行了比较。评估使我们根据特定算法的需求和应用场景确定性能更好的耦合技术。

{"title":"Loosely Coupling Java Algorithms and XML Parsers: a Performance-Oriented Study","authors":"G. Psaila","doi":"10.1109/ICDEW.2006.73","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.73","url":null,"abstract":"The adoption of XML to represent any kind of data and documents, even complex and huge, is becoming a matter of fact. However, interfacing algorithms and applications with XML Parsers requires to adapt algorithms and applications: event-based SAX Parsers need algorithms that react to events generated by the parser. But parsing/loading XML documents provides poor performance (if compared to reading flat files): therefore, several researches are trying to address this problem by improving the parsing phase, e.g., by adopting condensed or binary representations of XML documents. This paper deals with the other side of the coin, i.e., the problem of coupling algorithms with XML Parsers, in a way that does not require to change the active (polling-based) nature of many algorithms and provides acceptable performance during execution; this problem becomes even more important when we consider Java algorithms, that usually are less efficient than C or C++ algorithms. This paper presents a study about the problem of loosely coupling Java algorithms with XML Parsers. The coupling is loose because the algorithm should be unaware of the particular interface provided by parsers. We consider several coupling techniques, and we compare them by analyzing their performance. The evaluation leads us to identify the coupling techniques that perform better, depending on the specific algorithm’s needs and application scenario.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127498965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

22nd International Conference on Data Engineering Workshops (ICDEW'06)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀