22nd International Conference on Data Engineering (ICDE'06)最新文献

英文中文

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.86

Bee-Chung Chen, Lei Chen, R. Ramakrishnan, D. Musicant

In this paper, we introduce a new class of data mining problems called learning from aggregate views. In contrast to the traditional problem of learning from a single table of training examples, the new goal is to learn from multiple aggregate views of the underlying data, without access to the un-aggregated data. We motivate this new problem, present a general problem framework, develop learning methods for RFA (Restriction-Free Aggregate) views defined using COUNT, SUM, AVG and STDEV, and offer theoretical and experimental results that characterize the proposed methods.

在本文中，我们引入了一类新的数据挖掘问题，称为从聚合视图中学习。与从单个训练样例表中学习的传统问题相比，新的目标是从底层数据的多个聚合视图中学习，而不需要访问未聚合的数据。我们提出了这个新问题，提出了一个通用的问题框架，开发了使用COUNT, SUM, AVG和STDEV定义的RFA(无限制聚合)视图的学习方法，并提供了表征所提出方法的理论和实验结果。

引用次数: 34

SaveRF: Towards Efficient Relevance Feedback Search SaveRF:迈向高效的相关反馈搜索

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.132

Heng Tao Shen, B. Ooi, K. Tan

In multimedia retrieval, a query is typically interactively refined towards the ‘optimal’ answers by exploiting user feedback. However, in existing work, in each iteration, the refined query is re-evaluated. This is not only inefficient but fails to exploit the answers that may be common between iterations. In this paper, we introduce a new approach called SaveRF (Save random accesses in Relevance Feedback) for iterative relevance feedback search. SaveRF predicts the potential candidates for the next iteration and maintains this small set for efficient sequential scan. By doing so, repeated candidate accesses can be saved, hence reducing the number of random accesses. In addition, efficient scan on the overlap before the search starts also tightens the search space with smaller pruning radius. We implemented SaveRF and our experimental study on real life data sets show that it can reduce the I/O cost significantly.

在多媒体检索中，通过利用用户反馈，查询通常被交互式地细化为“最佳”答案。然而，在现有的工作中，在每次迭代中，精炼的查询都会被重新评估。这不仅效率低下，而且无法利用迭代之间可能常见的答案。本文提出了一种新的相关反馈迭代搜索方法SaveRF (Save random access In Relevance Feedback)。SaveRF预测下一次迭代的潜在候选项，并维护这个小集合以进行有效的顺序扫描。这样做可以节省重复的候选访问，从而减少随机访问的数量。此外，在搜索开始前对重叠部分进行高效扫描，以更小的剪枝半径收紧了搜索空间。我们实现了SaveRF，我们对现实生活数据集的实验研究表明，它可以显着降低I/O成本。

引用次数: 1

Declarative Network Monitoring with an Underprovisioned Query Processor 使用未充分配置的查询处理器的声明性网络监控

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.46

Frederick Reiss, J. Hellerstein

Many of the data sources used in stream query processing are known to exhibit bursty behavior. We focus here on passive network monitoring, an application in which the data rates typically exhibit a large peak-to-average ratio. Provisioning a stream query processor to handle peak rates in such a setting can be prohibitively expensive. In this paper, we propose to solve this problem by provisioning the query processor for typical data rates instead of much higher peak data rates. To enable this strategy, we present mechanisms and policies for managing the tradeoffs between the latency and accuracy of query results when bursts exceed the steady-state capacity of the query processor. We describe the current status of our implementation and present experimental results on a testbed network monitoring application to demonstrate the utility of our approach

流查询处理中使用的许多数据源都表现出突发行为。我们在这里关注被动网络监控，这是一种数据速率通常表现出较大峰值与平均比率的应用程序。在这种设置中，配置流查询处理器来处理峰值速率可能会非常昂贵。在本文中，我们建议通过为典型数据速率而不是更高的峰值数据速率提供查询处理器来解决这个问题。为了实现这一策略，我们提出了一些机制和策略，用于在突发超过查询处理器的稳态容量时管理查询结果的延迟和准确性之间的权衡。我们描述了我们实现的现状，并给出了一个测试平台网络监控应用程序的实验结果，以证明我们的方法的实用性

引用次数: 31

Collaborative Business Process Support in IHE XDS through ebXML Business Processes 通过ebXML业务流程支持IHE XDS中的协作业务流程

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.39

A. Dogac, V. Bicer, Alper Okcan

Currently, clinical information is stored in all kinds of proprietary formats through a multitude of medical information systems available on the market. This results in a severe interoperability problem in sharing electronic healthcare records. To address this problem, an industry initiative, called "Integrating Healthcare Enterprise (IHE)" has specified the "Cross Enterprise Document Sharing (XDS)" Profile to store healthcare documents in an ebXML registry/ repository to facilitate their sharing. Through a separate effort, IHE has also defined interdepartmental Workflow Profiles to identify the transactions required to integrate information flow among several information systems. Although the clinical documents stored in XDS registries are obtained as a result of executing these workflows, IHE has not yet specified collaborative healthcare processes for the XDS. Hence, there is no way to track the workflows in XDS and the clinical documents produced through the workflows are manually inserted into the registry/ repository. Given that IHE XDS is using the ebXML architecture, the most natural way to integrate IHE Workflow Profiles to IHE XDS is using ebXML Business Processes (ebBP). In this paper, we describe the implementation of an enhanced IHE architecture demonstrating how ebXML Business Processes, IHE Workflow Profiles and the IHE XDS architecture can all be integrated to provide collaborative business process support in the healthcare domain.

目前，临床信息通过市场上大量可用的医疗信息系统以各种专有格式存储。这将导致共享电子医疗记录时出现严重的互操作性问题。为了解决这个问题，一个名为“集成医疗保健企业(IHE)”的行业计划指定了“跨企业文档共享(XDS)”。配置文件用于将医疗保健文档存储在ebXML注册中心/存储库中，以促进其共享。通过单独的努力，IHE还定义了部门间工作流程概要，以确定在几个信息系统之间集成信息流所需的事务。尽管存储在XDS注册中心中的临床文档是通过执行这些工作流获得的，但IHE尚未为XDS指定协作式医疗保健流程。因此，无法在XDS中跟踪工作流，并且通过工作流生成的临床文档被手动插入到注册中心/存储库中。既然IHE XDS正在使用ebXML体系结构，那么将IHE工作流配置文件集成到IHE XDS的最自然的方法就是使用ebXML业务流程(ebBP)。在本文中，我们描述了增强的IHE体系结构的实现，演示了如何集成ebXML业务流程、IHE工作流配置文件和IHE XDS体系结构，以在医疗保健领域提供协作业务流程支持。

{"title":"Collaborative Business Process Support in IHE XDS through ebXML Business Processes","authors":"A. Dogac, V. Bicer, Alper Okcan","doi":"10.1109/ICDE.2006.39","DOIUrl":"https://doi.org/10.1109/ICDE.2006.39","url":null,"abstract":"Currently, clinical information is stored in all kinds of proprietary formats through a multitude of medical information systems available on the market. This results in a severe interoperability problem in sharing electronic healthcare records. To address this problem, an industry initiative, called \"Integrating Healthcare Enterprise (IHE)\" has specified the \"Cross Enterprise Document Sharing (XDS)\" Profile to store healthcare documents in an ebXML registry/ repository to facilitate their sharing. Through a separate effort, IHE has also defined interdepartmental Workflow Profiles to identify the transactions required to integrate information flow among several information systems. Although the clinical documents stored in XDS registries are obtained as a result of executing these workflows, IHE has not yet specified collaborative healthcare processes for the XDS. Hence, there is no way to track the workflows in XDS and the clinical documents produced through the workflows are manually inserted into the registry/ repository. Given that IHE XDS is using the ebXML architecture, the most natural way to integrate IHE Workflow Profiles to IHE XDS is using ebXML Business Processes (ebBP). In this paper, we describe the implementation of an enhanced IHE architecture demonstrating how ebXML Business Processes, IHE Workflow Profiles and the IHE XDS architecture can all be integrated to provide collaborative business process support in the healthcare domain.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"6 1","pages":"91-91"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75104754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors 高斯树:概率特征向量数据库中有效的目标识别

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.159

C. Böhm, A. Pryakhin, Matthias Schubert

In applications of biometric databases the typical task is to identify individuals according to features which are not exactly known. Reasons for this inexactness are varying measuring techniques or environmental circumstances. Since these circumstances are not necessarily the same when determining the features for different individuals, the exactness might strongly vary between the individuals as well as between the features. To identify individuals, similarity search on feature vectors is applicable, but even the use of adaptable distance measures is not capable to handle objects having an individual level of exactness. Therefore, we develop a comprehensive probabilistic theory in which uncertain observations are modeled by probabilistic feature vectors (pfv), i.e. feature vectors where the conventional feature values are replaced by Gaussian probability distribution functions. Each feature value of each object is complemented by a variance value indicating its uncertainty. We define two types of identification queries, k-mostlikely identification and threshold identification. For efficient query processing, we propose a novel index structure, the Gauss-tree. Our experimental evaluation demonstrates that pfv stored in a Gauss-tree significantly improve the result quality compared to traditional feature vectors. Additionally, we show that the Gauss-tree significantly speeds up query times compared to competitive methods.

在生物特征数据库的应用中，典型的任务是根据不完全已知的特征来识别个体。造成这种不精确的原因是不同的测量技术或环境环境。由于在确定不同个体的特征时，这些情况不一定相同，因此个体之间以及特征之间的准确性可能会有很大差异。为了识别个体，特征向量上的相似性搜索是适用的，但即使使用自适应距离度量也无法处理具有个体精确度的对象。因此，我们发展了一个全面的概率理论，其中不确定观测由概率特征向量(pfv)建模，即传统特征值被高斯概率分布函数取代的特征向量。每个对象的每个特征值都有一个方差值来表示其不确定性。我们定义了两种类型的识别查询，k-最有可能识别和阈值识别。为了提高查询处理的效率，我们提出了一种新的索引结构——高斯树。我们的实验评估表明，与传统的特征向量相比，存储在高斯树中的pfv显著提高了结果质量。此外，我们表明，与竞争方法相比，高斯树显著加快了查询时间。

{"title":"The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors","authors":"C. Böhm, A. Pryakhin, Matthias Schubert","doi":"10.1109/ICDE.2006.159","DOIUrl":"https://doi.org/10.1109/ICDE.2006.159","url":null,"abstract":"In applications of biometric databases the typical task is to identify individuals according to features which are not exactly known. Reasons for this inexactness are varying measuring techniques or environmental circumstances. Since these circumstances are not necessarily the same when determining the features for different individuals, the exactness might strongly vary between the individuals as well as between the features. To identify individuals, similarity search on feature vectors is applicable, but even the use of adaptable distance measures is not capable to handle objects having an individual level of exactness. Therefore, we develop a comprehensive probabilistic theory in which uncertain observations are modeled by probabilistic feature vectors (pfv), i.e. feature vectors where the conventional feature values are replaced by Gaussian probability distribution functions. Each feature value of each object is complemented by a variance value indicating its uncertainty. We define two types of identification queries, k-mostlikely identification and threshold identification. For efficient query processing, we propose a novel index structure, the Gauss-tree. Our experimental evaluation demonstrates that pfv stored in a Gauss-tree significantly improve the result quality compared to traditional feature vectors. Additionally, we show that the Gauss-tree significantly speeds up query times compared to competitive methods.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"11 1","pages":"9-9"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73792085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 124

Foundations of Automated Database Tuning 自动数据库调优的基础

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.72

Surajit Chaudhuri, G. Weikum

Our society is more dependent on information systems than ever before. However, managing the information systems infrastructure in a cost-effective manner is a growing challenge. The total cost of ownership (TCO) of information technology is increasingly dominated by people costs. In fact, mistakes in operations and administration of information systems are the single most reasons for system outage and unacceptable performance. For information systems to provide value to their customers, we must reduce the complexity associated with their deployment and usage.

我们的社会比以往任何时候都更加依赖信息系统。然而，以具有成本效益的方式管理信息系统基础设施是一个日益严峻的挑战。信息技术的总拥有成本(TCO)越来越多地由人员成本主导。事实上，信息系统操作和管理中的错误是导致系统中断和不可接受性能的最主要原因。为了让信息系统为其客户提供价值，我们必须降低与它们的部署和使用相关的复杂性。

引用次数: 2

Network-Aware Operator Placement for Stream-Processing Systems 流处理系统的网络感知算子配置

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.105

P. Pietzuch, J. Ledlie, Jeffrey Shneidman, M. Roussopoulos, M. Welsh, M. Seltzer

To use their pool of resources efficiently, distributed stream-processing systems push query operators to nodes within the network. Currently, these operators, ranging from simple filters to custom business logic, are placed manually at intermediate nodes along the transmission path to meet application-specific performance goals. Determining placement locations is challenging because network and node conditions change over time and because streams may interact with each other, opening venues for reuse and repositioning of operators. This paper describes a stream-based overlay network (SBON), a layer between a stream-processing system and the physical network that manages operator placement for stream-processing systems. Our design is based on a cost space, an abstract representation of the network and on-going streams, which permits decentralized, large-scale multi-query optimization decisions. We present an evaluation of the SBON approach through simulation, experiments on PlanetLab, and an integration with Borealis, an existing stream-processing engine. Our results show that an SBON consistently improves network utilization, provides low stream latency, and enables dynamic optimization at low engineering cost.

为了有效地使用资源池，分布式流处理系统将查询操作符推送到网络中的节点。目前，这些操作符(从简单的过滤器到自定义业务逻辑)被手动放置在传输路径上的中间节点上，以满足特定于应用程序的性能目标。由于网络和节点的条件会随着时间的推移而变化，而且数据流可能会相互作用，从而为运营商的再利用和重新定位开辟场地，因此确定放置位置是一项挑战。本文描述了一种基于流的覆盖网络(SBON)，它是流处理系统和物理网络之间的一层，用于管理流处理系统的操作员位置。我们的设计基于成本空间，网络和持续流的抽象表示，它允许分散的，大规模的多查询优化决策。我们通过模拟、PlanetLab上的实验以及与现有流处理引擎Borealis的集成，对SBON方法进行了评估。我们的研究结果表明，SBON可以持续提高网络利用率，提供低流延迟，并以低工程成本实现动态优化。

{"title":"Network-Aware Operator Placement for Stream-Processing Systems","authors":"P. Pietzuch, J. Ledlie, Jeffrey Shneidman, M. Roussopoulos, M. Welsh, M. Seltzer","doi":"10.1109/ICDE.2006.105","DOIUrl":"https://doi.org/10.1109/ICDE.2006.105","url":null,"abstract":"To use their pool of resources efficiently, distributed stream-processing systems push query operators to nodes within the network. Currently, these operators, ranging from simple filters to custom business logic, are placed manually at intermediate nodes along the transmission path to meet application-specific performance goals. Determining placement locations is challenging because network and node conditions change over time and because streams may interact with each other, opening venues for reuse and repositioning of operators. This paper describes a stream-based overlay network (SBON), a layer between a stream-processing system and the physical network that manages operator placement for stream-processing systems. Our design is based on a cost space, an abstract representation of the network and on-going streams, which permits decentralized, large-scale multi-query optimization decisions. We present an evaluation of the SBON approach through simulation, experiments on PlanetLab, and an integration with Borealis, an existing stream-processing engine. Our results show that an SBON consistently improves network utilization, provides low stream latency, and enables dynamic optimization at low engineering cost.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"122 1","pages":"49-49"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83649866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 474

Approximation Techniques for Indexing the Earth Mover’s Distance in Multimedia Databases 多媒体数据库中推土机距离索引的近似技术

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.25

I. Assent, Andrea Wenning, T. Seidl

Todays abundance of storage coupled with digital technologies in virtually any scientific or commercial application such as medical and biological imaging or music archives deal with tremendous quantities of images, videos or audio files stored in large multimedia databases. For content-based data mining and retrieval purposes suitable similarity models are crucial. The Earth Mover’s Distance was introduced in Computer Vision to better approach human perceptual similarities. Its computation, however, is too complex for usage in interactive multimedia database scenarios. In order to enable efficient query processing in large databases, we propose an index-supported multistep algorithm. We therefore develop new lower bounding approximation techniques for the Earth Mover’s Distance which satisfy high quality criteria including completeness (no false drops), index-suitability and fast computation. We demonstrate the efficiency of our approach in extensive experiments on large image databases

今天，在几乎任何科学或商业应用(如医学和生物成像或音乐档案)中，存储空间的丰富与数字技术相结合，处理存储在大型多媒体数据库中的大量图像、视频或音频文件。对于基于内容的数据挖掘和检索来说，合适的相似度模型至关重要。为了更好地接近人类感知相似性，在计算机视觉中引入了地球移动者的距离。然而，它的计算过于复杂，无法用于交互式多媒体数据库场景。为了在大型数据库中实现高效的查询处理，我们提出了一种索引支持的多步算法。因此，我们开发了新的下边界近似技术，以满足高质量的标准，包括完整性(无假滴)，索引适用性和快速计算。我们在大型图像数据库的大量实验中证明了我们的方法的有效性

引用次数: 82

Mining Actionable Patterns by Role Models 通过角色模型挖掘可操作的模式

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.96

Ke Wang, Yuelong Jiang, A. Tuzhilin

Data mining promises to discover valid and potentially useful patterns in data. Often, discovered patterns are not useful to the user."Actionability" addresses this problem in that a pattern is deemed actionable if the user can act upon it in her favor. We introduce the notion of "action" as a domain-independent way to model the domain knowledge. Given a data set about actionable features and an utility measure, a pattern is actionable if it summarizes a population that can be acted upon towards a more promising population observed with a higher utility. We present several pruning strategies taking into account the actionability requirement to reduce the search space, and algorithms for mining all actionable patterns as well as mining the top k actionable patterns. We evaluate the usefulness of patterns and the focus of search on a real-world application domain.

数据挖掘有望在数据中发现有效的和潜在有用的模式。通常，发现的模式对用户没有用处。“可操作性”解决了这个问题，因为如果用户可以对其进行操作，则认为模式是可操作的。我们引入了“动作”的概念，作为一种与领域无关的方法来对领域知识进行建模。给定一个关于可操作特性和效用度量的数据集，如果模式总结了一个种群，可以对其进行操作，从而获得具有更高效用的更有希望的种群，那么该模式就是可操作的。我们提出了几种考虑可操作性要求以减少搜索空间的修剪策略，以及挖掘所有可操作模式和挖掘前k个可操作模式的算法。我们将评估模式的有用性以及对实际应用程序领域的搜索重点。

引用次数: 61

Automatic Sales Lead Generation from Web Data 从网络数据自动生成销售线索

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.28

Ganesh Ramakrishnan, Sachindra Joshi, Sumit Negi, R. Krishnapuram, S. Balakrishnan

Speed to market is critical to companies that are driven by sales in a competitive market. The earlier a potential customer can be approached in the decision making process of a purchase, the higher are the chances of converting that prospect into a customer. Traditional methods to identify sales leads such as company surveys and direct marketing are manual, expensive and not scalable. Over the past decade the World Wide Web has grown into an information-mesh, with most important facts being reported through Web sites. Several news papers, press releases, trade journals, business magazines and other related sources are on-line. These sources could be used to identify prospective buyers automatically. In this paper, we present a system called ETAP (Electronic Trigger Alert Program) that extracts trigger events from Web data that help in identifying prospective buyers. Trigger events are events of corporate relevance and indicative of the propensity of companies to purchase new products associated with these events. Examples of trigger events are change in management, revenue growth and mergers & acquisitions. The unstructured nature of information makes the extraction task of trigger events difficult. We pose the problem of trigger events extraction as a classification problem and develop methods for learning trigger event classifiers using existing classification methods. We present methods to automatically generate the training data required to learn the classifiers. We also propose a method of feature abstraction that uses named entity recognition to solve the problem of data sparsity. We score and rank the trigger events extracted from ETAP for easy browsing. Our experiments show the effectiveness of the method and thus establish the feasibility of automatic sales lead generation using the Web data.

对于在竞争激烈的市场中靠销售驱动的公司来说，上市速度至关重要。在购买决策过程中，越早接触潜在客户，将潜在客户转化为客户的机会就越高。识别销售线索的传统方法，如公司调查和直接营销，是手动的，昂贵的，不可扩展的。在过去的十年里，万维网已经发展成为一个信息网，大多数重要的事实都是通过网站报道的。一些新闻报纸、新闻稿、贸易期刊、商业杂志和其他相关资源都在网上。这些资源可用于自动识别潜在买家。在本文中，我们提出了一个称为ETAP(电子触发警报程序)的系统，该系统从Web数据中提取触发事件，有助于识别潜在买家。触发事件是与公司相关的事件，表明公司倾向于购买与这些事件相关的新产品。触发事件的例子有管理层变动、收入增长和并购。信息的非结构化特性使得提取触发事件的任务变得困难。我们将触发事件提取问题作为一个分类问题，并开发了使用现有分类方法学习触发事件分类器的方法。我们提出了自动生成学习分类器所需的训练数据的方法。我们还提出了一种使用命名实体识别的特征抽象方法来解决数据稀疏性问题。我们对从ETAP中提取的触发事件进行评分和排序，以便于浏览。实验证明了该方法的有效性，从而建立了利用Web数据自动生成销售线索的可行性。

{"title":"Automatic Sales Lead Generation from Web Data","authors":"Ganesh Ramakrishnan, Sachindra Joshi, Sumit Negi, R. Krishnapuram, S. Balakrishnan","doi":"10.1109/ICDE.2006.28","DOIUrl":"https://doi.org/10.1109/ICDE.2006.28","url":null,"abstract":"Speed to market is critical to companies that are driven by sales in a competitive market. The earlier a potential customer can be approached in the decision making process of a purchase, the higher are the chances of converting that prospect into a customer. Traditional methods to identify sales leads such as company surveys and direct marketing are manual, expensive and not scalable. Over the past decade the World Wide Web has grown into an information-mesh, with most important facts being reported through Web sites. Several news papers, press releases, trade journals, business magazines and other related sources are on-line. These sources could be used to identify prospective buyers automatically. In this paper, we present a system called ETAP (Electronic Trigger Alert Program) that extracts trigger events from Web data that help in identifying prospective buyers. Trigger events are events of corporate relevance and indicative of the propensity of companies to purchase new products associated with these events. Examples of trigger events are change in management, revenue growth and mergers & acquisitions. The unstructured nature of information makes the extraction task of trigger events difficult. We pose the problem of trigger events extraction as a classification problem and develop methods for learning trigger event classifiers using existing classification methods. We present methods to automatically generate the training data required to learn the classifiers. We also propose a method of feature abstraction that uses named entity recognition to solve the problem of data sparsity. We score and rank the trigger events extracted from ETAP for easy browsing. Our experiments show the effectiveness of the method and thus establish the feasibility of automatic sales lead generation using the Web data.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"67 1","pages":"101-101"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90276794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

22nd International Conference on Data Engineering (ICDE'06)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀