The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)最新文献

英文中文

Automatically generating labeled examples for Web wrapper maintenance 自动生成用于Web包装器维护的标记示例

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.40

J. Raposo, A. Pan, M. Álvarez, Justo Hidalgo

In order to let software programs gain full benefit from semi-structured Web sources, wrapper programs must be built to provide a "machine readable" view over them. A significant problem of this approach is that, since Web sources are autonomous, they may experience changes that invalidate the current wrapper. In this paper, we address this problem by introducing novel heuristics and algorithms for automatically maintaining wrappers. In our approach, the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper. Our experiments show that the proposed techniques show high accuracy for a wide range of real world Web data extraction problems.

为了让软件程序从半结构化的Web源中获得充分的好处，必须构建包装程序以提供对它们的“机器可读”视图。这种方法的一个重要问题是，由于Web源是自治的，它们可能会经历使当前包装器失效的更改。在本文中，我们通过引入新的启发式算法和算法来自动维护包装器来解决这个问题。在我们的方法中，系统在正常包装器操作期间收集一些查询结果，当源更改时，它将它们作为输入为源生成一组标记示例，然后可用于诱导新的包装器。我们的实验表明，所提出的技术对于广泛的现实世界Web数据提取问题显示出很高的准确性。

引用次数: 8

Schema matching using neural network 基于神经网络的模式匹配

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.129

You Li, Dongbo Liu, Weiming Zhang

Schema matching plays a key role in data integration, data warehouse and e-business. This paper introduces a schema matching method SMDD based on neural network. By analyzing the characteristics of data distribution, it automatically fulfills the task of schema matching. It can be used independently or as a supplement of other schema matching methods. SMDD can improve the accuracy of schema matching from the point of view of data contents.

模式匹配在数据集成、数据仓库和电子商务中起着关键作用。介绍了一种基于神经网络的模式匹配方法SMDD。通过分析数据分布的特点，自动完成模式匹配的任务。它可以独立使用，也可以作为其他模式匹配方法的补充。SMDD可以从数据内容的角度提高模式匹配的准确性。

引用次数: 17

Network-based intrusion detection using Adaboost algorithm 基于Adaboost算法的网络入侵检测

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.107

Wei Hu, Weiming Hu

Intrusion detection on the Internet is a heated research field in computer science, where much work has been done during the past two decades. In this paper, we build a network-based intrusion detection system using Adaboost, a prevailing machine learning algorithm. The experiments demonstrate that our system can achieve an especially low false positive rate while keeping a preferable detection rate, and its computational complexity is extremely low, which is a very attractive property in practice.

Internet上的入侵检测是计算机科学研究的一个热点，在过去的二十年里已经做了大量的工作。在本文中，我们使用Adaboost(一种流行的机器学习算法)构建了一个基于网络的入侵检测系统。实验表明，该系统在保持较好的检测率的同时，可以实现特别低的误报率，并且其计算复杂度极低，这在实际应用中是一个非常有吸引力的特性。

引用次数: 40

Learning from ontologies for common meaningful structures 从本体中学习常见的有意义结构

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.90

Liu Yang, Guojie Li, Zhongzhi Shi

We put forward a hypothesis that there exist common meaningful structures among ontologies whose domains are analogous to each other The initial motivation of our hypothesis is to make full use of the structural information in existing ontologies, in order to benefit the domain of ontology. To verify the hypothesis we give a precise definition of the candidate of the common meaningful structure called MICISO (maximum isomorphic common induced sub-ontology). Based on the hypothesis and the definition we present a novel data mining problem called MICISO mining, whose aim is learning from ontologies to find out MICISOs and further recommend the common meaningful structures. We also provide an algorithm for MICISO mining, based on which we have developed a practical tool for mining and checking such structures. With the tool, the algorithm is implemented with quite a few pairs of existing ontologies, and the interesting meaningful results support our hypothesis. Thus we consider that the hypothesis is preliminarily verified. We suppose that our work sparks a novel promising thinking for the domain of ontology -to study existing ontologies for useful things.

我们提出了领域相似的本体之间存在共同意义结构的假设，提出这一假设的初衷是为了充分利用现有本体中的结构信息，从而使本体领域受益。为了验证这一假设，我们给出了MICISO (maximum isomorphic common induced sub-ontology，最大同构公共诱导子本体)候选对象的精确定义。在此假设和定义的基础上，我们提出了一种新的数据挖掘问题，称为MICISO挖掘，其目的是从本体中学习，发现MICISO，并进一步推荐共同的有意义结构。我们还提供了一种MICISO挖掘算法，并在此基础上开发了一种实用的挖掘和检查此类结构的工具。使用该工具，该算法使用了相当多对现有本体来实现，并且有趣的有意义的结果支持了我们的假设。因此，我们认为该假设得到了初步验证。我们认为我们的工作为本体论领域激发了一种新的有前途的思维——研究现有的本体论以寻找有用的东西。

{"title":"Learning from ontologies for common meaningful structures","authors":"Liu Yang, Guojie Li, Zhongzhi Shi","doi":"10.1109/WI.2005.90","DOIUrl":"https://doi.org/10.1109/WI.2005.90","url":null,"abstract":"We put forward a hypothesis that there exist common meaningful structures among ontologies whose domains are analogous to each other The initial motivation of our hypothesis is to make full use of the structural information in existing ontologies, in order to benefit the domain of ontology. To verify the hypothesis we give a precise definition of the candidate of the common meaningful structure called MICISO (maximum isomorphic common induced sub-ontology). Based on the hypothesis and the definition we present a novel data mining problem called MICISO mining, whose aim is learning from ontologies to find out MICISOs and further recommend the common meaningful structures. We also provide an algorithm for MICISO mining, based on which we have developed a practical tool for mining and checking such structures. With the tool, the algorithm is implemented with quite a few pairs of existing ontologies, and the interesting meaningful results support our hypothesis. Thus we consider that the hypothesis is preliminarily verified. We suppose that our work sparks a novel promising thinking for the domain of ontology -to study existing ontologies for useful things.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129888337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Resource optimization in heterogeneous Web environments 异构Web环境中的资源优化

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.128

Xiaolong Jin, Jiming Liu

This paper addresses the distributed resource optimization issue in heterogeneous Web environments, where both resource nodes and service requests may be heterogeneous. Specifically, this paper presents an agent-based mechanism, where agents are employed to carry service requests. Agents are equipped with three behaviors, namely, least-loaded move, less-loaded move, and random move, to search for appropriate resource nodes. Every time, agents probabilistically choose a behavior to perform. As a whole, the multiagent system can accomplish the objective of load balancing and resource optimization. Through experiments on a computing platform, called SSADRO, we validate the effectiveness of the proposed mechanism. As compared to our previously proposed load balancing mechanism in Liu, Jin and Wang, (2005), the one in this paper can address dynamic load balancing in heterogeneous environments.

本文讨论了异构Web环境中的分布式资源优化问题，其中资源节点和服务请求都可能是异构的。具体来说，本文提出了一种基于代理的机制，其中使用代理来承载服务请求。agent具有最小负载移动、最小负载移动和随机移动三种行为来搜索合适的资源节点。每次，代理都会概率地选择一种行为来执行。总体而言，多智能体系统可以实现负载均衡和资源优化的目标。通过在一个名为SSADRO的计算平台上的实验，我们验证了所提出机制的有效性。与我们之前在Liu, Jin和Wang(2005)中提出的负载平衡机制相比，本文中的机制可以解决异构环境中的动态负载平衡问题。

引用次数: 1

Mining interesting topics for Web information gathering and Web personalization 为Web信息收集和Web个性化挖掘有趣的主题

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.98

Yuefeng Li, Ben Murphy, N. Zhong

The quality of discovery patterns is crucial for building satisfactory systems of Web text mining. It is no doubt that we can find numerous frequent patterns from Web documents. However, there are many meaningless frequent patterns. This paper presents a novel method to improve the quality of discovered patterns. It generalizes discovered patterns into interesting topics in order to acquire the necessary useful information. The experimental results also verify the proposed method is promising.

发现模式的质量对于构建令人满意的Web文本挖掘系统至关重要。毫无疑问，我们可以从Web文档中找到许多常见的模式。然而，有许多无意义的频繁模式。本文提出了一种提高模式发现质量的新方法。它将发现的模式概括为有趣的主题，以获取必要的有用信息。实验结果也验证了该方法的可行性。

引用次数: 2

Biological ontology enhancement with fuzzy relations: a text-mining framework 基于模糊关系的生物本体增强:一个文本挖掘框架

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.43

M. Abulaish, Lipika Dey

Domain ontology can help in information retrieval from documents. But ontology is a pre-defined structure with crisp concept descriptions and inter-concept relations. However, due to the dynamic nature of the document repository, ontology should be upgradeable with information extracted through text mining of documents in the domain. This also necessitates that concepts, their descriptions and inter-concept relations should be associated with a degree of fuzziness that will indicate the support for the extracted knowledge according to the currently available resources. Supports may be revised with more knowledge coming in future. This approach preserves the basic structured knowledge format for storing domain knowledge, but at the same time allows for update of information. In this paper, we have proposed a mechanism which initiates text mining with a set of ontological concepts, and thereafter extracts fuzzy relations through text mining. Membership values of relations are functions of frequency of co-occurrence of concepts and relations. We have worked on the GENIA corpus and shown how fuzzy relations can be further used for guided information extraction from MEDLINE documents.

领域本体有助于从文档中检索信息。但本体是一个预定义的结构，具有清晰的概念描述和概念间的关系。然而，由于文档存储库的动态性，本体应该可以通过对领域内文档的文本挖掘提取信息来升级。这也需要概念，它们的描述和概念间的关系应该与一定程度的模糊性相关联，这将表明根据当前可用资源对提取的知识的支持。支持可能会随着将来更多的知识而被修改。该方法保留了用于存储领域知识的基本结构化知识格式，但同时允许信息更新。本文提出了一种基于本体概念集的文本挖掘机制，通过文本挖掘提取模糊关系。关系的隶属度值是概念和关系共现频率的函数。我们对GENIA语料库进行了研究，并展示了如何将模糊关系进一步用于从MEDLINE文档中提取引导信息。

{"title":"Biological ontology enhancement with fuzzy relations: a text-mining framework","authors":"M. Abulaish, Lipika Dey","doi":"10.1109/WI.2005.43","DOIUrl":"https://doi.org/10.1109/WI.2005.43","url":null,"abstract":"Domain ontology can help in information retrieval from documents. But ontology is a pre-defined structure with crisp concept descriptions and inter-concept relations. However, due to the dynamic nature of the document repository, ontology should be upgradeable with information extracted through text mining of documents in the domain. This also necessitates that concepts, their descriptions and inter-concept relations should be associated with a degree of fuzziness that will indicate the support for the extracted knowledge according to the currently available resources. Supports may be revised with more knowledge coming in future. This approach preserves the basic structured knowledge format for storing domain knowledge, but at the same time allows for update of information. In this paper, we have proposed a mechanism which initiates text mining with a set of ontological concepts, and thereafter extracts fuzzy relations through text mining. Membership values of relations are functions of frequency of co-occurrence of concepts and relations. We have worked on the GENIA corpus and shown how fuzzy relations can be further used for guided information extraction from MEDLINE documents.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133414859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Adding the temporal dimension to search - a case study in publication search 向搜索中添加时间维度——出版物搜索中的一个案例研究

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.21

Philip S. Yu, Xin Li, B. Liu

The most well known search techniques are perhaps the PageRank and HITS algorithms. In this paper, we argue that these algorithms miss an important dimension, the temporal dimension. Quality pages in the past may not be quality pages now or in the future. These techniques favor older pages because these pages have many in-links accumulated over time. New pages, which may be of high quality, have few or no in-links and are left behind. Research publication search has the same problem. If we use the PageRank or HITS algorithm, those older or classic papers are ranked high due to the large number of citations that they received in the past. This paper studies the temporal dimension of search in the context of research publication. A number of methods are proposed to deal with the problem based on analyzing the behavior history and the source of each publication. These methods are evaluated empirically. Our results show that they are highly effective.

最著名的搜索技术可能是PageRank和HITS算法。在本文中，我们认为这些算法忽略了一个重要的维度，即时间维度。过去的高质量页面可能不是现在或将来的高质量页面。这些技术偏爱较老的页面，因为这些页面有许多随时间积累的内链接。新页面，可能是高质量的，有很少或没有链接，并留下。研究出版物搜索也有同样的问题。如果我们使用PageRank或HITS算法，那些较老的或经典的论文排名靠前，是因为它们在过去获得了大量的引用。本文研究了科研论文检索的时间维度。在分析每个出版物的行为历史和来源的基础上，提出了许多方法来处理这个问题。对这些方法进行了实证评估。我们的结果表明，它们是非常有效的。

引用次数: 51

Guidance performance indicator - Web metrics for information driven Web sites 指导性能指示器。信息驱动的Web站点的Web度量

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.69

C. Stolz, Maximilian Viermetz, Michal Skubacz, R. Neuneier

For the evaluation of Web sites, a multitude of metrics are available. Apart from general statistical measures, success metrics reflect the degree to which a Web site achieves its defined objectives. Particularly metrics for e-commerce sites based on transaction analysis are commonly available and well understood. In contrast to transaction based sites, the success of Web sites geared toward information delivery is harder to quantify since there is no direct feedback of user intent. User feedback is only directly available on transactional Web sites. We introduce a metric to measure the success of an information driven Web site in meeting its objective to deliver the desired information in a timely and usable fashion. We propose to assign a value to each click based on the type of transition, duration and semantic distance. These values are then combined into a scoring model describing the success of a Web site in meeting its objectives. The resulting metric is introduced as the GPI and its applicability shown on a large corporate Web site.

对于Web站点的评估，有许多可用的度量标准。除了一般的统计度量之外，成功度量还反映了网站实现其定义目标的程度。特别是基于交易分析的电子商务网站的度量通常是可用的，并且很容易理解。与基于交易的站点相比，面向信息传递的Web站点的成功很难量化，因为没有用户意图的直接反馈。用户反馈只能在事务性Web站点上直接获得。我们引入了一个度量标准来衡量信息驱动的Web站点是否成功地实现了以及时和可用的方式交付所需信息的目标。我们建议根据转换类型、持续时间和语义距离为每次点击分配一个值。然后将这些值组合到一个评分模型中，该模型描述了网站在满足其目标方面的成功。由此产生的度量作为GPI引入，并在大型公司网站上展示了它的适用性。

引用次数: 23

Binary prediction based on weighted sequential mining method 基于加权顺序挖掘方法的二值预测

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.42

Shuchuan Lo

This paper presents a weighted-binary-sequential method to predict the status of customer patronage for the next day. Most of the research using association rules to mine sequential data focus on the algorithms and computing efficiency of pattern or rule generation. But few of them consider the time value of the sequential data. It is desirable to weight recent observations more heavily than remote observations in the analysis of time-series data. In this paper, we address a time-weighted concept on association algorithm to mine the binary-time-series data. The weighted binary sequence algorithm gives more weight on the recent data in finding the longest frequent patterns from binary-time-series data. There are two weighting methods; dynamic-length weighting and fixed-length weighting. Both algorithms are compared to the un-weighted algorithm to show how time value influences the prediction accuracy. Some performance results with a real-life Web site application given in this paper show that time-weighted sequential algorithms are generally superior to un-weighted sequential algorithm.

本文提出了一种加权二值序列方法来预测第二天的顾客光顾状况。利用关联规则进行序列数据挖掘的研究大多集中在模式或规则生成的算法和计算效率上。但是很少有人考虑序列数据的时间价值。在对时间序列数据进行分析时，可取的做法是更重视最近的观测，而不是远程观测。在本文中，我们提出了一个时间加权的关联算法概念来挖掘二进制时间序列数据。加权二值序列算法在从二值时间序列数据中寻找最长频繁模式时，给予最近数据更多的权重。有两种加权方法;动态长度加权和固定长度加权。将这两种算法与未加权算法进行比较，以显示时间值对预测精度的影响。本文给出的实际Web站点应用程序的性能结果表明，时间加权顺序算法总体上优于非加权顺序算法。

引用次数: 17

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀