The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)最新文献

英文中文

A combined approach of formal concept analysis and text mining for concept based document clustering 一种形式概念分析与文本挖掘相结合的基于概念的文档聚类方法

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.1

Nyeint Nyeint Myat, K. Hla

Nowadays, the demand of conceptual document clustering is becoming increase to manage various types of vast amount of information published on the World Wide Web. In this paper, we use formal concept analysis (FCA) method for clustering documents according to their formal contexts. Concept hierarchy of documents is built using the formal concepts of the documents in the document corpus. We use tf.idf (term frequency /spl times/ inverse document frequency) term weighting model to reduce less useful concepts from these formal concepts and the association and correlation mining techniques to analyze the relationship of terms in the document corpus.

目前，为了管理万维网上发布的各类海量信息，对概念文档聚类的需求越来越大。在本文中，我们使用形式概念分析(FCA)方法根据文档的形式上下文进行聚类。使用文档语料库中文档的正式概念构建文档的概念层次结构。我们用tf。Idf (term frequency /spl times/ inverse document frequency)术语加权模型，从这些形式化概念中减少不太有用的概念，并使用关联和相关性挖掘技术分析文档语料库中术语之间的关系。

引用次数: 10

A middleware system for Web-based digital music libraries 基于web的数字音乐库中间件系统

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.8

A. S. Lampropoulos, P. S. Lampropoulou, G. Tsihrintzis

We present a middleware system that facilitates Internet users' access to Web-based digital music libraries and allows them to manipulate audio meta-information taking into consideration content and semantic information of music data. Useful relations in the data are automatically extracted through semantic networks (constructed and maintained in the library). Our system is complemented with a query-by-example retrieval subsystem, user relevance feedback facilities, and a new approach for musical genre classification based on the features extracted from signals that correspond to distinct musical instrument sources, as these sources have been identified by a source separation process. The system operation is illustrated in detail.

我们提出了一个中间件系统，方便互联网用户访问基于web的数字音乐库，并允许他们在考虑音乐数据的内容和语义信息的情况下操纵音频元信息。通过语义网络自动提取数据中的有用关系(在库中构建和维护)。我们的系统还补充了一个按例查询检索子系统、用户相关性反馈设施和一种新的音乐类型分类方法，该方法基于从对应于不同乐器源的信号中提取的特征，因为这些源已经通过源分离过程识别出来。详细说明了系统的操作。

引用次数: 6

Webpage importance analysis using conditional Markov random walk 基于条件马尔可夫随机漫步的网页重要性分析

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.161

Tie-Yan Liu, Wei-Ying Ma

In this paper, we propose a novel method to calculate the Web page importance based on a conditional Markov random walk model. The main assumption in this model is that given the hyperlinks in a Web page, users are not really randomly clicking one of them. Instead, many factors may bias their behaviors, for example, the anchor text, the content relevance and the previous experiences when visiting the Web site that a destination page belongs to. As one of the results, the user might tend to visit those pages in high-quality Web sites with higher probability. To implement this idea, we reformulate the Web graph to be a two-layer structure, and the Web page importance is calculated by conditional random walk in this new Web graph. Experiments on the topic distillation task of TREC 2003 Web track showed that our new method can achieve about 18% improvement on mean average precision (MAP) and 16% on precision at 10 (P@10) over the PageRank algorithm.

本文提出了一种基于条件马尔可夫随机游走模型的网页重要性计算方法。该模型的主要假设是，给定Web页面中的超链接，用户实际上不会随机单击其中一个。相反，许多因素可能会影响他们的行为，例如，锚文本、内容相关性以及访问目标页面所属网站时的先前经验。作为结果之一，用户可能倾向于以更高的概率访问高质量Web站点中的这些页面。为了实现这一思想，我们将Web图重新表述为一个双层结构，并在这个新的Web图中通过条件随机游动来计算Web页面的重要性。在TREC 2003 Web track的主题蒸馏任务上进行的实验表明，与PageRank算法相比，新方法的平均精度(MAP)提高了18%左右，10 (P@10)的精度提高了16%左右。

引用次数: 15

Improving Web clustering by cluster selection 通过聚类选择改进Web聚类

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.75

Daniel Crabtree, Xiaoying Gao, Peter M. Andreae

Web page clustering is a technology that puts semantically related Web pages into groups and is useful for categorizing, organizing, and refining search results. When clustering using only textual information, suffix tree clustering (STC) outperforms other clustering algorithms by making use of phrases and allowing clusters to overlap. One problem of STC and other similar algorithms is how to select a small set of clusters to display to the user from a very large set of generated clusters. The cluster selection method used in STC is flawed in that it does not handle overlapping clusters appropriately. This paper introduces a new cluster scoring function and a new cluster selection algorithm to overcome the problems with overlapping clusters, which are combined with STC to make a new clustering algorithm ESTC. This paper's experiments show that ESTC significantly outperforms STC and that even with less data ESTC performs similarly to a commercial clustering search engine.

Web页面聚类是一种将语义相关的Web页面分组的技术，对于分类、组织和改进搜索结果非常有用。当仅使用文本信息聚类时，后缀树聚类(STC)通过使用短语和允许聚类重叠胜过其他聚类算法。STC和其他类似算法的一个问题是如何从生成的非常大的集群中选择一小组集群显示给用户。STC中使用的聚类选择方法存在缺陷，没有适当地处理重叠聚类。本文引入了一种新的聚类评分函数和一种新的聚类选择算法来克服聚类重叠的问题，并将其与STC算法相结合，形成了一种新的聚类算法ESTC。本文的实验表明，ESTC显著优于STC，即使在数据较少的情况下，ESTC的性能与商业聚类搜索引擎相似。

引用次数: 68

Web structure mining for usability analysis 可用性分析的Web结构挖掘

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.160

Chun-hung Li, C. Chui

The interaction between usability and how a Web site is structured is a complicated issue. In this paper, we discuss a Web structure mining algorithm which allows the automatic extraction of navigational structures in a Web site without performing hypertext analysis. We perform several usability experiments to correlate the usability of Web sites and the structural design of the Web site. Experimental results show that the structure mining algorithm gives reasonable prediction about several design issues in Web structure. The analysis serves as building block in the complex issue of web usability and structure mining.

可用性和网站结构之间的交互是一个复杂的问题。在本文中，我们讨论了一种Web结构挖掘算法，该算法允许在不执行超文本分析的情况下自动提取Web站点中的导航结构。我们进行了几个可用性实验，将网站的可用性与网站的结构设计联系起来。实验结果表明，该算法对Web结构中的几个设计问题给出了合理的预测。该分析是解决web可用性和结构挖掘等复杂问题的基石。

引用次数: 24

Automated metadata and instance extraction from news Web sites 从新闻网站自动提取元数据和实例

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.38

Srinivas Vadrevu, S. Nagarajan, Fatih Gelgi, H. Davulcu

Over the past few years World Wide Web has established as a vital resource for news. With the continuous growth in the number of available news Web sites and the diversity in their presentation of content, there is an increasing need to organize the news related information on the Web and keep track of it. In this paper, we present automated techniques for extracting metadata instance information by organizing and mining a set of news Web sites. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. The tree-mining algorithms that we present identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. We report experimental evaluation for the news domain to demonstrate the efficacy of our algorithms.

在过去的几年里，万维网已经成为一个重要的新闻资源。随着可用新闻网站数量的不断增加和内容呈现的多样性，人们越来越需要在Web上组织与新闻相关的信息并对其进行跟踪。在本文中，我们提出了通过组织和挖掘一组新闻网站来自动提取元数据实例信息的技术。我们开发算法来检测和利用Web文档中的HTML规则，将它们转换为编码为XML的分层语义结构。我们提出的树挖掘算法识别关键领域概念及其分类关系。我们还提取半结构化的概念实例，只要它们可用，就用它们的标签进行注释。我们报告了新闻领域的实验评估，以证明我们的算法的有效性。

引用次数: 16

Clickstream log acquisition with Web farming 通过网络农场获取点击流日志

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.47

Jia Hu, N. Zhong

Collecting customer interaction data on the e-business Web sites and portals help to figure out customer behavior and build customer profile, and then perform personalized services. Traditional Web server log is hard to be associated with specific customer and impossible to log the complete actions and movements of customers across Web sites. Collecting clickstream log at the application layer with Web farming technology helps to seamlessly integrate Web usage data with other customer related data. This model can be developed as a common plugin for most existing e-business Web sites and portals.

在电子商务网站和门户网站上收集客户交互数据有助于了解客户行为并建立客户档案，然后执行个性化服务。传统的Web服务器日志很难与特定的客户相关联，也不可能记录跨Web站点的客户的完整操作和移动。利用Web农场技术在应用层收集点击流日志有助于将Web使用数据与其他客户相关数据无缝集成。可以将此模型开发为大多数现有电子商务网站和门户的公共插件。

引用次数: 11

Page-reRank: using trusted links to re-rank authority Page-reRank:使用可信链接对权限进行重新排序

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.112

P. Massa, Conor Hayes

Search engines like Google.com use the link structure of the Web to determine whether Web pages are authoritative sources of information. However, the linking mechanism provided by HTML does not allow the Web author to express different types of links, such as positive or negative endorsements of page content. As a consequence, search engine algorithms cannot discriminate between sites that are highly linked and sites that are highly trusted. We demonstrate our claim by running PageRank on a real world data set containing positive and negative links. We conclude that simple semantic extensions to the link mechanism would provide a richer semantic network from which to mine more precise Web intelligence.

像Google.com这样的搜索引擎使用网络的链接结构来确定网页是否是权威的信息来源。但是，HTML提供的链接机制不允许Web作者表达不同类型的链接，例如对页面内容的正面或负面认可。因此，搜索引擎算法无法区分高链接的网站和高信任的网站。我们通过在包含积极和消极链接的真实世界数据集上运行PageRank来证明我们的主张。我们得出结论，链接机制的简单语义扩展将提供更丰富的语义网络，从中可以挖掘更精确的Web智能。

引用次数: 64

Aligning class hierarchies with grass-roots class alignment 将阶级等级与基层阶级统一起来

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.23

B. Yan

The performance of an ontology alignment technique largely depends on the amount of information that can be leveraged for the alignment task. On the semantic Web, end-users may explicitly or implicitly generate ontology alignments during their use of the semantic data. This kind of end-user-generated ontology alignment, which we call grass-roots ontology alignment, is an important source of information that is yet to be taken into account by current ontology alignment techniques. Grass-roots ontology alignment, often generated as a side effect of other data manipulations, could be user-specific, task-specific, approximate, or even contradictory. This paper reports our work on reusing grass-roots class alignment for aligning class hierarchies. A grass-roots class alignment, though approximate, still reveals some facts about relationships between different classes. We formalize facts about class relationships that can be inferred from an alignment under different cases. We then apply forward-chaining inference to the facts knowledge base to infer more facts. The facts KB is then leveraged for ontology alignment purposes. To deal with uncertainty and inconsistency, each fact is associated with an evidence that tells how the fact is obtained. The evidences are used to select better-supported facts in case of inconsistency.

本体对齐技术的性能在很大程度上取决于可以用于对齐任务的信息量。在语义Web上，最终用户可以在使用语义数据期间显式或隐式地生成本体对齐。这种终端用户生成的本体对齐，我们称之为基层本体对齐，是当前本体对齐技术尚未考虑到的重要信息来源。基层本体对齐通常是作为其他数据操作的副作用产生的，可能是特定于用户的、特定于任务的、近似的，甚至是矛盾的。本文报告了我们重用基层阶级对齐来对齐阶级层次的工作。草根阶层的结盟虽然是近似的，但仍然揭示了不同阶层之间关系的一些事实。我们形式化了关于类关系的事实，这些事实可以从不同情况下的对齐中推断出来。然后我们对事实知识库应用前向链推理来推断更多的事实。然后利用事实知识库进行本体对齐。为了处理不确定性和不一致性，每个事实都与一个证据相关联，该证据说明了事实是如何获得的。在不一致的情况下，这些证据被用来选择得到更好支持的事实。

{"title":"Aligning class hierarchies with grass-roots class alignment","authors":"B. Yan","doi":"10.1109/WI.2005.23","DOIUrl":"https://doi.org/10.1109/WI.2005.23","url":null,"abstract":"The performance of an ontology alignment technique largely depends on the amount of information that can be leveraged for the alignment task. On the semantic Web, end-users may explicitly or implicitly generate ontology alignments during their use of the semantic data. This kind of end-user-generated ontology alignment, which we call grass-roots ontology alignment, is an important source of information that is yet to be taken into account by current ontology alignment techniques. Grass-roots ontology alignment, often generated as a side effect of other data manipulations, could be user-specific, task-specific, approximate, or even contradictory. This paper reports our work on reusing grass-roots class alignment for aligning class hierarchies. A grass-roots class alignment, though approximate, still reveals some facts about relationships between different classes. We formalize facts about class relationships that can be inferred from an alignment under different cases. We then apply forward-chaining inference to the facts knowledge base to infer more facts. The facts KB is then leveraged for ontology alignment purposes. To deal with uncertainty and inconsistency, each fact is associated with an evidence that tells how the fact is obtained. The evidences are used to select better-supported facts in case of inconsistency.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128752562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Toward the automatic compilation of multimedia encyclopedias: associating images with term descriptions on the Web 迈向多媒体百科全书的自动编纂:将图像与网络上的术语描述相关联

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

Pub Date : 2005-09-19 DOI: 10.1109/WI.2005.148

Atsushi Fujii, Tetsuya Ishikawa

To generate content for multimedia encyclopedias, we propose a method for searching the Web, seeking images associated with a specific word sense. We use text in an HTML file that links to an image as a pseudo-caption for the image, enabling text-based indexing and retrieval. We use term descriptions in a Web search site called "Cyclone" as queries and match images and texts based on word senses. We show the effectiveness of our method experimentally.

为了生成多媒体百科全书的内容，我们提出了一种搜索网络的方法，寻找与特定词义相关的图像。我们在HTML文件中使用链接到图像的文本作为图像的伪标题，从而实现基于文本的索引和检索。我们在名为“Cyclone”的Web搜索站点中使用术语描述作为查询，并根据词义匹配图像和文本。我们通过实验证明了这种方法的有效性。

引用次数: 12

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀