SEMSEARCH '10最新文献

英文中文

Distributed indexing for semantic search 用于语义搜索的分布式索引

SEMSEARCH '10

Pub Date : 2010-04-26 DOI: 10.1145/1863879.1863882

P. Mika

In this paper we describe the process of building indices for semantic search using MapReduce. We compare the two most straightforward representations of RDF data, the horizontal index structure using parallel indices and the vertical index structure using fields. We measure the cost of building indices and also compare retrieval performance on keyword queries and queries restricted to particular properties.

本文描述了使用MapReduce构建语义搜索索引的过程。我们比较了RDF数据的两种最直接的表示形式，即使用并行索引的水平索引结构和使用字段的垂直索引结构。我们测量了构建索引的成本，并比较了关键字查询和限于特定属性的查询的检索性能。

引用次数: 21

Semantically enabled exploratory video search 语义支持探索性视频搜索

SEMSEARCH '10

Pub Date : 2010-04-26 DOI: 10.1145/1863879.1863887

J. Waitelonis, Harald Sack, Johannes Hercher, Zalan Kramer

With the exponential growth of video data on the World Wide Web comes the challenge of efficient methods in video content management, content-based video search, filtering and browsing. But, video data often lacks sufficient meta-data to open up the video content and to enable pinpoint content-based search. With the advent of the 'web of data' as an extension of the current WWW new data sources can be exploited by semantically interconnecting video meta-data with the web of data. Thus, enabling better access to video repositories by deploying semantic search technologies and improving the user's search experience by supporting exploratory search strategies. We have developed the prototype semantic video search engine 'yovisto' that demonstrates the advantages of semantically enhanced exploratory video search and enables investigative navigation and browsing in large video repositories.

随着万维网上视频数据的指数级增长，对视频内容管理、基于内容的视频搜索、过滤和浏览的高效方法提出了挑战。但是，视频数据通常缺乏足够的元数据来打开视频内容并实现精确的基于内容的搜索。随着“数据网”的出现，作为当前万维网的扩展，新的数据源可以通过将视频元数据与数据网在语义上相互连接来利用。因此，通过部署语义搜索技术来更好地访问视频库，并通过支持探索性搜索策略来改善用户的搜索体验。我们已经开发了原型语义视频搜索引擎“yovisto”，它展示了语义增强的探索性视频搜索的优势，并使调查导航和浏览大型视频库成为可能。

引用次数: 13

Paraphrasing invariance coefficient: measuring para-query invariance of search engines 释义不变性系数:衡量搜索引擎的准查询不变性

SEMSEARCH '10

Pub Date : 2010-04-26 DOI: 10.1145/1863879.1863880

T. Imielinski, Jinyun Yan, Yihan Fang, Kurt Eldridge, Huiwen Yu, Peter Kelly

Paraphrasing is the restatement (or reuse) of text which preserves its meaning in another form. A para-query is a para-phrase of a search query. Humans easily recognize para-queries, but search engines are still far away from it. We claim that in order for a search engine to be called semantic it is necessary that it recognizes para-queries by returning the same search results for all para-queries of a given query. Recognizing para-queries is an important and desired ability of a search engine. It can relieve users of the burden of rephrasing queries in order to improve the relevance of results. In this paper, we cover two main threads: monolingual para-query generation (PG) and para-query recognition measurement (PRM). Para-query generation aims to automatically generate as many English para-queries as possible for a given query. We propose a novel game "Rephraser" to tackle this problem. Hundreds of para-query templates are extracted from the game's output and used to compose tens of thousands of para-queries. The goal of para-query recognition measurement is to examine to what level search engines recognize para-queries. We propose the concept of paraphrasing invariance coefficient (PIC) which is defined as the probability that search results are the same for a pair of para-queries. By using para-queries generated from the game, we design experiments to measure search engines' PIC. Results show that today's leading search engines are still inferior to human ability in recognizing para-queries. It is a long way ahead for search to be truly semantic.

释义是对文本的重述(或重复使用)，以另一种形式保留其含义。para-query是搜索查询的para-phrase。人类很容易识别准查询，但搜索引擎离它还很远。我们声称，为了使搜索引擎被称为语义搜索引擎，它必须通过对给定查询的所有准查询返回相同的搜索结果来识别准查询。识别准查询是搜索引擎的一项重要且需要的功能。它可以减轻用户改写查询的负担，以提高结果的相关性。在本文中，我们讨论了两个主要的线程:单语辅助查询生成(PG)和辅助查询识别测量(PRM)。Para-query generation旨在为给定查询自动生成尽可能多的英语Para-query。我们提出了一个新颖的游戏“Rephraser”来解决这个问题。从游戏的输出中提取了数百个准查询模板，并用于组成数万个准查询。准查询识别度量的目标是检查搜索引擎识别准查询的级别。我们提出了释义不变性系数(PIC)的概念，将其定义为一对准查询的搜索结果相同的概率。通过使用从游戏中生成的准查询，我们设计了测量搜索引擎PIC的实验。结果表明，当今领先的搜索引擎在识别类查询方面仍然不如人类的能力。搜索要真正实现语义化还有很长的路要走。

{"title":"Paraphrasing invariance coefficient: measuring para-query invariance of search engines","authors":"T. Imielinski, Jinyun Yan, Yihan Fang, Kurt Eldridge, Huiwen Yu, Peter Kelly","doi":"10.1145/1863879.1863880","DOIUrl":"https://doi.org/10.1145/1863879.1863880","url":null,"abstract":"Paraphrasing is the restatement (or reuse) of text which preserves its meaning in another form. A para-query is a para-phrase of a search query. Humans easily recognize para-queries, but search engines are still far away from it. We claim that in order for a search engine to be called semantic it is necessary that it recognizes para-queries by returning the same search results for all para-queries of a given query. Recognizing para-queries is an important and desired ability of a search engine. It can relieve users of the burden of rephrasing queries in order to improve the relevance of results.\u0000 In this paper, we cover two main threads: monolingual para-query generation (PG) and para-query recognition measurement (PRM). Para-query generation aims to automatically generate as many English para-queries as possible for a given query. We propose a novel game \"Rephraser\" to tackle this problem. Hundreds of para-query templates are extracted from the game's output and used to compose tens of thousands of para-queries.\u0000 The goal of para-query recognition measurement is to examine to what level search engines recognize para-queries. We propose the concept of paraphrasing invariance coefficient (PIC) which is defined as the probability that search results are the same for a pair of para-queries. By using para-queries generated from the game, we design experiments to measure search engines' PIC. Results show that today's leading search engines are still inferior to human ability in recognizing para-queries. It is a long way ahead for search to be truly semantic.","PeriodicalId":239913,"journal":{"name":"SEMSEARCH '10","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123599784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A large-scale system for annotating and querying quotations in news feeds 一个用于在新闻提要中注释和查询引文的大型系统

SEMSEARCH '10

Pub Date : 2010-04-26 DOI: 10.1145/1863879.1863886

Jisheng Liang, Navdeep Dhillon, K. Koperski

In this paper, we describe a system that automatically extracts quotations from news feeds, and allows efficient retrieval of the semantically annotated quotes. APIs for real-time querying of over 10 million quotes extracted from recent news feeds are publicly available. In addition, each day we add around 60 thousand new quotes extracted from around 50 thousand news articles or blogs. We apply computational linguistic techniques such as coreference resolution, entity recognition and disambiguation to improve both precision and recall of the quote detection. We support faceted search on both speakers and entities mentioned in the quotes.

在本文中，我们描述了一个从新闻提要中自动提取引文的系统，并允许有效地检索语义注释的引文。用于实时查询从最近新闻提要中提取的超过1000万条报价的api是公开的。此外，我们每天从大约5万篇新闻文章或博客中添加大约6万条新引语。我们应用计算语言技术，如共指解析、实体识别和消歧来提高引用检测的精度和召回率。我们支持对引号中提到的说话者和实体进行分面搜索。

引用次数: 14

Entity search: building bridges between two worlds 实体搜索:在两个世界之间搭建桥梁

SEMSEARCH '10

Pub Date : 2010-04-26 DOI: 10.1145/1863879.1863888

K. Balog, E. Meij, M. de Rijke

We consider the task of entity search and examine to which extent state-of-art information retrieval (IR) and semantic web (SW) technologies are capable of answering information needs that focus on entities. We also explore the potential of combining IR with SW technologies to improve the end-to-end performance on a specific entity search task. We arrive at and motivate a proposal to combine text-based entity models with semantic information from the Linked Open Data cloud.

我们考虑了实体搜索的任务，并检查了最先进的信息检索(IR)和语义网(SW)技术在多大程度上能够满足关注实体的信息需求。我们还探索了将IR与软件技术相结合的潜力，以提高特定实体搜索任务的端到端性能。我们提出了一个建议，将基于文本的实体模型与来自关联开放数据云的语义信息结合起来。

引用次数: 59

Dear search engine: what's your opinion about...?: sentiment analysis for semantic enrichment of web search results 亲爱的搜索引擎:你对……有什么看法?:对网络搜索结果进行语义丰富的情感分析

SEMSEARCH '10

Pub Date : 2010-04-26 DOI: 10.1145/1863879.1863883

Gianluca Demartini, Stefan Siersdorfer

Search Engines have become the main entry point to Web content, and a large part of the "visible" Web consists in what is presented by them as top retrieved results. Therefore, it would be desirable if the first few results were a representative sample of the entire result set. This paper provides a preliminary study about opinions contained in search engine results for controversial queries such as "cloning" or "immigration". To this end, we extract sentiment metadata from web pages, and compare search engine results for several queries. Furthermore, we compare opinions expressed in the top results to those in other retrieved results to examine whether the top-ranked pages are a good sample of all results from an opinion perspective. In a preliminary empirical analysis, we compare up to 50 results from 3 commercial search engines on 14 controversial queries to study the relation between sentiments, topics, and rankings.

搜索引擎已经成为Web内容的主要入口，“可见”Web的很大一部分由它们作为顶级检索结果呈现的内容组成。因此，如果前几个结果是整个结果集的代表性样本，这是可取的。本文提供了一项关于搜索引擎结果中包含的观点的初步研究，这些争议性查询如“克隆”或“移民”。为此，我们从网页中提取情感元数据，并对几个查询的搜索引擎结果进行比较。此外，我们将排名靠前的结果中表达的观点与其他检索结果中的观点进行比较，以从观点的角度检查排名靠前的页面是否为所有结果的良好样本。在初步的实证分析中，我们比较了3个商业搜索引擎对14个有争议的查询的多达50个结果，以研究情绪，主题和排名之间的关系。

引用次数: 28

Methodology and campaign design for the evaluation of semantic search tools 评估语义搜索工具的方法和活动设计

SEMSEARCH '10

Pub Date : 2010-04-26 DOI: 10.1145/1863879.1863889

S. Wrigley, D. Reinhard, Khadija Elbedweihy, A. Bernstein, F. Ciravegna

The main problem with the state of the art in the semantic search domain is the lack of comprehensive evaluations. There exist only a few efforts to evaluate semantic search tools and to compare the results with other evaluations of their kind. In this paper, we present a systematic approach for testing and benchmarking semantic search tools that was developed within the SEALS project. Unlike other semantic web evaluations our methodology tests search tools both automatically and interactively with a human user in the loop. This allows us to test not only functional performance measures, such as precision and recall, but also usability issues, such as ease of use and comprehensibility of the query language. The paper describes the evaluation goals and assumptions; the criteria and metrics; the type of experiments we will conduct as well as the datasets required to conduct the evaluation in the context of the SEALS initiative. To our knowledge it is the first effort to present a comprehensive evaluation methodology for Semantic Web search tools.

目前语义搜索领域的主要问题是缺乏全面的评估。对语义搜索工具进行评估并将结果与同类其他评估进行比较的工作很少。在本文中，我们提出了一种系统的方法来测试和基准化在SEALS项目中开发的语义搜索工具。与其他语义web评估不同，我们的方法既自动测试搜索工具，也与循环中的人类用户进行交互。这使我们不仅可以测试功能性能度量，例如准确性和召回率，还可以测试可用性问题，例如查询语言的易用性和可理解性。本文阐述了评价目标和假设;标准和度量;我们将进行的实验类型，以及在海豹突击队行动的背景下进行评估所需的数据集。据我们所知，这是第一次为语义网搜索工具提出一个全面的评估方法。

引用次数: 10

Using BM25F for semantic search 使用BM25F进行语义搜索

SEMSEARCH '10

Pub Date : 2010-04-26 DOI: 10.1145/1863879.1863881

José R. Pérez-Agüera, Javier Arroyo, J. Greenberg, Joaquín Pérez-Iglesias, Víctor Fresno-Fernández

Information Retrieval (IR) approaches for semantic web search engines have become very populars in the last years. Popularization of different IR libraries, like Lucene, that allows IR implementations almost out-of-the-box have make easier IR integration in Semantic Web search engines. However, one of the most important features of Semantic Web documents is the structure, since this structure allow us to represent semantic in a machine readable format. In this paper we analyze the specific problems of structured IR and how to adapt weighting schemas for semantic document retrieval.

近年来，语义网络搜索引擎的信息检索(IR)方法变得非常流行。不同IR库(如Lucene)的普及，使得IR实现几乎是开箱即用的，这使得语义Web搜索引擎中的IR集成变得更加容易。然而，语义Web文档最重要的特性之一是结构，因为这种结构允许我们以机器可读的格式表示语义。本文分析了结构化信息检索的具体问题，以及如何采用加权模式进行语义文档检索。

引用次数: 80

The wisdom in tweetonomies: acquiring latent conceptual structures from social awareness streams 推特分类中的智慧:从社会意识流中获取潜在的概念结构

SEMSEARCH '10

Pub Date : 2010-04-26 DOI: 10.1145/1863879.1863885

Claudia Wagner, M. Strohmaier

Although one might argue that little wisdom can be conveyed in messages of 140 characters or less, this paper sets out to explore whether the aggregation of messages in social awareness streams, such as Twitter, conveys meaningful information about a given domain. As a research community, we know little about the structural and semantic properties of such streams, and how they can be analyzed, characterized and used. This paper introduces a network-theoretic model of social awareness stream, a so-called "tweetonomy", together with a set of stream-based measures that allow researchers to systematically define and compare different stream aggregations. We apply the model and measures to a dataset acquired from Twitter to study emerging semantics in selected streams. The network-theoretic model and the corresponding measures introduced in this paper are relevant for researchers interested in information retrieval and ontology learning from social awareness streams. Our empirical findings demonstrate that different social awareness stream aggregations exhibit interesting differences, making them amenable for different applications.

尽管有人可能会争辩说，140个字符或更少的信息传达不了多少智慧，但本文开始探索社会意识流(如Twitter)中的信息聚合是否传达了有关给定领域的有意义的信息。作为一个研究团体，我们对这些流的结构和语义特性知之甚少，也不知道如何分析、表征和使用它们。本文介绍了一个社会意识流的网络理论模型，即所谓的“tweetonomy”，以及一套基于流的测量方法，使研究人员能够系统地定义和比较不同的流聚合。我们将模型和度量应用于从Twitter获取的数据集，以研究选定流中的新兴语义。本文提出的网络理论模型和相应的方法对研究社会意识流信息检索和本体学习的研究人员有一定的参考价值。我们的实证研究结果表明，不同的社会意识流聚合呈现出有趣的差异，使它们适用于不同的应用。

引用次数: 51

Automatic modeling of user's real world activities from the web for semantic IR 基于语义IR的用户真实世界活动的自动建模

SEMSEARCH '10

Pub Date : 2010-04-26 DOI: 10.1145/1863879.1863884

Yusuke Fukazawa, J. Ota

We have been developing a task-based service navigation system that offers to the user services relevant to the task the user wants to perform. The system allows the user to concretize his/her request in the task-model developed by human-experts. In this study, to reduce the cost of collecting a wide variety of activities, we investigate the automatic modeling of users' real world activities from the web. To extract the widest possible variety of activities with high precision and recall, we investigate the appropriate number of contents and resources to extract. Our results show that we do not need to examine the entire web, which is too time consuming; a limited number of search results (e.g. 900 from among 21,000,000 search results) from blog contents are needed. In addition, to estimate the hierarchical relationships present in the activity model with the lowest possible error rate, we propose a method that divides the representation of activities into a noun part and a verb part, and calculates the mutual information between them. The result shows almost 80% of the hierarchical relationships can be captured by the proposed method.

我们一直在开发一个基于任务的服务导航系统，为用户提供与用户想要执行的任务相关的服务。该系统允许用户在由人类专家开发的任务模型中具体化他/她的请求。在本研究中，为了降低收集各种活动的成本，我们研究了从网络中对用户的真实世界活动进行自动建模。为了以高精度和召回率提取尽可能多的各种活动，我们调查了适当数量的内容和资源来提取。我们的结果表明，我们不需要检查整个网络，这太耗时;需要从博客内容中提取有限数量的搜索结果(例如，从21,000,000个搜索结果中提取900个)。此外，为了以最低的错误率估计活动模型中存在的层次关系，我们提出了一种将活动的表示分为名词部分和动词部分，并计算它们之间相互信息的方法。结果表明，该方法可以捕获近80%的层次关系。

引用次数: 14

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

SEMSEARCH '10

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀