Proceedings Eighth Symposium on String Processing and Information Retrieval最新文献

英文中文

Speeding-up hirschberg and hunt-szymanski LCS algorithms 加速hirschberg和hunt-szymanski LCS算法

Proceedings Eighth Symposium on String Processing and Information Retrieval

Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989737

M. Crochemore, C. Iliopoulos, Y. Pinzón

Two algorithms are presented that solve the problem of recovering the longest common subsequence of two strings. The £rst algorithm is an improvement of Hirschberg's divide-and- conquer algorithm. The second algorithm is an improvement of Hunt-Szymanski algorithm based on an ef£cient computation of all dominant match points. These two algorithms use bit-vector operations and are shown to work very ef£ciently in practice.

提出了两种用于恢复两个字符串的最长公共子序列的算法。第一种算法是对Hirschberg的分治算法的改进。第二种算法是对Hunt-Szymanski算法的改进，基于对所有优势赛点的高效计算。这两种算法都使用位向量运算，并且在实践中被证明是非常有效的。

引用次数: 16

A model for the representation and focussed retrieval of structured documents based on fuzzy aggregation 基于模糊聚合的结构化文档表示和集中检索模型

Proceedings Eighth Symposium on String Processing and Information Retrieval

Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989746

G. Kazai, M. Lalmas, T. Roelleke

Effective retrieval of structured documents should exploit the content and structural knowledge associated with the documents. This knowledge can be used to focus retrieval to the best entry points: document components that contain relevant information, and from which users can browse to retrieve further relevant components. To enable this, suitable representation methods must be developed. This paper presents a model for representing structured documents to allow for their focussed retrieval. The model is founded on fuzzy aggregation, an approach based on the fuzzy representation of linguistic quantifiers and ordered weighted averaging operators. By defining the representation of a document component as the fuzzy aggregation of its related components, we arrive at a document representation that supports the selection of best entry points.

结构化文档的有效检索应该利用与文档相关的内容和结构化知识。这些知识可用于将检索集中到最佳入口点:包含相关信息的文档组件，用户可以从中浏览以检索进一步相关的组件。为了实现这一点，必须开发合适的表示方法。本文提出了一个表示结构化文档的模型，以允许对结构化文档进行集中检索。该模型建立在模糊聚合的基础上，该方法基于语言量词的模糊表示和有序加权平均算子。通过将文档组件的表示定义为其相关组件的模糊聚合，我们得到了支持最佳入口点选择的文档表示。

引用次数: 29

Design of a graphical user interface for focussed retrieval of structured documents 为集中检索结构化文档而设计的图形用户界面

Proceedings Eighth Symposium on String Processing and Information Retrieval

Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989775

F. Crestani, P. de la Fuente, J. Vegas

Many document collections contain documents that have signiJicant structure. Structured document retrieval requires diferent models and interfaces from standard information Retrieval. An Information Retrieval system dealing with structured documents has to enable a user to query, browse retrieved documents, and provide query refinement and relevance feedback based not only on full documents but also on specific parts of them, according to their structure. Currently, very few IR systems enable such level of flexibility and interaction, because of limitations in indexing and retrieval models and in interfaces. In this papec we present the design of a new graphical user interface for structured document retrieval. This interface provides the user with an intuitive and yet powerful set of tools for structured document searching, retrieved list navigation, and search refinement.

许多文档集合包含具有重要结构的文档。结构化文档检索需要不同于标准信息检索的模型和接口。处理结构化文档的信息检索系统必须使用户能够查询、浏览检索到的文档，并根据文档的结构，不仅基于整个文档，而且基于文档的特定部分提供查询细化和相关性反馈。目前，由于索引和检索模型以及接口方面的限制，很少有IR系统能够实现这种程度的灵活性和交互性。在本文中，我们提出了一个新的结构化文档检索图形用户界面的设计。该界面为用户提供了一组直观而强大的工具，用于结构化文档搜索、检索列表导航和搜索细化。

引用次数: 4

Using semantics for paragraph selection in question answering systems 在问答系统中使用语义进行段落选择

Proceedings Eighth Symposium on String Processing and Information Retrieval

Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989765

J. Vicedo

Ejiciency of term-based Question Answering systems is limited to answering questions whose answer is expressed in documents by using mainly the same terms appearing in questions. The system presented in this paper overcomes this fact by performing open domain Question Answering (QA) from a semantic perspective. For this purpose, we define a general semantic model that represents the concepts referenced into the questions as well as a relevance measure that allows locating and ranking fragments of documents fiom whose content is possible to infer the answer to specific questions. mth the purpose of evaluation, this model has been embedded into a full QA system. Comparison of performance between our model and term-based approaches shows that QA measures improve signiJicantly when this model is applied to paragraph selection process.

基于术语的问答系统的效率仅限于回答问题，这些问题的答案主要是通过使用问题中出现的相同术语来表达的。本文提出的系统通过从语义的角度执行开放域问答(QA)来克服这一事实。为此，我们定义了一个通用的语义模型，该模型表示问题中引用的概念，以及一个相关性度量，该度量允许对文档片段进行定位和排序，这些文档片段的内容可以推断出特定问题的答案。出于评估的目的，该模型已嵌入到完整的QA系统中。我们的模型与基于术语的方法之间的性能比较表明，当该模型应用于段落选择过程时，QA度量显着提高。

引用次数: 1

Musical sequence comparison for melodic and rhythmic similarities 旋律和节奏相似的音乐序列比较

Proceedings Eighth Symposium on String Processing and Information Retrieval

Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989744

T. Kadota, Masahiro Hirao, A. Ishino, M. Takeda, A. Shinohara, F. Matsuo

We address the problem of musical sequence comparison for melodic similarity. Starting with a very simple similarity measure, we improve it step-by-step to finally obtain an acceptable measure. While the measure is still simple and has only two tuning parameters, it is better than that proposed by Mongeau and Sankoff (1990) in the sense that it can distinguish variations on a particular theme from a mixed collection of variations on multiple themes by Mozart, more successfully than the Mongeau-Sankoff measure. We also present a measure for quantifying rhythmic similarity and evaluate its performance on popular Japanese songs.

我们解决了旋律相似性的音乐序列比较问题。从一个非常简单的相似性度量开始，我们逐步改进它，最终获得一个可接受的度量。虽然该方法仍然很简单，只有两个调音参数，但它比Mongeau和Sankoff(1990)提出的方法更好，因为它可以从莫扎特的多个主题变奏曲的混合集合中区分出特定主题的变奏曲，比Mongeau-Sankoff方法更成功。我们还提出了一种量化节奏相似性的方法，并对其在日本流行歌曲中的表现进行了评估。

引用次数: 10

Fast categorisation of large document collections 大型文档集合的快速分类

Proceedings Eighth Symposium on String Processing and Information Retrieval

Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989757

Vaughan R. Shanks, H. Williams

As the volume of data stored online increases, careful management of large document collections becomes increasingly important. Categorisation is one important document management technique. It has been efectively employed in the Web, where links to documents are maintained in topic or interest areas in, for example, the manuallycategorised Yahoo!‘ hierarchy. The drawback of manual categorisation is that it is practical only on small numbers of documents, it is not scalable, and relies on the subjective judgement of human assessors. Automatic categorisation has been shown to be an accurate alternative to manual categorisation. In automatic categorisation, documents are processed and automatically assigned to pre-defined categories that represent an interest or topic area. We propose and investigate heuristics for fast categorisation of laGe collections of documents that are focused on selecting a minimal set of representative features from uncategorised documents. We show that these new heuristics are accurate-in some cases more accurate than the baseline techniques-and also permit more than three-fold reductions in processing time for categorising large collections.

随着在线存储的数据量的增加，对大型文档集合的仔细管理变得越来越重要。分类是一种重要的文档管理技术。它在Web中得到了有效的应用，在Web中，文档的链接按照主题或兴趣区域进行维护，例如，手动分类的Yahoo!的层次结构。人工分类的缺点是它只适用于少量的文档，它是不可扩展的，并且依赖于人类评估者的主观判断。自动分类已被证明是一个准确的替代人工分类。在自动分类中，文档被处理并自动分配到代表兴趣或主题领域的预定义类别。我们提出并研究了启发式方法，用于快速分类大型文档集合，重点是从未分类的文档中选择最小的代表性特征集。我们表明，这些新的启发式方法是准确的——在某些情况下比基线技术更准确——并且还允许将对大型集合进行分类的处理时间减少三倍以上。

{"title":"Fast categorisation of large document collections","authors":"Vaughan R. Shanks, H. Williams","doi":"10.1109/SPIRE.2001.989757","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989757","url":null,"abstract":"As the volume of data stored online increases, careful management of large document collections becomes increasingly important. Categorisation is one important document management technique. It has been efectively employed in the Web, where links to documents are maintained in topic or interest areas in, for example, the manuallycategorised Yahoo!‘ hierarchy. The drawback of manual categorisation is that it is practical only on small numbers of documents, it is not scalable, and relies on the subjective judgement of human assessors. Automatic categorisation has been shown to be an accurate alternative to manual categorisation. In automatic categorisation, documents are processed and automatically assigned to pre-defined categories that represent an interest or topic area. We propose and investigate heuristics for fast categorisation of laGe collections of documents that are focused on selecting a minimal set of representative features from uncategorised documents. We show that these new heuristics are accurate-in some cases more accurate than the baseline techniques-and also permit more than three-fold reductions in processing time for categorising large collections.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115830031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Compaction techniques for nextword indexes nextword索引的压缩技术

Proceedings Eighth Symposium on String Processing and Information Retrieval

Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989735

D. Bahle, H. Williams, J. Zobel

Most queries to text search engines are ranked or Boolean. Phrase querying is a powerful technique for rejning searches, but is expensive to implement on conventional indexes. In previous work we introduced the nextword index, a structure specifically designed for phrase queries, which however is relatively large. In this paper we introduce new compaction techniques for nextword indexes. In contrast to most index compression schemes, these techniques are lossy, yet as we show allow full resolution ofphrase queries without false match checking. We show experimentally that our novel techniques lead to significant savings in index size.

对文本搜索引擎的大多数查询都是排序的或布尔的。短语查询是一种强大的拒绝搜索的技术，但是在传统索引上实现的成本很高。在之前的工作中，我们介绍了nextword索引，这是一个专门为短语查询设计的结构，但是它相对较大。本文介绍了用于nextword索引的新的压缩技术。与大多数索引压缩方案相比，这些技术是有损的，但正如我们所展示的，允许完全解析短语查询而不进行错误匹配检查。我们通过实验证明，我们的新技术可以显著节省索引大小。

引用次数: 19

Re-store: a system for compressing, browsing, and searching large documents Re-store:用于压缩、浏览和搜索大型文档的系统

Proceedings Eighth Symposium on String Processing and Information Retrieval

Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989752

Alistair Moffat, R. Wan

A constant temperature box comprises a body and a lid therefor which are of adiabatic construction, and is incorporated with a container used as a cooling or heating source, the container being made flat and arranged opposite to each other at the side walls of the box body, so that the container may cool or warm foodstuffs and beverages kept within the constant temperature box.

一种恒温箱，包括绝热结构的箱体及其盖，并与作为冷却或加热源的容器相结合，该容器在箱体的侧壁处扁平且相对设置，以便该容器可以冷却或加热保存在恒温箱内的食品和饮料。

引用次数: 18

An efficient bottom-up distance between trees 树之间有效的自下而上的距离

Proceedings Eighth Symposium on String Processing and Information Retrieval

Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989761

G. Valiente

A new bottom-up distance measure for labeled trees, which is based on the largest common forest of the trees and has the threefold advantage of independence ofparticular edit costs, low complexity, and coverage of ordered and unordered trees, is introduced and related in this paper with other distance measures published in the literature. Algorithms for computing the bottom-up distance in time linear in the number ofnodes are given in full detail.

本文介绍了一种新的自底向上的标记树距离度量方法，该方法基于树木的最大共同林，具有独立于特定编辑成本、低复杂性和覆盖有序和无序树的三重优势，并将其与文献中发表的其他距离度量方法进行了比较。给出了计算自底向上距离的算法，该算法与节点数呈线性关系。

引用次数: 127

A documental database query language 一种文档数据库查询语言

Proceedings Eighth Symposium on String Processing and Information Retrieval

Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989772

N. Brisaboa, Miguel R. Penabad, Á. Places, F. J. Rodríguez

This work presents a natural language based technique to build user interfaces to query document databases through the web. We call such technique Bounded Natural Language (BNL). Interfaces based on BNL are useful to query document databases containing only structured data, containing only text or containing both of them. That is, the underlying formalism of BNL can integrate restrictions over structured and non-structured data (as text).Interfaces using BNL can be programmed ad hoc for any document database but in this paper we present a system with an ontology based architecture in which the user interface is automatically generated by a software module (User Interface Generator) capable of reading and following the ontology. This ontology is a conceptualization of the database model, which uses a label in natural language for any concept in the ontology. Each label represents the usual name for a concept in the real world.The ontology includes general concepts useful when the user is interested in documents in any corpus in the database, and specific concepts useful when the user is interested in a specific corpus. That is, databases can store one or more corpus of documents and queries can be issued either over the whole database or over a specific corpus.The ontology guides the execution of the User Interface Generator and other software modules in such a way that any change in the database does not imply making changes in the program code, because the whole system runs following the ontology. That is, if a modification in the database schema occurs, only the ontology must be changed and the User Interface Generator will produce a new and different user interface adapted to the new database.

这项工作提出了一种基于自然语言的技术来构建用户界面，通过web查询文档数据库。我们称这种技术为有界自然语言(BNL)。基于BNL的接口对于查询只包含结构化数据、只包含文本或两者都包含的文档数据库非常有用。也就是说，BNL的底层形式化可以集成对结构化和非结构化数据(如文本)的限制。使用BNL的界面可以为任何文档数据库特别编程，但在本文中，我们提出了一个基于本体架构的系统，其中用户界面由能够读取和遵循本体的软件模块(用户界面生成器)自动生成。该本体是数据库模型的概念化，它对本体中的任何概念使用自然语言的标签。每个标签代表一个概念在现实世界中的常用名称。本体包括当用户对数据库中任何语料库中的文档感兴趣时有用的一般概念，以及当用户对特定语料库感兴趣时有用的特定概念。也就是说，数据库可以存储一个或多个文档语料库，查询可以在整个数据库上发出，也可以在特定语料库上发出。本体以这样一种方式指导用户界面生成器和其他软件模块的执行，即数据库的任何更改都不意味着程序代码的更改，因为整个系统遵循本体运行。也就是说，如果数据库模式发生修改，只需更改本体，用户界面生成器将生成适应新数据库的新的不同用户界面。

{"title":"A documental database query language","authors":"N. Brisaboa, Miguel R. Penabad, Á. Places, F. J. Rodríguez","doi":"10.1109/SPIRE.2001.989772","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989772","url":null,"abstract":"This work presents a natural language based technique to build user interfaces to query document databases through the web. We call such technique Bounded Natural Language (BNL). Interfaces based on BNL are useful to query document databases containing only structured data, containing only text or containing both of them. That is, the underlying formalism of BNL can integrate restrictions over structured and non-structured data (as text).Interfaces using BNL can be programmed ad hoc for any document database but in this paper we present a system with an ontology based architecture in which the user interface is automatically generated by a software module (User Interface Generator) capable of reading and following the ontology. This ontology is a conceptualization of the database model, which uses a label in natural language for any concept in the ontology. Each label represents the usual name for a concept in the real world.The ontology includes general concepts useful when the user is interested in documents in any corpus in the database, and specific concepts useful when the user is interested in a specific corpus. That is, databases can store one or more corpus of documents and queries can be issued either over the whole database or over a specific corpus.The ontology guides the execution of the User Interface Generator and other software modules in such a way that any change in the database does not imply making changes in the program code, because the whole system runs following the ontology. That is, if a modification in the database schema occurs, only the ontology must be changed and the User Interface Generator will produce a new and different user interface adapted to the new database.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121872303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings Eighth Symposium on String Processing and Information Retrieval

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀