首页 > 最新文献

International Workshop On Research Issues in Digital Libraries最新文献

英文 中文
Advances in XML retrieval: the INEX initiative XML检索的进展:INEX倡议
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364763
N. Fuhr, M. Lalmas
We give a survey over the INEX initiative, which focuses on the evaluation of content -based access to XML documents. First, we describe the test setting and the various tracks of INEX. Then we present a new framework for the different views on XML retrieval, where we distinguish between the structural and the content dimension; in this space, current activities are located as well as new areas of research are pointed out. Finally, we discuss the combination of semantic web technologies and XML retrieval, pointing out potential benefits as well as the need for further research in this area.
我们对INEX计划进行了调查,该计划的重点是评估对XML文档的基于内容的访问。首先,我们描述了测试设置和INEX的各种轨道。然后针对XML检索的不同观点,提出了一种新的框架,区分了结构维度和内容维度;在这个空间中,当前的活动被定位,并指出了新的研究领域。最后,我们讨论了语义web技术与XML检索技术的结合,指出了潜在的好处以及在这一领域进一步研究的需要。
{"title":"Advances in XML retrieval: the INEX initiative","authors":"N. Fuhr, M. Lalmas","doi":"10.1145/1364742.1364763","DOIUrl":"https://doi.org/10.1145/1364742.1364763","url":null,"abstract":"We give a survey over the INEX initiative, which focuses on the evaluation of content -based access to XML documents. First, we describe the test setting and the various tracks of INEX. Then we present a new framework for the different views on XML retrieval, where we distinguish between the structural and the content dimension; in this space, current activities are located as well as new areas of research are pointed out. Finally, we discuss the combination of semantic web technologies and XML retrieval, pointing out potential benefits as well as the need for further research in this area.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123851268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
How the dragons work: searching in a web 龙是如何工作的:在网上搜索
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364747
I. Witten
Search engines -- "web dragons" -- are the portals through which we access society's treasure trove of information. They do not publish the algorithms they use to sort and filter information, yet how they work is one of the most important questions of our time. Google's PageRank is a way of measuring the prestige of each web page in terms of who links to it: it reflects the experience of a surfer condemned to click randomly around the web forever. The HITS technique distinguishes "hubs" that point to reputable sources from "authorities," the sources themselves. This helps differentiate communities on the web, which in turn can tease out alternative interpretations of ambiguous query terms. RankNet uses machine learning techniques to rank documents by predicting relevance judgments based on training data. This article explains in non-technical terms how the dragons work.
搜索引擎——“网络巨龙”——是我们获取社会信息宝库的门户。他们没有公布他们用来分类和过滤信息的算法,但它们是如何工作的是我们这个时代最重要的问题之一。b谷歌的PageRank是一种衡量每个网页的声望的方法,根据链接到它的人来衡量:它反映了一个被谴责永远在网络上随机点击的冲浪者的体验。HITS技术将指向信誉良好的资源的“中心”与资源本身的“权威”区分开来。这有助于区分网络上的社区,这反过来又可以梳理出模棱两可的查询术语的替代解释。RankNet使用机器学习技术通过预测基于训练数据的相关性判断来对文档进行排名。本文用非技术术语解释龙是如何工作的。
{"title":"How the dragons work: searching in a web","authors":"I. Witten","doi":"10.1145/1364742.1364747","DOIUrl":"https://doi.org/10.1145/1364742.1364747","url":null,"abstract":"Search engines -- \"web dragons\" -- are the portals through which we access society's treasure trove of information. They do not publish the algorithms they use to sort and filter information, yet how they work is one of the most important questions of our time. Google's PageRank is a way of measuring the prestige of each web page in terms of who links to it: it reflects the experience of a surfer condemned to click randomly around the web forever. The HITS technique distinguishes \"hubs\" that point to reputable sources from \"authorities,\" the sources themselves. This helps differentiate communities on the web, which in turn can tease out alternative interpretations of ambiguous query terms. RankNet uses machine learning techniques to rank documents by predicting relevance judgments based on training data. This article explains in non-technical terms how the dragons work.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122035366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Document image analysis for digital libraries 数字图书馆文献图像分析
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364758
Prateek Sarkar
Digital Libraries have many forms -- institutional libraries for information dissemination, document repositories for record-keeping, and personal digital libraries for organizing personal thoughts, knowledge, and course of action. Digital image content (scanned or otherwise) is a substantial component of all of these libraries. Processing and analyzing these images include tasks such as document layout understanding, character recognition, functional role labeling, image enhancement, indexing, organizing, restructuring, summarizing, cross linking, redaction, privacy management, and distribution. At the Palo Alto Research Center, we conduct research on several aspects of document analysis for Digital Libraries ranging from raw image transformations to linguistic analysis to interactive sensemaking tools. I shall describe a few recent research activities in the realm of document image analysis or their use in digital libraries.
数字图书馆有多种形式——用于信息传播的机构图书馆,用于记录保存的文档存储库,以及用于组织个人思想、知识和行动的个人数字图书馆。数字图像内容(扫描或其他)是所有这些库的重要组成部分。处理和分析这些图像包括文档布局理解、字符识别、功能角色标记、图像增强、索引、组织、重组、总结、交叉链接、编校、隐私管理和分发等任务。在帕洛阿尔托研究中心,我们对数字图书馆文档分析的几个方面进行研究,从原始图像转换到语言分析,再到交互式语义生成工具。我将描述一些最近在文档图像分析领域的研究活动或它们在数字图书馆中的应用。
{"title":"Document image analysis for digital libraries","authors":"Prateek Sarkar","doi":"10.1145/1364742.1364758","DOIUrl":"https://doi.org/10.1145/1364742.1364758","url":null,"abstract":"Digital Libraries have many forms -- institutional libraries for information dissemination, document repositories for record-keeping, and personal digital libraries for organizing personal thoughts, knowledge, and course of action. Digital image content (scanned or otherwise) is a substantial component of all of these libraries. Processing and analyzing these images include tasks such as document layout understanding, character recognition, functional role labeling, image enhancement, indexing, organizing, restructuring, summarizing, cross linking, redaction, privacy management, and distribution.\u0000 At the Palo Alto Research Center, we conduct research on several aspects of document analysis for Digital Libraries ranging from raw image transformations to linguistic analysis to interactive sensemaking tools. I shall describe a few recent research activities in the realm of document image analysis or their use in digital libraries.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127394328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Vagueness and uncertainty in information retrieval: how can fuzzy sets help? 信息检索中的模糊性和不确定性:模糊集如何起作用?
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364746
D. Kraft, G. Pasi, Gloria Bordogna
The field of fuzzy information systems has grown and is maturing. In this paper, some applications of fuzzy set theory to information retrieval are described, as well as the more recent outcomes of research in this field. Fuzzy set theory is applied to information retrieval with the main aim being to define flexible systems, i.e., systems that can represent and manage the vagueness and subjectivity which characterizes the process of information representation and retrieval, one of the main objectives of artificial intelligence.
模糊信息系统领域已经发展壮大,并日趋成熟。本文介绍了模糊集合理论在信息检索中的一些应用,以及该领域的最新研究成果。模糊集合理论应用于信息检索,其主要目的是定义柔性系统,即能够表示和管理信息表示和检索过程中的模糊性和主观性的系统,这是人工智能的主要目标之一。
{"title":"Vagueness and uncertainty in information retrieval: how can fuzzy sets help?","authors":"D. Kraft, G. Pasi, Gloria Bordogna","doi":"10.1145/1364742.1364746","DOIUrl":"https://doi.org/10.1145/1364742.1364746","url":null,"abstract":"The field of fuzzy information systems has grown and is maturing. In this paper, some applications of fuzzy set theory to information retrieval are described, as well as the more recent outcomes of research in this field. Fuzzy set theory is applied to information retrieval with the main aim being to define flexible systems, i.e., systems that can represent and manage the vagueness and subjectivity which characterizes the process of information representation and retrieval, one of the main objectives of artificial intelligence.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123497211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Diffusion maps-based image clustering 基于扩散图的图像聚类
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364754
R. Agrawal, C.-H. Wu, W. Grosky, F. Fotouhi
In the clustering of large number of images using low-level features, one of the problems encountered is the high dimensional feature space. The high dimensionality of feature spaces leads to unnecessary cost in feature selection and also in the distance measurement during the clustering process. In this paper, we propose an approach to reduce the dimensionality of the feature space based on diffusion maps. In the proposed approach, each image is represented by a set of tiles. A visual keyword-image matrix is derived from classifying these tiles into a set of clusters and counting the occurrence of each cluster in each image of our database. The visual keyword-image matrix is similar to the term-document matrix in information retrieval. We use diffusion maps to reduce the dimensionality of visual keyword matrix. By reducing the dimensionality of the image representation, we can save computation cost significantly. We compare the performance between the proposed approach and the approach that uses the global MPEG-7 color descriptors. The results demonstrate the improvements.
在使用低级特征对大量图像进行聚类时,遇到的问题之一是高维特征空间。特征空间的高维导致了聚类过程中特征选择和距离测量的不必要开销。本文提出了一种基于扩散图的特征空间降维方法。在提出的方法中,每个图像由一组图块表示。通过将这些图块分类为一组聚类,并计算每个聚类在我们数据库的每张图像中的出现次数,得出一个视觉关键字-图像矩阵。视觉关键词-图像矩阵类似于信息检索中的词-文档矩阵。我们使用扩散图来降低视觉关键词矩阵的维数。通过降低图像表示的维数,可以显著节省计算成本。我们比较了所提出的方法和使用全局MPEG-7颜色描述符的方法的性能。结果证明了这些改进。
{"title":"Diffusion maps-based image clustering","authors":"R. Agrawal, C.-H. Wu, W. Grosky, F. Fotouhi","doi":"10.1145/1364742.1364754","DOIUrl":"https://doi.org/10.1145/1364742.1364754","url":null,"abstract":"In the clustering of large number of images using low-level features, one of the problems encountered is the high dimensional feature space. The high dimensionality of feature spaces leads to unnecessary cost in feature selection and also in the distance measurement during the clustering process. In this paper, we propose an approach to reduce the dimensionality of the feature space based on diffusion maps. In the proposed approach, each image is represented by a set of tiles. A visual keyword-image matrix is derived from classifying these tiles into a set of clusters and counting the occurrence of each cluster in each image of our database. The visual keyword-image matrix is similar to the term-document matrix in information retrieval. We use diffusion maps to reduce the dimensionality of visual keyword matrix. By reducing the dimensionality of the image representation, we can save computation cost significantly. We compare the performance between the proposed approach and the approach that uses the global MPEG-7 color descriptors. The results demonstrate the improvements.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126065183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Digitizing, coding, annotating, disseminating, and preserving documents 数字化、编码、注释、传播和保存文件
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364757
G. Nagy
We examine some research issues in pattern recognition and image processing that have been spurred by the needs of digital libraries. Broader -- and not only linguistic -- context must be introduced in character recognition on low-contrast, tightly-set documents because the conversion of documents to coded (searchable) form is lagging far behind conversion to image formats. At the same time, the prevalence of imaged documents over coded documents gives rise to interesting research problems in interactive annotation of document images. At the level of circulation, reformatting document images to accommodate diverse user needs remains a challenge.
我们研究了一些模式识别和图像处理方面的研究问题,这些问题是由数字图书馆的需求所激发的。在对低对比度、严格设置的文档进行字符识别时,必须引入更广泛(而不仅仅是语言)的上下文,因为文档到编码(可搜索)形式的转换远远落后于到图像格式的转换。同时,图像文档相对于编码文档的流行,也给文档图像的交互式标注带来了有趣的研究问题。在流通层面,重新格式化文档图像以适应不同的用户需求仍然是一个挑战。
{"title":"Digitizing, coding, annotating, disseminating, and preserving documents","authors":"G. Nagy","doi":"10.1145/1364742.1364757","DOIUrl":"https://doi.org/10.1145/1364742.1364757","url":null,"abstract":"We examine some research issues in pattern recognition and image processing that have been spurred by the needs of digital libraries. Broader -- and not only linguistic -- context must be introduced in character recognition on low-contrast, tightly-set documents because the conversion of documents to coded (searchable) form is lagging far behind conversion to image formats. At the same time, the prevalence of imaged documents over coded documents gives rise to interesting research problems in interactive annotation of document images. At the level of circulation, reformatting document images to accommodate diverse user needs remains a challenge.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"11 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129480720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
International Workshop On Research Issues in Digital Libraries
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1