首页 > 最新文献

International Workshop On Research Issues in Digital Libraries最新文献

英文 中文
On the science of search: statistical approaches, evaluation, optimisation 关于搜索的科学:统计方法,评估,优化
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364745
S. Robertson
This paper, based on a talk, presents an overview of evaluation experiments in information retrieval, and also of statistical approaches to search. A strong connection exists between them: the notion that the objective of search can be expressed in terms of the measures used for evaluation informs the statistical theory in several ways. The latest manifestation of this connection is the work on optimization of ranking algorithms, using machine learning techniques.
本文在一次演讲的基础上,概述了信息检索中的评价实验,以及统计方法在检索中的应用。它们之间存在着强烈的联系:搜索的目标可以用用于评估的度量来表示,这一概念在几个方面为统计理论提供了信息。这种联系的最新表现是使用机器学习技术对排名算法进行优化。
{"title":"On the science of search: statistical approaches, evaluation, optimisation","authors":"S. Robertson","doi":"10.1145/1364742.1364745","DOIUrl":"https://doi.org/10.1145/1364742.1364745","url":null,"abstract":"This paper, based on a talk, presents an overview of evaluation experiments in information retrieval, and also of statistical approaches to search. A strong connection exists between them: the notion that the objective of search can be expressed in terms of the measures used for evaluation informs the statistical theory in several ways. The latest manifestation of this connection is the work on optimization of ranking algorithms, using machine learning techniques.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115845661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
How to compose a complex document recognition system 如何组成一个复杂的文档识别系统
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364759
H. Fujisawa
The technical challenges in document analysis and recognition have been to solve the problems of uncertainty and variability. From our experiences in developing OCRs, business form readers, and postal address recognition engines, we would like to present design principles to cope with these problems of uncertainty and variability. When the targets of document recognition are complex and diversified, the recognition engine needs to solve many different kinds of pattern recognition problems, which are a reflection of uncertainty and variability. Inevitably, the engine becomes complex, raising a question of how to combine its subcomponents, which are not perfect in their accuracies. The design principles will be explained with examples in postal address recognition.
文件分析和识别的技术挑战一直是解决不确定性和可变性问题。根据我们在开发ocr、业务表单读取器和邮政地址识别引擎方面的经验,我们希望提出处理这些不确定性和可变性问题的设计原则。当文档识别目标复杂多样时,识别引擎需要解决许多不同类型的模式识别问题,这是不确定性和可变性的反映。不可避免地,引擎变得复杂,提出了如何组合其子组件的问题,这些子组件在精度上并不完美。以邮政地址识别为例说明设计原则。
{"title":"How to compose a complex document recognition system","authors":"H. Fujisawa","doi":"10.1145/1364742.1364759","DOIUrl":"https://doi.org/10.1145/1364742.1364759","url":null,"abstract":"The technical challenges in document analysis and recognition have been to solve the problems of uncertainty and variability. From our experiences in developing OCRs, business form readers, and postal address recognition engines, we would like to present design principles to cope with these problems of uncertainty and variability. When the targets of document recognition are complex and diversified, the recognition engine needs to solve many different kinds of pattern recognition problems, which are a reflection of uncertainty and variability. Inevitably, the engine becomes complex, raising a question of how to combine its subcomponents, which are not perfect in their accuracies. The design principles will be explained with examples in postal address recognition.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132289633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding an answer to a question 寻找问题的答案
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364751
Brigitte Grau
The huge quantity of available electronic information leads to a growing need for users to have tools able to be precise and selective. These kinds of tools have to provide answers to requests quite rapidly without requiring the user to explore each document, to reformulate her request or to seek for the answer inside documents. From that viewpoint, finding an answer consists not only in finding relevant documents but also in extracting relevant parts. This leads us to express the question-answering problem in terms of an information retrieval problem that can be solved using natural language processing (NLP) approaches. In my talk, I will focus on defining what a "good" answer is, and how a system can find it. A good answer has to give the required piece of information. However, it is not sufficient; it also has both to be presented within its context of interpretation and to be justified in order to give a user means to evaluate if the answer fits her needs and is appropriate. One can view searching an answer to a question as a reformulation problem: according to what is asked, find one of the different linguistic expressions of the answer in all candidate sentences. Within this framework, interlingual question-answering can also be seen as another kind of linguistic variation. The answer phrasing can be considered as an affirmative reformulation of the question, partly or totally, which entails the definition of models that match with sentences containing the answer. According to the different approaches, the kinds of model and the matching criteria greatly differ. It can consist in building a structured representation that makes explicit the semantic relations between the concepts of the question and that is compared to a similar representation of sentences. As this approach requires a syntactic parser and a semantic knowledge base, which are not always available in all the languages, systems often apply a less formal approach based on a similarity measure between a passage and the question and answers are extracted from highest scored passages. Similarity involves different criteria: question terms and their linguistic variations in passages, syntactic proximity, answer type. We will see that, in such an approach, justifications can be envisioned by using text themselves, considered as depositories of semantic knowledge. I will focus on the approach the LIR group of LIMSI has taken for its monolingual and bilingual systems.
由于可利用的电子信息数量巨大,用户越来越需要能够精确和有选择性的工具。这些类型的工具必须非常快速地为请求提供答案,而不需要用户浏览每个文档,重新制定请求或在文档中寻找答案。从这个角度来看,找到答案不仅包括找到相关的文件,还包括提取相关的部分。这导致我们将问答问题表达为可以使用自然语言处理(NLP)方法解决的信息检索问题。在我的演讲中,我将专注于定义什么是“好”答案,以及系统如何找到它。一个好的答案必须提供所需的信息。然而,这是不够的;它还必须在其解释的背景下提出,并证明是合理的,以便为用户提供评估答案是否符合其需要和适当的方法。我们可以把搜索问题的答案看作是一个重新表述的问题:根据问题的内容,在所有候选句子中找到答案的不同语言表达之一。在这个框架下,语际问答也可以看作是另一种语言变异。答案措辞可以被认为是部分或全部对问题的肯定重新表述,这需要定义与包含答案的句子相匹配的模型。根据不同的方法,模型的种类和匹配标准也有很大的不同。它可以包括建立一个结构化的表示,明确问题概念之间的语义关系,并将其与句子的类似表示进行比较。由于这种方法需要语法解析器和语义知识库,而这在所有语言中并不总是可用,因此系统通常采用一种不太正式的方法,基于文章和问题之间的相似性度量,并从得分最高的文章中提取答案。相似性涉及不同的标准:问题术语及其在段落中的语言变化,句法接近性,答案类型。我们将看到,在这种方法中,可以通过使用文本本身来设想理由,文本本身被认为是语义知识的存储库。我将重点介绍LIMSI的LIR小组为其单语和双语系统所采取的方法。
{"title":"Finding an answer to a question","authors":"Brigitte Grau","doi":"10.1145/1364742.1364751","DOIUrl":"https://doi.org/10.1145/1364742.1364751","url":null,"abstract":"The huge quantity of available electronic information leads to a growing need for users to have tools able to be precise and selective. These kinds of tools have to provide answers to requests quite rapidly without requiring the user to explore each document, to reformulate her request or to seek for the answer inside documents. From that viewpoint, finding an answer consists not only in finding relevant documents but also in extracting relevant parts. This leads us to express the question-answering problem in terms of an information retrieval problem that can be solved using natural language processing (NLP) approaches. In my talk, I will focus on defining what a \"good\" answer is, and how a system can find it.\u0000 A good answer has to give the required piece of information. However, it is not sufficient; it also has both to be presented within its context of interpretation and to be justified in order to give a user means to evaluate if the answer fits her needs and is appropriate.\u0000 One can view searching an answer to a question as a reformulation problem: according to what is asked, find one of the different linguistic expressions of the answer in all candidate sentences. Within this framework, interlingual question-answering can also be seen as another kind of linguistic variation. The answer phrasing can be considered as an affirmative reformulation of the question, partly or totally, which entails the definition of models that match with sentences containing the answer. According to the different approaches, the kinds of model and the matching criteria greatly differ. It can consist in building a structured representation that makes explicit the semantic relations between the concepts of the question and that is compared to a similar representation of sentences. As this approach requires a syntactic parser and a semantic knowledge base, which are not always available in all the languages, systems often apply a less formal approach based on a similarity measure between a passage and the question and answers are extracted from highest scored passages. Similarity involves different criteria: question terms and their linguistic variations in passages, syntactic proximity, answer type. We will see that, in such an approach, justifications can be envisioned by using text themselves, considered as depositories of semantic knowledge. I will focus on the approach the LIR group of LIMSI has taken for its monolingual and bilingual systems.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124011340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Information retrieval and digital libraries: lessons of research 信息检索与数字图书馆:研究的经验教训
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364743
Karen Spärck Jones
This paper reviews lessons from the history of information retrieval research, with particular emphasis on recent developments. These have demonstrated the value of statistical techniques for retrieval, and have also shown that they have an important, though not exclusive, part to play in other information processing tasks, like question asnwering and summarising. The heterogeneous materials that digital libraries are expected to cover, their scale, and their changing composition, imply that statistical methods, which are general-purpose and very flexible, have significant potential value for the digital libraries of the future.
本文回顾了信息检索研究的历史教训,特别强调了最近的发展。这些都证明了统计技术对检索的价值,也表明它们在其他信息处理任务中发挥着重要的作用,尽管不是唯一的作用,比如回答问题和总结。数字图书馆有望涵盖的异构材料、它们的规模和它们不断变化的组成,意味着通用且非常灵活的统计方法对未来的数字图书馆具有重要的潜在价值。
{"title":"Information retrieval and digital libraries: lessons of research","authors":"Karen Spärck Jones","doi":"10.1145/1364742.1364743","DOIUrl":"https://doi.org/10.1145/1364742.1364743","url":null,"abstract":"This paper reviews lessons from the history of information retrieval research, with particular emphasis on recent developments. These have demonstrated the value of statistical techniques for retrieval, and have also shown that they have an important, though not exclusive, part to play in other information processing tasks, like question asnwering and summarising. The heterogeneous materials that digital libraries are expected to cover, their scale, and their changing composition, imply that statistical methods, which are general-purpose and very flexible, have significant potential value for the digital libraries of the future.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129187307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Open source search and research 开源搜索和研究
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364748
M. Beigbeder, Wray L. Buntine, Wai Gen Yee
In this paper, we present a review of criteria for the evaluation of open source information retrieval tools and provide an overview of some of those that are more popular. The question of interaction between research and availability of open source search tools is addressed.
在本文中,我们回顾了评估开源信息检索工具的标准,并概述了其中一些比较流行的标准。研究和开放源码搜索工具的可用性之间的交互问题被解决。
{"title":"Open source search and research","authors":"M. Beigbeder, Wray L. Buntine, Wai Gen Yee","doi":"10.1145/1364742.1364748","DOIUrl":"https://doi.org/10.1145/1364742.1364748","url":null,"abstract":"In this paper, we present a review of criteria for the evaluation of open source information retrieval tools and provide an overview of some of those that are more popular. The question of interaction between research and availability of open source search tools is addressed.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129845071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Digital audiovisual repositories: an introduction 数字音像资源库:介绍
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364753
Richard Wright
This paper briefly describes the essential aspects of the digital world that audiovisual archives are entering - or being swallowed-up in. The crucial issue is whether archives will sink or swim in this all-digital environment. The core issue is defining - and meeting - the requirements for a secure, sustainable digital repository.
本文简要介绍了音像档案正在进入或正在被吞噬的数字世界的基本方面。关键的问题是,在这个全数字化的环境中,档案是会沉沦还是会繁荣。核心问题是定义并满足安全、可持续的数字存储库的需求。
{"title":"Digital audiovisual repositories: an introduction","authors":"Richard Wright","doi":"10.1145/1364742.1364753","DOIUrl":"https://doi.org/10.1145/1364742.1364753","url":null,"abstract":"This paper briefly describes the essential aspects of the digital world that audiovisual archives are entering - or being swallowed-up in. The crucial issue is whether archives will sink or swim in this all-digital environment. The core issue is defining - and meeting - the requirements for a secure, sustainable digital repository.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122269774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
From CLIR to CLIE: some lessons in NTCIR evaluation 从CLIR到CLIE: NTCIR评价的几点启示
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364762
Hsin-Hsi Chen
Cross-language information retrieval (CLIR) facilitates the use of one language to access documents in other languages. Cross-language information extraction (CLIE) extracts relevant information in finer granularity from multilingual documents for some specific applications like summarization, question answering, opinion extraction, etc. This paper reviews CLIR, CLQA, and opinion analysis tasks in NTCIR evaluation. The design methodologies and some key technologies are reported.
跨语言信息检索(CLIR)便于使用一种语言访问其他语言的文档。跨语言信息提取(CLIE)是从多语言文档中以更细的粒度提取相关信息,用于摘要、问答、意见提取等特定应用。本文综述了NTCIR评估中的CLIR、CLQA和意见分析任务。介绍了系统的设计方法和关键技术。
{"title":"From CLIR to CLIE: some lessons in NTCIR evaluation","authors":"Hsin-Hsi Chen","doi":"10.1145/1364742.1364762","DOIUrl":"https://doi.org/10.1145/1364742.1364762","url":null,"abstract":"Cross-language information retrieval (CLIR) facilitates the use of one language to access documents in other languages. Cross-language information extraction (CLIE) extracts relevant information in finer granularity from multilingual documents for some specific applications like summarization, question answering, opinion extraction, etc. This paper reviews CLIR, CLQA, and opinion analysis tasks in NTCIR evaluation. The design methodologies and some key technologies are reported.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127034446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shallow syntax analysis in Sanskrit guided by semantic nets constraints 语义网约束下的梵文浅语法分析
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364750
G. Huet
We present the state of the art of a computational platform for the analysis of classical Sanskrit. The platform comprises modules for phonology, morphology, segmentation and shallow syntax analysis, organized around a structured lexical database. It relies on the Zen toolkit for finite state automata and transducers, which provides data structures and algorithms for the modular construction and execution of finite state machines, in a functional framework. Some of the layers proceed in bottom-up synthesis mode - for instance, noun and verb morphological modules generate all inflected forms from stems and roots listed in the lexicon. Morphemes are assembled through internal sandhi, and the inflected forms are stored with morphological tags in dictionaries usable for lemmatizing. These dictionaries are then compiled into transducers, implementing the analysis of external sandhi, the phonological process which merges words together by euphony. This provides a tagging segmenter, which analyses a sentence presented as a stream of phonemes and produces a stream of tagged lexical entries, hyperlinked to the lexicon. The next layer is a syntax analyser, guided by semantic nets constraints expressing dependencies between the word forms. Finite verb forms demand semantic roles, according to valency patterns depending on the voice (active, passive) of the form and the governance (transitive, etc) of the root. Conversely, noun/adjective forms provide actors which may fill those roles, provided agreement constraints are satisfied. Tool words are mapped to transducers operating on tagged streams, allowing the modeling of linguistic phenomena such as coordination by abstract interpretation of actor streams. The parser ranks the various interpretations (matching actors with roles) with penalties, and returns to the user the minimum penalty analyses, for final validation of ambiguities. The whole platform is organized as a Web service, allowing the piecewise tagging of a Sanskrit text.
我们提出了一个用于分析古典梵文的计算平台的最新状态。该平台包括音系、词法、分词和浅层语法分析模块,围绕一个结构化的词汇数据库进行组织。它依赖于有限状态自动机和传感器的Zen工具包,该工具包在功能框架中为有限状态机的模块化构造和执行提供数据结构和算法。有些层以自下而上的合成模式进行——例如,名词和动词形态模块根据词典中列出的词干和词根生成所有的屈折形式。语素通过内部变调组合,词形变化形式与词形标记一起存储在词典中,用于词法转换。然后,这些词典被编译成换能器,实现外部变调的分析,即通过谐音将单词合并在一起的语音过程。它提供了一个标记切分器,该切分器分析作为音素流呈现的句子,并生成标记的词汇条目流,这些条目与词汇有超链接。下一层是语法分析器,由表达词形式之间依赖关系的语义网约束指导。有限动词形式需要语义角色,根据形式的语态(主动、被动)和词根的支配(及物等)的配价模式。相反,名词/形容词形式提供了可以填补这些角色的行动者,前提是协议约束得到满足。工具词被映射到在标记流上操作的换能器,允许对语言现象进行建模,例如通过对行动者流的抽象解释进行协调。解析器根据惩罚对各种解释(匹配参与者和角色)进行排序,并将最小惩罚分析返回给用户,以便对歧义进行最终验证。整个平台被组织为一个Web服务,允许对梵文文本进行分段标记。
{"title":"Shallow syntax analysis in Sanskrit guided by semantic nets constraints","authors":"G. Huet","doi":"10.1145/1364742.1364750","DOIUrl":"https://doi.org/10.1145/1364742.1364750","url":null,"abstract":"We present the state of the art of a computational platform for the analysis of classical Sanskrit. The platform comprises modules for phonology, morphology, segmentation and shallow syntax analysis, organized around a structured lexical database. It relies on the Zen toolkit for finite state automata and transducers, which provides data structures and algorithms for the modular construction and execution of finite state machines, in a functional framework.\u0000 Some of the layers proceed in bottom-up synthesis mode - for instance, noun and verb morphological modules generate all inflected forms from stems and roots listed in the lexicon. Morphemes are assembled through internal sandhi, and the inflected forms are stored with morphological tags in dictionaries usable for lemmatizing. These dictionaries are then compiled into transducers, implementing the analysis of external sandhi, the phonological process which merges words together by euphony. This provides a tagging segmenter, which analyses a sentence presented as a stream of phonemes and produces a stream of tagged lexical entries, hyperlinked to the lexicon.\u0000 The next layer is a syntax analyser, guided by semantic nets constraints expressing dependencies between the word forms. Finite verb forms demand semantic roles, according to valency patterns depending on the voice (active, passive) of the form and the governance (transitive, etc) of the root. Conversely, noun/adjective forms provide actors which may fill those roles, provided agreement constraints are satisfied. Tool words are mapped to transducers operating on tagged streams, allowing the modeling of linguistic phenomena such as coordination by abstract interpretation of actor streams. The parser ranks the various interpretations (matching actors with roles) with penalties, and returns to the user the minimum penalty analyses, for final validation of ambiguities. The whole platform is organized as a Web service, allowing the piecewise tagging of a Sanskrit text.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131164032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Toward a common semantics between media and languages 媒体和语言之间的共同语义
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364755
C. Fluhr, G. Grefenstette, Adrian Daniel Popescu
For a computer to recognize objects, persons, situations or actions in multimedia, it needs to have learned models of each thing beforehand. For the moment, no large, general collection of training examples exists for the wide variety of things that we would want to automatically recognize in multimedia, video and still images. We believe that the WWW and current technology can allow us to automatically build such a resource. This paper describes a methodology for the construction of a grounded, general purpose, multimedia ontology that is instantiated through web processing. In this hierarchically organized ontology, concepts corresponding to concrete objects, persons, situations and actions are linked with still images, videos and sounds that represent exemplars of each concept. These examples are necessary resources for computing discriminating signatures for the recognition of the concepts in still images or videos. Since images retrieved using existing image search engines contain much noise hand are not always representative, we also present here our methodology for finding good representative for each concept.
计算机要在多媒体中识别物体、人物、情景或动作,就需要事先学习每种事物的模型。目前,对于我们想要在多媒体、视频和静止图像中自动识别的各种各样的东西,还没有一个大的、通用的训练示例集。我们相信WWW和目前的技术可以让我们自动建立这样一个资源。本文描述了一种通过web处理实例化的基于通用的多媒体本体的构建方法。在这个分层组织的本体中,与具体物体、人物、情景和动作相对应的概念与代表每个概念范例的静态图像、视频和声音联系在一起。这些示例是计算判别签名以识别静态图像或视频中的概念的必要资源。由于使用现有图像搜索引擎检索到的图像包含许多噪声,并且并不总是具有代表性,因此我们在这里还介绍了为每个概念寻找良好代表性的方法。
{"title":"Toward a common semantics between media and languages","authors":"C. Fluhr, G. Grefenstette, Adrian Daniel Popescu","doi":"10.1145/1364742.1364755","DOIUrl":"https://doi.org/10.1145/1364742.1364755","url":null,"abstract":"For a computer to recognize objects, persons, situations or actions in multimedia, it needs to have learned models of each thing beforehand. For the moment, no large, general collection of training examples exists for the wide variety of things that we would want to automatically recognize in multimedia, video and still images. We believe that the WWW and current technology can allow us to automatically build such a resource. This paper describes a methodology for the construction of a grounded, general purpose, multimedia ontology that is instantiated through web processing. In this hierarchically organized ontology, concepts corresponding to concrete objects, persons, situations and actions are linked with still images, videos and sounds that represent exemplars of each concept. These examples are necessary resources for computing discriminating signatures for the recognition of the concepts in still images or videos. Since images retrieved using existing image search engines contain much noise hand are not always representative, we also present here our methodology for finding good representative for each concept.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125048554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multilingual information access: the contribution of evaluation 多语种信息获取:评价的贡献
Pub Date : 2006-12-12 DOI: 10.1145/1364742.1364761
C. Peters
Since evaluation of cross-language information retrieval systems began at TREC in 1997 and NTCIR in 1998 and, in particular, with the launch of the Cross-Language Evaluation Forum (CLEF) in 2000, considerable progress has been made in this particular sector of IR. Advances can be considered in two stages. The first stage regarded in particular the development of text retrieval systems from simple so-called "bilingual" systems in which a query in one language is used to search a document collection in another to truly "multilingual" retrieval systems where a query in one language can find relevant results from a collection of documents in multiple languages. In the second stage, the focus was no longer just on multilingual document retrieval but was diversified to include different kinds of text retrieval across languages (e.g multilingual question answering) and retrieval on different kinds of media (e.g. collections containing images or speech). However, although the results from the research perspective have been interesting, there has been little real take-up by the applications communities. In the paper we describe the results achieved by CLEF over the years and propose a third stage for multilingual system evaluation which gives far more attention to questions regarding usability and user satisfaction but also provides ways for the results achieved to be transferred to the operational context.
自从1997年TREC和1998年NTCIR开始对跨语言信息检索系统进行评估以来,特别是随着2000年跨语言评估论坛(CLEF)的启动,在这一特定的信息检索领域取得了相当大的进展。进步可以分为两个阶段。第一阶段特别关注文本检索系统的发展,从简单的所谓“双语”系统,其中一种语言的查询用于搜索另一种语言的文档集合,到真正的“多语言”检索系统,其中一种语言的查询可以从多种语言的文档集合中找到相关的结果。在第二阶段,重点不再是多语言文档检索,而是多样化,包括不同类型的跨语言文本检索(如多语言问答)和不同类型的媒体检索(如包含图像或语音的集合)。然而,尽管从研究的角度来看,结果很有趣,但应用社区却很少真正采用。在本文中,我们描述了CLEF多年来取得的成果,并提出了多语言系统评估的第三阶段,该阶段更多地关注有关可用性和用户满意度的问题,但也提供了将所取得的结果转移到操作环境的方法。
{"title":"Multilingual information access: the contribution of evaluation","authors":"C. Peters","doi":"10.1145/1364742.1364761","DOIUrl":"https://doi.org/10.1145/1364742.1364761","url":null,"abstract":"Since evaluation of cross-language information retrieval systems began at TREC in 1997 and NTCIR in 1998 and, in particular, with the launch of the Cross-Language Evaluation Forum (CLEF) in 2000, considerable progress has been made in this particular sector of IR. Advances can be considered in two stages. The first stage regarded in particular the development of text retrieval systems from simple so-called \"bilingual\" systems in which a query in one language is used to search a document collection in another to truly \"multilingual\" retrieval systems where a query in one language can find relevant results from a collection of documents in multiple languages. In the second stage, the focus was no longer just on multilingual document retrieval but was diversified to include different kinds of text retrieval across languages (e.g multilingual question answering) and retrieval on different kinds of media (e.g. collections containing images or speech). However, although the results from the research perspective have been interesting, there has been little real take-up by the applications communities. In the paper we describe the results achieved by CLEF over the years and propose a third stage for multilingual system evaluation which gives far more attention to questions regarding usability and user satisfaction but also provides ways for the results achieved to be transferred to the operational context.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128263844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
International Workshop On Research Issues in Digital Libraries
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1