Proceedings of the 2015 ACM Symposium on Document Engineering最新文献

英文中文

TEXUS: A Task-based Approach for Table Extraction and Understanding TEXUS:基于任务的表提取和理解方法

Proceedings of the 2015 ACM Symposium on Document Engineering

Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797069

Roya Rastan, Hye-young Paik, J. Shepherd

In this paper, we propose a precise, comprehensive model of table processing which aims to remedy some of the problems in the discussion of table processing in the literature. The model targets application-independent, end-to-end table processing, and thus encompasses a large subset of the work in the area. The model can be used to aid the design of table processing systems (We provide an example of such a system), can be considered as a reference framework for evaluating the performance of table processing systems, and can assist in clarifying terminological differences in the table processing literature.

在本文中，我们提出了一个精确的，全面的表处理模型，旨在纠正文献中讨论表处理的一些问题。该模型的目标是独立于应用程序的端到端表处理，因此包含了该领域工作的很大一部分。该模型可用于帮助表处理系统的设计(我们提供了这样一个系统的示例)，可以被视为评估表处理系统性能的参考框架，并且可以帮助澄清表处理文献中的术语差异。

引用次数: 31

Interlinking English and Chinese RDF Data Using BabelNet 利用BabelNet实现英汉RDF数据互连

Proceedings of the 2015 ACM Symposium on Document Engineering

Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797089

Tatiana Lesnikova, Jérôme David, J. Euzenat

Linked data technologies make it possible to publish and link structured data on the Web. Although RDF is not about text, many RDF data providers publish their data in their own language. Cross-lingual interlinking aims at discovering links between identical resources across knowledge bases in different languages. In this paper, we present a method for interlinking RDF resources described in English and Chinese using the BabelNet multilingual lexicon. Resources are represented as vectors of identifiers and then similarity between these resources is computed. The method achieves an F-measure of 88%. The results are also compared to a translation-based method.

关联数据技术使得在Web上发布和链接结构化数据成为可能。尽管RDF与文本无关，但许多RDF数据提供程序都用自己的语言发布数据。跨语言互联旨在发现不同语言知识库中相同资源之间的联系。本文提出了一种利用BabelNet多语言词典实现英汉RDF资源互联的方法。将资源表示为标识符向量，然后计算这些资源之间的相似度。该方法的f值达到88%。结果还与基于翻译的方法进行了比较。

引用次数: 14

Creating eBooks with Accessible Graphics Content 创建电子书与无障碍图形内容

Proceedings of the 2015 ACM Symposium on Document Engineering

Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797076

Cagatay Goncu, K. Marriott

We present a new model for presenting graphics in eBooks to blind readers. It is based on the GraViewer app which allows an accessible graphic embedded in an iBook to be explored on an iPad using speech and non-speech audio feedback. We also introduce a web-based tool, GraAuthor, for creating such accessible graphics and describe the workflow for including these in an iBook. Unlike previous approaches our model provides an integrated digital presentation of both text and graphics and allows the general public to create accessible graphics.

我们提出了一种新的模式，为盲人读者呈现电子书中的图形。它基于GraViewer应用程序，该应用程序允许在iPad上使用语音和非语音音频反馈来探索iBook中嵌入的可访问图形。我们还介绍了一个基于网络的工具GraAuthor，用于创建这种可访问的图形，并描述了在iBook中包含这些图形的工作流程。与以前的方法不同，我们的模型提供了文本和图形的集成数字表示，并允许公众创建可访问的图形。

引用次数: 13

Detecting XSLT Rules Affected by Schema Evolution 检测受模式演化影响的XSLT规则

Proceedings of the 2015 ACM Symposium on Document Engineering

Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797086

Yang Wu, Nobutaka Suzuki

In general, schemas of XML documents are continuously updated according to changes in the real world. If a schema is updated, then XSLT stylesheets are also affected by the schema update. To maintain the consistencies of XSLT stylesheets with updated schemas, we have to detect the XSLT rules affected by schema updates. However, detecting such XSLT rules manually is a difficult and time-consuming task, since recent DTDs and XSLT stylesheets are becoming more complex and users do not always fully understand the dependencies between XSLT stylesheets and DTDs. In this paper, we consider three subclasses based on unranked tree transducer, and consider an algorithm for detecting XSLT rules affected by a DTD update for the classes.

一般来说，XML文档的模式会根据现实世界中的变化不断更新。如果模式更新了，那么XSLT样式表也会受到模式更新的影响。为了维护XSLT样式表与更新的模式的一致性，我们必须检测受模式更新影响的XSLT规则。然而，手工检测这样的XSLT规则是一项困难且耗时的任务，因为最近的dtd和XSLT样式表变得越来越复杂，而且用户并不总是完全理解XSLT样式表和dtd之间的依赖关系。在本文中，我们考虑了基于未排序树转换器的三个子类，并考虑了一种用于检测受类的DTD更新影响的XSLT规则的算法。

引用次数: 2

Session details: Information Summarized 会话详细信息:信息汇总

Proceedings of the 2015 ACM Symposium on Document Engineering

Pub Date : 2015-09-08 DOI: 10.1145/3256803

D. Brailsford

引用次数: 0

Session details: Logical Structures 会话详细信息:逻辑结构

Proceedings of the 2015 ACM Symposium on Document Engineering

Pub Date : 2015-09-08 DOI: 10.1145/3256807

E. Munson

引用次数: 0

What Is This Thing Called Linked Data? 什么叫关联数据?

Proceedings of the 2015 ACM Symposium on Document Engineering

Pub Date : 2015-09-08 DOI: 10.1145/2682571.2801035

Manuel Atencia, Jérôme David, P. Genoud

The Linked Data initiative has made it possible for the web to evolve from being a global information space in which only documents are linked to one in which both documents and data are linked: a web of documents and data. This tutorial aims to give an overview of the principles, models and technologies underlying Linked Data.

关联数据计划使网络从一个只有文档链接的全球信息空间发展成为一个文档和数据都链接的空间:一个文档和数据的网络。本教程旨在概述关联数据的原理、模型和技术。

引用次数: 0

Efficient Computation of Co-occurrence Based Word Relatedness 基于共现词相关度的高效计算

Proceedings of the 2015 ACM Symposium on Document Engineering

Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797088

Jie Mei, Xinxin Kou, Zhimin Yao, A. Rau-Chaplin, Aminul Islam, A. Mohammad, E. Milios

Measuring document relatedness using unsupervised co-occurrence based word relatedness methods is a processing-time and memory consuming task. This paper introduces the application of compact data structures for efficient computation of word relatedness based on corpus statistics. The data structure is used to efficiently lookup: (1) the corpus statistics for the Common Word Relatedness Approach, (2) the pairwise word relatedness for the Algorithm Specific Word Relatedness Approach. These two approaches significantly accelerate the processing time of word relatedness methods and reduce the space cost of storing co-occurrence statistics in memory, making text mining tasks like classification and clustering based on word relatedness practical.

使用基于无监督共现的词相关方法测量文档相关性是一项耗费处理时间和内存的任务。本文介绍了基于语料库统计的紧凑数据结构在高效词相关计算中的应用。该数据结构用于高效查找:(1)常用词相关性方法的语料库统计信息，(2)算法特定词相关性方法的成对词相关性。这两种方法显著加快了词相关性方法的处理时间，降低了共现统计数据在内存中的存储空间成本，使得基于词相关性的分类、聚类等文本挖掘任务变得切实可行。

引用次数: 9

BBookX: An Automatic Book Creation Framework BBookX:一个自动图书创建框架

Proceedings of the 2015 ACM Symposium on Document Engineering

Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797094

Chen Liang, Shuting Wang, Zhaohui Wu, Kyle Williams, B. Pursel, Benjamin Bräutigam, Sherwyn Saul, Hannah Williams, Kyle Bowen, C. Lee Giles

As more educational resources become available online, it is possible to acquire more up-to-date knowledge and information. We propose BBookX, a novel computer facilitated system that automatically and collaboratively builds free open online books using publicly available educational resources such as Wikipedia. BBookX has two separate components: one creates an open version of existing books by linking different book chapters to Wikipedia articles, while another with an interactive user interface supports interactive real-time book creation where users are allowed to modify a generated book from explicit feedback.

随着越来越多的教育资源在网上可用，人们有可能获得更多最新的知识和信息。我们提出了BBookX，这是一个新颖的计算机辅助系统，可以自动协作地使用维基百科等公共教育资源构建免费开放的在线图书。BBookX有两个独立的组件:一个组件通过将不同的图书章节链接到Wikipedia文章来创建现有图书的开放版本，而另一个组件具有交互式用户界面，支持交互式实时图书创建，允许用户根据明确的反馈修改生成的图书。

引用次数: 12

Knuth-Plass Revisited: Flexible Line-Breaking for Automatic Document Layout Knuth-Plass重访:自动文档布局的灵活断行

Proceedings of the 2015 ACM Symposium on Document Engineering

Pub Date : 2015-09-08 DOI: 10.1145/2682571.2797091

Tamir Hassan, Andrew Hunter

There is an inherent flexibility in typesetting a block of text. Traditionally, line breaks would be manually chosen at strategic points in such a way as to minimize the amount of whitespace in each line. Hyphenation would only be used as a last resort. Knuth and Plass automated this optimization procedure, which has been used in various typesetting systems and DTP applications ever since. However, an optimal solution for the line-breaking problem does not necessarily lead us to an optimal document layout on the whole. The flexibility of choosing line breaks enables us, in many cases, to adjust the height of a paragraph by changing the number of lines, without having to make adjustments to font size, leading, etc. In many cases, the word spacing remains within the usual tolerances and visual quality does not noticeably suffer. This paper presents a modification to the Knuth-Plass algorithm to return several results for a given column of text, each corresponding to a different height, and describes steps to quantify the amount of expected flexibility in a given paragraph. We conclude with a discussion on how such "sub-optimal" results can lead to a better overall document layout, particularly in the context of mobile layouts, where flexibility is of key importance.

排版文本块有其固有的灵活性。传统上，在策略点手动选择换行符，以使每行中的空白量最小化。连字符只能作为最后的手段使用。Knuth和Plass自动化了这个优化过程，从那时起，它就被用于各种排版系统和DTP应用程序中。然而，对断行问题的最佳解决方案并不一定会使我们在整体上获得最佳的文档布局。在许多情况下，选择换行的灵活性使我们能够通过改变行数来调整段落的高度，而不必调整字体大小、行距等。在许多情况下，字间距保持在通常的公差范围内，并且视觉质量不会明显受到影响。本文提出了对Knuth-Plass算法的修改，以便为给定的文本列返回多个结果，每个结果对应于不同的高度，并描述了量化给定段落中预期灵活性的步骤。最后，我们讨论了这种“次优”的结果是如何导致更好的整体文档布局的，特别是在移动布局的背景下，灵活性是至关重要的。

{"title":"Knuth-Plass Revisited: Flexible Line-Breaking for Automatic Document Layout","authors":"Tamir Hassan, Andrew Hunter","doi":"10.1145/2682571.2797091","DOIUrl":"https://doi.org/10.1145/2682571.2797091","url":null,"abstract":"There is an inherent flexibility in typesetting a block of text. Traditionally, line breaks would be manually chosen at strategic points in such a way as to minimize the amount of whitespace in each line. Hyphenation would only be used as a last resort. Knuth and Plass automated this optimization procedure, which has been used in various typesetting systems and DTP applications ever since. However, an optimal solution for the line-breaking problem does not necessarily lead us to an optimal document layout on the whole. The flexibility of choosing line breaks enables us, in many cases, to adjust the height of a paragraph by changing the number of lines, without having to make adjustments to font size, leading, etc. In many cases, the word spacing remains within the usual tolerances and visual quality does not noticeably suffer. This paper presents a modification to the Knuth-Plass algorithm to return several results for a given column of text, each corresponding to a different height, and describes steps to quantify the amount of expected flexibility in a given paragraph. We conclude with a discussion on how such \"sub-optimal\" results can lead to a better overall document layout, particularly in the context of mobile layouts, where flexibility is of key importance.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124945973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2015 ACM Symposium on Document Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀