首页 > 最新文献

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering最新文献

英文 中文
Typesetting multiple interacting streams 排版多个交互流
Blanca Mancilla, Jarryd P. Beck, J. Plaice
We present a new means for specifying multiple interacting streams, as is needed for documents with multiple systems of notes, side-by-side translations, and critical editions. Each stream is treated as a sequence of components, and anchors are used in the concrete syntax to define reference points used by other streams. When these streams are loaded into memory, the anchors simply become iterators in a container. We present a set of algorithms for the typesetting of multiple streams of text, each with multiple streams of floats and footnotes.
我们提出了一种用于指定多个交互流的新方法,因为它需要具有多个注释系统、并排翻译和关键版本的文档。每个流都被视为一个组件序列,锚在具体语法中用于定义其他流使用的参考点。当这些流被加载到内存中时,锚就变成了容器中的迭代器。我们提出了一套算法,用于多个文本流的排版,每个文本流都有多个浮动和脚注流。
{"title":"Typesetting multiple interacting streams","authors":"Blanca Mancilla, Jarryd P. Beck, J. Plaice","doi":"10.1145/2361354.2361389","DOIUrl":"https://doi.org/10.1145/2361354.2361389","url":null,"abstract":"We present a new means for specifying multiple interacting streams, as is needed for documents with multiple systems of notes, side-by-side translations, and critical editions. Each stream is treated as a sequence of components, and anchors are used in the concrete syntax to define reference points used by other streams. When these streams are loaded into memory, the anchors simply become iterators in a container. We present a set of algorithms for the typesetting of multiple streams of text, each with multiple streams of floats and footnotes.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"1 1","pages":"149-152"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78498158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Document and archive: editing the past 文档和存档:编辑过去
B. Bachimont
Document engineering has a difficult task: to propose tools and methods to manipulate contents and make sense of them. This task is still harder when dealing with archive, insofar as document engineering has not only to provide tools for expressing sense but above all tools and methods to keep contents accessible in their integrity and intelligible according to their meaning. However, these objectives may be contradictory: access implies to transform contents to make them accessible through networks, tools and devices. Intelligibility may imply to adapt contents to the current state of knowledge and capacity of understanding. But, by doing that, can we still speak of authenticity, integrity, or even the identity of documents? Document engineering has provided powerful means to express meaning and to turn an intention into a semiotic expression. Document repurposing has become a usual way for exploiting libraries, archives, etc. By enabling to reuse a specific part of a given content, repurposing techniques allow to entirely renegotiate the meaning of this part by changing its context, its interactivity, in short the way people can consider this piece of content and interpret it. Put in this way, there could be an antinomy between archiving and document engineering. However, transforming document, editing content is an efficient way to keep them alive and compelling for people. Preserving contents does not consist in simply storing them but in actively transforming them to adapt them technically and keep them intelligible. Editing the past is then a new challenge, merging a content deontology with a document technology. This challenge implies to redefine some classical notions as authenticity and highlight the needs for new concepts and methods. Especially in a digital world, documents are permanently reconfigured by technical tools that produce variants, similar contents calling into question the usual definition the identity of documents. Editing the past calls for a new critics of variants.
文档工程有一项困难的任务:提出工具和方法来操作内容并理解它们。在处理存档时,这项任务更加困难,因为文档工程不仅要提供表达意义的工具,而且最重要的是要提供工具和方法来保持内容的完整性和可理解性。然而,这些目标可能是相互矛盾的:访问意味着转换内容,使其可以通过网络、工具和设备访问。可理解性可能意味着使内容适应当前的知识状态和理解能力。但是,通过这样做,我们还能谈论文件的真实性、完整性甚至身份吗?文档工程为表达意义和将意图转化为符号学表达提供了强有力的手段。文件再利用已经成为利用图书馆、档案馆等的一种常用方法。通过重用给定内容的特定部分,重用技术允许通过改变其上下文、交互性(简而言之,改变人们考虑和解释这部分内容的方式)来完全重新协商这部分内容的意义。这样看来,归档和文档工程之间可能存在矛盾。然而,转换文档、编辑内容是一种有效的方式,可以让它们保持活力,吸引人们。保存内容并不是简单地存储它们,而是积极地对它们进行改造,使它们在技术上适应并保持可理解性。编辑过去是一个新的挑战,将内容义务论与文档技术相结合。这一挑战意味着将一些经典概念重新定义为真实性,并强调对新概念和新方法的需求。特别是在数字世界中,文档被技术工具永久地重新配置,这些技术工具会产生变体,类似的内容会对文档的通常定义和身份产生质疑。编辑过去需要对变体进行新的批评。
{"title":"Document and archive: editing the past","authors":"B. Bachimont","doi":"10.1145/2361354.2361356","DOIUrl":"https://doi.org/10.1145/2361354.2361356","url":null,"abstract":"Document engineering has a difficult task: to propose tools and methods to manipulate contents and make sense of them. This task is still harder when dealing with archive, insofar as document engineering has not only to provide tools for expressing sense but above all tools and methods to keep contents accessible in their integrity and intelligible according to their meaning. However, these objectives may be contradictory: access implies to transform contents to make them accessible through networks, tools and devices. Intelligibility may imply to adapt contents to the current state of knowledge and capacity of understanding. But, by doing that, can we still speak of authenticity, integrity, or even the identity of documents? Document engineering has provided powerful means to express meaning and to turn an intention into a semiotic expression. Document repurposing has become a usual way for exploiting libraries, archives, etc. By enabling to reuse a specific part of a given content, repurposing techniques allow to entirely renegotiate the meaning of this part by changing its context, its interactivity, in short the way people can consider this piece of content and interpret it. Put in this way, there could be an antinomy between archiving and document engineering. However, transforming document, editing content is an efficient way to keep them alive and compelling for people. Preserving contents does not consist in simply storing them but in actively transforming them to adapt them technically and keep them intelligible. Editing the past is then a new challenge, merging a content deontology with a document technology. This challenge implies to redefine some classical notions as authenticity and highlight the needs for new concepts and methods. Especially in a digital world, documents are permanently reconfigured by technical tools that produce variants, similar contents calling into question the usual definition the identity of documents. Editing the past calls for a new critics of variants.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"96 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82612062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Full-text search on multi-byte encoded documents 多字节编码文档的全文搜索
R. Wong, Fengming Shi, N. Lam
The Burrows Wheeler transform (BWT) has become popular in text compression, full-text search, XML representation, and DNA sequence matching. It is very efficient to perform a full-text search on BWT encoded text using backward search. This paper aims to study different approaches for applying BWT on multi-byte encoded (e.g. UTF-16) text documents. While previous work has studied BWT on word-based models, and BWT can be applied directly on multi-byte encodings (by treating the document as single-byte coded), there has been no extensive study on how to utilize BWT on multi-byte encoded documents for efficient full-text search. Therefore, in this paper, we propose several ways to efficiently backward search multi-byte text documents. We demonstrate our findings using Chinese text documents. Our experiment results show that our extensions to the standard BWT method offer faster search performance and use less runtime memory.
Burrows Wheeler变换(BWT)在文本压缩、全文搜索、XML表示和DNA序列匹配方面已经非常流行。使用反向搜索对BWT编码的文本进行全文搜索是非常有效的。本文旨在研究将BWT应用于多字节编码(如UTF-16)文本文档的不同方法。虽然以前的工作已经研究了基于词的模型上的BWT,并且BWT可以直接应用于多字节编码(通过将文档视为单字节编码),但如何在多字节编码的文档上利用BWT进行高效的全文搜索还没有广泛的研究。因此,在本文中,我们提出了几种有效的向后搜索多字节文本文档的方法。我们使用中文文本文档来证明我们的发现。实验结果表明,我们对标准BWT方法的扩展提供了更快的搜索性能和更少的运行时内存。
{"title":"Full-text search on multi-byte encoded documents","authors":"R. Wong, Fengming Shi, N. Lam","doi":"10.1145/2361354.2361404","DOIUrl":"https://doi.org/10.1145/2361354.2361404","url":null,"abstract":"The Burrows Wheeler transform (BWT) has become popular in text compression, full-text search, XML representation, and DNA sequence matching. It is very efficient to perform a full-text search on BWT encoded text using backward search. This paper aims to study different approaches for applying BWT on multi-byte encoded (e.g. UTF-16) text documents. While previous work has studied BWT on word-based models, and BWT can be applied directly on multi-byte encodings (by treating the document as single-byte coded), there has been no extensive study on how to utilize BWT on multi-byte encoded documents for efficient full-text search. Therefore, in this paper, we propose several ways to efficiently backward search multi-byte text documents. We demonstrate our findings using Chinese text documents. Our experiment results show that our extensions to the standard BWT method offer faster search performance and use less runtime memory.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"13 1","pages":"227-236"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81032552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An inheritance model for documents in web applications with sydonie 具有sydonie的web应用程序中的文档继承模型
Jean-Marc Lecarpentier, Pierre-Yves Buard, Hervé Le Crosnier, Romain Brixtel
Each web site has to manage documents tailored for its specific needs. When building applications with a specific document model, web developers must make a choice: build from scratch or use existing tools with the need to accomodate the model. We propose an inheritance model for documents, implemented in the Sydonie open source web development framework. It offers a flexible environment to create classes of documents. Sydonie's document model uses entity nodes inspired by the Functional Requirements for Bibliographics Records (FRBR). Document content and metadata are modeled using a set of relations between entity nodes and attribute objects. Classes of documents or attribute types can be defined through a declarative XML file. Our inheritance model provides the possibility to define them at the framework level, application profile level or application level. This demonstration explains the document definition process and inheritance model implemented in the framework and gives several examples of its advantages.
每个网站都必须管理适合其特定需求的文档。当使用特定的文档模型构建应用程序时,web开发人员必须做出选择:从头开始构建还是使用现有的工具来适应模型的需要。我们提出了一个文档继承模型,在Sydonie开源web开发框架中实现。它提供了一个灵活的环境来创建文档类。Sydonie的文档模型使用受书目记录功能需求(Functional Requirements for Bibliographics Records, FRBR)启发的实体节点。使用实体节点和属性对象之间的一组关系对文档内容和元数据进行建模。可以通过声明性XML文件定义文档类或属性类型。我们的继承模型提供了在框架级、应用程序概要级或应用程序级定义它们的可能性。本演示解释了在框架中实现的文档定义过程和继承模型,并给出了其优点的几个示例。
{"title":"An inheritance model for documents in web applications with sydonie","authors":"Jean-Marc Lecarpentier, Pierre-Yves Buard, Hervé Le Crosnier, Romain Brixtel","doi":"10.1145/2361354.2361390","DOIUrl":"https://doi.org/10.1145/2361354.2361390","url":null,"abstract":"Each web site has to manage documents tailored for its specific needs. When building applications with a specific document model, web developers must make a choice: build from scratch or use existing tools with the need to accomodate the model. We propose an inheritance model for documents, implemented in the Sydonie open source web development framework. It offers a flexible environment to create classes of documents. Sydonie's document model uses entity nodes inspired by the Functional Requirements for Bibliographics Records (FRBR). Document content and metadata are modeled using a set of relations between entity nodes and attribute objects. Classes of documents or attribute types can be defined through a declarative XML file. Our inheritance model provides the possibility to define them at the framework level, application profile level or application level. This demonstration explains the document definition process and inheritance model implemented in the framework and gives several examples of its advantages.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"12 1","pages":"153-156"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85114735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Toward automated schema-directed code revision 朝着自动化模式导向代码修订的方向发展
R. Oliveira, P. Genevès, Nabil Layaïda
Updating XQuery programs in accordance with a change of the input XML schema is known to be a time-consuming and error-prone task. We propose an automatic method aimed at helping developers realign the XQuery program with the new schema. First, we introduce a taxonomy of possible problems induced by a schema change. This allows to differentiate problems according to their severity levels, e.g. errors that require code revision, and semantic changes that should be brought to the developer's attention. Second, we provide the necessary algorithms to detect such problems using a solver that checks satisfiability of XPath expressions.
根据输入XML模式的变化来更新XQuery程序是一项耗时且容易出错的任务。我们提出了一种自动方法,旨在帮助开发人员用新模式重新调整XQuery程序。首先,我们介绍由模式更改引起的可能问题的分类。这允许根据问题的严重程度来区分问题,例如,需要修改代码的错误,以及应该引起开发人员注意的语义更改。其次,我们提供必要的算法,使用检查XPath表达式的可满足性的求解器来检测此类问题。
{"title":"Toward automated schema-directed code revision","authors":"R. Oliveira, P. Genevès, Nabil Layaïda","doi":"10.1145/2361354.2361377","DOIUrl":"https://doi.org/10.1145/2361354.2361377","url":null,"abstract":"Updating XQuery programs in accordance with a change of the input XML schema is known to be a time-consuming and error-prone task. We propose an automatic method aimed at helping developers realign the XQuery program with the new schema. First, we introduce a taxonomy of possible problems induced by a schema change. This allows to differentiate problems according to their severity levels, e.g. errors that require code revision, and semantic changes that should be brought to the developer's attention. Second, we provide the necessary algorithms to detect such problems using a solver that checks satisfiability of XPath expressions.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"51 1","pages":"103-106"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86460633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Timesheets.js: when SMIL meets HTML5 and CSS3 timesheet .js:当SMIL满足HTML5和CSS3时
Fabien Cazenave, V. Quint, C. Roisin
In this paper, we explore different ways to publish multimedia documents on the web. We propose a solution that takes advantage of the new multimedia features of web standards, namely HTML5 and CSS3. While JavaScript is fine for handling timing, synchronization and user interaction in specific multimedia pages, we advocate a more generic, document-oriented alternative relying primarily on declarative standards: HTML5 and CSS3 complemented by SMIL Timesheets. This approach is made possible by a Timesheets scheduler that runs in the browser. Various applications based on this solution illustrate the paper, ranging from media annotations to web documentaries.
在本文中,我们探讨了在网络上发布多媒体文档的不同方法。我们提出了一个利用web标准的新多媒体特性的解决方案,即HTML5和CSS3。虽然JavaScript可以很好地处理特定多媒体页面中的定时、同步和用户交互,但我们提倡一种更通用的、面向文档的替代方案,主要依赖于声明性标准:HTML5和CSS3,并辅之以SMIL时间表。这种方法是通过在浏览器中运行的时间表调度程序实现的。基于该解决方案的各种应用说明了本文的内容,从媒体注释到网络纪录片。
{"title":"Timesheets.js: when SMIL meets HTML5 and CSS3","authors":"Fabien Cazenave, V. Quint, C. Roisin","doi":"10.1145/2034691.2034700","DOIUrl":"https://doi.org/10.1145/2034691.2034700","url":null,"abstract":"In this paper, we explore different ways to publish multimedia documents on the web. We propose a solution that takes advantage of the new multimedia features of web standards, namely HTML5 and CSS3. While JavaScript is fine for handling timing, synchronization and user interaction in specific multimedia pages, we advocate a more generic, document-oriented alternative relying primarily on declarative standards: HTML5 and CSS3 complemented by SMIL Timesheets. This approach is made possible by a Timesheets scheduler that runs in the browser. Various applications based on this solution illustrate the paper, ranging from media annotations to web documentaries.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"104 1","pages":"43-52"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82513405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Building table formatting tools 构建表格格式工具
Mihai Bilauca, P. Healy
In this paper we present an overview of the challenges to overcome when developing table authoring tools, including a review of logical table models, typographical issues and automated table layout optimization. We present a Table Drawing Tool prototype which implements an automated solution for the table layout optimization problem for tables with spanning cells using a mathematical modelling method. We report on the performance improvements of this new optimization method compared to previous solutions
在本文中,我们概述了在开发表创作工具时需要克服的挑战,包括对逻辑表模型、排版问题和自动表布局优化的回顾。我们提出了一个表格绘制工具原型,它使用数学建模方法实现了表格布局优化问题的自动化解决方案。我们报告了与以前的解决方案相比,这种新的优化方法的性能改进
{"title":"Building table formatting tools","authors":"Mihai Bilauca, P. Healy","doi":"10.1145/2034691.2034696","DOIUrl":"https://doi.org/10.1145/2034691.2034696","url":null,"abstract":"In this paper we present an overview of the challenges to overcome when developing table authoring tools, including a review of logical table models, typographical issues and automated table layout optimization. We present a Table Drawing Tool prototype which implements an automated solution for the table layout optimization problem for tables with spanning cells using a mathematical modelling method. We report on the performance improvements of this new optimization method compared to previous solutions","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"31 1","pages":"13-22"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81445625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Detecting and resolving conflicts between adaptation aspects in multi-staged XML transformations 检测和解决多阶段XML转换中适配方面之间的冲突
Sven Karol, Matthias Niederhausen, D. Kadner, U. Assmann, Klaus Meißner
Separation of Concerns (SoC) is a common principle to reduce the complexity of large software and hypermedia systems. Amongst a variety of approaches, adaptation aspects are a well-known solution to significantly improve SoC in adaptive hypermedia applications. To model adaptation aspects in XML-based hypermedia applications, we developed PX-Weave, a tool which allows to specify and weave such aspects in multi-staged XML transformation environments. However, while aspects increase modularity and thus decrease complexity of software, they do also introduce some complex problems. The most prominent one, aspect interaction, has received a lot of attention from researchers during the last decade. In this paper we investigate the problem of aspect interaction for adaptation aspects. We present a combined approach for static and dynamic detection of aspect interactions in multi-staged XML-based hypermedia applications, which we implemented as an add-on to PX-Weave.
关注点分离(SoC)是降低大型软件和超媒体系统复杂性的常用原则。在各种方法中,自适应方面是一种众所周知的解决方案,可以显著改善自适应超媒体应用中的SoC。为了对基于XML的超媒体应用程序中的适应性方面进行建模,我们开发了PX-Weave,这是一个允许在多阶段XML转换环境中指定和编织这些方面的工具。然而,虽然方面增加了模块化,从而降低了软件的复杂性,但它们也引入了一些复杂的问题。其中最突出的是方面交互作用,近十年来受到了研究者的广泛关注。本文研究了适应方面的方面相互作用问题。我们提出了一种在基于xml的多阶段超媒体应用程序中静态和动态检测方面交互的组合方法,我们将其作为附加组件实现到PX-Weave。
{"title":"Detecting and resolving conflicts between adaptation aspects in multi-staged XML transformations","authors":"Sven Karol, Matthias Niederhausen, D. Kadner, U. Assmann, Klaus Meißner","doi":"10.1145/2034691.2034738","DOIUrl":"https://doi.org/10.1145/2034691.2034738","url":null,"abstract":"Separation of Concerns (SoC) is a common principle to reduce the complexity of large software and hypermedia systems. Amongst a variety of approaches, adaptation aspects are a well-known solution to significantly improve SoC in adaptive hypermedia applications. To model adaptation aspects in XML-based hypermedia applications, we developed PX-Weave, a tool which allows to specify and weave such aspects in multi-staged XML transformation environments. However, while aspects increase modularity and thus decrease complexity of software, they do also introduce some complex problems. The most prominent one, aspect interaction, has received a lot of attention from researchers during the last decade. In this paper we investigate the problem of aspect interaction for adaptation aspects. We present a combined approach for static and dynamic detection of aspect interactions in multi-staged XML-based hypermedia applications, which we implemented as an add-on to PX-Weave.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"72 1","pages":"229-238"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76547116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Print-friendly page extraction for web printing service 打印友好的网页提取网页打印服务
Sam Liu, Conglun Yao
Printing Web pages from browsers usually results in unsatisfactory printouts because the pages are typically ill formatted and contain non-informative content such as navigation menu and ads. Thus, print-worthy Web pages such as articles generally contain hyperlinks (or links) that lead to print-friendly pages containing the salient content. For a more desirable Web printing experience, the main Web content should be extracted to produce well formatted pages. This paper describes a cloud service based on automatic content extraction and repurposing from print-friendly pages for Web printing. Content extraction from print-friendly pages is simpler and more reliable than from the original pages, but there are many variations of the print-link representations in HTML that make robust print-link detection more difficult than it first appears. First, the link can be text-based, image-based, or both. For example, there is a lexicon of phrases used to indicate print-friendly pages, such as "print", "print article", "print-friendly version", etc. In addition, some links use printer-resembling image icons with or without a print phrase present. To complicate matter further, not all of the links contain a valid URL, but instead the pages are dynamically generated either by the client Javascript or by the server, so that no URL is present. Experimental results suggest that our solution is capable of achieving over 99% precision and 97% recall performance measures for print-friendly link extraction.
从浏览器打印网页通常会导致不满意的打印输出,因为页面通常格式不佳,并且包含非信息内容,如导航菜单和广告。因此,值得打印的网页(如文章)通常包含超链接(或链接),这些链接会导致包含重要内容的打印友好页面。为了获得更理想的Web打印体验,应该提取主要Web内容以生成格式良好的页面。本文描述了一种基于自动内容提取和重新利用打印友好页面的云服务,用于Web打印。从打印友好的页面中提取内容比从原始页面中提取内容更简单、更可靠,但是HTML中打印链接表示的许多变体使得健壮的打印链接检测比最初看起来更加困难。首先,链接可以是基于文本的、基于图像的,或者两者兼而有之。例如,有一个用于表示打印友好页面的短语词典,如“打印”、“打印文章”、“打印友好版本”等。此外,一些链接使用类似打印机的图像图标,有或没有打印短语。更复杂的是,并非所有链接都包含有效的URL,而是由客户端Javascript或服务器动态生成页面,因此没有URL。实验结果表明,我们的解决方案能够实现超过99%的精度和97%的召回率的打印友好链接提取性能指标。
{"title":"Print-friendly page extraction for web printing service","authors":"Sam Liu, Conglun Yao","doi":"10.1145/2034691.2034711","DOIUrl":"https://doi.org/10.1145/2034691.2034711","url":null,"abstract":"Printing Web pages from browsers usually results in unsatisfactory printouts because the pages are typically ill formatted and contain non-informative content such as navigation menu and ads. Thus, print-worthy Web pages such as articles generally contain hyperlinks (or links) that lead to print-friendly pages containing the salient content. For a more desirable Web printing experience, the main Web content should be extracted to produce well formatted pages. This paper describes a cloud service based on automatic content extraction and repurposing from print-friendly pages for Web printing. Content extraction from print-friendly pages is simpler and more reliable than from the original pages, but there are many variations of the print-link representations in HTML that make robust print-link detection more difficult than it first appears. First, the link can be text-based, image-based, or both. For example, there is a lexicon of phrases used to indicate print-friendly pages, such as \"print\", \"print article\", \"print-friendly version\", etc. In addition, some links use printer-resembling image icons with or without a print phrase present. To complicate matter further, not all of the links contain a valid URL, but instead the pages are dynamically generated either by the client Javascript or by the server, so that no URL is present. Experimental results suggest that our solution is capable of achieving over 99% precision and 97% recall performance measures for print-friendly link extraction.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"1 1","pages":"89-92"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88656354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reflowable documents composed from pre-rendered atomic components 由预呈现的原子组件组成的可回流文档
Alexander J. Pinkney, S. Bagley, D. Brailsford
Mobile eBook readers are now commonplace in today's society, but their document layout algorithms remain basic, largely due to constraints imposed by short battery life. At present, with any eBook file format not based on PDF, the layout of the document, as it appears to the end user, is at the mercy of hidden reformatting and reflow algorithms interacting with the screen parameters of the device on which the document is rendered. Very little control is provided to the publisher or author, beyond some basic formatting options. This paper describes a method of producing well-typeset, scalable, document layouts by embedding several pre-rendered versions of a document within one file, thus enabling many computationally expensive steps (e.g. hyphenation and line-breaking) to be carried out at document compilation time, rather than at 'view time'. This system has the advantage that end users are not constrained to a single, arbitrarily chosen view of the document, nor are they subjected to reading a poorly typeset version rendered on the fly. Instead, the device can choose a layout appropriate to its screen size and the end user's choice of zoom level, and the author and publisher can have fine-grained control over all layouts.
移动电子书阅读器在当今社会已经很普遍了,但它们的文档布局算法仍然很基础,很大程度上是由于电池寿命短的限制。目前,对于任何不基于PDF的电子书文件格式,文档的布局,正如最终用户所看到的那样,是由隐藏的重新格式化和回流算法与呈现文档的设备的屏幕参数交互所支配的。除了一些基本的格式化选项外,对发布者或作者提供的控制很少。本文介绍了一种通过在一个文件中嵌入几个预渲染版本的文档来生成排版良好、可扩展的文档布局的方法,从而使许多计算上昂贵的步骤(例如,连字符和断行)可以在文档编译时执行,而不是在“查看时”执行。该系统的优点是,最终用户不受限于单一的、任意选择的文档视图,也不需要阅读动态呈现的排版不佳的版本。相反,设备可以选择适合其屏幕大小和最终用户选择的缩放级别的布局,作者和发布者可以对所有布局进行细粒度控制。
{"title":"Reflowable documents composed from pre-rendered atomic components","authors":"Alexander J. Pinkney, S. Bagley, D. Brailsford","doi":"10.1145/2034691.2034726","DOIUrl":"https://doi.org/10.1145/2034691.2034726","url":null,"abstract":"Mobile eBook readers are now commonplace in today's society, but their document layout algorithms remain basic, largely due to constraints imposed by short battery life. At present, with any eBook file format not based on PDF, the layout of the document, as it appears to the end user, is at the mercy of hidden reformatting and reflow algorithms interacting with the screen parameters of the device on which the document is rendered. Very little control is provided to the publisher or author, beyond some basic formatting options.\u0000 This paper describes a method of producing well-typeset, scalable, document layouts by embedding several pre-rendered versions of a document within one file, thus enabling many computationally expensive steps (e.g. hyphenation and line-breaking) to be carried out at document compilation time, rather than at 'view time'. This system has the advantage that end users are not constrained to a single, arbitrarily chosen view of the document, nor are they subjected to reading a poorly typeset version rendered on the fly. Instead, the device can choose a layout appropriate to its screen size and the end user's choice of zoom level, and the author and publisher can have fine-grained control over all layouts.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"14 1","pages":"163-166"},"PeriodicalIF":0.0,"publicationDate":"2011-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79548431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1