首页 > 最新文献

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering最新文献

英文 中文
The Notarial Archives, Valletta: Starting from Zero 瓦莱塔公证档案:从零开始
T. Lupi
The main objective of this paper is to talk about my work as a book and paper conservator in the light of the current rehabilitation project at the Notarial Archives in St Christopher Street, Valletta. With its six centuries of manuscript material spread over two kilometres of shelving, the state of preservation of the archives has presented numerous challenges over the last years. The EU funds granted in recent months are a crucial investment that will ensure the safeguarding of the collection, but putting one's house in order is not just about money. A number of other considerations such as careful planning, multidisciplinary collaboration, clever marketing, accessibility, team-building and creating a clear vision for the future have been some of the central factors that continue to contribute to the success of this project. A discussion on the general preservation and conservation strategies that are being undertaken for the project will also be given.
本文的主要目的是根据瓦莱塔圣克里斯托弗街公证档案馆目前的修复项目,谈谈我作为一名书籍和纸张管理员的工作。随着六个世纪的手稿材料散布在两公里长的架子上,档案的保存状况在过去几年里提出了许多挑战。欧盟最近几个月拨给的资金是一项至关重要的投资,将确保这些藏品得到保护,但让自己的房子井然有序不仅仅是钱的问题。许多其他考虑因素,如仔细的规划、多学科合作、聪明的营销、可访问性、团队建设和为未来创造清晰的愿景,都是继续促进这个项目成功的一些核心因素。亦会讨论为该项目所推行的一般保存及保育策略。
{"title":"The Notarial Archives, Valletta: Starting from Zero","authors":"T. Lupi","doi":"10.1145/3103010.3103025","DOIUrl":"https://doi.org/10.1145/3103010.3103025","url":null,"abstract":"The main objective of this paper is to talk about my work as a book and paper conservator in the light of the current rehabilitation project at the Notarial Archives in St Christopher Street, Valletta. With its six centuries of manuscript material spread over two kilometres of shelving, the state of preservation of the archives has presented numerous challenges over the last years. The EU funds granted in recent months are a crucial investment that will ensure the safeguarding of the collection, but putting one's house in order is not just about money. A number of other considerations such as careful planning, multidisciplinary collaboration, clever marketing, accessibility, team-building and creating a clear vision for the future have been some of the central factors that continue to contribute to the success of this project. A discussion on the general preservation and conservation strategies that are being undertaken for the project will also be given.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"64 1","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74480023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ruling analysis and classification of torn documents 撕裂文件的判读分析与分类
Markus Diem, Florian Kleber, Robert Sablatnig
A ruling classification is presented in this paper. In contrast to state-of-the-art methods which focus on ruling line removal, ruling lines are analyzed for document clustering in the context of document snippet reassembling. First, a background patch is extracted from a snippet at a position which minimizes the inscribed content. A novel Fourier feature is then computed on the image patch. The classification into void, lined and checked is carried out using Support Vector Machines. Finally, an accurate line localization is performed by means of projection profiles and robust line fitting. The ruling classification achieves an F-score of 0.987 evaluated on a dataset comprising real world document snippets. In addition the line removal was evaluated on a synthetically generated dataset where an F-score of 0.931 is achieved. This dataset is made publicly available so as to allow for benchmarking.
本文提出了一种规则分类。相对于目前最先进的方法,其重点是裁断线去除,裁断线分析在文档片段重组背景下的文档聚类。首先,从一个片段中提取一个背景补丁,该位置使嵌入的内容最小化。然后在图像patch上计算一个新的傅里叶特征。利用支持向量机将其分类为空、衬和校验。最后,利用投影轮廓和鲁棒线拟合的方法进行精确的线定位。该规则分类在包含真实世界文档片段的数据集上获得了0.987的f分。此外,在合成生成的数据集上评估线去除,f得分为0.931。这个数据集是公开的,以便进行基准测试。
{"title":"Ruling analysis and classification of torn documents","authors":"Markus Diem, Florian Kleber, Robert Sablatnig","doi":"10.1145/2644866.2644876","DOIUrl":"https://doi.org/10.1145/2644866.2644876","url":null,"abstract":"A ruling classification is presented in this paper. In contrast to state-of-the-art methods which focus on ruling line removal, ruling lines are analyzed for document clustering in the context of document snippet reassembling. First, a background patch is extracted from a snippet at a position which minimizes the inscribed content. A novel Fourier feature is then computed on the image patch. The classification into void, lined and checked is carried out using Support Vector Machines. Finally, an accurate line localization is performed by means of projection profiles and robust line fitting. The ruling classification achieves an F-score of 0.987 evaluated on a dataset comprising real world document snippets. In addition the line removal was evaluated on a synthetically generated dataset where an F-score of 0.931 is achieved. This dataset is made publicly available so as to allow for benchmarking.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"8 1","pages":"63-72"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89652172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classifying and ranking search engine results as potential sources of plagiarism 分类和排名搜索引擎结果作为潜在的剽窃来源
Kyle Williams, Hung-Hsuan Chen, C. Lee Giles
Source retrieval for plagiarism detection involves using a search engine to retrieve candidate sources of plagiarism for a given suspicious document so that more accurate comparisons can be made. An important consideration is that only documents that are likely to be sources of plagiarism should be retrieved so as to minimize the number of unnecessary comparisons made. A supervised strategy for source retrieval is described whereby search results are classified and ranked as potential sources of plagiarism without retrieving the search result documents and using only the information available at search time. The performance of the supervised method is compared to a baseline method and shown to improve precision by up to 3.28%, recall by up to 2.6% and the F1 score by up to 3.37%. Furthermore, features are analyzed to determine which of them are most important for search result classification with features based on document and search result similarity appearing to be the most important.
剽窃检测的来源检索包括使用搜索引擎检索给定可疑文档的候选剽窃来源,以便进行更准确的比较。一个重要的考虑是,只有可能是抄袭来源的文件才应该被检索,以尽量减少不必要的比较次数。本文描述了一种有监督的来源检索策略,在不检索搜索结果文档和仅使用搜索时可用的信息的情况下,将搜索结果分类并排名为潜在的剽窃来源。将监督方法的性能与基线方法进行了比较,结果显示准确率提高了3.28%,召回率提高了2.6%,F1分数提高了3.37%。此外,对特征进行分析,以确定哪些特征对搜索结果分类最重要,其中基于文档的特征和搜索结果相似度似乎是最重要的。
{"title":"Classifying and ranking search engine results as potential sources of plagiarism","authors":"Kyle Williams, Hung-Hsuan Chen, C. Lee Giles","doi":"10.1145/2644866.2644879","DOIUrl":"https://doi.org/10.1145/2644866.2644879","url":null,"abstract":"Source retrieval for plagiarism detection involves using a search engine to retrieve candidate sources of plagiarism for a given suspicious document so that more accurate comparisons can be made. An important consideration is that only documents that are likely to be sources of plagiarism should be retrieved so as to minimize the number of unnecessary comparisons made. A supervised strategy for source retrieval is described whereby search results are classified and ranked as potential sources of plagiarism without retrieving the search result documents and using only the information available at search time. The performance of the supervised method is compared to a baseline method and shown to improve precision by up to 3.28%, recall by up to 2.6% and the F1 score by up to 3.37%. Furthermore, features are analyzed to determine which of them are most important for search result classification with features based on document and search result similarity appearing to be the most important.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"2 1","pages":"97-106"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73105632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
SimSeerX: a similar document search engine SimSeerX:一个类似的文档搜索引擎
Kyle Williams, Jian Wu, C. Lee Giles
The need to find similar documents occurs in many settings, such as in plagiarism detection or research paper recommendation. Manually constructing queries to find similar documents may be overly complex, thus motivating the use of whole documents as queries. This paper introduces SimSeerX, a search engine for similar document retrieval that receives whole documents as queries and returns a ranked list of similar documents. Key to the design of SimSeerX is that is able to work with multiple similarity functions and document collections. We present the architecture and interface of SimSeerX, show its applicability with 3 different similarity functions and demonstrate its scalability on a collection of 3.5 million academic documents.
在许多情况下都需要查找类似的文档,例如在抄袭检测或研究论文推荐中。手动构造查找类似文档的查询可能过于复杂,因此会促使使用整个文档作为查询。本文介绍了SimSeerX,一个用于相似文档检索的搜索引擎,它接收整个文档作为查询,并返回相似文档的排序列表。SimSeerX设计的关键在于它能够处理多个相似函数和文档集合。介绍了SimSeerX的体系结构和接口,用3种不同的相似度函数展示了SimSeerX的适用性,并在350万篇学术文献上展示了SimSeerX的可扩展性。
{"title":"SimSeerX: a similar document search engine","authors":"Kyle Williams, Jian Wu, C. Lee Giles","doi":"10.1145/2644866.2644895","DOIUrl":"https://doi.org/10.1145/2644866.2644895","url":null,"abstract":"The need to find similar documents occurs in many settings, such as in plagiarism detection or research paper recommendation. Manually constructing queries to find similar documents may be overly complex, thus motivating the use of whole documents as queries. This paper introduces SimSeerX, a search engine for similar document retrieval that receives whole documents as queries and returns a ranked list of similar documents. Key to the design of SimSeerX is that is able to work with multiple similarity functions and document collections. We present the architecture and interface of SimSeerX, show its applicability with 3 different similarity functions and demonstrate its scalability on a collection of 3.5 million academic documents.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"96 1","pages":"143-146"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82794105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
JAR tool: using document analysis for improving the throughput of high performance printing environments JAR工具:利用文档分析提高吞吐量的高性能打印环境
M. Kolberg, L. G. Fernandes, Mateus Raeder, Carolina Fonseca
Digital printers have consistently improved their speed in the past years. Meanwhile, the need for document personalization and customization has increased. As a consequence of these two facts, the traditional rasterization process has become a highly demanding computational step in the printing workflow. Moreover, Print Service Providers are now using multiple RIP engines to speed up the whole document rasterization process, and depending on the input document characteristics the rasterization process may not achieve the print-engine speed creating a unwanted bottleneck. In this scenario, we developed a tool called Job Adaptive Router (JAR) aiming at improving the throughput of the rasterization process through a clever load balance among RIP engines which is based on information obtained by the analysis of input documents content. Furthermore, along with this tool we propose some strategies that consider relevant characteristics of documents, such as transparency and reusability of images, to split the job in a more intelligent way. The obtained results confirm that the use of the proposed tool improved the rasterization process performance.
数字打印机在过去几年里一直在提高速度。同时,对文档个性化和定制的需求也在增加。由于这两个事实,传统的光栅化过程已经成为印刷工作流程中一个要求很高的计算步骤。此外,打印服务提供商现在正在使用多个RIP引擎来加速整个文档光栅化过程,根据输入文档的特性,光栅化过程可能无法达到打印引擎的速度,从而产生不必要的瓶颈。在这种情况下,我们开发了一个名为Job Adaptive Router (JAR)的工具,旨在通过基于输入文档内容分析获得的信息在RIP引擎之间进行智能负载平衡来提高栅格化过程的吞吐量。此外,与此工具一起,我们提出了一些考虑文档相关特征的策略,例如图像的透明度和可重用性,以更智能的方式划分任务。得到的结果证实,使用该工具提高了光栅化过程的性能。
{"title":"JAR tool: using document analysis for improving the throughput of high performance printing environments","authors":"M. Kolberg, L. G. Fernandes, Mateus Raeder, Carolina Fonseca","doi":"10.1145/2644866.2644887","DOIUrl":"https://doi.org/10.1145/2644866.2644887","url":null,"abstract":"Digital printers have consistently improved their speed in the past years. Meanwhile, the need for document personalization and customization has increased. As a consequence of these two facts, the traditional rasterization process has become a highly demanding computational step in the printing workflow. Moreover, Print Service Providers are now using multiple RIP engines to speed up the whole document rasterization process, and depending on the input document characteristics the rasterization process may not achieve the print-engine speed creating a unwanted bottleneck. In this scenario, we developed a tool called Job Adaptive Router (JAR) aiming at improving the throughput of the rasterization process through a clever load balance among RIP engines which is based on information obtained by the analysis of input documents content. Furthermore, along with this tool we propose some strategies that consider relevant characteristics of documents, such as transparency and reusability of images, to split the job in a more intelligent way. The obtained results confirm that the use of the proposed tool improved the rasterization process performance.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"25 1","pages":"175-178"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77664085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What academics want when reading digitally 学者们想要的数字阅读
Juliane Franze, K. Marriott, Michael Wybrow
Researchers constantly read and annotate academic documents. While almost all documents are provided digitally, many are still printed and read on paper. We surveyed 162 academics in order to better understand their reading habits and preferences. We were particularly interested in understanding the barriers to digital reading and the features desired by academics for digital reading applications.
研究人员不断地阅读和注释学术文献。虽然几乎所有的文件都是数字提供的,但许多文件仍然是打印和阅读在纸上。为了更好地了解他们的阅读习惯和偏好,我们调查了162名学者。我们特别感兴趣的是了解数字阅读的障碍和学者对数字阅读应用程序所期望的功能。
{"title":"What academics want when reading digitally","authors":"Juliane Franze, K. Marriott, Michael Wybrow","doi":"10.1145/2644866.2644894","DOIUrl":"https://doi.org/10.1145/2644866.2644894","url":null,"abstract":"Researchers constantly read and annotate academic documents. While almost all documents are provided digitally, many are still printed and read on paper. We surveyed 162 academics in order to better understand their reading habits and preferences. We were particularly interested in understanding the barriers to digital reading and the features desired by academics for digital reading applications.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"9 1","pages":"199-202"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79912551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Extracting web content for personalized presentation 为个性化表示提取web内容
Rodrigo Chamun, Daniele Pinheiro, Diego Jornada, J. B. Oliveira, I. Manssour
Printing web pages is usually a thankless task as the result is often a document with many badly-used pages and poor layout. Besides the actual content, superfluous web elements like menus and links are often present and in a printed version they are commonly perceived as an annoyance. Therefore, a solution for obtaining cleaner versions for printing is to detect parts of the page that the reader wants to consume, eliminating unnecessary elements and filtering the "true" content of the web page. In addition, the same solution may be used online to present cleaner versions of web pages, discarding any elements that the user wishes to avoid. In this paper we present a novel approach to implement such filtering. The method is interactive at first: The user samples items that are to be preserved on the page and thereafter everything that is not similar to the samples is removed from the page. This is achieved by comparing the path of all elements on the DOM representation of the page with the path of the elements sampled by the user and preserving only elements that have a path "similar" to the sample. The introduction of a similarity measure adds an important degree of adaptability to the needs of different users and applications. This approach is quite general and may be applied to any XML tree that has labeled nodes. We use HTML as a case study and present a Google Chrome extension that implements the approach as well as a user study comparing our results with commercial results.
打印网页通常是一件吃力不讨好的事情,因为打印出来的文件往往有很多没用的页面和糟糕的布局。除了实际的内容外,菜单和链接等多余的网络元素也经常出现,在印刷版本中,它们通常被认为是一种烦恼。因此,获得更清晰的打印版本的解决方案是检测读者想要消费的页面部分,消除不必要的元素并过滤网页的“真实”内容。此外,同样的解决方案也可以用于在线呈现更干净的网页版本,丢弃用户希望避免的任何元素。本文提出了一种实现这种滤波的新方法。该方法最初是交互式的:用户将在页面上保存样例项目,然后从页面中删除与样例不相似的所有内容。这是通过将页面的DOM表示上的所有元素的路径与用户采样的元素的路径进行比较,并只保留路径与样本“相似”的元素来实现的。相似性度量的引入增加了对不同用户和应用程序需求的重要程度的适应性。这种方法非常通用,可以应用于任何带有标记节点的XML树。我们使用HTML作为案例研究,并提供了一个实现该方法的Google Chrome扩展,以及将我们的结果与商业结果进行比较的用户研究。
{"title":"Extracting web content for personalized presentation","authors":"Rodrigo Chamun, Daniele Pinheiro, Diego Jornada, J. B. Oliveira, I. Manssour","doi":"10.1145/2644866.2644871","DOIUrl":"https://doi.org/10.1145/2644866.2644871","url":null,"abstract":"Printing web pages is usually a thankless task as the result is often a document with many badly-used pages and poor layout. Besides the actual content, superfluous web elements like menus and links are often present and in a printed version they are commonly perceived as an annoyance. Therefore, a solution for obtaining cleaner versions for printing is to detect parts of the page that the reader wants to consume, eliminating unnecessary elements and filtering the \"true\" content of the web page. In addition, the same solution may be used online to present cleaner versions of web pages, discarding any elements that the user wishes to avoid.\u0000 In this paper we present a novel approach to implement such filtering. The method is interactive at first: The user samples items that are to be preserved on the page and thereafter everything that is not similar to the samples is removed from the page. This is achieved by comparing the path of all elements on the DOM representation of the page with the path of the elements sampled by the user and preserving only elements that have a path \"similar\" to the sample. The introduction of a similarity measure adds an important degree of adaptability to the needs of different users and applications.\u0000 This approach is quite general and may be applied to any XML tree that has labeled nodes. We use HTML as a case study and present a Google Chrome extension that implements the approach as well as a user study comparing our results with commercial results.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"72 1","pages":"157-164"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86225398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A platform for language independent summarization 一个独立于语言的摘要平台
L. Cabral, R. Lins, R. Mello, F. Freitas, B. T. Ávila, S. Simske, M. Riss
The text data available on the Internet is not only huge in volume, but also in diversity of subject, quality and idiom. Such factors make it infeasible to efficiently scavenge useful information from it. Automatic text summarization is a possible solution for efficiently addressing such a problem, because it aims to sieve the relevant information in documents by creating shorter versions of the text. However, most of the techniques and tools available for automatic text summarization are designed only for the English language, which is a severe restriction. There are multilingual platforms that support, at most, 2 languages. This paper proposes a language independent summarization platform that provides corpus acquisition, language classification, translation and text summarization for 25 different languages.
互联网上的文本数据不仅数量庞大,而且题材多样、质量多样、成语多样。这些因素使得有效地从中清除有用信息变得不可行。自动文本摘要是有效解决此类问题的一种可能的解决方案,因为它旨在通过创建文本的较短版本来筛选文档中的相关信息。然而,大多数可用于自动文本摘要的技术和工具仅针对英语设计,这是一个严重的限制。有些多语言平台最多支持两种语言。本文提出了一个独立于语言的摘要平台,提供25种不同语言的语料库获取、语言分类、翻译和文本摘要。
{"title":"A platform for language independent summarization","authors":"L. Cabral, R. Lins, R. Mello, F. Freitas, B. T. Ávila, S. Simske, M. Riss","doi":"10.1145/2644866.2644890","DOIUrl":"https://doi.org/10.1145/2644866.2644890","url":null,"abstract":"The text data available on the Internet is not only huge in volume, but also in diversity of subject, quality and idiom. Such factors make it infeasible to efficiently scavenge useful information from it. Automatic text summarization is a possible solution for efficiently addressing such a problem, because it aims to sieve the relevant information in documents by creating shorter versions of the text. However, most of the techniques and tools available for automatic text summarization are designed only for the English language, which is a severe restriction. There are multilingual platforms that support, at most, 2 languages. This paper proposes a language independent summarization platform that provides corpus acquisition, language classification, translation and text summarization for 25 different languages.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"43 1","pages":"203-206"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73953974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Automated refactoring for size reduction of CSS style sheets 自动重构CSS样式表的大小
Martí Bosch, P. Genevès, Nabil Layaïda
Cascading Style Sheets (CSS) is a standard language for stylizing and formatting web documents. Its role in web user experience becomes increasingly important. However, CSS files tend to be designed from a result-driven point of view, without much attention devoted to the CSS file structure as long as it produces the desired results. Furthermore, the rendering intended in the browser is often checked and debugged with a document instance. Style sheets normally apply to a set of documents, therefore modifications added while focusing on a particular instance might affect other documents of the set. We present a first prototype of static CSS semantical analyzer and optimizer that is capable of automatically detecting and removing redundant property declarations and rules. We build on earlier work on tree logics to locate redundancies due to the semantics of selectors and properties. Existing purely syntactic CSS optimizers might be used in conjunction with our tool, for performing complementary (and orthogonal) size reduction, toward the common goal of providing smaller and cleaner CSS files.
层叠样式表(CSS)是一种用于样式化和格式化web文档的标准语言。它在网络用户体验中的作用变得越来越重要。然而,CSS文件往往是从结果驱动的角度来设计的,只要它能产生期望的结果,就不会过多地关注CSS文件结构。此外,通常使用文档实例检查和调试浏览器中的呈现。样式表通常应用于一组文档,因此在关注某个特定实例时添加的修改可能会影响该集合的其他文档。我们提出了静态CSS语义分析器和优化器的第一个原型,它能够自动检测和删除冗余的属性声明和规则。我们建立在之前关于树逻辑的工作的基础上,以定位由于选择器和属性的语义而产生的冗余。现有的纯语法CSS优化器可以与我们的工具结合使用,以执行互补(和正交)大小缩减,以实现提供更小、更干净的CSS文件的共同目标。
{"title":"Automated refactoring for size reduction of CSS style sheets","authors":"Martí Bosch, P. Genevès, Nabil Layaïda","doi":"10.1145/2644866.2644885","DOIUrl":"https://doi.org/10.1145/2644866.2644885","url":null,"abstract":"Cascading Style Sheets (CSS) is a standard language for stylizing and formatting web documents. Its role in web user experience becomes increasingly important. However, CSS files tend to be designed from a result-driven point of view, without much attention devoted to the CSS file structure as long as it produces the desired results. Furthermore, the rendering intended in the browser is often checked and debugged with a document instance. Style sheets normally apply to a set of documents, therefore modifications added while focusing on a particular instance might affect other documents of the set.\u0000 We present a first prototype of static CSS semantical analyzer and optimizer that is capable of automatically detecting and removing redundant property declarations and rules. We build on earlier work on tree logics to locate redundancies due to the semantics of selectors and properties. Existing purely syntactic CSS optimizers might be used in conjunction with our tool, for performing complementary (and orthogonal) size reduction, toward the common goal of providing smaller and cleaner CSS files.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"10 1","pages":"13-16"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87175522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Generating summary documents for a variable-quality PDF document collection 为可变质量的PDF文档集合生成摘要文档
Jacob Hughes, D. Brailsford, S. Bagley, C. Adams
The Cochrane Schizophrenia Group's Register of studies details all aspects of the effects of treating people with schizophrenia. It has been gathered over the last 20 years and consists of around 20,000 documents, overwhelmingly in PDF. Document collections of this sort -- on a given theme but gathered from a wide range of sources -- will generally have huge variability in the quality of the PDF, particularly with respect to the key property of text searchability. Summarising the results from the best of these papers, to allow evidence-based health care decision making, has so far been done by manually creating a summary document, starting from a visual inspection of the relevant PDF file. This labour-intensive process has resulted, to date, in only 4,000 of the papers being summarised -- with enormous duplication of effort and with many issues around the validity and reliability of the data extraction. This paper describes a pilot project to provide a computer-assisted framework in which any of the PDF documents could be searched for the occurrence of some 8,000 keywords and key phrases. Once keyword tagging has been completed the framework assists in the generation of a standard summary document, thereby greatly speeding up the production of these summaries. Early examples of the framework are described and its capabilities illustrated.
Cochrane精神分裂症小组的研究记录详细介绍了治疗精神分裂症患者的所有方面的效果。它是在过去20年里收集的,由大约2万份文件组成,绝大多数是PDF格式的。这种类型的文档集合——在给定的主题上,但从广泛的来源收集——通常会在PDF的质量上有很大的变化,特别是在文本可搜索性的关键属性方面。迄今为止,从相关PDF文件的视觉检查开始,通过手动创建摘要文档来总结这些最佳论文的结果,以允许基于证据的卫生保健决策。这一劳动密集型的过程导致,到目前为止,只有4000篇论文被总结出来——大量的重复工作,以及围绕数据提取的有效性和可靠性的许多问题。本文描述了一个提供计算机辅助框架的试点项目,在该框架中,任何PDF文档都可以搜索大约8000个关键字和关键短语。一旦关键字标签完成,框架就会协助生成标准摘要文档,从而大大加快这些摘要的生成速度。描述了该框架的早期示例并说明了其功能。
{"title":"Generating summary documents for a variable-quality PDF document collection","authors":"Jacob Hughes, D. Brailsford, S. Bagley, C. Adams","doi":"10.1145/2644866.2644892","DOIUrl":"https://doi.org/10.1145/2644866.2644892","url":null,"abstract":"The Cochrane Schizophrenia Group's Register of studies details all aspects of the effects of treating people with schizophrenia. It has been gathered over the last 20 years and consists of around 20,000 documents, overwhelmingly in PDF. Document collections of this sort -- on a given theme but gathered from a wide range of sources -- will generally have huge variability in the quality of the PDF, particularly with respect to the key property of text searchability.\u0000 Summarising the results from the best of these papers, to allow evidence-based health care decision making, has so far been done by manually creating a summary document, starting from a visual inspection of the relevant PDF file. This labour-intensive process has resulted, to date, in only 4,000 of the papers being summarised -- with enormous duplication of effort and with many issues around the validity and reliability of the data extraction.\u0000 This paper describes a pilot project to provide a computer-assisted framework in which any of the PDF documents could be searched for the occurrence of some 8,000 keywords and key phrases. Once keyword tagging has been completed the framework assists in the generation of a standard summary document, thereby greatly speeding up the production of these summaries. Early examples of the framework are described and its capabilities illustrated.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"34 1","pages":"49-52"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74291369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1