首页 > 最新文献

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering最新文献

英文 中文
Truncation: all the news that fits we'll print 截断:我们将打印所有符合的新闻
J. Hailpern, N. Venkata, Marina Danilevsky
A news article generally contains a high-level overview of the facts early on, followed by paragraphs of more detailed information. This structure allows copy editors to truncate the latter paragraphs of an article in order to satisfy space limitations without losing critical information. Existing approaches to this problem of automatic multi-article layout focus exclusively on maximizing content and aesthetics. However, no algorithm can determine how "good" a truncation point is based on the semantic content, or article readability. Yet, disregarding the semantic information within the article can lead to either overly aggressive cutting, thereby eliminating key content and potentially confusing the reader; conversely, it may set too generous of a truncation point, thus leaving in superfluous content and making automatic layout more difficult. This is one of the remaining challenges on the path from manual layouts to fully automated processes with high quality output. In this work, we present a new semantic-focused approach to rate the quality of a truncation point. We built models based on results from an extensive user study on over 700 news articles. Further results show that existing techniques over-cut content. We demonstrate the layout impact through a second evaluation that implements our models in the first layout approach that integrates both layout and semantic quality. The primary contribution of this work is the demonstration that semantic-based modeling is critical for high-quality automated document synthesis within a real-world context.
一篇新闻文章通常会在开头对事实进行概述,然后是更详细的信息段落。这种结构允许文字编辑截断文章的后几段,以满足空间限制而不丢失关键信息。现有的方法来解决这个问题的自动多文章布局完全集中在最大化的内容和美学。然而,没有一种算法可以根据语义内容或文章可读性来确定截断点有多“好”。然而,忽视文章中的语义信息可能会导致过度的删节,从而消除关键内容并可能使读者感到困惑;相反,它可能会设置太大的截断点,从而留下多余的内容,使自动布局更加困难。从手工布局到高质量输出的全自动流程,这是剩下的挑战之一。在这项工作中,我们提出了一种新的以语义为中心的方法来评价截断点的质量。我们根据对700多篇新闻文章的广泛用户研究的结果建立了模型。进一步的结果表明,现有的技术过度削减了内容。我们通过第二次评估来演示布局的影响,该评估在集成布局和语义质量的第一种布局方法中实现了我们的模型。这项工作的主要贡献是证明了基于语义的建模对于现实环境中高质量的自动文档合成是至关重要的。
{"title":"Truncation: all the news that fits we'll print","authors":"J. Hailpern, N. Venkata, Marina Danilevsky","doi":"10.1145/2644866.2644869","DOIUrl":"https://doi.org/10.1145/2644866.2644869","url":null,"abstract":"A news article generally contains a high-level overview of the facts early on, followed by paragraphs of more detailed information. This structure allows copy editors to truncate the latter paragraphs of an article in order to satisfy space limitations without losing critical information. Existing approaches to this problem of automatic multi-article layout focus exclusively on maximizing content and aesthetics. However, no algorithm can determine how \"good\" a truncation point is based on the semantic content, or article readability. Yet, disregarding the semantic information within the article can lead to either overly aggressive cutting, thereby eliminating key content and potentially confusing the reader; conversely, it may set too generous of a truncation point, thus leaving in superfluous content and making automatic layout more difficult. This is one of the remaining challenges on the path from manual layouts to fully automated processes with high quality output. In this work, we present a new semantic-focused approach to rate the quality of a truncation point. We built models based on results from an extensive user study on over 700 news articles. Further results show that existing techniques over-cut content. We demonstrate the layout impact through a second evaluation that implements our models in the first layout approach that integrates both layout and semantic quality. The primary contribution of this work is the demonstration that semantic-based modeling is critical for high-quality automated document synthesis within a real-world context.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"90 1","pages":"165-174"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72894111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The virtual splitter: refactoring web applications for themultiscreen environment 虚拟分配器:为多屏幕环境重构web应用程序
Mira Sarkis, C. Concolato, Jean-Claude Dufourd
Creating web applications for the multiscreen environment is still a challenge. One approach is to transform existing single-screen applications but this has not been done yet automatically or generically. This paper proposes a refactoring system. It consists of a generic and extensible mapping phase that automatically analyzes the application content based on a semantic or a visual criterion determined by the author or the user, and prepares it for the splitting process. The system then splits the application and as a result delivers two instrumented applications ready for distribution across devices. During runtime, the system uses a mirroring phase to maintain the functionality of the distributed application and to support a dynamic splitting process. Developed as a Chrome extension, our approach is validated on several web applications, including a YouTube page and a video application from Mozilla.
为多屏幕环境创建web应用程序仍然是一个挑战。一种方法是转换现有的单屏应用程序,但这还没有自动或通用地完成。本文提出了一个重构系统。它由一个通用的和可扩展的映射阶段组成,该阶段根据作者或用户确定的语义或视觉标准自动分析应用程序内容,并为拆分过程做好准备。然后,系统拆分应用程序,从而交付两个准备跨设备分发的仪器化应用程序。在运行时,系统使用镜像阶段来维护分布式应用程序的功能,并支持动态拆分过程。作为Chrome扩展开发,我们的方法在几个web应用程序上得到了验证,包括YouTube页面和Mozilla的视频应用程序。
{"title":"The virtual splitter: refactoring web applications for themultiscreen environment","authors":"Mira Sarkis, C. Concolato, Jean-Claude Dufourd","doi":"10.1145/2644866.2644893","DOIUrl":"https://doi.org/10.1145/2644866.2644893","url":null,"abstract":"Creating web applications for the multiscreen environment is still a challenge. One approach is to transform existing single-screen applications but this has not been done yet automatically or generically. This paper proposes a refactoring system. It consists of a generic and extensible mapping phase that automatically analyzes the application content based on a semantic or a visual criterion determined by the author or the user, and prepares it for the splitting process. The system then splits the application and as a result delivers two instrumented applications ready for distribution across devices. During runtime, the system uses a mirroring phase to maintain the functionality of the distributed application and to support a dynamic splitting process. Developed as a Chrome extension, our approach is validated on several web applications, including a YouTube page and a video application from Mozilla.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"1 1","pages":"139-142"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89425276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Image-based document management: aggregating collections of handwritten forms 基于图像的文档管理:聚合手写表单的集合
J. Barrus, E. L. Schwartz
Many companies still operate critical business processes using paper-based forms, including customer surveys, inspections, contracts and invoices. Converting those handwritten forms to symbolic data is expensive and complicated. This paper presents an overview of the Image-Based Document Management (IBDM) system for analyzing handwritten forms without requiring conversion to symbolic data. Strokes captured in a questionnaire on a tablet are separated into fields that are then displayed in a spreadsheet. Rows represent documents while columns represent corresponding fields across all documents. IBDM allows a process owner to capture and analyze large collections of documents with minimal IT support. IBDM supports the creation of filters and queries on the data. IBDM also allows the user to request symbolic conversion of individual columns of data and permits the user to create custom views by reordering and sorting the columns. In other words, IBDM provides a "writing on paper" experience for the data collector and a web-based database experience for the analyst.
许多公司仍然使用基于纸张的表单来操作关键业务流程,包括客户调查、检查、合同和发票。将这些手写表单转换为符号数据既昂贵又复杂。本文概述了基于图像的文档管理(IBDM)系统,该系统用于分析手写表单而无需转换为符号数据。在平板电脑上的问卷中捕捉到的笔划被分成字段,然后显示在电子表格中。行表示文档,列表示所有文档中的相应字段。IBDM允许流程所有者在最少的IT支持下捕获和分析大型文档集合。IBDM支持在数据上创建过滤器和查询。IBDM还允许用户请求对各个数据列进行符号转换,并允许用户通过对列重新排序和排序来创建自定义视图。换句话说,IBDM为数据收集器提供了“写在纸上”的体验,为分析人员提供了基于web的数据库体验。
{"title":"Image-based document management: aggregating collections of handwritten forms","authors":"J. Barrus, E. L. Schwartz","doi":"10.1145/2644866.2644891","DOIUrl":"https://doi.org/10.1145/2644866.2644891","url":null,"abstract":"Many companies still operate critical business processes using paper-based forms, including customer surveys, inspections, contracts and invoices. Converting those handwritten forms to symbolic data is expensive and complicated. This paper presents an overview of the Image-Based Document Management (IBDM) system for analyzing handwritten forms without requiring conversion to symbolic data. Strokes captured in a questionnaire on a tablet are separated into fields that are then displayed in a spreadsheet. Rows represent documents while columns represent corresponding fields across all documents. IBDM allows a process owner to capture and analyze large collections of documents with minimal IT support. IBDM supports the creation of filters and queries on the data. IBDM also allows the user to request symbolic conversion of individual columns of data and permits the user to create custom views by reordering and sorting the columns. In other words, IBDM provides a \"writing on paper\" experience for the data collector and a web-based database experience for the analyst.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"78 1","pages":"117-120"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83524681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-grained change detection in structured text documents 结构化文本文档中的细粒度变更检测
Hannes Dohrn, D. Riehle
Detecting and understanding changes between document revisions is an important task. The acquired knowledge can be used to classify the nature of a new document revision or to support a human editor in the review process. While purely textual change detection algorithms offer fine-grained results, they do not understand the syntactic meaning of a change. By representing structured text documents as XML documents we can apply tree-to-tree correction algorithms to identify the syntactic nature of a change. Many algorithms for change detection in XML documents have been propsed but most of them focus on the intricacies of generic XML data and emphasize speed over the quality of the result. Structured text requires a change detection algorithm to pay close attention to the content in text nodes, however, recent algorithms treat text nodes as black boxes. We present an algorithm that combines the advantages of the purely textual approach with the advantages of tree-to-tree change detection by redistributing text from non-overlapping common substrings to the nodes of the trees. This allows us to not only spot changes in the structure but also in the text itself, thus achieving higher quality and a fine-grained result in linear time on average. The algorithm is evaluated by applying it to the corpus of structured text documents that can be found in the English Wikipedia.
检测和理解文档修订之间的变化是一项重要任务。获得的知识可用于对新文档修订的性质进行分类,或在审查过程中支持人工编辑。虽然纯文本更改检测算法提供细粒度的结果,但它们不理解更改的语法含义。通过将结构化文本文档表示为XML文档,我们可以应用树到树的校正算法来识别更改的语法性质。已经提出了许多用于XML文档中更改检测的算法,但其中大多数算法关注的是通用XML数据的复杂性,并且强调速度而不是结果的质量。结构化文本需要一个变化检测算法来密切关注文本节点中的内容,然而,最近的算法将文本节点视为黑盒。我们提出了一种算法,它结合了纯文本方法的优点和树到树的变化检测的优点,通过将文本从非重叠的公共子字符串重新分配到树的节点。这使我们不仅可以发现结构的变化,还可以发现文本本身的变化,从而在平均线性时间内获得更高的质量和细粒度的结果。该算法通过将其应用于英语维基百科中可以找到的结构化文本文档的语料库来评估。
{"title":"Fine-grained change detection in structured text documents","authors":"Hannes Dohrn, D. Riehle","doi":"10.1145/2644866.2644880","DOIUrl":"https://doi.org/10.1145/2644866.2644880","url":null,"abstract":"Detecting and understanding changes between document revisions is an important task. The acquired knowledge can be used to classify the nature of a new document revision or to support a human editor in the review process. While purely textual change detection algorithms offer fine-grained results, they do not understand the syntactic meaning of a change. By representing structured text documents as XML documents we can apply tree-to-tree correction algorithms to identify the syntactic nature of a change.\u0000 Many algorithms for change detection in XML documents have been propsed but most of them focus on the intricacies of generic XML data and emphasize speed over the quality of the result. Structured text requires a change detection algorithm to pay close attention to the content in text nodes, however, recent algorithms treat text nodes as black boxes.\u0000 We present an algorithm that combines the advantages of the purely textual approach with the advantages of tree-to-tree change detection by redistributing text from non-overlapping common substrings to the nodes of the trees. This allows us to not only spot changes in the structure but also in the text itself, thus achieving higher quality and a fine-grained result in linear time on average. The algorithm is evaluated by applying it to the corpus of structured text documents that can be found in the English Wikipedia.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"149 1","pages":"87-96"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79440986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
On automatic text segmentation 自动文本分割
Boris Dadachev, A. Balinsky, H. Balinsky
Automatic text segmentation, which is the task of breaking a text into topically-consistent segments, is a fundamental problem in Natural Language Processing, Document Classification and Information Retrieval. Text segmentation can significantly improve the performance of various text mining algorithms, by splitting heterogeneous documents into homogeneous fragments and thus facilitating subsequent processing. Applications range from screening of radio communication transcripts to document summarization, from automatic document classification to information visualization, from automatic filtering to security policy enforcement - all rely on, or can largely benefit from, automatic document segmentation. In this article, a novel approach for automatic text and data stream segmentation is presented and studied. The proposed automatic segmentation algorithm takes advantage of feature extraction and unusual behaviour detection algorithms developed in [4, 5]. It is entirely unsupervised and flexible to allow segmentation at different scales, such as short paragraphs and large sections. We also briefly review the most popular and important algorithms for automatic text segmentation and present detailed comparisons of our approach with several of those state-of-the-art algorithms.
自动文本分割是将文本分割成主题一致的文本片段的任务,是自然语言处理、文档分类和信息检索中的一个基本问题。文本分割通过将异构文档分割成同质的片段,从而便于后续处理,可以显著提高各种文本挖掘算法的性能。应用程序的范围从无线电通信记录的筛选到文档摘要,从自动文档分类到信息可视化,从自动过滤到安全策略的实施——所有这些都依赖于或很大程度上受益于自动文档分割。本文提出并研究了一种新的文本和数据流自动分割方法。本文提出的自动分割算法利用了[4,5]中开发的特征提取和异常行为检测算法。它是完全无监督和灵活的,允许在不同规模的分割,如短段落和大的部分。我们还简要回顾了最流行和最重要的自动文本分割算法,并将我们的方法与其中几种最先进的算法进行了详细的比较。
{"title":"On automatic text segmentation","authors":"Boris Dadachev, A. Balinsky, H. Balinsky","doi":"10.1145/2644866.2644874","DOIUrl":"https://doi.org/10.1145/2644866.2644874","url":null,"abstract":"Automatic text segmentation, which is the task of breaking a text into topically-consistent segments, is a fundamental problem in Natural Language Processing, Document Classification and Information Retrieval. Text segmentation can significantly improve the performance of various text mining algorithms, by splitting heterogeneous documents into homogeneous fragments and thus facilitating subsequent processing. Applications range from screening of radio communication transcripts to document summarization, from automatic document classification to information visualization, from automatic filtering to security policy enforcement - all rely on, or can largely benefit from, automatic document segmentation. In this article, a novel approach for automatic text and data stream segmentation is presented and studied. The proposed automatic segmentation algorithm takes advantage of feature extraction and unusual behaviour detection algorithms developed in [4, 5]. It is entirely unsupervised and flexible to allow segmentation at different scales, such as short paragraphs and large sections. We also briefly review the most popular and important algorithms for automatic text segmentation and present detailed comparisons of our approach with several of those state-of-the-art algorithms.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"9 1","pages":"73-80"},"PeriodicalIF":0.0,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89114138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Personalized document clustering with dual supervision 具有双重监督的个性化文档聚类
Yeming Hu, E. Milios, J. Blustein, Shali Liu
The potential for semi-supervised techniques to produce personalized clusters has not been explored. This is due to the fact that semi-supervised clustering algorithms used to be evaluated using oracles based on underlying class labels. Although using oracles allows clustering algorithms to be evaluated quickly and without labor intensive labeling, it has the key disadvantage that oracles always give the same answer for an assignment of a document or a feature. However, different human users might give different assignments of the same document and/or feature because of different but equally valid points of view. In this paper, we conduct a user study in which we ask participants (users) to group the same document collection into clusters according to their own understanding, which are then used to evaluate semi-supervised clustering algorithms for user personalization. Through our user study, we observe that different users have their own personalized organizations of the same collection and a user's organization changes over time. Therefore, we propose that document clustering algorithms should be able to incorporate user input and produce personalized clusters based on the user input. We also confirm that semi-supervised algorithms with noisy user input can still produce better organizations matching user's expectation (personalization) than traditional unsupervised ones. Finally, we demonstrate that labeling keywords for clusters at the same time as labeling documents can improve clustering performance further compared to labeling only documents with respect to user personalization.
半监督技术产生个性化集群的潜力尚未得到探索。这是因为半监督聚类算法过去是使用基于底层类标签的oracle来评估的。虽然使用oracle可以快速评估聚类算法,并且不需要耗费大量劳动的标记,但它有一个关键的缺点,即对于文档或特征的分配,oracle总是给出相同的答案。然而,不同的人类用户可能会因为不同但同样有效的观点而对同一文档和/或特性给出不同的分配。在本文中,我们进行了一项用户研究,我们要求参与者(用户)根据自己的理解将相同的文档集合分组,然后使用这些分组来评估用户个性化的半监督聚类算法。通过我们的用户研究,我们观察到不同的用户对相同的收藏有自己的个性化组织,并且用户的组织随时间而变化。因此,我们建议文档聚类算法应该能够结合用户输入并基于用户输入生成个性化的聚类。我们还证实,与传统的无监督算法相比,带有噪声用户输入的半监督算法仍然可以产生更好的匹配用户期望(个性化)的组织。最后,我们证明了在标记文档的同时标记聚类的关键字可以进一步提高聚类性能,而不仅仅是标记用户个性化的文档。
{"title":"Personalized document clustering with dual supervision","authors":"Yeming Hu, E. Milios, J. Blustein, Shali Liu","doi":"10.1145/2361354.2361393","DOIUrl":"https://doi.org/10.1145/2361354.2361393","url":null,"abstract":"The potential for semi-supervised techniques to produce personalized clusters has not been explored. This is due to the fact that semi-supervised clustering algorithms used to be evaluated using oracles based on underlying class labels. Although using oracles allows clustering algorithms to be evaluated quickly and without labor intensive labeling, it has the key disadvantage that oracles always give the same answer for an assignment of a document or a feature. However, different human users might give different assignments of the same document and/or feature because of different but equally valid points of view. In this paper, we conduct a user study in which we ask participants (users) to group the same document collection into clusters according to their own understanding, which are then used to evaluate semi-supervised clustering algorithms for user personalization. Through our user study, we observe that different users have their own personalized organizations of the same collection and a user's organization changes over time. Therefore, we propose that document clustering algorithms should be able to incorporate user input and produce personalized clusters based on the user input. We also confirm that semi-supervised algorithms with noisy user input can still produce better organizations matching user's expectation (personalization) than traditional unsupervised ones. Finally, we demonstrate that labeling keywords for clusters at the same time as labeling documents can improve clustering performance further compared to labeling only documents with respect to user personalization.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"59 Pt A 1","pages":"161-170"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86924607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Just-in-time personalized video presentations 即时的个性化视频演示
Jack Jansen, Pablo César, R. Guimarães, D. Bulterman
Using high-quality video cameras on mobile devices, it is relatively easy to capture a significant volume of video content for community events such as local concerts or sporting events. A more difficult problem is selecting and sequencing individual media fragments that meet the personal interests of a viewer of such content. In this paper, we consider an infrastructure that supports the just-in-time delivery of personalized content. Based on user profiles and interests, tailored video mash-ups can be created at view-time and then further tailored to user interests via simple end-user interaction. Unlike other mash-up research, our system focuses on client-side compilation based on personal (rather than aggregate) interests. This paper concentrates on a discussion of language and infrastructure issues required to support just-in-time video composition and delivery. Using a high school concert as an example, we provide a set of requirements for dynamic content delivery. We then provide an architecture and infrastructure that meets these requirements. We conclude with a technical and user analysis of the just-in-time personalized video approach.
在移动设备上使用高质量的视频摄像机,可以相对容易地为社区活动(如当地音乐会或体育赛事)捕获大量视频内容。一个更困难的问题是选择和排序满足这些内容的观众个人兴趣的单个媒体片段。在本文中,我们考虑一种支持即时交付个性化内容的基础设施。基于用户配置文件和兴趣,可以在观看时创建定制视频混搭,然后通过简单的最终用户交互进一步针对用户兴趣进行定制。与其他混搭研究不同,我们的系统侧重于基于个人(而不是总体)兴趣的客户端编译。本文集中讨论了支持实时视频合成和交付所需的语言和基础结构问题。以一所高中的音乐会为例,我们提供了一组动态内容交付的需求。然后,我们提供满足这些需求的体系结构和基础设施。最后,我们对即时个性化视频方法进行了技术和用户分析。
{"title":"Just-in-time personalized video presentations","authors":"Jack Jansen, Pablo César, R. Guimarães, D. Bulterman","doi":"10.1145/2361354.2361368","DOIUrl":"https://doi.org/10.1145/2361354.2361368","url":null,"abstract":"Using high-quality video cameras on mobile devices, it is relatively easy to capture a significant volume of video content for community events such as local concerts or sporting events. A more difficult problem is selecting and sequencing individual media fragments that meet the personal interests of a viewer of such content. In this paper, we consider an infrastructure that supports the just-in-time delivery of personalized content. Based on user profiles and interests, tailored video mash-ups can be created at view-time and then further tailored to user interests via simple end-user interaction. Unlike other mash-up research, our system focuses on client-side compilation based on personal (rather than aggregate) interests. This paper concentrates on a discussion of language and infrastructure issues required to support just-in-time video composition and delivery. Using a high school concert as an example, we provide a set of requirements for dynamic content delivery. We then provide an architecture and infrastructure that meets these requirements. We conclude with a technical and user analysis of the just-in-time personalized video approach.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"1 1","pages":"59-68"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83657243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Ad insertion in automatically composed documents 自动组合文档中的广告插入
Niranjan Damera-Venkata, José Bento
We consider the problem of automatically inserting advertisements (ads) into machine composed documents. We explicitly analyze the fundamental tradeoff between expected revenue due to ad insertion and the quality of the corresponding composed documents. We show that the optimal tradeoff a publisher can expect may be expressed as an efficient-frontier in the revenue-quality space. We develop algorithms to compose documents that lie on this optimal tradeoff frontier. These algorithms can automatically choose distributions of ad sizes and ad placement locations to optimize revenue for a given quality or optimize quality for given revenue. Such automation allows a market maker to accept highly personalized content from publishers who have no design or ad inventory management capability and distribute formatted documents to end users with aesthetic ad placement. The ad density/coverage may be controlled by the publisher or the end user on a per document basis by simply sliding along the tradeoff frontier. Business models where ad sales precede (ad-pull) or follow (ad-push) document composition are analyzed from a document engineering perspective.
我们考虑在机器合成文档中自动插入广告的问题。我们明确地分析了由于广告插入而产生的预期收入和相应组合文档的质量之间的基本权衡。我们认为发行商所期望的最佳权衡可以用收益-质量领域的有效边界来表示。我们开发算法来编写位于这个最佳权衡边界的文档。这些算法可以自动选择广告大小和广告放置位置的分布,以优化给定质量的收入或优化给定收入的质量。这种自动化允许做市商从没有设计或广告库存管理能力的发布商那里接受高度个性化的内容,并通过美观的广告位置向最终用户分发格式化的文档。广告密度/覆盖率可以由发行商或最终用户根据每个文档来控制,只需沿着权衡边界滑动即可。从文档工程的角度分析了广告销售先于(广告拉)或紧跟(广告推)文档构成的业务模型。
{"title":"Ad insertion in automatically composed documents","authors":"Niranjan Damera-Venkata, José Bento","doi":"10.1145/2361354.2361358","DOIUrl":"https://doi.org/10.1145/2361354.2361358","url":null,"abstract":"We consider the problem of automatically inserting advertisements (ads) into machine composed documents. We explicitly analyze the fundamental tradeoff between expected revenue due to ad insertion and the quality of the corresponding composed documents. We show that the optimal tradeoff a publisher can expect may be expressed as an efficient-frontier in the revenue-quality space. We develop algorithms to compose documents that lie on this optimal tradeoff frontier. These algorithms can automatically choose distributions of ad sizes and ad placement locations to optimize revenue for a given quality or optimize quality for given revenue. Such automation allows a market maker to accept highly personalized content from publishers who have no design or ad inventory management capability and distribute formatted documents to end users with aesthetic ad placement. The ad density/coverage may be controlled by the publisher or the end user on a per document basis by simply sliding along the tradeoff frontier. Business models where ad sales precede (ad-pull) or follow (ad-push) document composition are analyzed from a document engineering perspective.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"82 1","pages":"3-12"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87378701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Challenges in generating bookmarks from TOC entries in e-books 从电子书的TOC条目生成书签的挑战
Yogalakshmi Jayabal, Chandrashekar Ramanathan, M. Sheth
ABSTRACT The task of extracting document structures from a digital e-book is difficult and is an active area of research. On the other hand, many e-books already have a table of contents (TOC) at the beginning of the document. This may lead us to believe that adding bookmarks into digital document (e-book) based on the existing TOC would be trivial. In this paper, we highlight the challenges involved in this task of automatically adding bookmarks to an existing e-book based on the TOC that exists within the document. If we are able to reliably identify the specific locations of each TOC entry within the document, the algorithms can be easily extended to identify document structures within e-books that have TOC. We describe a tool we have built called Booky that tries to add automatic PDF bookmarks to existing PDF based e-books as they have TOC as part of the document content. The tool addresses most of the challenges that have been identified while still leaving a few tricky scenarios still open.
从电子书中提取文档结构是一个困难的任务,也是一个活跃的研究领域。另一方面,许多电子书在文档的开头已经有了目录(TOC)。这可能会导致我们认为,在基于现有TOC的数字文档(电子书)中添加书签是微不足道的。在本文中,我们重点介绍了基于文档中存在的TOC自动向现有电子书添加书签的任务所涉及的挑战。如果我们能够可靠地识别文档中每个TOC条目的具体位置,那么算法可以很容易地扩展到识别具有TOC的电子书中的文档结构。我们描述了一个我们建立的叫做Booky的工具,它试图将自动PDF书签添加到现有的基于PDF的电子书中,因为它们将TOC作为文档内容的一部分。该工具解决了已经确定的大多数挑战,同时仍然留下一些棘手的场景仍然开放。
{"title":"Challenges in generating bookmarks from TOC entries in e-books","authors":"Yogalakshmi Jayabal, Chandrashekar Ramanathan, M. Sheth","doi":"10.1145/2361354.2361363","DOIUrl":"https://doi.org/10.1145/2361354.2361363","url":null,"abstract":"ABSTRACT The task of extracting document structures from a digital e-book is difficult and is an active area of research. On the other hand, many e-books already have a table of contents (TOC) at the beginning of the document. This may lead us to believe that adding bookmarks into digital document (e-book) based on the existing TOC would be trivial. In this paper, we highlight the challenges involved in this task of automatically adding bookmarks to an existing e-book based on the TOC that exists within the document. If we are able to reliably identify the specific locations of each TOC entry within the document, the algorithms can be easily extended to identify document structures within e-books that have TOC. We describe a tool we have built called Booky that tries to add automatic PDF bookmarks to existing PDF based e-books as they have TOC as part of the document content. The tool addresses most of the challenges that have been identified while still leaving a few tricky scenarios still open.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"62 1","pages":"37-40"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86104494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
ALMcss: a javascript implementation of the CSS template layout module 一个javascript实现的CSS模板布局模块
César F. Acebal, B. Bos, M. Rodríguez, J. M. C. Lovelle
Traditionally, web standards in general and Cascading Style Sheets (CSS) in particular take a long time from when they are defined by the W3C until they are implemented by browser vendors. This has been a limitation not only for authors, who had to wait even years before they were able to use certain CSS properties in their web pages, but also for the creators of the specification itself, who were not able to test their proposals in practice. In this paper we present ALMcss, a JavaScript prototype that implements the CSS Template Layout Module, a proposal for an addition to CSS to make it a more capable layout language. It has been developed inside the W3C CSS Working Group by two of the authors of this paper. We present the rationale of the module and an introduction to its syntax, before discussing the design of our prototype. ALMcss has served us as a proof of concept that the Template Layout Module is not only feasible, but it can be in fact implemented in current web browsers using just JavaScript and the Document Object Model (DOM). In addition, ALMcss allows web designers to start to use today the new layout capabilities of CSS that the module provides, even before it becomes an official W3C specification.
传统上,一般的web标准,尤其是层叠样式表(CSS),从W3C定义到浏览器厂商实现,需要很长时间。这不仅对作者来说是一个限制,他们必须等待甚至几年才能在他们的网页中使用某些CSS属性,而且对规范本身的创建者来说也是一个限制,他们无法在实践中测试他们的建议。在本文中,我们提出了ALMcss,一个实现CSS模板布局模块的JavaScript原型,一个增加CSS的建议,使其成为更有能力的布局语言。它是由本文的两位作者在W3C CSS工作组内部开发的。在讨论原型的设计之前,我们将介绍模块的基本原理和语法。ALMcss为我们提供了一个概念证明,模板布局模块不仅是可行的,而且实际上可以在当前的web浏览器中使用JavaScript和文档对象模型(DOM)来实现。此外,ALMcss允许网页设计师开始使用该模块提供的CSS的新布局功能,甚至在它成为正式的W3C规范之前。
{"title":"ALMcss: a javascript implementation of the CSS template layout module","authors":"César F. Acebal, B. Bos, M. Rodríguez, J. M. C. Lovelle","doi":"10.1145/2361354.2361360","DOIUrl":"https://doi.org/10.1145/2361354.2361360","url":null,"abstract":"Traditionally, web standards in general and Cascading Style Sheets (CSS) in particular take a long time from when they are defined by the W3C until they are implemented by browser vendors. This has been a limitation not only for authors, who had to wait even years before they were able to use certain CSS properties in their web pages, but also for the creators of the specification itself, who were not able to test their proposals in practice.\u0000 In this paper we present ALMcss, a JavaScript prototype that implements the CSS Template Layout Module, a proposal for an addition to CSS to make it a more capable layout language. It has been developed inside the W3C CSS Working Group by two of the authors of this paper. We present the rationale of the module and an introduction to its syntax, before discussing the design of our prototype.\u0000 ALMcss has served us as a proof of concept that the Template Layout Module is not only feasible, but it can be in fact implemented in current web browsers using just JavaScript and the Document Object Model (DOM). In addition, ALMcss allows web designers to start to use today the new layout capabilities of CSS that the module provides, even before it becomes an official W3C specification.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"10 1","pages":"23-32"},"PeriodicalIF":0.0,"publicationDate":"2012-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88182088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1