Proceedings of the ACM Symposium on Document Engineering 2018最新文献

英文中文

Recommending Colors and Fonts for Cover Page of EPUB Book 推荐EPUB图书封面的颜色和字体

Proceedings of the ACM Symposium on Document Engineering 2018

Pub Date : 2018-08-28 DOI: 10.1145/3209280.3229086

Haruka Kawaguchi, Nobutaka Suzuki

Suppose that you write a text or find a text that looks interesting on the Web, and that want to create an e-book from the text. When creating an e-book from such a text file, you have to create a cover page for the e-book. However, with existing conversion services/tools we cannot obtain any cover page reflecting the impression of the text automatically. In this paper, in order to support users to create "good" cover pages for such texts, we propose a method for recommending colors and fonts for the cover pages of given texts/cover-less EPUB books. In our method, colors and fonts are selected so that the colors and the fonts reflect the impression of the contents of given texts/EPUB books.

假设您写了一篇文章或在网上找到了一篇看起来很有趣的文章，并希望根据该文章创建电子书。当从这样的文本文件创建电子书时，你必须为电子书创建一个封面页。但是，使用现有的转换服务/工具，我们无法自动获得反映文本印象的任何封面。在本文中，为了支持用户为这些文本创建“好的”封面，我们提出了一种为给定文本/无封面EPUB图书的封面推荐颜色和字体的方法。在我们的方法中，选择颜色和字体，以便颜色和字体反映给定文本/EPUB书籍内容的印象。

引用次数: 2

Improving Short Text Clustering by Similarity Matrix Sparsification 基于相似矩阵稀疏的短文本聚类改进

Proceedings of the ACM Symposium on Document Engineering 2018

Pub Date : 2018-08-28 DOI: 10.1145/3209280.3229114

Md. Rashadul Hasan Rakib, Magdalena Jankowska, N. Zeh, E. Milios

Short text clustering is an important but challenging task. We investigate impact of similarity matrix sparsification on the performance of short text clustering. We show that two sparsification methods (the proposed Similarity Distribution based, and k-nearest neighbors) that aim to retain a prescribed number of similarity elements per text, improve hierarchical clustering quality of short texts for various text similarities. These methods using a word embedding based similarity yield competitive results with state-of-the-art methods for short text clustering especially for general domain, and are faster than the main state-of-the-art baseline.

短文本聚类是一项重要但具有挑战性的任务。我们研究了相似矩阵稀疏化对短文本聚类性能的影响。我们展示了两种稀疏化方法(基于相似度分布和k近邻)，旨在保留每个文本规定数量的相似元素，提高了各种文本相似度的短文本分层聚类质量。这些方法使用基于词嵌入的相似度，在短文本聚类方面，特别是在一般领域，与最先进的方法产生竞争结果，并且比最先进的基线更快。

引用次数: 3

Automatic Term Extraction in Technical Domain using Part-of-Speech and Common-Word Features 基于词性和常用词特征的技术领域术语自动提取

Proceedings of the ACM Symposium on Document Engineering 2018

Pub Date : 2018-08-28 DOI: 10.1145/3209280.3229100

N. Simon, Vlado Keselj

Extracting key terms from technical documents allows us to write effective documentation that is specific and clear, with minimum ambiguity and confusion caused by nearly synonymous but different terms. For instance, in order to avoid confusion, the same object should not be referred to by two different names (e.g. "hydraulic oil filter"). In the modern world of commerce, clear terminology is the hallmark of successful RFPs (Requests for Proposal) and is therefore a key to the growth of competitive organizations. While Automatic Term Extraction (ATE) is a well-developed area of study, its applications in the technical domain have been sparse and constrained to certain narrow areas such as the biomedical research domain. We present a method for Automatic Term Extraction (ATE) for the technical domain based on the use of part-of-speech features and common words information. The method is evaluated on a C programming language reference manual as well as a manual of aircraft maintenance guidelines, and has shown comparable or better results to the reported state of the art results.

从技术文档中提取关键术语使我们能够编写具体而清晰的有效文档，将几乎同义但不同的术语引起的歧义和混淆降到最低。例如，为了避免混淆，同一个对象不应该用两个不同的名称来指代。“液压油过滤器”)。在现代商业世界中，清晰的术语是成功的rfp(请求提案)的标志，因此是竞争性组织成长的关键。虽然自动术语提取(ATE)是一个发展良好的研究领域，但其在技术领域的应用却很少，而且仅限于某些狭窄的领域，如生物医学研究领域。提出了一种基于词性特征和常用词信息的技术领域术语自动抽取方法。该方法在C编程语言参考手册以及飞机维修指南手册上进行了评估，并显示出与报告的最先进结果相当或更好的结果。

引用次数: 7

Fashioning a Search Engine to Support Humanities Research 打造支持人文学科研究的搜索引擎

Proceedings of the ACM Symposium on Document Engineering 2018

Pub Date : 2018-08-28 DOI: 10.1145/3209280.3209520

Frank Wm. Tompa

Scholarship in the humanities often requires the ability to search curated electronic corpora and to display search results in a variety of formats. Challenges that need to be addressed include transforming the texts into a suitable form, typically XML, and catering to the scholars' search and display needs. We describe our experience in creating such a search and display facility.

人文学科的学术研究通常需要有能力搜索精心策划的电子语料库，并以各种格式显示搜索结果。需要解决的挑战包括将文本转换为合适的形式，通常是XML，以及满足学者的搜索和显示需求。我们描述了创建这样一个搜索和显示工具的经验。

引用次数: 0

Text Mining and Recommender Systems for Predictive Policing 预测性警务的文本挖掘和推荐系统

Proceedings of the ACM Symposium on Document Engineering 2018

Pub Date : 2018-08-28 DOI: 10.1145/3209280.3229112

Isabelle Percy, A. Balinsky, H. Balinsky, S. Simske

We present some results from a joint project between HP Labs, Cardiff University and Dyfed Powys Police on predictive policing. Applications of the various techniques from recommender systems and text mining to the problem of crime patterns recognition are demonstrated. Our main idea is to consider crime records for different regions and time period as a corpus of text documents with words being crime types. We apply tools from NLP and text documents classifications to analyse different regions in time and space. We evaluate performance of several measures of similarity for texts and documents clustering algorithms.

我们展示了惠普实验室、卡迪夫大学和Dyfed Powys警方在预测性警务方面的联合项目的一些结果。演示了从推荐系统和文本挖掘到犯罪模式识别问题的各种技术的应用。我们的主要想法是将不同地区和时间段的犯罪记录视为文本文档的语料库，其中单词是犯罪类型。我们运用自然语言处理和文本文档分类的工具来分析时间和空间上的不同区域。我们评估了文本和文档聚类算法的几种相似度量的性能。

引用次数: 4

Exploring an AR-based User Interface for Authoring Multimedia Presentations 探索基于ar的用户界面用于创作多媒体演示

Proceedings of the ACM Symposium on Document Engineering 2018

Pub Date : 2018-08-28 DOI: 10.1145/3209280.3209534

P. Mendes, R. Azevedo, Ruy Guilherme Silva Gomes de Oliveira, Carlos de Salles Soares Neto

This paper describes the BumbAR approach for composing multimedia presentations and evaluates it through a qualitative study based on the Technology Acceptance Model (TAM). The BumbAR proposal is based on the event-condition-action model of Nested Context Model (NCM) and explores the use of augmented reality and real-world objects (markers) as an innovative user interface to specify the behavior and relationships between the media objects in a presentation. The qualitative study aimed at measuring the users' attitude towards using BumbAR and an augmented reality environment for authoring multimedia presentations. The results show that the participants found the BumbAR approach both useful and easy-to-use, while most of them (66,67%) found the system more convenient than traditional desktop-based authoring tools.

本文介绍了BumbAR多媒体演示文稿的组成方法，并通过基于技术接受模型(TAM)的定性研究对其进行了评价。BumbAR提案基于嵌套上下文模型(NCM)的事件-条件-动作模型，并探索了增强现实和现实世界对象(标记)作为创新用户界面的使用，以指定演示中媒体对象之间的行为和关系。定性研究旨在测量用户对使用BumbAR和增强现实环境创作多媒体演示文稿的态度。结果显示，参与者发现BumbAR方法既有用又易于使用，而大多数人(66,67%)认为该系统比传统的基于桌面的创作工具更方便。

引用次数: 2

Semantic Interoperability for Electronic Business through a Novel Cross-Context Semantic Document Exchange Approach 基于跨上下文语义文档交换方法的电子商务语义互操作性

Proceedings of the ACM Symposium on Document Engineering 2018

Pub Date : 2018-08-28 DOI: 10.1145/3209280.3209523

Shuo Yang, Ran Wei, A. Shigarov

The E-marketplace is a common place where entities situated in different contexts conduct business electronically. Since sellers and buyers may be located in areas with different languages, customs and even business standards, business documents may be heterogeneously edited and parsed in different contexts. However, so far, no desirable approaches have been implemented to transfer a document from one context to another without generating ambiguity and disputes may arise due to different interpretation towards to the same document. Thus, it is important to guarantee consistent understanding among different contexts. This paper proposes a cross-context semantic document exchange approach, Tabdoc approach, as a novel strategy in implementing semantic interoperability. It guarantees consistent business document understanding and realizes automatic cross-context document processing. The experimental results demonstrate promising performance improvements over state-of-the-art methods.

电子市场是位于不同环境中的实体以电子方式进行业务的公共场所。由于卖方和买方可能位于具有不同语言、习俗甚至业务标准的地区，因此业务文档可能在不同的上下文中进行不同的编辑和解析。然而，到目前为止，还没有采取任何可取的办法将一份文件从一种情况转移到另一种情况而不产生歧义，并且可能由于对同一份文件的不同解释而产生争端。因此，保证不同上下文之间的一致理解是很重要的。本文提出了一种跨上下文的语义文档交换方法——Tabdoc方法，作为实现语义互操作性的一种新策略。它保证了业务文档理解的一致性，实现了自动跨上下文文档处理。实验结果表明，与最先进的方法相比，有希望的性能改进。

引用次数: 12

A Market Analytics Approach to Restaurant Review Data 餐馆评论数据的市场分析方法

Proceedings of the ACM Symposium on Document Engineering 2018

Pub Date : 2018-08-28 DOI: 10.1145/3209280.3209524

Olga Tsubiks, Vlado Keselj

We present a novel marketing method for consumer trend detection from online user generated content, which is motivated by the gap identified in the market research literature. The existing approaches for trend analysis are generally based on rating of trends by industry experts through survey questionnaires, interviews, or similar. These methods proved to be inherently costly and often suffer from bias. Our approach is based on the use of information extraction techniques for identification of trends in large aggregations of social media data. It is cost-effective method that reduces the possibility of errors associated with the design of the sample and the research instrument. The effectiveness of the approach is demonstrated in the experiment performed on restaurant review data. The accuracy of the results is at the level of current approaches for both, information extraction and market research.

我们提出了一种新的营销方法，用于从在线用户生成的内容中检测消费者趋势，这是由市场研究文献中确定的差距所激发的。现有的趋势分析方法通常是基于行业专家通过调查问卷、访谈或类似的方式对趋势进行评级。事实证明，这些方法本身成本高昂，而且往往存在偏见。我们的方法是基于使用信息提取技术来识别大量社交媒体数据的趋势。这是一种经济有效的方法，减少了与样品和研究仪器设计相关的误差的可能性。在餐馆评论数据上进行的实验证明了该方法的有效性。结果的准确性处于当前信息提取和市场研究方法的水平。

引用次数: 0

Towards a Universally Editable Portable Document Format 迈向通用可编辑的可移植文档格式

Proceedings of the ACM Symposium on Document Engineering 2018

Pub Date : 2018-08-28 DOI: 10.1145/3209280.3229083

Tamir Hassan

PDF is the established format for the exchange of final-form print-oriented documents on the Web, and for a good reason: it is the only format that guarantees the preservation of layout across different platforms, systems and viewing devices. Its main disadvantage, however, is that a document, once converted to PDF, is very difficult to edit. As of today (2018), there is still no universal format for the exchange of editable formatted text documents on the Web; users can only exchange the application's source files, which do not benefit from the robustness and portability of PDF. This position paper describes how we can engineer such an editable format based on some of the principles of PDF. We begin by analysing the current status quo, and provide a summary of current approaches for editing existing PDFs, other relevant document formats, and ways to embed the document's structure into the PDF itself. We then ask ourselves what it really means for a formatted document to be editable, and discuss the related problem of enabling WYSIWYG direct manipulation even in cases where layout is usually computed or optimized using offline or batch methods (as is common with long-form documents). After defining our goals, we propose a framework for creating such editable portable documents and present a prototype tool that demonstrates our initial steps and serves as a proof of concept. We conclude by providing a roadmap for future work.

PDF是用于在Web上交换最终形式的面向打印的文档的既定格式，这是有充分理由的:它是唯一能够保证跨不同平台、系统和查看设备保留布局的格式。然而，它的主要缺点是，文档一旦转换为PDF，就很难编辑。截至今天(2018年)，仍然没有在Web上交换可编辑的格式化文本文档的通用格式;用户只能交换应用程序的源文件，这并不能从PDF的健壮性和可移植性中获益。这份意见书描述了我们如何基于PDF的一些原则来设计这样一种可编辑的格式。我们首先分析当前的现状，并总结当前编辑现有PDF、其他相关文档格式的方法，以及将文档结构嵌入PDF本身的方法。然后我们问自己，一个格式化的文档是可编辑的真正意义是什么，并讨论在通常使用离线或批处理方法计算或优化布局的情况下，启用WYSIWYG直接操作的相关问题(这在长格式文档中很常见)。在定义了我们的目标之后，我们提出了一个创建这种可编辑的便携式文档的框架，并提供了一个原型工具来演示我们的初始步骤，并作为概念的证明。最后，我们为未来的工作提供了一个路线图。

{"title":"Towards a Universally Editable Portable Document Format","authors":"Tamir Hassan","doi":"10.1145/3209280.3229083","DOIUrl":"https://doi.org/10.1145/3209280.3229083","url":null,"abstract":"PDF is the established format for the exchange of final-form print-oriented documents on the Web, and for a good reason: it is the only format that guarantees the preservation of layout across different platforms, systems and viewing devices. Its main disadvantage, however, is that a document, once converted to PDF, is very difficult to edit. As of today (2018), there is still no universal format for the exchange of editable formatted text documents on the Web; users can only exchange the application's source files, which do not benefit from the robustness and portability of PDF. This position paper describes how we can engineer such an editable format based on some of the principles of PDF. We begin by analysing the current status quo, and provide a summary of current approaches for editing existing PDFs, other relevant document formats, and ways to embed the document's structure into the PDF itself. We then ask ourselves what it really means for a formatted document to be editable, and discuss the related problem of enabling WYSIWYG direct manipulation even in cases where layout is usually computed or optimized using offline or batch methods (as is common with long-form documents). After defining our goals, we propose a framework for creating such editable portable documents and present a prototype tool that demonstrates our initial steps and serves as a proof of concept. We conclude by providing a roadmap for future work.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125388826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Vectorisation of Sketches with Shadows and Shading using COSFIRE filters 矢量化草图与阴影和阴影使用COSFIRE过滤器

Proceedings of the ACM Symposium on Document Engineering 2018

Pub Date : 2018-08-28 DOI: 10.1145/3209280.3209525

Alexandra Bonnici, Dorian Bugeja, G. Azzopardi

Engineering design makes use of freehand sketches to communicate ideas, allowing designers to externalise form concepts quickly and naturally. Such sketches serve as working documents which demonstrate the evolution of the design process. For the product design to progress, however, these sketches are often redrawn using computer-aided design tools to obtain virtual, interactive prototypes of the design. Although there are commercial software packages which extract the required information from freehand sketches, such packages typically do not handle the complexity of the sketched drawings, particularly when considering the visual cues that are introduced to the sketch to aid the human observer to interpret the sketch. In this paper, we tackle one such complexity, namely the use of shading and shadows which help portray spatial and depth information in the sketch. For this reason, we propose a vectorisation algorithm, based on trainable COSFIRE filters for the detection of junction points and subsequent tracing of line paths to create a topology graph as a representation of the sketched object form. The vectorisation algorithm is evaluated on 17 sketches containing different shading patterns and drawn by different sketchers specifically for this work. Using these sketches, we show that the vectorisation algorithm can handle drawings with straight or curved contours containing shadow cues, reducing the salient point error in the junction point location by 91% of that obtained by the off-the-shelf Harris-Stephen's corner detector while the overall vectorial representations of the sketch achieved an average F-score of 0.92 in comparison to the ground truth. The results demonstrate the effectiveness of the proposed approach.

工程设计利用手绘草图来传达想法，使设计师能够快速自然地将形式概念具体化。这些草图作为工作文件，展示了设计过程的演变。然而，为了使产品设计取得进展，这些草图经常使用计算机辅助设计工具重新绘制，以获得设计的虚拟交互式原型。虽然有从手绘草图中提取所需信息的商业软件包，但这些软件包通常不能处理草图的复杂性，特别是当考虑到引入草图以帮助人类观察者解释草图的视觉线索时。在本文中，我们解决了这样一个复杂的问题，即使用阴影和阴影来帮助在草图中描绘空间和深度信息。出于这个原因，我们提出了一种矢量化算法，该算法基于可训练的COSFIRE滤波器，用于检测连接点和随后的线路径跟踪，以创建拓扑图作为草图对象形式的表示。矢量化算法在包含不同阴影模式的17个草图上进行评估，并由不同的草图师专门为这项工作绘制。使用这些草图，我们表明矢量化算法可以处理含有阴影线索的直线或弯曲轮廓的图纸，将连接点位置的突出点误差减少了91%，而与地面事实相比，草图的整体矢量表示实现了平均f分0.92。结果表明了该方法的有效性。

{"title":"Vectorisation of Sketches with Shadows and Shading using COSFIRE filters","authors":"Alexandra Bonnici, Dorian Bugeja, G. Azzopardi","doi":"10.1145/3209280.3209525","DOIUrl":"https://doi.org/10.1145/3209280.3209525","url":null,"abstract":"Engineering design makes use of freehand sketches to communicate ideas, allowing designers to externalise form concepts quickly and naturally. Such sketches serve as working documents which demonstrate the evolution of the design process. For the product design to progress, however, these sketches are often redrawn using computer-aided design tools to obtain virtual, interactive prototypes of the design. Although there are commercial software packages which extract the required information from freehand sketches, such packages typically do not handle the complexity of the sketched drawings, particularly when considering the visual cues that are introduced to the sketch to aid the human observer to interpret the sketch. In this paper, we tackle one such complexity, namely the use of shading and shadows which help portray spatial and depth information in the sketch. For this reason, we propose a vectorisation algorithm, based on trainable COSFIRE filters for the detection of junction points and subsequent tracing of line paths to create a topology graph as a representation of the sketched object form. The vectorisation algorithm is evaluated on 17 sketches containing different shading patterns and drawn by different sketchers specifically for this work. Using these sketches, we show that the vectorisation algorithm can handle drawings with straight or curved contours containing shadow cues, reducing the salient point error in the junction point location by 91% of that obtained by the off-the-shelf Harris-Stephen's corner detector while the overall vectorial representations of the sketch achieved an average F-score of 0.92 in comparison to the ground truth. The results demonstrate the effectiveness of the proposed approach.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122202403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the ACM Symposium on Document Engineering 2018

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀