{"title":"Proceedings of the ACM Symposium on Document Engineering 2018","authors":"","doi":"10.1145/3209280","DOIUrl":"https://doi.org/10.1145/3209280","url":null,"abstract":"","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117006398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Combining text and mathematics when searching in a corpus with extensive mathematical notation remains an open problem. Recent results for Tangent-3 on the math and text retrieval task at NTCIR-12, for example, have room for improvement, even though formula retrieval appeared to be fairly successful. This paper explores how to adapt the state-of-the-art BM25 text ranking method to work well when searching for math together with text. Following the approach proposed for the Tangent math search system, we use symbol layout trees to represent math formulae. We extract features from the symbol layout trees to serve as search terms to be ranked using BM25 and then explore the effects on retrieval performance of various classes of features. Based on the results, we recommend which features can be used effectively in a conventional text-based retrieval engine. We validate our overall approach using a NTCIR-12 math and text benchmark.
{"title":"Choosing Math Features for BM25 Ranking with Tangent-L","authors":"Dallas J. Fraser, Andrew Kane, Frank Wm. Tompa","doi":"10.1145/3209280.3209527","DOIUrl":"https://doi.org/10.1145/3209280.3209527","url":null,"abstract":"Combining text and mathematics when searching in a corpus with extensive mathematical notation remains an open problem. Recent results for Tangent-3 on the math and text retrieval task at NTCIR-12, for example, have room for improvement, even though formula retrieval appeared to be fairly successful. This paper explores how to adapt the state-of-the-art BM25 text ranking method to work well when searching for math together with text. Following the approach proposed for the Tangent math search system, we use symbol layout trees to represent math formulae. We extract features from the symbol layout trees to serve as search terms to be ranked using BM25 and then explore the effects on retrieval performance of various classes of features. Based on the results, we recommend which features can be used effectively in a conventional text-based retrieval engine. We validate our overall approach using a NTCIR-12 math and text benchmark.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132022632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Douglas Paulo De Mattos, Débora C. Muchaluat-Saade
This paper proposes an interactive multimedia authoring tool called STEVE (Spatio-Temporal View Editor) and a new multimedia model called SIMM (Simple Interactive Multimedia Model). STEVE aims at allowing users with no knowledge of multimedia authoring languages and models to create hypermedia applications for web and digital TV systems in a user-friendly way. Compared with existing multimedia authoring tools, STEVE is the unique tool that allows ordinary users to export hypermedia applications to HTML5 and NCL documents. STEVE uses an event-based temporal synchronization model called SIMM that exactly fits its needs. SIMM provides high-level temporal, spatial and interactivity relations to make authoring with STEVE easier. Usability tests show that, according to users, STEVE allowed them to create multimedia applications and export them as HTML5 and NCL documents in a few minutes without programming.
{"title":"STEVE","authors":"Douglas Paulo De Mattos, Débora C. Muchaluat-Saade","doi":"10.1145/3209280.3209521","DOIUrl":"https://doi.org/10.1145/3209280.3209521","url":null,"abstract":"This paper proposes an interactive multimedia authoring tool called STEVE (Spatio-Temporal View Editor) and a new multimedia model called SIMM (Simple Interactive Multimedia Model). STEVE aims at allowing users with no knowledge of multimedia authoring languages and models to create hypermedia applications for web and digital TV systems in a user-friendly way. Compared with existing multimedia authoring tools, STEVE is the unique tool that allows ordinary users to export hypermedia applications to HTML5 and NCL documents. STEVE uses an event-based temporal synchronization model called SIMM that exactly fits its needs. SIMM provides high-level temporal, spatial and interactivity relations to make authoring with STEVE easier. Usability tests show that, according to users, STEVE allowed them to create multimedia applications and export them as HTML5 and NCL documents in a few minutes without programming.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124502456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Caponi, A. Iorio, F. Vitali, Paolo Alberti, M. Scatá
There are several domains in which the documents are made of reusable pieces. Template languages have been widely studied by the document engineering community to deal with common structures and textual fragments. Though, templating mechanisms are often hidden in mainstream word-precessors and even unknown by common users. This paper presents a pattern-based language for templates, serialized in HTML and exploited in a user-friendly WYSIWYG editor for writing technical documentation. We discuss the deployment of the editor by an engineering company in the railway domain, as well as some generalized lessons learned about templates.
{"title":"Exploiting patterns and templates for technical documentation","authors":"A. Caponi, A. Iorio, F. Vitali, Paolo Alberti, M. Scatá","doi":"10.1145/3209280.3209537","DOIUrl":"https://doi.org/10.1145/3209280.3209537","url":null,"abstract":"There are several domains in which the documents are made of reusable pieces. Template languages have been widely studied by the document engineering community to deal with common structures and textual fragments. Though, templating mechanisms are often hidden in mainstream word-precessors and even unknown by common users. This paper presents a pattern-based language for templates, serialized in HTML and exploited in a user-friendly WYSIWYG editor for writing technical documentation. We discuss the deployment of the editor by an engineering company in the railway domain, as well as some generalized lessons learned about templates.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115684924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
diffi (diff improved) is a comparison tool whose primary goal is to describe the differences between the content of two documents regardless of their formats. diffi examines the stacks of abstraction levels of the two documents to be compared, finds which levels can be compared, selects one or more appropriate comparison algorithms and calculates the delta(s) between the two documents. Finally, the deltas are serialized using the extended unified patch format, an extension of the common unified patch format. The produced deltas describe the differences between all the comparable levels of the inputs documents. Users and developers of patch visualization tools have, thus, the choice to focus on their preferred level of abstraction.
{"title":"diffi","authors":"Gioele Barabucci","doi":"10.1145/3209280.3229084","DOIUrl":"https://doi.org/10.1145/3209280.3229084","url":null,"abstract":"diffi (diff improved) is a comparison tool whose primary goal is to describe the differences between the content of two documents regardless of their formats. diffi examines the stacks of abstraction levels of the two documents to be compared, finds which levels can be compared, selects one or more appropriate comparison algorithms and calculates the delta(s) between the two documents. Finally, the deltas are serialized using the extended unified patch format, an extension of the common unified patch format. The produced deltas describe the differences between all the comparable levels of the inputs documents. Users and developers of patch visualization tools have, thus, the choice to focus on their preferred level of abstraction.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128681385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Millions of customer reviews for products are available online across hundreds of different websites. These reviews have a tremendous influence on the purchase decision of new customers and in creating a positive brand image. Understanding which of the product issues are critical in determining the product ratings is crucial for marketing teams. We have developed a solution which can derive deep insights from customer reviews which goes significantly beyond keyword based analysis. Our solution can identify key customer issues voiced in the reviews and the impact of each of these on the final rating that a customer gives the product. This insight is very actionable as it helps identify which customer concerns are responsible for bad ratings of products.
{"title":"Identifying the Relative Importance of Customer Issues on Product Ratings through Machine Learning","authors":"Himanshu Tiwari, Shameed Sait, Md Imbesat Hassan Rizvi, Niranjan Damera-Venkata","doi":"10.1145/3209280.3229113","DOIUrl":"https://doi.org/10.1145/3209280.3229113","url":null,"abstract":"Millions of customer reviews for products are available online across hundreds of different websites. These reviews have a tremendous influence on the purchase decision of new customers and in creating a positive brand image. Understanding which of the product issues are critical in determining the product ratings is crucial for marketing teams. We have developed a solution which can derive deep insights from customer reviews which goes significantly beyond keyword based analysis. Our solution can identify key customer issues voiced in the reviews and the impact of each of these on the final rating that a customer gives the product. This insight is very actionable as it helps identify which customer concerns are responsible for bad ratings of products.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130271650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Mei, Xiang Jiang, Aminul Islam, A. Mohammad, E. Milios
Attention guides computation to focus on important parts of the input data. For pairwise input, existing attention approaches tend to bias towards trivial repetitions (e.g. punctuations and stop words) between two texts, and thus failed to contribute reasonable guidance to model predictions. As a remedy, we suggest taking into account the corpus-level information via global-aware attention. In this paper, we propose an attention mechanism that makes use of intratext, inter-text and global contextual information. We undertake an ablation study on paraphrase identification, and demonstrate that the proposed attention mechanism can obviate the downsides of trivial repetitions and provide interpretable word weightings.
{"title":"Integrating Global Attention for Pairwise Text Comparison","authors":"Jie Mei, Xiang Jiang, Aminul Islam, A. Mohammad, E. Milios","doi":"10.1145/3209280.3229119","DOIUrl":"https://doi.org/10.1145/3209280.3229119","url":null,"abstract":"Attention guides computation to focus on important parts of the input data. For pairwise input, existing attention approaches tend to bias towards trivial repetitions (e.g. punctuations and stop words) between two texts, and thus failed to contribute reasonable guidance to model predictions. As a remedy, we suggest taking into account the corpus-level information via global-aware attention. In this paper, we propose an attention mechanism that makes use of intratext, inter-text and global contextual information. We undertake an ablation study on paraphrase identification, and demonstrate that the proposed attention mechanism can obviate the downsides of trivial repetitions and provide interpretable word weightings.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115740609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visual Text Analytics has been an active area of interdisciplinary research (http://textvis.lnu.se/). This interactive tutorial is designed to give attendees an introduction to the area of information visualization, with a focus on linguistic visualization. After an introduction to the basic principles of information visualization and visual analytics, this tutorial will give an overview of the broad spectrum of linguistic and text visualization techniques, as well as their application areas [3]. This will be followed by a hands-on session that will allow participants to design their own visualizations using tools (e.g., Tableau), libraries (e.g., d3.js), or applying sketching techniques [4]. Some sample datasets will be provided by the instructor. Besides general techniques, special access will be provided to use the VisArgue framework [1] for the analysis of selected datasets.
{"title":"Visual Text Analytics: Techniques for Linguistic Information Visualization","authors":"Mennatallah El-Assady","doi":"10.1145/3209280.3232795","DOIUrl":"https://doi.org/10.1145/3209280.3232795","url":null,"abstract":"Visual Text Analytics has been an active area of interdisciplinary research (http://textvis.lnu.se/). This interactive tutorial is designed to give attendees an introduction to the area of information visualization, with a focus on linguistic visualization. After an introduction to the basic principles of information visualization and visual analytics, this tutorial will give an overview of the broad spectrum of linguistic and text visualization techniques, as well as their application areas [3]. This will be followed by a hands-on session that will allow participants to design their own visualizations using tools (e.g., Tableau), libraries (e.g., d3.js), or applying sketching techniques [4]. Some sample datasets will be provided by the instructor. Besides general techniques, special access will be provided to use the VisArgue framework [1] for the analysis of selected datasets.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134501423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SlideDiff is a system that automatically creates an animated rendering of textual and media differences between two versions of a slide presentation. While previous work focused on either textual or image data, SlideDiff integrates both text and media changes, as well as their interactions, for example when adding an image forces nearby text boxes to shrink. Given two versions of a slide (not the full history of edits), SlideDiff detects the textual and image differences, and then animates the changes by mimicking what a user might have done, such as moving the cursor, typing text, resizing image boxes, adding images. This editing metaphor is well known to most users, helping them better understand what has changed, and fosters a sense of connection between remote workers, derived from communicating both the revision process as well as its results. After detection of text and image differences, the animations are rendered in HTML and CSS, including mouse cursor motion, text and image box selection and resizing, text deletion and insertion with its cursor. We discuss strategies for animating changes, in particular the importance of starting with large changes and finishing with smaller edits, and provide details of the implementation using modern HTML and CSS.
{"title":"SlideDiff","authors":"Laurent Denoue, S. Carter, M. Cooper","doi":"10.1145/3209280.3229107","DOIUrl":"https://doi.org/10.1145/3209280.3229107","url":null,"abstract":"SlideDiff is a system that automatically creates an animated rendering of textual and media differences between two versions of a slide presentation. While previous work focused on either textual or image data, SlideDiff integrates both text and media changes, as well as their interactions, for example when adding an image forces nearby text boxes to shrink. Given two versions of a slide (not the full history of edits), SlideDiff detects the textual and image differences, and then animates the changes by mimicking what a user might have done, such as moving the cursor, typing text, resizing image boxes, adding images. This editing metaphor is well known to most users, helping them better understand what has changed, and fosters a sense of connection between remote workers, derived from communicating both the revision process as well as its results. After detection of text and image differences, the animations are rendered in HTML and CSS, including mouse cursor motion, text and image box selection and resizing, text deletion and insertion with its cursor. We discuss strategies for animating changes, in particular the importance of starting with large changes and finishing with smaller edits, and provide details of the implementation using modern HTML and CSS.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114479435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The text processing tool LATEX has prevailed as a standard in many fields of exact sciences; it is evident that LATEX is likely to be here to stay. From that perspective, it is important to explore what are the best possible ways to support the author in efficiently editing documents. There have been several approaches that provide graphical editing support for LATEX. We argue that a true WYSIWYG (What You See Is What You Get) approach is a justified requirement for future systems and we present here the first cloud-based true WYSIWYG editor. This allows the author to edit the document in its print form directly in a web-based PDF viewer. Building such a system creates unique challenges compared to existing approaches. We identify these challenges and name workable solutions. We also provide a usability evaluation of the new system. In short our finding is that editing LATEX directly in the PDF view is possible for a wide range of edits and valuable for many major user groups and use cases; hence it is a fair requirement for future top-of-the-line LATEX editors.
{"title":"SwiftLaTeX","authors":"Elliott Wen, Gerald Weber","doi":"10.1145/3209280.3209522","DOIUrl":"https://doi.org/10.1145/3209280.3209522","url":null,"abstract":"The text processing tool LATEX has prevailed as a standard in many fields of exact sciences; it is evident that LATEX is likely to be here to stay. From that perspective, it is important to explore what are the best possible ways to support the author in efficiently editing documents. There have been several approaches that provide graphical editing support for LATEX. We argue that a true WYSIWYG (What You See Is What You Get) approach is a justified requirement for future systems and we present here the first cloud-based true WYSIWYG editor. This allows the author to edit the document in its print form directly in a web-based PDF viewer. Building such a system creates unique challenges compared to existing approaches. We identify these challenges and name workable solutions. We also provide a usability evaluation of the new system. In short our finding is that editing LATEX directly in the PDF view is possible for a wide range of edits and valuable for many major user groups and use cases; hence it is a fair requirement for future top-of-the-line LATEX editors.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115084015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}